Obstacle Avoidance for blind people using Yolo algorithm, Darknet and GTTS

. The obstacle avoidance system uses a YOLO model to detect obstacles in real-time and provide spatial information about their location and size. This information is then passed to the GTTS system, which generates audio alerts to notify the user of the presence of an obstacle and its location. The audio alerts are generated in a natural-sounding voice to provide the user with clear and concise information. To evaluate the effectiveness of our proposed system, we conducted experiments with visually impaired individuals in real-world scenarios. The results show that our system can significantly improve obstacle detection and avoidance performance compared to traditional methods. The participants reported high levels of satisfaction with the system's performance and ease of use .


Introduction
Obstacle avoidance is a critical challenge for blind individuals, as it affects their ability to navigate the world safely and independently. With advancements in computer vision and natural language processing, technologies like YOLO and GTTS offer promising solutions for improving the lives of the visually impaired. A cutting-edge object identification system called YOLO can instantly recognize and locate objects in real-time video feeds. GTTS (Google Text-to-Speech) is a powerful text-to-speech engine that can convert text into natural-sounding speech output. By combining the capabilities of these two technologies, it is possible to create a system that can detect and verbally communicate the presence of obstacles in the environment to the user in real-time, enabling them to avoid potential hazards and navigate safely. This combination has the potential to revolutionize the way blind individuals interact with their surroundings and enhance their independence and mobility.
A popular object recognition approach in computer vision applications is called YOLO. It is known for its speed and accuracy in real-time object detection tasks. The main steps involved in object detection using YOLO. It takes an input image and resizes it to a fixed size (e.g., 416x416),.Anchor Boxes: YOLO uses anchor boxes to predict the bounding boxes of the objects in the image. Anchor boxes are predefined boxes of different sizes and aspect ratios that are used to predict the location and size of the objects. [2] Feature Extraction: YOLO uses a convolutional neural network (CNN) to extract features from the input image. GTTS (Google Text-to-Speech) is a text-to-speech service provided by Google, which can be used to convert text into spoken audio. While GTTS can be used in various applications, it is not directly related to obstacle avoidance.
Obstacle avoidance typically involves using sensors such as cameras, LiDAR, or ultrasonic sensors to detect obstacles in the environment and take appropriate actions to avoid them. For example, in a self-driving car, obstacle detection and avoidance algorithms are used to detect obstacles in the environment and control the vehicle's speed, direction, and braking to avoid collisions. While text-to-speech services like GTTS are not directly related to obstacle avoidance, they can be used in conjunction with obstacle avoidance systems to provide audio feedback to the user. For example, a self-driving car could use GTTS to provide audio alerts to the passengers when an obstacle is detected, or a blind person could use GTTS to receive audio feedback about the obstacles in their path such as people, cars, and other obstacles. The trained model can then be integrated into the robot's obstacle avoidance system, allowing it to navigate safely and autonomously through complex environments.

Methodologies
In this project we have an futuristic methodology of using Coco as an input data set,[21] Yolo algorithm to perform object detection and GTTS to convey the detected obstacles to the blind person. The following are procedure to perform the obstacle avoidance for Blind persons.

A. Data Collection:
Collect a diverse set of images of the environment where the obstacle avoidance system will be used. The images should be taken from different angles, distances, and lighting conditions.

B. Annotation:
Annotate the images using the Coco dataset format. The Coco dataset format provides a standard format for labeling objects in an image. The annotation should include the coordinates of the bounding boxes around the objects in the images.

C. Training:
Train the YOLO v3 model using the annotated data. The YOLO v3 model is a state-of-the-art object detection model that can detect multiple objects in an image. Training involves configuring the YOLO v3 model with hyperparameters, feeding the annotated data, and optimizing the model's weights and biases.

D. Integration:
Integrate the trained YOLO v3 model with Darknet. Darknet is an open-source neural network framework used to build deep neural networks. The integration involves modifying the Darknet configuration file to include the YOLO v3 model and configuring the model to use the Coco dataset.

E. Testing:
Test the obstacle detection system by running it in a real-time environment. The system should detect obstacles in the images captured by a camera and generate alerts to the user when an obstacle is detected. The performance of the system should be evaluated based on metrics such as accuracy, precision, and recall.

F. Integration with GTTS:
Integrate GTTS to generate audio alerts to the user whenever an obstacle is detected. The GTTS package is a Python library used to convert text to speech using Google's Text-to-Speech API. The system should generate alerts based on the location of the detected obstacle and the distance to the obstacle.
Obstacle avoidance is a critical task in robotics and autonomous systems, where the goal is to navigate an environment while avoiding obstacles. There are several methodologies for obstacle avoidance, and the most appropriate approach depends on the specific application and the characteristics of the environment.
One approach is the reactive method, where the robot reacts to the presence of obstacles in its immediate surroundings. This method involves using sensors such as ultrasonic, infrared, or lidar to detect obstacles and then taking appropriate actions to avoid them. The advantage of this method is that it can be implemented in real-time and does not require a priori knowledge of the environment. However, it may not be suitable for complex environments with many obstacles or where long-term planning is necessary.
Another approach is the deliberative method, where the robot plans a path around obstacles before it starts moving. [19] This method involves using maps or models of the environment to plan a collision-free path. The advantage of this method is that it can handle complex environments with many obstacles and plan long-term paths. However, it may not be suitable for dynamic environments where obstacles can move or change.
A hybrid approach combines reactive and deliberative methods to take advantage of their respective strengths. For example, the robot can use a reactive method to navigate in a local area, while a deliberative method is used to plan a path to the next goal. The advantage of this method is that it can handle both static and dynamic environments.
In conclusion, the choice of methodology for obstacle avoidance depends on the specific application and the characteristics of the environment. A reactive method is suitable for realtime obstacle avoidance, while a deliberative method is suitable for complex environments with many obstacles. A hybrid approach can take advantage of the strengths of both methods. "Real-time Object Detection and Tracking System for Autonomous Robots using YOLO and COCO Dataset" by M. A. Shafique et al. This paper presents a real-time object detection and tracking system for autonomous robots using YOLO and the COCO dataset. The authors use a combination of YOLO and Kalman filtering to track objects and demonstrate the effectiveness of their system in real-world experiments. [3] "Object Detection and Obstacle Avoidance for Autonomous Navigation using YOLO and ROS" by R. K. Mok et al. This paper proposes an object detection and obstacle avoidance system for autonomous navigation using YOLO and ROS (Robot Operating System). The authors use the COCO dataset to train their model and demonstrate the effectiveness of their system in real-world experiment.

Proposed System
Here, a camera is used to capture the real-time environment, and the video stream is passed to the YOLO v3 object detection algorithm. [11] YOLO v3 (You Only Look Once version 3) is a deep learning-based object detection algorithm that can identify various objects in a given image or video frame. It uses the COCO (Common Objects in Context) dataset, which contains 80 different object categories.
The next stage is to pinpoint the impediments after YOLOv3 has identified them. The Darknet neural network framework, an open-source neural network toolkit developed in C and CUDA, may be used to do this. The output from YOLOv3 may be processed by darknet, which can then show where the obstacles are in the video stream.
Finally, the detected obstacles and their location information can be passed to the GTTS (Google Text-to-Speech) API, which can convert the text into speech output. The speech output can then be played back to the user as a warning or alert regarding the obstacles in the environment.
In summary, the architecture diagram illustrates how real-time obstacle avoidance can be achieved using YOLOv3, COCO, Darknet, and GTTS. The camera captures the environment, YOLO v3 detects the obstacles, Darknet localizes the detected obstacles, and GTTS converts the text into speech output for the user.

Taxonomy
In this section, [21] we categorize the available literature on Obstacle avoidance for blind based on Yolo v3 and Darknet. Here is a possible taxonomy for obstacle avoidance using YOLO, COCO, Darknet, and GTTS:

Object Detection
The YOLO is a real-time object detection system that can detect objects in images or video frames. [17] It uses a single neural network to predict the bounding boxes and class probabilities of objects in an image. The COCO (Common Objects in Context) dataset, which consists of approximately 330,000 photos with 2.5 million object instances classified with 80 distinct item categories, is a large object recognition, segmentation, and captioning dataset. It may be used for YOLO and other object detection algorithms to train and test them.
Darknet is an open-source neural network framework that can be used to train 6 ITM Web of Conferences 56, 05010 (2023) https://doi.org/10.1051/itmconf/20235605010 ICDSAC 2023 and deploy object detection models, including YOLO. It is written in C and CUDA, making it fast and efficient on both CPUs and GPUs.

Path planning and Obstacle avoidance
In this process, Once objects have been detected in the environment, [20] a path planning and obstacle avoidance algorithm can be used to navigate around them. This can involve creating a map of the environment, planning a collisionfree path, and controlling the robot's motion.

Text-to-speech output
The GTTS (Google Text-to-Speech) is a Python library that can be used to generate speech from text. It can be used to provide audio output to a robot to indicate the presence of obstacles or provide other instructions to the user.
In summary, YOLO and COCO can be used for object detection, Darknet can be used for training and deploying object detection models, and a path planning and obstacle avoidance algorithm can be used to navigate around obstacles. Finally, GTTS can be used to provide audio output to the user.

Loss Function
Even though we already know what would happen, we still want to understand how the weights were adjusted [18] so that the loss function for this model was lowered during the course of the training period. When broken down, the function looks to be complex but is actually rather straightforward. L2 Loss: The squared distance between the predicted trajectory and the environmental obstacles is measured by the L2 loss, also known as mean squared error (MSE).

Huber Loss:
The Huber loss is a robust loss function that is less sensitive to outliers than the L2 loss.

3.
Smooth L1 Loss: The Smooth L1 loss is similar to the Huber loss and is also less sensitive to outliers than the L2 loss.

Future Works
Some future scopes for obstacle avoidance using these technologies are:

Real-time Obstacle Detection:
The YOLO algorithm is fast and accurate, which makes it suitable for real-time obstacle detection. It can detect objects in images and videos in realtime and provide instant feedback to the system for obstacle avoidance.

Multi-class Object Detection:
The COCO dataset provides a wide range of object classes, which can help in the detection of multiple types of obstacles. This can help in making the system more versatile and capable of detecting and avoiding a variety of obstacles.

Conclusion
Obstacle avoidance is an important application of computer vision and artificial intelligence. [22] In this context, YOLO, COCO, Darknet, and GTTS are all tools that can be used to develop effective obstacle avoidance systems.
YOLO (You Only Look Once) is a real-time object detection system that uses deep neural networks to detect objects in images or video frames. A large-scale object identification, segmentation, and captioning dataset called COCO (Common Objects in Context) may be used to train object detection models like YOLO. Darknet is an open-source neural network framework that can be used to build and train neural networks for a variety of tasks, including object detection. Finally, GTTS (Google Text-to-Speech) is a tool that can be used to convert text to speech, which is useful for creating audio warnings in obstacle avoidance systems.
By combining these tools, [20] developers can create effective obstacle avoidance systems that can detect and avoid obstacles in real time. YOLO can be used to detect obstacles in images or video frames, while Darknet can be used to train and fine-tune the neural network to improve accuracy. COCO can be used to provide a large dataset of objects to train the neural network on. Finally, GTTS can be used to generate audio warnings to alert the user when an obstacle is detected.
Overall, obstacle avoidance using YOLO, COCO, Darknet, and GTTS is a promising application of computer vision and artificial intelligence, with the potential to improve safety and autonomy in a variety of contexts.