Design of A Semi-Autonomous Vehicle Using Reinforcement Machine Learning For The Indian Infrastructure
Design of A Semi-Autonomous Vehicle Using Reinforcement Machine Learning For The Indian Infrastructure
Design of a Semi-Autonomous
Vehicle Using Reinforcement Machine
Learning for the Indian Infrastructure
1*
Rugved Naik; 2Omkar Jadhav; 3Vaibhav Yelam; 4Soham Rajopadhye
1
Dept. of Mech. Engg., Marathwada Mitra Mandal’s College of Engg., Pune, Maharashtra, India
2
Design Engineer, V.R. Coatings Pvt. Ltd., Pune, Maharashtra, India
3
System Engineer, Tata Consultancy Services (TCS), Pune, Maharashtra, India
4
Operations and Planning Dept., Tata Motors, Pune, Maharashtra, India
Abstract:- The rapid growth in the transportation sector II. PROBLEM STATEMENT
demands innovative solutions to address safety,
efficiency, and environmental challenges, especially in The Indian Road Infrastructure Presents Several Unique
countries with complex and dynamic road Challenges That Require Special Consideration:
infrastructures like India. This research explores the
design of a semi-autonomous vehicle tailored for Indian Mixed Traffic Conditions: The coexistence of
road conditions using reinforcement learning (RL) motorbikes, auto-rickshaws, buses, pedestrians, and stray
techniques. The unique characteristics of Indian animals.
infrastructure, including mixed traffic, unpredictable Unpredictable Human Behavior: Pedestrians and drivers
behavior of pedestrians, varying road conditions, and often do not adhere to traffic rules.
inconsistent adherence to traffic regulations, pose Varying Road Conditions: Roads can range from well-
challenges to the implementation of autonomous driving paved highways to poorly maintained rural paths.
technologies. This paper proposes an RL-based Environmental Factors: The system must account for
approach to navigate these challenges and discusses the frequent fog, rain, and dust conditions.
potential design, algorithmic frameworks, practical case
studies, and implications. III. REINFORCEMENT LEARNING FOR SEMI-
AUTONOMOUS DRIVING
Keywords:- Arduino-Based Automation, Autonomous
Driving, Obstacle Avoidance, Obstacle Detection, Real- Reinforcement learning (RL) is a subset of machine
Time Navigation, Reinforcement Learning, Sensor Fusion, learning where an agent learns optimal behaviors by
Semi-Autonomous Vehicle interacting with its environment. The agent makes decisions
based on a reward-punishment mechanism designed to
I. INTRODUCTION maximize cumulative rewards. In the context of semi-
autonomous driving, the RL agent would learn to make
Autonomous driving technology has seen significant decisions regarding navigation, obstacle avoidance, and
advancements in recent years, driven by improvements in speed regulation based on real-time data inputs.
computational capabilities and advancements in machine
learning techniques. However, these advancements have A. Markov Decision Process (MDP)
mostly been optimized for structured environments with
well-defined road systems, such as those found in developed The Driving Task can be Modeled as an MDP, Where:
countries. The Indian road network is vastly different,
characterized by unmarked lanes, heavy congestion, non- State (S): Includes the vehicle’s position, speed, sensor
linear traffic patterns, and frequent obstacles, including readings, and road conditions.
animals and pedestrians. These conditions necessitate a Action (A): Steering, acceleration, braking, and
tailored approach to semi-autonomous vehicle (SAV) signaling.
technology.
Reward (R): Designed to reinforce safe driving, such as
staying in the correct lane, maintaining a safe distance,
This paper aims to design a semi-autonomous vehicle
and minimizing abrupt maneuvers.
system using reinforcement learning optimized for Indian
Transition (T): Represents the probabilities of moving
infrastructure. By focusing on a semi-autonomous model,
from one state to another given an action.
human intervention can be integrated to handle scenarios
where full automation is infeasible due to the complexity of
Indian roads.
IV. SYSTEM ARCHTECTURE AND DESIGN Soft Actor-Critic (SAC): SAC is another algorithm that
handles continuous control tasks by maximizing the
The system architecture for a semi-autonomous vehicle expected reward while maintaining a certain level of
tailored to Indian infrastructure is a multi-layered design entropy, enabling better exploration. This is particularly
that integrates perception, decision-making, and control useful for situations where the system needs to explore
modules with a human-in-the-loop mechanism. This unconventional driving strategies, like navigating
architecture is designed to address the unique challenges narrow, congested roads.
posed by Indian roads, such as unstructured traffic,
unpredictable pedestrian behavior, and inconsistent road C. Control Module:
quality. The control module converts the decisions made by the
RL agent into precise commands that the vehicle’s actuators
A. Perception Module can execute. This involves:
The perception module is responsible for
understanding the vehicle’s surroundings using a Longitudinal Control (Speed and Braking): Adaptive
combination of sensors. The module relies on three key Cruise Control (ACC) is implemented to maintain safe
subsystems: distances while considering the speed of surrounding
vehicles. The RL agent dynamically adjusts acceleration
Sensor Suite: LiDAR (Light Detection and Ranging): and braking based on real-time conditions.
Provides 3D mapping for object detection, depth Lateral Control (Steering): Lane-keeping and lane-
perception, and environment modeling. changing maneuvers are managed by a Model Predictive
Cameras: Capture images for lane detection, traffic sign Control (MPC) framework, which ensures smooth
recognition, and obstacle identification. trajectory tracking. MPC works in conjunction with the
Ultrasonic Sensors: Useful for detecting nearby objects RL agent to predict and correct vehicle paths in real-
during low-speed maneuvers such as parking or time.
navigating through congested streets.
Radar: Provides reliable distance measurements, D. Human-in-the-Loop Mechanism
especially in adverse weather conditions like fog and Given the unpredictability of Indian roads, the semi-
rain. autonomous system integrates a human-in-the-loop
Sensor Fusion: A sensor fusion algorithm combines data mechanism that allows manual intervention when necessary.
from the sensor suite to create a coherent understanding The human driver can take control in complex scenarios
of the vehicle's environment. Kalman Filters or Particle where the RL agent may be uncertain or when the risk of
Filters are commonly used for this purpose to estimate failure is high (e.g., navigating through crowded markets).
the vehicle’s state and nearby objects’ positions. The system also provides feedback to the driver, suggesting
Environment Modeling: The data processed through actions while maintaining safety margins.
sensor fusion is used to build a real-time map of the
environment. This includes identifying lanes, road edges, V. CASE STUDIES
dynamic obstacles (e.g., pedestrians, vehicles), and static
objects (e.g., traffic signs, buildings). To evaluate the effectiveness of the designed system,
real-world case studies have been conducted in two distinct
B. Decision-Making Module environments: a congested metropolitan city and smaller
The decision-making module is the core of the semi- tier-2 cities with different traffic patterns.
autonomous system, where reinforcement learning
algorithms play a vital role. The RL agent learns to make A. Case Study: RL-Based Semi-Autonomous Driving in
decisions in real-time based on the current environment, Bangalore [1]
predefined goals (like reaching a destination safely), and Bangalore, one of India’s most congested cities, offers
driving policies. The following algorithms are integral to a challenging environment for semi-autonomous vehicles
this module: (SAVs). Known for its narrow lanes, erratic traffic signals,
and jaywalking pedestrians, the city presents a practical
Proximal Policy Optimization (PPO): PPO is well-suited testbed for RL-based driving systems.
for continuous action spaces like those encountered in
driving scenarios. It ensures stability and performance Implementation: A prototype semi-autonomous vehicle
while avoiding drastic updates, making it reliable for was deployed in collaboration with local authorities. The
steering, speed regulation, and obstacle avoidance. vehicle was trained using a combination of simulated
Deep Q-Networks (DQN) with Prioritized Experience data and real-world data from Bangalore’s streets.
Replay: For scenarios with discrete action spaces, such Training scenarios included:
as choosing between different predefined driving Narrow Lanes and Congested Traffic: The RL agent
maneuvers (e.g., overtaking or yielding), DQN is used. learned to navigate through tight spaces, often having to
Prioritized experience replay ensures that more decide between yielding to aggressive drivers or
important experiences (e.g., near-crash situations) are maintaining course.
replayed more frequently during training.
Erratic Traffic Signals: Traffic signals in Bangalore are sensors, and cameras to continuously monitor the vehicle's
often poorly synchronized. The RL model was trained to surroundings.
handle situations where traffic lights suddenly change or
where other drivers do not comply with the signal. Working Principle:
Jaywalking Pedestrians: Pedestrians in Bangalore
frequently cross roads unexpectedly. The RL agent Sensor Fusion: Data from LIDAR and ultrasonic sensors
adapted by learning to predict pedestrian behavior based are processed through sensor fusion algorithms, which
on historical data and real-time sensor inputs. filter noise and provide accurate distance measurements
Case Study Outcomes: Initial tests demonstrated to nearby objects.
significant improvements in travel time and decision- Reinforcement Learning Integration: The RL model is
making efficiency. The vehicle handled complex traffic trained using a reward-based system where penalties are
scenarios with minimal human intervention, especially in assigned for near-collision scenarios, helping the vehicle
congested areas, reducing the overall time spent to predict and react to potential threats in real-time.
navigating through bottlenecks by 20% compared to Decision-Making: Upon detecting an imminent collision,
human drivers. The system's ability to anticipate sudden the system overrides the vehicle's control, applying
changes, like pedestrians entering the road or erratic emergency braking or steering adjustments.
driver behavior, was particularly effective.
Challenges Addressed:
B. Case Study: Integration of RL in Tier-2 Cities [1]
The dynamics in smaller cities differ significantly from Unpredictable movements of pedestrians, animals, and
those in metropolitan areas. Tier-2 cities in India often have non-standard vehicles like auto-rickshaws.
less congested but poorly maintained roads, erratic traffic, Dynamic obstacle behavior and crowded streets, often
and different cultural driving practices. For this study, the lacking clear traffic rules.
RL model was retrained using localized data from cities like
Lucknow and Indore. B. Obstacle Avoidance Circuit
Obstacle avoidance is pivotal given the frequent,
Implementation: The vehicle was deployed in mid-sized unexpected obstructions on Indian roads, such as potholes,
cities, where it was trained on local driving conditions. debris, and unregulated road barriers. The obstacle
Key adaptations included: avoidance system integrates deep reinforcement learning
Irregular Road Conditions: Roads in tier-2 cities often with a focus on real-time environmental adaptability.
have potholes, abrupt speed breakers, and unmarked
lanes. The RL agent was trained to recognize these Working Principle:
irregularities and adjust its speed and trajectory
accordingly. Perception Layer: The vehicle's sensors map the
Traffic Patterns: Unlike in metropolitan areas, traffic in surrounding area, identifying static and dynamic
these cities is less dense but more unpredictable. The obstacles.
vehicle needed to adapt to diverse traffic patterns, Path Planning: The RL model continuously evaluates
including slow-moving vehicles, animal crossings, and multiple potential paths based on real-time sensor input.
frequent stops for roadside markets. Paths with fewer obstacles and smoother terrain are
Cultural Driving Practices: Drivers in these regions may prioritized.
frequently honk, overtake in non-standard ways, or use Control Layer: The model adjusts the vehicle’s steering
hand signals instead of indicators. The RL model had to and speed while maintaining stability, even in complex
incorporate these unconventional cues into its decision- environments like narrow lanes or crowded junctions.
making.
Case Study Outcomes: Customizing the RL model for Challenges Addressed:
these environments led to a 30% improvement in safety
metrics, such as fewer abrupt stops and better avoidance Navigating around unexpected road obstacles, including
of road hazards. The localized training also resulted in stationary vehicles or street vendors.
higher user acceptance rates, as passengers noted Managing split-second decisions in chaotic urban
smoother driving behaviors more aligned with the conditions where obstacles can suddenly appear.
typical driving experience in their regions.
C. Line Tracing Circuit
VI. PROTOTYPE SYSTEM DESIGN AND Line tracing is essential for maintaining lane discipline,
COMPONENTS even when road markings are inconsistent or faded, as is
common on Indian roads. The line tracing circuit is designed
A. Collision Detection Circuit to perform under various conditions, from well-marked
The collision detection circuit is crucial for ensuring highways to narrow rural lanes.
the vehicle's safety in India's congested traffic environments.
The system uses a combination of LIDAR, ultrasonic
Image Processing: The system employs convolutional Data Logging: All sensor data, including collision
neural networks (CNNs) for edge detection and lane events, obstacle encounters, and lane deviations, are
recognition, even when markings are faint or partially logged and tagged with GPS coordinates and timestamp
obscured. information.
Reinforcement Learning: The RL agent is trained to Edge Computing: Data is pre-processed on the vehicle
balance between following lane markings and adapting using edge computing techniques to reduce latency and
to deviations like potholes or construction work. enhance real-time decision-making.
Dynamic Adjustment: The model continuously learns Cloud Integration: Processed data is uploaded to a cloud
from feedback, improving its performance across server for long-term storage and further analysis,
different environments. enabling more comprehensive model retraining and
updates.
Challenges Addressed:
Challenges Addressed:
Poor or inconsistent lane markings, common in urban
areas and rural roads. The diverse and evolving nature of Indian road
Frequent changes in lane structure due to temporary or conditions necessitates continuous learning for the RL
informal road layouts. model.
Real-time adaptation to sudden environmental changes,
D. Real-Time Data Gathering System such as weather shifts or road repairs.
The real-time data gathering system is designed to
support continuous learning, adaptation, and improvement VII. SYSTEM MODEL
of the vehicle’s performance. This system collects and
analyzes data from the vehicle’s sensors and external The following chapter will consist of the design of the
conditions, feeding it back into the RL model. model along with the mounting of the sensors and the
various modules along with the individual circuits designed
to accomplish various predefined objectives.
A. Mounting Diagram
The following diagram represents the mounting of the
sensors and the components onto the model.
D. Circuit Diagram
The circuit diagram serves as a visual representation of
the electrical connections and components within a system.
It plays a crucial role in understanding the flow of current
and the relationships between various elements, enabling the
design, analysis, and troubleshooting of circuits effectively.
Fig 3: Front View of CAD Model
along with the assembly of the various subsystems and Line Tracing Circuit – Two IR sensors have been
individual circuits. mounted which have been specifically programmed to
detect black lines. Thus the vehicle seamlessly followed
G. Testing of the Model- the tracks made via line tracing.
The created prototype model was then tested for the
various objectives. The necessary changes like changing the Initially, a problem of overpowering the Arduino was
position of the sensors, changes in code as required, and found because of using multiple alkaline batteries in a
changes of the sensors/modules were made as found parallel arrangement for powering the assembly. Also, the
necessary after the testing. The flow of voltage and current batteries would drain out very fast. This was solved by
was checked throughout the circuit and adjusted as and making use of Li-ion batteries which have a higher mAh
when required. The delays in the code, as well as the rating and can also be recharged.
sensors, were fine-tuned and adjusted. Also, the process was
iterated and followed from steps 5-7 until the tuning of the We were initially making use of 2 HC SR04 sensors
model was complete. mounted on the front ends diagonally facing forwards. But
there would be interference of the data at times and the
creation of a small blind spot at the very center. Hence this
problem was countered by adding a servomotor which will
support a single ultrasonic sensor and move 120˚ to sense its
surroundings if an obstacle is detected in the forward
direction.
X. IMPLEMENTATION OF MACHINE
LEARNING INTO THE MODEL
Implementation Overview: