1. Introduction
Today the concept of autonomous driving is a very active field of research and first solutions of more and more advanced driver assistance systems are already available in recent models of commercial cars. As mentioned in [
1], “the majority of the technologies required to create a fully autonomous vehicle already exist. The challenge is to combine existing automated functions with control, sensing and communications systems, to allow the vehicle to operate autonomously and safely”. That report also presents a classification of the level of autonomy based on the capabilities provided by an autonomous system. The simplest system includes the human driver along with an electronic stability and cruise control, which is available in most of the new car models. The next level adds a driver assistance in which steering and/or acceleration is automated in specific situations like parking assistance and adaptive cruise control. The classification then continues with partial autonomy, in which the driver does not control the steering and/or acceleration, but can take over control again if it is required like in lane keeping. After that, there is the level of high autonomy in which the car system is able to operate autonomously in different sections of the journey and only gives the control back to the human driver in some specific dangerous situations. Finally, there is the level of full autonomy in which the vehicle is capable of driving an entire journey without human intervention. Herein, the vehicle must be able to provide all the following specific capabilities:
Self-localization in a map or on a predefined path.
Sensing the surrounding and identification of potential collisions.
Control the basic driving functions, i.e., breaking, accelerating and steering.
Decision making, path planning and following while respecting the regulations of traffic.
Information collection and exchange, such as maps, traffic status and road incidences.
Platooning with other vehicles.
The work presented in this paper is build on the authors previous work [
2] which considers a visual line guided system to control the steering of on-board Autonomous Guided Vehicle (AGV)
that pursue a guided path. This work was initiated with a project done in collaboration with Siemens Spain S.A. focusing on the development of a driver assistance system for buses in the city center using a vision-based line guided system. The main idea was that the driver should still be able to actuate the brake pedal in order to avoid any potential collisions, but has no longer a steering wheel to manually guide the vehicle. Following the previously mentioned level of autonomy, the system presented in this paper could be assigned to a level somewhere between the levels of partial or high autonomy. The presented control system approach takes over the complete control of the steering wheel which corresponds to a high level of autonomy. However, the speed is controlled as an assistance cruise control by keeping the user’s desired speed under the limitation of the maximum speed in each section of the predefined path even if the user pushes the gas pedal to exceed this limit. The speed assistance control also allows to stop the vehicle in case of an emergency, such as the detection of an absence of the line, the push of an emergency button or if a specific localization mark on the road occurs.
The localization of the vehicle on the predefined path was implemented with the help of specific visual localization marks. In cases of a false or missing detection of one or more localization marks, the localization is supported by an odometry approach, i.e. the integration of the speed of the vehicle. These marks were not only used to localize the vehicle, but also to provide additional information to the control system and to the assistance cruise control. This additional information noticeably improves the behavior of the system, such as allowing to reach higher speeds and improving the robustness of the system. Regarding the previous list of the capabilities of a full autonomous system, we are here focusing on Point 1 and partially Point 3. The collision avoidance control is out of scope of this work. Various and potentially adverse conditions of the road such as on rainy days are also not considered in the work at hand. Therefore, the main focus is to find a low cost solution for a vision-based control approach including (I) the steering of an autonomous vehicle using a line guide, (II) a speed control assistance and (III) the localization of the vehicle, while preserving robustness against brightness variations in inner-city environments.
The layout of the paper is as follows.
Section 2 discusses related works on autonomous cars.
Section 3 describes the full system architecture as well as the low-level car controller (
Section 3.2), and the human-machine interface developed to remotely supervise the system, to command the desired speed and to realize an emergency stop if needed (
Section 3.3).
Section 4 presents the derived computer vision algorithms.
Section 5 presents the general system architecture, the car automation and the control approach of the steering wheel. The results of experiments that were carried out in a closed test road are presented in
Section 6. Finally,
Section 7 presents the conclusion and the future work of this paper.
2. Related Works
Autonomous guided vehicles (AGVs) are generally used in manufacturing and logistic systems inside warehouses, but their acceptance inspired many other applications such as guided buses in city transportation. They were introduced during 1950s and, by 1960s the Personal Rapid Transit (PRT) began to advent. Different guidance systems were introduced for AGVs such as systems based on optical distance measurements, wires, magnetic tapes or computer vision. Each type is based on own design requirements and comes with own related advantages and disadvantages. For instance, in wire guidance systems, a wire is installed below the floor on which the AGV is moving. The wire emits a radio signal which can be detected by a sensor on the bottom of AGV close to the ground. The relative position to the radio signal is applied by the AGV to follow the path. In magnetic tape guidance system, a flexible tape of magnetic material is buried in the floor or road like in the case of the wire guidance system. The advantage of this method with respect to the wire guidance system is the fact that it remains unpowered or passive. In laser navigation systems, the AGV is equipped with a laser transmitter and receives the reflection of the laser from a reflective tape installed on the surrounding walls. The localization and navigation is done using the measurements of the angles and distances to the reflectors. However, this method is generally only used in indoor environments [
3,
4]. A vision based navigation system uses a vision sensor to track landmarks in the environment which means that no magnets, no induction wires and also no laser technique is required to let the AGV follow a specified path [
5,
6].
On the other hand, the motivation to reduce traffic jams, to improve the fuel economy and to reduce the number of vehicle accidents in transportation leads to the introduction of different levels of automated driving. Many research institutes and automotive manufacturers worldwide are introducing their automated driving solutions, based on proprioceptive sensors such as the Anti-lock Brake System or the Electric Stability Program, or based on exteroceptive sensors such as radar, video, or LiDAR [
7]. The very first experiments on autonomous vehicles have been started in
and promising steps have been conducted in the 1950s. The research in autonomous driving in Europe started within the PROMETHEUS project (program for a European Traffic with Highest Efficiency and Unprecedented Safety) which was one of the largest research projects in fully automated driving in 1986. The obtained results of this project are regarded as milestones in the history of vehicular and robotic systems. Two of the vehicles were ARGO by VisLab [
8,
9] and VaMoRs [
10] tested in 1998. Both of them used two cameras to detect road lanes and to avoid obstacles, but the implemented algorithms and strategies were different. In 1995 the NAHSC project (National Automated Highway System Consortium) started in the United States within the California PATH (Partners for Advanced Transit and Highways) program [
11]. In 1997, the important Demo’97 was developed in San Diego in which some cars were guided by a magnetic guided line inside the asphalt. An array of different sensors had been installed in those cars to execute self-driving tests and to form automated platoons of 8 cars.
In the last decade many authorities around the world introduced plans to promote the development and establishment of automated vehicles [
12]. Numerous commercial vehicles offer some levels of automation, such as adaptive cruise control, collision avoidance, parallel parking system, lane keeping assistance,
etc. Research on this topic got a strong impulse by the challenging test-bed of DARPA in the grand and the urban challenge in 2005 and 2007 [
13] with impressive results obtained by Sebastian Thrun and his team from the Stanford University in 2005 [
14] and 2008 [
15], or by the Braunschweig University in 2009 [
16]. All of these works tried to cover all the capabilities listed for a fully autonomous system, which is also the case for the recent Google Car [
17]. In this specific case, the obtained results of this approach should enforce legal changes to achieve the first license for a self-driving car. The European Union also has a long history of contributing to automated driving such as the Vehicle and Road Automation (VRA) program, the GCDC (Grand Cooperative Driving Challenge), and others. Many countries plan to develop sensors, control systems and services in order to have competitive autonomous driving systems and infrastructures [
18]. A considerable number of studies and projects have been funded or are still continuing within the new HORIZON2020 research framework in the European Union.
For instance, a Mercedes-Benz S-Class vehicle equipped with six radar sensors covering the full 360° angular range of the environment around the vehicle in the near and far range has been introduced in 2013. The vehicle drove completely autonomous for about 100 km from Mannheim to Pforzheim, Germany, in normal traffic [
19].
Moreover, there are also some works focusing on the sixth point of the list of autonomous capabilities (
i.e., the platoon formation). In 2010, the multidisciplinary European project SARTRE used new approaches in platoon formations and leader systems to successfully present an autonomous platooning demo traveling 120 miles [
20]. The platoon comprised one human-driven truck followed by four cars equipped with cameras, laser sensors, radar and GPS technology. A complete test of different systems of leader following, lane and obstacle detection and terrain mapping has been done by the VisLab. In 2010, the laboratory directed by Alberto Broggi covered the distance of
km from Parma to Shanghai with a convoy of four motor homes [
21,
22]. All of them were equipped with five cameras and four laser scanners, no road maps were used. The first vehicle drove autonomously in selected sections of the trip while the other vehicles were
autonomous, using the sensors and the GPS way-points sent by the leader vehicle. The control of speed and/or steering of autonomous vehicles with a localization system based on GPS information is also presented in the literature, see, e.g., [
23]. Herein, a cruise control approach for an urban environment comprising the control of the longitudinal speed based on the speed limits, the curvature of the lane and the state of the next traffic light is proposed. In [
24], control tests of a high-speed car running the Pikes Peaks rally drive are presented. The work in [
25] shows a localization system without GPS, based on the detection of intersections and the use of a virtual cylindrical scanner (VCS) to adapt the vehicle speed.
Highly automated levels of driving require a very wide range of capabilities like sensing the environment, figuring out the situation and taking proper action for the driver. The design of a cost-effective solution for such highly automated driving systems is challenging and most of the time leads to an expensive multi-sensor configuration like the way introduced in [
26]. Vision-based systems are considered to be a cost-effective approach for automated driving systems [
27]. Vision-based systems can be categorized in different research areas and applications in the field of automated driving such as distance estimation using stereo vision [
28,
29] or monocular vision data [
30]. A review of the literature in on-road vision-based vehicle detection, tracking, and behavior understanding is provided in [
31]. From an algorithmic point of view, computationally more complex algorithms require an understanding of the trade-off between computational performance (speed and power consumption) and accuracy [
6,
32]. For instance, an offline-online strategy has been introduced in [
33] to overcome this trade-off. Furthermore, vision-based systems have many applications in automated driving, such as road detection which is one of the key issues of scene understanding for Advanced Driving Assistance Systems (ADAS). In [
34] road geometries for road detection are classified, and [
35] introduces an improved road detection algorithm that provides a pixel-level confidence map. The paper [
36] describes a neural network road and intersection detection. Another vision-based application for ADAS is lane keeping assistance, where a technique for the identification of the unwanted lane departure of a traveling vehicle on a road is presented in [
37].
Despite some new and improved computer vision algorithms which have been introduced in recent years such as [
38], it has to be noted that the variation in the lighting conditions, occlusions of the lane marking or road shoulders, and effects of shadows make the current vision-based solutions not reliable for the steering control of an autonomous car. Furthermore, these algorithms are still not completely real-time capable to be used in the closed control loop. Based on that and the specific constraints of our project mentioned in the previous section, this work focuses on a vision-based line guided system approach. To the author’s best knowledge, a vision-based line-guided system has not been presented to control the steering of an autonomous car under the maximum speed constraints of urban environments.
4. Computer Vision System
A computer vision algorithm processes in real time the images captured by a monocular camera under illumination with ultraviolet (UV) light. This camera is placed in the front part of the vehicle, looking downwards and isolated from the sunlight by a black box. A similar approach with a looking downward camera at the bottom of the car is presented in [
42]. In this work the car structure was used to avoid that the illumination changes affect the image acquisition. The camera was used to get information from the road to do a localization matching between vision based ground features and the global localization done with a RKT-GPS system. No control of the steering wheel was presented in this work. We, in the presented work, have to set the camera on the front of the vehicle since we are controlling the steering wheel in an Ackermann model vehicle. The mentioned approach could be used in our case to get the information from the visual marks painted on the road but not to guide the vehicle.
The presented algorithm detects both the line to be followed by the vehicle, as well as visual marks painted on the road. The visual marks provide a coded information associated to forward path properties like curvature, maximum allowed speed, etc., which is used by the controller to anticipate changes and react faster.
Two kinds of paint were used for the line and the visual marks. Due to their different pigments, the line is seen as blue while the marks seen as yellow in the images captured by the camera under illumination with UV light. The rest of the image remains black, as depicted in
Figure 4.
The visual algorithm has been designed with a special focus on robustness and thus being able to detect fragmented lines due to occlusions or small irregularities on the road. The full system has been tested under different weather conditions including sunny and cloudy days as well as sparkling days. To evaluate the robustness of the system it is important to know the exactly speed at which the system is able to see every single cm of the track by the camera installed. To know this value it has to be taken into account that the frame rate of the camera is equal to 29 fps and the size of the camera system (the metallic box) is cm. The distance covered by the system is equal to 29 fps cm cm/s, and 870 cm/s are equal to km/h. That means that at this speed the captured frames cover all the track without losing any single cm of the road. When the speed of the vehicle is higher than km/h the system will lose some distance covered by the vehicle in between each frame captured. In the case of 40 km/h, which is equal to 1131 cm/s, the system cover a distance of 39 cm per frame, and for 50 km/h ( cm/s) the system cover a distance of 47 cm per frame. That means that the system can not see 9 cm and 17 cm in between each frame for the speed of 40 and 50 km/h respectively. Despite this limitation of the vision system the vehicle was able to detect the line and visual marks covering successfully long distance at different speeds. The computer vision algorithm has two different parts which will be described in the following sections.
4.1. Line Detection
This first part of the visual algorithm processes the current acquired image to obtain information about the line to be followed by the vehicle. If there is a line in the analyzed image, the line angle and distance are determined with respect to the image center.
The first step for the line detection is color segmentation on YUV space that exploits the blue appearance of the line in the image. Some other color spaces were tested, but YUV provided better results under different light conditions. A rectangular prism inside the YUV space is defined so that only the pixel values inside this volume are considered part of the line. The output of this first step is a binary image in which only the line pixels are set. This method proved to be robust in detecting lines of different blue tones and brightnesses.
To reduced the noise, a second step is performed. In the binary image, every 8-connected pixel group is marked as a blob. Blobs having an area outside a defined range are discarded. Then, for every survivor the centroid, the dominant direction and the maximal length are computed. Those blobs with a too short maximal length are ignored. The remaining blobs are clustered according to proximity and parallelism, so each cluster becomes a candidate line. The centroid and dominant direction of each candidate line are calculated from the weighted sum of the features of its component blobs, where the weight of each blob is proportional to its relative area. In this way the algorithm can accurately detect lines that are fragmented because of the aging of the paint.
The last step consists of the choice of the winning detected line from the whole set of candidate lines. The decision is achieved by using temporal information between the current and the previous frame, i.e., the candidate closer to the last frame winner in terms of centroid distances will be selected as the current frame winner. This rejects false positives because of old line traces along the circuit. In the case that all candidates are far enough from the last frame winner, a bifurcation is assumed and the winner will be the leftmost or rightmost candidate, depending on the information associated to the last detected visual mark.
4.2. Mark Detection and Decoding
The second part of the computer vision algorithm includes the detection and decoding of visual marks painted on the road next to the line to follow. The visual marks are detected and decoded even when they appear rotated in the image as a result of vehicle turns.
Each visual mark is labeled through a unique identifier that represents a binary-encoded number, where mark bits are drawn as bars parallel to the line. Because of the reduced visual field of the camera, instead of painting a bar for each bit like in common barcodes where the width of the bar depends on the bit’s value, a more compact encoding scheme was chosen. All bits are bars with the same width with no spacing between them. When a bit is one, the bar is painted; when it is zero, the bar space is left unpainted. In the image a mark appears as a set of yellow rectangles. Every painted rectangle is a bit with value one.
A start bar is added at the furthest bit slot from the line to designate the beginning of the mark. The mark specification also defines the number of bits per mark, a fixed bit width, a minimum bit length, and valid ranges for line-mark angle and line-to-start-bit distance. According to the specification, the algorithm will only detect marks that are placed on the right of the line in the direction of motion of the vehicle.
Similarly to the line detection phase, the mark detection algorithm follows a set of steps. First of all, the acquired image is segmented by color in YUV space, extracting the potential mark pixels. The color space boundaries of this segmentation are set so that it admits several yellow tones, ranging from tangerine yellow to bright green, as seen in tests with multiple paints. This makes the color-based filter less restrictive and avoids false negatives. As the probability that any yellow noise present in the image has a valid mark structure is low, the following steps of the algorithm seek for this valid structure in the color-segmented pixels to reduce the false positives.
After the color segmentation, the resulting eight-connected pixels are grouped in blobs. The blobs that do not meet the following criteria are considered as noise and are discarded: the blobs must appear at the right of the line, the blob area must be in a valid range (computed from the visual mark specification), the angular distance between the dominant blob direction and the line must be in the specified range, and the blob length in the dominant direction must be larger than the minimum bit length.
The blobs that pass the filters correspond to a set of bits with value 1. A pattern matching determines the specific ordinal number of bits for each blob. Assuming each mark has a total of
N bits (including the start bit),
N pattern tests are carried out for each blob, one test for each bit in the range
. For every bit
i, the pattern
is a rectangle with the direction and length of the blob
B and a width
a equal to
i times the bit width in the specification (in pixels). The ordinal number of bits for the blob
B will be the value
i that minimizes the cost function present in the Equation (
1).
where
and
is the area in pixels inside the shape
S.
Function f indicates how much the pattern covers the blob. Function g evaluates the similarity between the blob and pattern areas. Patterns whose f or g are above a threshold are discarded. This forces the best solution to have a minimum quality in both indicators. Then, the minimization process favors patterns that cover the blob while having a similar size. The threshold for g is and is a normalized version that stays in , like f does.
After assigning a number of ones to all processed blobs, the rightmost one is interpreted as the start bit. If its distance to the line is in the range allowed by the specification, the mark remains in the detection process, otherwise it is ignored. The mark’s dominant direction is computed as the average of all its blobs. The orthogonal vector to this direction defines a baseline that is divided into N consecutive fixed-width bit slots, starting from the start bit. All bit slots are set to an initial value of zero. Then, blob centroids are projected on the baseline and each projection falls into one of the slots, an then filled with a one. Adjacent slots are also filled with ones according to the blob’s number of bits. Finally, the slot values define the binary-encoded mark identifier whose least significant bit is the closest to the start bit.
The final detection step of the visual mark identifier that is going to be passed to the control system is elected with a two different voting process. The first voting process evaluates the detected visual mark id in each frame. Working under the assumption that the code is always located on the right of the line, this part of the image is divided in nine horizontal sections. In each of these section the system tries to identify a visual mark. The final result comes from the most detected code. An example of this voting process is shown in the
Figure 5.
The second voting process evaluates the code detection among the last
M frames. Large values of
M produce higher detection delays but increase detection robustness, as more image samples are taken into account. In our experiments,
gave good results. Besides the detected mark identifier, the algorithm provides an estimation of the number of frames and time lapse since the mark was last seen. This information is especially useful at high speeds and high values of
M, when the decision is delayed until
M frames have been captured, but the mark was only seen on the first few frames. In addition, an estimation of the mark quality is given based on its average luminance.
Figure 4 shows the detection of the line and the mark which represent the number 19.
Once the mark detection stage provides a visual mark identifier, the decoding stage starts. The information encoded in these marks is the current location of the marks on the track, the size and the curvature of the following section of the track, and the maximum permitted speed on it.
The detection of a mark is always checked with a database of available marks, avoiding false detections that could localize the vehicle in another section of the track.
5. Lateral Control System: A Vision-Based Fuzzy-Logic Controller
The steering control system of the vehicle includes three additive components: The first one is a fuzzy-logic feedback controller that acts with a behavior equivalent to a PD controller. The second one is the weighted integral of the error (distance between the line reference and the measured line position). The third component is the steering offset that acts like a feedforward controller, by changing the operating point of the fuzzy-logic controller to improve its performance, based on the track information given by the detected mark. All the three components are added at the end of the control loop to generate the output of the control system, making a structure of
, as shown in
Figure 6.
The objective of this work is to develop a controller for a track with small radius curves. In such conditions the speed of the car is not allowed to be very high (less than 50 km/h).
From several real experiments with the vehicle, the authors can confirm that it is practically impossible for a human pilot using just the information received from the down-looking camera to drive faster than 10 km/h while keeping the line-following error low enough to meet the requirements of the application. This is because the pilot only sees 39 cm ahead, and, at that speed, the contents of this area change completely every s.
The first and main component, the fuzzy-logic feedback controller was implemented using a software called MOFS (
Miguel Olivares’ Fuzzy Software). This C++ library has been previously successfully used to implement fuzzy-logic control systems with other kind of robotic platforms such as an unmanned helicopter for autonomous landing [
43] or quadrotors for avoiding collisions [
44]. Thanks to this software, a fuzzy controller can be defined by specifying the desired number of inputs, the type of membership functions, the defuzzification model and the inference operator. In [
43], a more detailed explanation of this software is provided.
The fuzzy-logic controller designed with PD-like behavior has two inputs and one output. Triangular membership functions are used for the inputs and the output. The first input is called the error and is the difference between the line reference and the measured line position in pixels, with respect to the center of the image (
Figure 7a). The second input is the derivative of this error (
Figure 7b). The output of the controller is the absolute turn of the steering wheel in degrees to correct this error (
Figure 7c).
The rule base of the presented fuzzy control component is formed by 49 if-then rules. Heuristic information has been used to define the output of each rule as well as for the definition of the range and set of each variable. The developed fuzzy system is a Mamdani-type controller that uses a height defuzzification model with the product inference model as described in Equation (
2).
Herein N and M represent the number of input variables and the total number of rules, respectively. denotes the membership function of the l-th rule for the i-th input variable. represent the output of the l-th rule.
To tune the fuzzy-logic controller, a driving session performed by a human driver at 10 km/h provided the necessary training data to modify the initial base of rules of the controller and the size of the fuzzy sets of its variables. For the definition of the initial fuzzy sets, a heuristic method was used based on the extraction of statistical measures from the training data. For the initial base of rules, a supervised learning algorithm implemented in MOFS has been used. This algorithm evaluates the situation (value of input variables) and looks for the rules that are involved in it (active rules). Then, according to the steering command given by the human driver, the weights of these rules are changed. Each time the output of an active rule coincides with the human command its weight will be increased. Otherwise, when the output differs from the human command its weight will be decreased by a constant. Anytime the weight of a rule becomes negative the system sets the output of the rule to the one given by the human driver.
Since the velocity of the vehicle is not included in the fuzzy controller, this is taken into account by multiplying the output of the fuzzy-logic controller by , where v is the current velocity of the vehicle. The definition of the numerator value of this factor is based on the velocity, in km/h, obtained during a driving session with a skilled human driver, in which data was acquired to tune the rule base of the fuzzy controller.
The second component of the presented control system is the weighted integral of the error. The objective of this component is to ensure that the error converges to zero in every kind of track. The output of this component follows Equation (
3).
Herein e is the current error between the center of the line and the center of the image, t is the frame rate, and is a constant that appropriately weights the effect of the integrator. In the presented control approach this constant has a value equal to .
Finally, the third component of the lateral control system is a steering offset component. It behaves like a feedforward controller that offsets the effect of the change of the curvature of the circuit in every different track, updated each time that a new mark has been detected. It is theoretically calculated using the equations of the Frenet-frame kinematic model of a car-like mobile robot [
45]. More detailed information about this control component can be found in [
46].