0% found this document useful (0 votes)
12 views

Development of A Random Forest Based Algorithm Fo - 2024 - Expert Systems With A

Road damages, such as potholes and rutting, must be addressed in their early stages to prevent accidents and minimize maintenance costs. The presence of these damages not only causes discomfort to passengers but also accelerates vehicle deterioration. Improper material mixtures and inadequate maintenance practices often contribute to road damage. In recent years, numerous machine learning (ML) algorithms have been developed with a predominant focus on binary classification, specifically targetin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Development of A Random Forest Based Algorithm Fo - 2024 - Expert Systems With A

Road damages, such as potholes and rutting, must be addressed in their early stages to prevent accidents and minimize maintenance costs. The presence of these damages not only causes discomfort to passengers but also accelerates vehicle deterioration. Improper material mixtures and inadequate maintenance practices often contribute to road damage. In recent years, numerous machine learning (ML) algorithms have been developed with a predominant focus on binary classification, specifically targetin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Expert Systems With Applications 251 (2024) 123940

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

Development of a random forest based algorithm for road health monitoring


Revanth Dugalam ∗, Guru Prakash
Department of Civil Engineering, IIT Indore, Indore, MP, 453552, India

ARTICLE INFO ABSTRACT

Keywords: Road damages, such as potholes and rutting, must be addressed in their early stages to prevent accidents
Road health monitoring and minimize maintenance costs. The presence of these damages not only causes discomfort to passengers
Random forest algorithm but also accelerates vehicle deterioration. Improper material mixtures and inadequate maintenance practices
Road damages
often contribute to road damage. In recent years, numerous machine learning (ML) algorithms have been
Vibration data
developed with a predominant focus on binary classification, specifically targeting potholes. However, roads
Repair cost estimation
are susceptible to various damages beyond potholes, necessitating a comprehensive approach for effective
solutions. This study introduces a novel algorithm utilizing a random forest (RF) classifier for multi-class
classification of road damages. The proposed algorithm is validated through rigorous simulation and field
studies. Unlike previous models, this approach embraces a comprehensive perspective by considering a
wider range of damages, thereby facilitating a more nuanced and inclusive process for the classification
of road damages. In the field study, data has been collected using a set of uni-axial accelerometers and
a smartphone camera mounted on a test vehicle. The data has been processed using the sliding window
technique, and features have been extracted from each window to train the RF classifier. In this process,
an optimal window size has been determined and employed to enhance the effectiveness of feature extraction
for training the RF classifier. The proposed algorithm demonstrates significant accuracy in both simulation
and field studies, with notable performance in identifying and classifying road damages. The algorithm’s
outcomes are utilized to estimate a detailed cost for repairing the identified damages. This paper showcases
proficiency in addressing various damages, offering valuable insights for the implementation of cost-effective
road maintenance strategies.

1. Introduction unrealistic to expect a road with zero damage, as aging is inevitable.


However, the early detection and rectification of these damages are
A nation’s development hinges on its transportation system, with crucial for effective maintenance.
many citizens relying on roads as their primary mode of transportation. Therefore, researchers are actively exploring methods to identify
Consequently, roads play a pivotal role in the country’s economic road damages early, with the aim of expediting the repair process and
growth and development. Moreover, they act as vital links connecting preventing further deterioration. Traditionally, visual inspection has
essential services such as employment, healthcare, and education — served as the primary means of locating and categorizing road dam-
fundamental needs of human beings. In essence, roads form a crucial ages. However, for extensive road stretches, visual inspections can be
component of civil infrastructure and require regular maintenance. time-consuming and demand considerable labor. Moreover, redirecting
Therefore, the inspection of these roads is imperative for optimal traffic during these inspections may lead to congestion or blockages.
performance and to mitigate the escalation of damages.
Consequently, there is a critical need for the development of technology
There are different types of road damages, including cracking,
capable of swiftly detecting road damages without inconveniencing
potholes, rutting, and other depressions. These damages typically result
road users and minimizing inspection duration. In recent years, these
from factors such as improper material mix, the use of low-quality
challenges have been addressed by applying machine learning (ML) and
materials during construction, and settlement of the base or subgrade
deep learning (DL) techniques. Some of the most commonly used mod-
during operation. Various reasons can contribute to road damages
during construction, often influenced by the person or official in charge. els include support vector machine (SVM), logistic regression, k-nearest
Aging of roads, abnormal vehicular loads, and inadequate maintenance neighbors (KNN), Naive Bayes, and neural networks (NN) (Dugalam &
further contribute to the formation of damages during operation. It is Prakash, 2024; Kumar, Kalita, Singh, et al., 2020; Maeda, Sekimoto,

∗ Corresponding author.
E-mail addresses: [email protected] (R. Dugalam), [email protected] (G. Prakash).

https://doi.org/10.1016/j.eswa.2024.123940
Received 8 September 2023; Received in revised form 8 March 2024; Accepted 6 April 2024
Available online 16 April 2024
0957-4174/© 2024 Elsevier Ltd. All rights reserved.
R. Dugalam and G. Prakash Expert Systems With Applications 251 (2024) 123940

Seto, Kashiyama, & Omata, 2018). These techniques have been applied of damages on the road, DSW has been applied to the acceleration read-
to recognize road damage patterns by collecting image and vibration ings to qualitatively detect potholes. Comparing the results with other
data from extensive road networks. preprocessing methods available in the literature, they concluded that
In vision-based methods, data from roads is collected using cameras DSW yielded the most effective results in detecting potholes. Pandey
or any optical instrument that produces two-dimensional (2D) images. et al. (2022) collected acceleration data from different roads using an
Some previous researchers (Atencio, Plaza-Muñoz, Muñoz-La Rivera, iPhone operating system (IOS) smartphone. Binary classification was
& Lozano-Galant, 2022; Lakmal & Dissanayake, 2020; Lopes Ama- performed using a convolutional neural network (CNN) for the col-
ral Loures & Rezazadeh Azar, 2023) conducted studies using image data lected data. Anaissi, Khoa, Rakotoarivelo, Alamdari, and Wang (2019)
captured from damaged roads. They proposed computer vision-based developed a virtual road network inspector (VRNI) with a false rate
solutions where the data were collected using aerial and participatory of 4% for the detection of potholes. While many researchers have
approaches. Later, the captured images were processed using deep primarily focused on considering potholes as the sole road damages
neural networks to detect potholes, corrugations containing ripples, that need rectification, there are other damages on roads that need to be
rutting, etc. Ayman and Fakhr (2023) reviewed articles related to addressed to ensure proper maintenance, and these often go undetected
computer vision-based models and outlined research gaps, the latest and unclassified.
technologies, merits, and demerits related to pavement health monitor- Egaji, Evans, Griffiths, and Islas (2021) compared the performance
ing. In another study, Zhang and Hamdulla (2022), reviewed existing of ML models for pothole detection using a vibration-based method
studies on pothole detection that use vibrations, computer vision, and based on smartphone acceleration data. After preprocessing the data
3-D methods. Li, Yuan, Liu, and Cai (2016) proposed a vision-based with the moving window technique, binary classification was car-
method by combining 2D images with ground-penetrating radar (GPR) ried out. Their findings suggested that K-nearest neighbors (KNN)
to detect potholes. Through this fusion, they were able to find the and RF models were effective for pothole detection. Bansal, Mittal,
position and dimensions of the pothole. Ouma and Hahn (2017) studied Ahuja, Singh, and Gill (2020) conducted classification using various
pothole detection in asphalt pavements in urban areas using a vision- ML models and proposed RF as the most suitable model for the bi-
based approach. The fuzzy C-means algorithm (FCM) was used to nary classification of potholes. Bhatlawande, Deshpande, Deshpande,
cluster defective and non-defective images of pavements. This vision- and Shilaskar (2022) developed a vision-based method for pothole
based approach has demonstrated remarkable accuracy in detecting classification by collecting images from existing literature and found
and extracting potholes with minimal errors. Ryu, Kim, and Kim (2015) that the RF classifier performed better for binary classification. Borlea,
investigated the detection of potholes on both asphalt and concrete Precup, and Borlea (2022) conducted studies to enhance the quality of
pavements using a vision-based method. They collected 2D images clusters in the supervised KNN algorithm without reducing processing
via a survey vehicle in Korea and developed a system that included time. Protic and Stankovic (2023) proposed an XOR-based detector to
an optical device and a pothole detection algorithm. The claim is identify anomalies in computer network traffic, explaining that the only
that their system outperforms in the detection of damage and alerting way to reduce the size of datasets is by converting them into features.
drivers. However, it is worth noting that vision-based methods often For this purpose, they employed four binary classifiers: KNN, deci-
require expensive equipment and can be insensitive to depth (Kamal sion tree, feedforward neural networks, and SVM. Zhao et al. (2022)
et al., 2018). Hence, this study adopts a vibration-based method for designed an approach using particle clustering-deep Q-network (PC-
the detection and classification of road damages. DQN) for the localization of poisonous gas sources. They introduced
In a vibration-based method, data from the road is collected using the concept of transfer learning to reuse the trained Q-network for new
scenarios. Ghadge, Pandey, and Kalbande (2015) used the RF classifier
an accelerometer, which is commonly found in smartphones. Carlos,
to detect road bumps, allowing drivers to prepare in advance.
Aragón, González, Escalante, and Martínez (2018), Carlos, Gonzalez,
Most of the existing literature utilizing ML classifiers for road health
Wahlström, Cornejo, and Martinez (2019) proposed a novel algorithm
monitoring is only focused on pothole damage. In this paper, RF classi-
employing vibration data and an SVM classifier to detect potholes.
fier has been used for the classification of multiple road damages. The
The features used in this study were derived from Mednis, Strazdins,
RF, known for its proficiency in handling diverse patterns and complex
Zviedris, Kanonirs, and Selavo (2011) based on standard deviation
datasets, emerges as a powerful and versatile classifier for comprehen-
analysis. They created an online platform for vibration data, assembling
sive road damage classification. The primary motivation behind our
30 datasets collected from roads with potholes and bumps. Changes
study is to extend the application of RF beyond pothole classification,
in acceleration, easily measured through the standard deviation of
encompassing a wide array of road damages. A detailed comparison
the readings, were highlighted. They later extended their study to
of the proposed classifier’s main advantages with the limitations of
extract physical properties of road damages by analyzing the same
existing algorithms is presented in Table 1. This table provides a com-
acceleration and gyroscope data. Succeeding with a dataset comprising
prehensive overview, categorizing existing methods into Vision-Based
163 potholes, they achieved an RMS error of 1.68 cm for pothole
and Vibration-Based approaches. The comparative analysis underscores
depth prediction. Fox, Kumar, Chen, and Bai (2017) developed a system
the advancements and versatility of the proposed classifier, addressing
for the detection and localization of potholes using acceleration data
drawbacks associated with existing methods and making it a robust
on multilane roads, addressing issues like GPS position error, sensor
solution for road damage assessment. The results of the proposed
noise, and sensor mobility. They compared their crowdsourced system
classifier not only facilitate efficient road damage identification but also
results for multilane roads with those for single-lane roads. Some
expedite the cost estimation process, enhancing overall road mainte-
researchers (Chen, Lu, Fan, Wei, & Wu, 2011; Wickramarathne, Garg, &
nance efforts by governmental and relevant organizations. The main
Bauer, 2018) conducted experiments with micro-electrical mechanical
contributions of this paper are as follows:
system (MEMS) accelerometers to collect data from roads. Nowadays,
MEMS accelerometers, such as ADXL345 and LIS33DE, are commonly (i) A novel approach utilizing a RF classifier has been developed for
used as inexpensive sensors. Considering these sensors aims to develop multi-class classification of road damages.
a low-cost, robust anomaly detection system. Previous researchers, in- (ii) Simulation and field studies have been conducted to assess the
cluding (Agrawal, Gupta, Sharma, & Singh, 2021; Bosi, Ferrera, Brevi, & performance of the proposed RF algorithm.
Pastrone, 2019; Kaushik & Kalyan, 2022; Pandey et al., 2021; Silvister (iii) The proposed classifier has been compared with other clas-
et al., 2019), collected data using the built-in accelerometers in smart- sifiers, including SVM, KNN, Naive Bayes, and decision tree.
phones. Chibani, Sebbak, Cherifi, and Belmessous (2022) proposed a Notably, the results demonstrate that the proposed classifier
novel technique for the effective preprocessing of acceleration readings outperforms them, making it an effective choice for road damage
using a dynamic sliding window (DSW). Due to the irregular occurrence classification.

2
R. Dugalam and G. Prakash Expert Systems With Applications 251 (2024) 123940

Table 1
Limitations of existing methods compared to the proposed approach.
Method Ref. Limitations of existing methods Proposed approach
Zhang and Hamdulla (2022) Lack of cost estimation Detailed repair cost estimation provided.
Vision-Based

Li et al. (2016) Expensive equipment, insensitivity to Cost-effective equipment, enables depth


road damage depth analysis of road damages.
Ouma and Hahn (2017) Sole focus on potholes Consideration of multiple road damages.
Ryu et al. (2015) Limited to binary classification Multi-class classification approach.
Carlos et al. (2018) Sole use of SVM algorithm Utilization of multiple algorithms with
Vibration-Based

comparative analysis.
Fox et al. (2017) Binary classification only Implementation of multi-class classification.
Chibani et al. (2022) Dependency on dynamic sliding Integration of sliding window technique
window technique with optimized window size.
Pandey et al. (2022) Restricted to binary classification Implementation of multi-class classification.

Fig. 1. Quarter car model.

(iv) Road repair costs were estimated using the results obtained from the road. This knowledge is leveraged to understand vehicle responses
the proposed classifier. concerning both damaged and undamaged roads. By collecting and
analyzing these responses, the detection of road damages becomes pos-
1.0.1. Organization sible. To efficiently analyze these large response datasets, the machine
The rest of the paper is organized as follows : Section 2 provides learning technique, RF classifier, has been adopted for this study. The
background information, elucidating the road–vehicle interaction and performance of the classifier is not dependent on this quarter car model;
the operational principles of the RF method. Section 3 delves into the instead, its efficacy relies on the quality of the features extracted and
methodology employed in this study, elucidating the window tuning the classifier’s ability to generalize across diverse road conditions.
process for achieving an optimal window. To enhance clarity and
facilitate implementation understanding, this section also includes the 2.1. Modeling of vehicle and road interaction
pseudo-code for the proposed RF classifier, shedding light on its algo-
rithmic structure and the steps involved in the multi-class classification In this study, the quarter car model has been considered to establish
of road damages. The simulation study is detailed in Section 4, suc- contact between the road and vehicle, as shown in Fig. 1 where 𝑚2
ceeded by the field study in Section 5. In addition to this, Section 6 represents the quarter mass of the car, and 𝑚1 is the mass of wheel
has been included with the comparative analysis of the proposed (𝑚2 ≫ 𝑚1 ). 𝐤𝑠 , 𝐤𝑤 are the stiffness matrices of the car and wheel, and b
classifier with other existing classifiers. Finally, Section 7 concludes is the damping coefficient. 𝐱, 𝐲, 𝐫 are the displacements of the masses
the paper, encompassing future study prospects and acknowledging the 𝑚1 , 𝑚2 and the road that changes with respect to time. Fig. 1(c) shows
limitations of the proposed work. the free body diagram of the model.
The forces acting on the bodies include the following:
2. Background
𝐅1 = 𝐤𝑠 (𝐲 − 𝐱) (1)
Modeling vehicle–road interaction plays a critical role in the de- 𝐅2 = 𝐛(𝐲̇ − 𝐱)
̇ (2)
sign, construction, and maintenance of civil infrastructure. It aids in
identifying and mitigating factors that impact vehicle performance on 𝐅3 = 𝐤𝑤 (𝐱 − 𝐫) (3)

3
R. Dugalam and G. Prakash Expert Systems With Applications 251 (2024) 123940

and the equations of motion can be given as: negative class as positive. Based on these, the classification report of
the model was calculated to check its performance.
𝑚1 𝐱̈ 1 = 𝐤𝑠 (𝐲 − 𝐱) + 𝐛(𝐲̇ − 𝐱)
̇ − 𝐤𝑤 (𝐱 − 𝐫) (4)
𝑚1 𝐱̈ 2 = −𝐤𝑠 (𝐲 − 𝐱) − 𝐛(𝐲̇ − 𝐱)
̇ (5) 3. Proposed methodology
Both 𝐱̈ 1 and 𝐱̈ 2 are the vibration responses that occur when the
vehicle travels on the road. By measuring these vibration response A flowchart of the proposed methodology is presented in Fig. 4. The
changes, it is possible to know the condition of the road because the main steps to execute the proposed methodology are as follows: (i) data
response changes with reference to the ‘𝐫’. The quarter car model serve collection, (ii) data pre-processing and window selection, (iii) feature
as a precursor to emphasize the foundational understanding of how extraction and labeling, (iv) training the RF model and (v) multiple
vibration responses can be indicative of road conditions. In Fig. 1(a), road damage classification.
the depicted system exhibits a smooth surface, indicating a constant The initial step of this work was data collection from the roads,
system stiffness. Conversely, in Fig. 1(b), the illustration represents an where a set of accelerometers were mounted on the test vehicle. The
uneven surface, leading to a variation in the system stiffness. It is clear data was processed by dividing into windows using the sliding window
that the vibration responses of the vehicle will also vary with respect technique, as anomalies were not occurring uniformly. A pseudocode
to the road surface conditions, emphasizing the key idea behind using detailing the implementation of the sliding window technique has been
accelerometer sensors in this study to capture these variations and ana- provided in Algorithm 1, offering a clear and structured representation
lyze road conditions. A set of accelerometers mounted inside the vehicle of the systematic approach taken to extract meaningful windows from
measured the change in vibration responses in this study. Sensors were the collected data.
strategically placed in four critical locations: the dashboard, the trunk,
and on the floor of the vehicle.

Algorithm 1 Sliding Window


2.2. Random forest technique
1: function sliding_window(iterable, size=30, step=1, fillvalue=9.81)
A Random Forest technique is an ensemble learning method that 2: if size < 0 or step < 1 then
constructs a group of decision trees during training for classification 3: raise ValueError
purposes. This study mainly focuses on the multi-class classification of 4: end if
road anomalies including potholes, speed bumps, and rutting. The size 5: it ← iter(iterable)
and shape of a road damage vary significantly across different roads, 6: q ← deque(islice(it, size), maxlen=size)
7: if not q then
and the occurrence is not uniform. Hence, a sliding window technique
8: return ⊳ empty iterable or size == 0
was utilized for feature extraction, as a single acceleration reading may
9: end if
not adequately represent road damage.
10: q.extend(fillvalue for _ in range(size - len(q))) ⊳ pad to size
In the RF classifier, a random subset windows of the training data
11: while True do
(R) is selected to train each decision tree, as shown in Fig. 2. At each
12: try:
node of the decision tree, a random subset of features is selected for
13: yield iter(q) ⊳ iter() to avoid accidental outside
splitting the data. This splitting to each decision tree is based on the
modifications
assigned randomness. An appropriate decision tree was built based
14: q.append(next(it))
on entropy and Gini impurity (GI). Both were used to measure the
15: q.extend(next(it, fillvalue) for _ in range(step - 1))
degree of purity of each feature sub-split. Here three classes have been
16: except StopIteration:
considered for the classification, each having a probability of 𝑝(𝑖). The
17: return ⊳ stop iteration if no more elements in the
entropy 𝐻(𝑋) of any split can be calculated using:
iterable

3
18: end try
𝐻(𝑋) = − 𝑝(𝑖) log2 𝑝(𝑖) (6) 19: end while
𝑖=1
20: end function ⊳ a visual representation of the sliding window is
where 1 = speed bump class ; 2 = rutting class ; 3 = pothole class. GI depicted in Fig. 15.
of the features at any tree after splitting can be calculated using :

3
GI = 1 − (𝑝𝑖 )2 (7) A set of twelve relevant features, such as mean, standard devia-
𝑖=1 tion, variance, coefficient of variance, capability potential, etc., were
Multiple decision trees are constructed using the above two steps. extracted for each window. To provide a detailed understanding of
Each tree is trained on a different subset of the training data and a the implemented features and their equations, pseudocode has been
different set of features. The output of the random forest algorithm is provided in Algorithm 2. This algorithm elucidates the extraction and
the majority vote of the predictions of all the decision trees. The class computation of the key features. The whole data was converted into
with the highest number of votes is selected as the final output. features and labels, were used to train the RF classifier. The model
parameters obtained from the training phase were used to predict
2.2.1. Metrics in classification of model the label of test data set. The Random Forest classifier, incorporating
The performance of RF classifier is validated using validation scores the proposed features, has successfully detected and classified road
of predicted values on testing dataset. The validation scores like accu- anomalies, including speed bumps, potholes, and rutting. Importantly,
racy, precision, specificity, f1-score and recall can be used to assess the if there are no damages present on the selected road segment, the
model (Miao & Zhu, 2022). These scores can be calculated using the classifier accurately predicts as undamaged. This capability enhances
metrics as shown in Fig. 3, where TP, TN, FP and FN represents true the classifier’s utility in providing precise and reliable assessments of
positive, true negative, false positive and false negative respectively. road conditions. To provide a comprehensive insight into the training
TP is the number of predictions where the classifier correctly identified process of the RF classifier with the features and corresponding labels,
the positive class of anomaly as positive. Similarly, FP represents the a pseudocode has been presented in Algorithm 3. This algorithm details
number of predictions where the classifier mistakenly identified the the steps involved in training and testing of the classifier.

4
R. Dugalam and G. Prakash Expert Systems With Applications 251 (2024) 123940

Algorithm 2 Extract Features In the pursuit of determining the optimal window size, we con-
1: function extract_features(zlist, window_size, step_size) ducted a comprehensive study, exploring window sizes ranging from
2: features ← [] 10 to 1500 observations with an interval of 10 observations and a 50%
3: positions ← [] overlap. For each window size, a RF classifier was trained, and the
4: for window in sliding_window(zlist, size=window_size, step=step_size) resulting classification outcomes were systematically compared. In the
do ∑30 context of the field study, we specifically adopted a window size of
𝑟𝑛 1250 observations for the RF classifier, as this particular window size
5: mean (𝜇) ← calculate_mean(window) ⊳ 𝑛=1
𝑛 demonstrated notable performance during the comprehensive study.
6: var (𝜃) ← calculate_variance(window) ⊳ (𝜎)2
∑30 The selection of an appropriate window size depends on factors such
(𝑟 𝑖 − 𝜇)2
7: stdev (𝜎) ← calculate_standard_deviation(window) ⊳ 𝑛=1 , as road conditions, the occurrence time of anomalies, and the sampling
𝑛−1
where r = number of observations in a window frequency of the data. This approach ensures a tailored and effective
𝜎
8: cv (𝛼) ← calculate_coefficient_variation(mean, stdev) ⊳ choice of window size based on the specific characteristics of the road
𝜇
9: cp (𝛽) ← evaluate_cp(window) ⊳ Threshold value = 𝑔 ± 0.3𝑔, if and the data under analysis.
observed acceleration is > threshold 𝛽 = 0.6 else 0.3.
10: cvc (𝜏) ← evaluate_cvc(window) ⊳ If 𝜎 > 0.15𝑔, then the value is
3.1. Application of proposed classifier in real life road maintenance
0.9 else 0.2.
11: diff (𝛿) ← calculate_difference(window) ⊳ Difference of maximum
and minimum values in each window. The block diagram (Fig. 5) illustrates a ‘Road Anomaly Classification
12: threshold (𝜈) ← evaluate_threshold(window) ⊳ If 𝜃 > 0.15𝑔, then and Repair Cost Estimation System’, demonstrating the integration
the value is 0.8 else 0.2. of both hardware and software components. The hardware section
13: varc (𝜂) ← evaluate_varc(window) ⊳ 𝜈, 𝛼, 𝜏, and 𝛽. If any three of comprises an accelerometer sensor and a camera, collecting real-time
these exceed their limits, then the value of 𝜂 is 0.8 else 0.2. data as the test vehicle moves along the selected road segment. This
14: cont (𝜒) ← evaluate_cont(window) ⊳ If 𝛼 > 0.015, then the value collected data then flows into the software section, which houses the
of 𝜒 is 0.8 else 0.2.
proposed trained Random Forest (RF) classifier. Using the features
15: sc (𝜖) ← calculate_score(varc, threshold, cvc, cp, cont) ⊳
extracted, the trained RF classifier predicts the types and quantities of
𝜈+𝜏 +𝜒 +𝛽+𝜂
16: cs (𝛾) ← evaluate_cs(sc) ⊳ If 𝜖 ≥ 2, then 𝛾 = 0.8 else 0.2.
road damages.
17: features.append([mean, var, stdev, cv, cp, cvc, diff, threshold, varc, A Graphical User Interface (GUI) will be developed to interact with
cont, sc, cs]) the system. It will include options for various types of repairs and
18: ⊳ Positions their current repair costs, displaying prompts to input quantities for
19: positions.append(len(window) / 2) each type of damage. Users can choose the repair type needed, and the
20: end for system will calculate a rough estimation of the total maintenance cost
21: return features, positions required for the selected road segment.
22: end function

Algorithm 3 Random Forest Classifier 4. Simulation study


1: Initialize empty lists: accuracy, precision, recall, f1score
2: for 𝑖 in range(10, 1500, 10) do A simulation study has been conducted to investigate the perfor-
3: 𝑥, 𝑌 ← extract_features(acceleration data, 𝑖, 2𝑖 ) mance of RF classifier. A flowchart of same is shown in Fig. 6. This
4: 𝑌 𝑦 ← [] simulation study has three major parts, (i) data generation, which
5: for 𝑖 in 𝑌 do resembles acceleration signals of road anomalies, (ii) slicing of data
6: 𝑌 𝑦.append(labels[int(𝑖)]) into windows and then extraction of features for each window, labeling
7: end for according to the signals, and (iii) training and testing of these generated
8: 𝑦 ← np.array(𝑌 𝑦) signals using RF model for the classification of road anomalies. The
9: 𝑥train , 𝑥test , 𝑦train , 𝑦test ← train_test_split(𝑥, 𝑦, random_state = generated data has been sliced with a fixed window size and various
0, test_size = 0.2) features such as mean, kurtosis, standard deviation and variance have
10: Function initialize_rf_classifier(): been extracted. The extracted features has been used to train the RF
11: ⊳ ... Other initialization steps model. Next, the various steps of simulation has been described in
12: detail.
13: function calculate_entropy(probabilities)

14: return − 3𝑖=1 𝑝𝑖 log2 𝑝𝑖 4.1. Data generation
15: end function
16: The normally distributed acceleration sample signals are generated
17: entropy ← calculate_entropy([p1, p2, p3]) using a Gaussian distribution with mean (𝜇) and standard deviation
18: (𝜎) values. These signals were generated following Eq. (8), where 𝜇
19: function calculate_gini_index(probabilities) represents the mean and 𝜎 represents the standard deviation.

20: return 1 − 3𝑖=1 (𝑝𝑖 )2
21: end function 𝑦 = 𝜇 + 𝜎 × np.random.randn(. . . ) (8)
22:
The value of 𝜇 and 𝜎 was chosen according to the road condition.
23: gini_index ← calculate_gini_index([p1, p2, p3])
To represent three distinct classes of road damage, simulations were
24: 𝑚𝑜𝑑𝑒𝑙.fit(𝑥train , 𝑦train )
conducted with 𝜎 values of 0.5, 2, and 8, while keeping the mean
25: 𝑦pred ← 𝑚𝑜𝑑𝑒𝑙.predict(𝑥test )
value (𝜇) constant at 10 for all three damage types. The resulting
26: Append accuracy_score(𝑦test , 𝑦pred ) to accuracy
acceleration signals were then labeled as class1, class2, and class3,
27: Append precision_score(𝑦test , 𝑦pred , average=None) to precision
28: Append recall_score(𝑦test , 𝑦pred , average=None) to recall respectively. For each class, ten thousand data points were simulated
29: Append f1_score(𝑦test , 𝑦pred , average=None) to f1score and concatenated, generating comprehensive datasets representative
30: end for
of various road damage scenarios. The overall synthesized signals are
visualized in Fig. 7.

5
R. Dugalam and G. Prakash Expert Systems With Applications 251 (2024) 123940

Fig. 2. Working principle of random forest classifier.

Fig. 3. Confusion matrix in multi class classification.

4.2. Feature extraction Table 2


Performance of the RF classifier.
Class Precision Recall F1-score Specificity Support
Now, the sliding window technique was applied with a window size
of 50 observations. For each window, features including mean, kurtosis, 1 0.967 1 0.983 0.99 59
2 0.967 0.956 0.963 0.981 69
standard deviation, and variance were extracted, and corresponding
3 0.98 0.961 0.971 0.992 52
labels were assigned. The extracted features, as illustrated in Fig. 8, Accuracy 0.97 180
provide a comprehensive overview of the dataset’s characteristics. Each macro avg. 0.971 0.972 0.972 0.987 180
data point in the plot represents a specific window of observations, and weighted avg. 0.949 0.964 0.954 0.967 180
the features for that window are depicted through the corresponding
values on the 𝑦-axis. This visualization allows for a detailed examina-
tion of how these features vary across different sections of the simulated to it. The final prediction was determined by the majority vote of all
road damage signals. Moving forward, the dataset was divided into decision trees. A total of 180 windows were considered for evaluation,
training and test sets with a split ratio of 70% for training and 30% comprising 59 smooth, 69 rutting, and 52 potholes. The true positives
for testing. Fig. 7 illustrates the concatenation of simulated data for (TP) for smooth, rutting, and potholes were classified as 59, 66, and 50,
each damage type, with damages randomly ordered to resemble real respectively. The overall accuracy of this model was 97%. Out of the
road conditions. total 180 damages in the testing data, the model accurately classified
175 damages. Only the classes 2 and 3 were wrongly classified into
4.3. RF model other classes.
Table 2 illustrates the performance of the RF classifier in detecting
To train the random forest model, 100 estimators were considered. different road anomalies, including smooth surfaces (class 1), rutting
All the features extracted from the windows, as described earlier, were (class 2), and potholes (class 3). Notably, when comparing class 1
randomly assigned at each node of the decision trees. Multiple decision to class 2 and class 3, superior values are observed for class 1. This
trees were constructed, with each tree trained on a different set of discrepancy may be attributed to the smaller feature values associated
acceleration data and its corresponding features. The random forest with class 1. The acceleration values for smooth surfaces are expected
(RF) model was then trained using these multiple decision trees. For to be approximately equal to the acceleration due to gravity, unless
testing the model, the 30% test data was fed into the trained RF model affected by noise. The consistency in acceleration values makes it easier
for damage prediction. Features were extracted from each window of for the model to identify features specific to smooth surface conditions,
the test data and allotted to the trained decision trees. Each decision resulting in a recall of 1 for class 1, indicating that the model correctly
tree independently predicted an output based on the features assigned predicted all positives for smooth surfaces. Conversely, class 2 and

6
R. Dugalam and G. Prakash Expert Systems With Applications 251 (2024) 123940

Table 3
Length of roads in India.
Type of road Km
NH 1,51,000
SH 1,86,528
District roads 6,32,154
Rural roads 45,35,511

5.1. Data collection

For the selected road segment, vibration data have been collected
in a sunny daytime using a uni-axial accelerometer type (4534−𝐵 −002)
of Hottinger Bruel and kjaer with a sensitivity of 50 mV/ms−2 . The
main benefit of using an accelerometer is that road damage causes
acceleration to rise or fall as shown in Fig. 11. Therefore, mounting
an accelerometer on a moving vehicle makes it possible to detect road
anomalies after analyzing the data. Four accelerometer and a mobile
camera has been mounted on a test vehicle — Maruti Suzuki Ertiga.
Table 4 provides detailed information about the car and accelerometer
sensor.
Fig. 12 shows the four mounted accelerometer inside the test vehi-
cle. These locations are on the left and right rear wheels, dashboard,
and trunk. As the vehicle approaches any road anomaly, the front
wheels will first make contact, and then the dashboard will begin
vibrating, similarly to the rear wheels and trunk with a time difference.
The dashboard and trunk are firmly fixed to the car to prevent noise in
acceleration data, while the vehicle is moving. The sampling frequency
was set to be 1000 Hz, and data were collected at an average speed of
28 km/hr. The accelerometer’s axes have been aligned to correspond
Fig. 4. A flowchart detailing the sequential steps for the process of multiple road
anomaly classification. with the car’s Z-axis, which is perpendicular to the road. The recorded
acceleration data, illustrated in Fig. 13, specifically represents the Z-
axis acceleration of the dashboard. In this visualization, the x-axis
represents time in seconds, and the y-axis signifies acceleration in 𝑚∕𝑠2 ,
class 3 exhibit slight discrepancies, leading to misclassifications and a
reduction in recall values. In terms of F1-score, class 1 outperforms the excluding the acceleration due to gravity. The video has been recorded
other classes. Additionally, class 3 shows a notable precision value of continuously, along with accelerometer. The smartphone camera, cru-
0.98, attributed to fewer false positives compared to the other classes. cial for recording the video, was securely positioned on the dashboard
The macro average of the model has been calculated by averaging each using a mobile holder, as illustrated in Fig. 14. Additionally, the setup
metric, while the weighted average was determined by multiplying the included a laptop displaying real-time readings of the accelerometer
percentage of each anomaly in the test data by their corresponding data, facilitating the monitoring of acceleration patterns throughout
metrics predicted by the classifier. These two values serve as key the field study. This comprehensive setup ensured precise alignment
indicators for understanding the overall performance of the model. The between video footage and corresponding accelerometer data, con-
proposed algorithm demonstrates effectiveness in the comprehensive tributing to the robustness of the model training process. The time
classification of simulated road anomalies. stamps of video are used to label anomalies in acceleration data, which
were subsequently used to train the RF model. A total of 11,00,000 data
5. Field study points were collected for the test road segment.

The Indian roads are mainly classified into the four categories 5.2. Data processing and feature extraction
namely, national highways (NH), state highways (SH), District roads
and rural roads. Among these roads, state highways (SH) are the After analyzing the data collected from the four accelerometers, it
backbone of transportation system which links district roads and rural
was determined that the accelerometer on the dashboard was better
roads with national highways. Hence, these SH are experiencing heavy
for labeling and showed better results. It has been observed that the
vehicular loads results in pavement damages and requires regular main-
occurrence of road anomalies is not uniform, and its length varies.
tenance. The damages present in these roads are different in nature
Hence, as a single reading of acceleration may not be able to detect
as illustrated in Fig. 9. In addition, presence of these damages on the
and classify road anomalies effectively. To address this issue, a sliding
SH causing an abrupt increase in the road accidents. Therefore, it is
essential to detect and rectify these damages as early as possible for window technique has been implemented, as illustrated in Fig. 15.
the reduction of road accidents and to increase the life span of the SH. For this technique, selecting an optimal window is crucial for the
The total length of each type of roads are presented in the Table 3. effective extraction of features. The process of determining the optimal
A field study has been planned to validate the proposed algorithm, window size is detailed below. The working flow of the sliding window
and a road segment of SH 27 near our university was selected for this technique is as follows: Firstly, it starts with the initial set of observa-
purpose. This road segment, which is 13 km in length, is located in a tions from the collected data and extracts all the proposed features.
suburban area between Tejaji Nagar and our university, as shown in Then, it moves to the next set of observations with 50% overlap and
Fig. 10. It contains several types of damage, such as potholes, rutting, extract the features. Likewise, it converts acceleration observations into
cracks, depressions, etc. The typical pothole and rutting damage has windows along with their corresponding labels. During this process,
been captured from the selected road segment and shown in Fig. 11(a) the total observations may not be exactly divide into same window
and (c). size i.e., some of the observations will not fill the whole window

7
R. Dugalam and G. Prakash Expert Systems With Applications 251 (2024) 123940

Fig. 5. Block diagram showing real life application of the present approach.

Fig. 6. Flowchart of simulation study.

Fig. 7. Concatenated simulation data.

size. In this case, these observations are concatenated with smooth different features of the trained model for predicting road damages are
condition acceleration signals (g = 9.81 m/s2 ) and then features have shown in Fig. 16. Other features like median, peak value, skewness (for
been extracted. Observations in the window are represented by ‘r.’ the randomness of the distribution), percentiles, quartiles, and outliers
It has been necessary to assign features for the acceleration data are available. The problem is that these features only consider some of
after segmentation in windows for training the RF model. A total of the observations of the window, and most only use one observation,
12 features were used in this study and the same has been presented which is ineffective for detecting damages. These features also affect
in Table 5. Five statistical features and seven non-statistical features the performance of the model by increasing its dimensionality.
were calculated. For non-statistical features the threshold limits was Fig. 17 presents a typical plot of features such as variance, standard
set by conducting experiments on roads. The weights/importance of deviation, coefficient of variance, and mean, illustrating the change

8
R. Dugalam and G. Prakash Expert Systems With Applications 251 (2024) 123940

Fig. 8. Various features extracted from simulated data.

Fig. 9. Main classification of roads within the state.

Fig. 10. Selected road segment for the field study.

9
R. Dugalam and G. Prakash Expert Systems With Applications 251 (2024) 123940

Fig. 11. Different road damages and their respective vibration signals.

Fig. 12. Sensors placement on the test vehicle.

Table 4
Details of car and accelerometer.
Item Component Specification
Engine Type K15C Smart Hybrid
Length (mm) 4395
Width (mm) 1735
Car Wheel Base (mm) 2740
Dashboard Sculpted with metallic Teak-Wooden finish
Front suspension Mac pherson strut & coil spring
Rear suspension Torsion beam & coil spring
Company Hottinger Bruel & Kjaer
Frequency range 0.2 - 12800 Hz
Accelerometer
Sensitivity 50 mV/ms−2
Weight 8.6 gram

in feature values. Training data has been labeled in the initial stage Subsequently, the process continued with window sizes of 20, 30, 40,
using the recorded video data. Now, the extracted features and the and so on, up to 1500 observations. The Random Forest (RF) model was
corresponding labels were used to train the RF classifier. trained using features from each window size, and the resulting classifi-
cation outcomes were plotted and visualized in Fig. 18. This approach
systematically explores various window sizes to identify the one that
5.2.1. Optimal window size selection
optimally captures the characteristics of road damages for improved
The choice of an optimal window size for classification relies on classification accuracy. Fig. 18 has four parts; one is accuracy, others
the type and length of road damages. To determine the most effective are precision, recall, and f1-score. These three have been calculated
window size, a sensitivity analysis was conducted. A set of windows, separately for each anomaly prediction. Fig. 18(a) illustrated about
ranging from 10 to 1500 observations with intervals of 10, was consid- how accuracy has been changing with the window size. At 1250, it
ered. The analysis started with a window size of 10 observations, each has a high accuracy of 83.34% and decreases continuously afterwards.
with a 50% overlap, from which the proposed features were extracted. Fig. 18(b) and (c) presented how precision and recall has been changing

10
R. Dugalam and G. Prakash Expert Systems With Applications 251 (2024) 123940

Fig. 13. A typical acceleration data collected from the dashboard accelerometer.

Fig. 14. Monitoring of vibration response during field study.

Fig. 15. Feature extraction using sliding window.

for each anomaly with the window size and Fig. 18(d) showed the proposed features, as detailed in Section 5.2. Subsequently, utilizing
variation of f1-score for each anomaly. For speed bumps, the precision these features and their corresponding labels, the Random Forest (RF)
is ‘1’ at 1250 window size, while for rutting, the precision is ‘1’ within classifier was trained.
the 920 to 1250 window size range. However, potholes have a precision
of 0.88 at 820 size of the window. Similarly recall values also changing 5.3. Results and discussions
for each class, with respect to window size and showed a value of 1 for
class-2 in the range of 1000–1500 window size. Recall value for class-3 The model has subsequently been trained utilizing the extracted
showing maximum at 1250–1500 window size whereas for class-1 it features, and testing has been carried out to assess the performance
was in the range of 50–75 window size. Moreover the maximum f1- of the model. The relationship between the number of estimators and
score for class 1 & 3, was in the range of 1000–1500 window size. key performance metrics, including accuracy, precision, F1 score, and
But for class-2, the first peak was at 750 and later has been shown in recall, is depicted in Fig. 19. The figure illustrates the impact of varying
the range of 1250–1500 window size. After analyzing these plots, the the number of estimators on crucial performance metrics, such as
optimal window size for the recorded road damages has been identified accuracy, precision, F1 score, and recall, with the x-axis representing
as 1250. This specific window size was chosen for the extraction of all the number of estimators and the 𝑦-axis indicating the corresponding

11
R. Dugalam and G. Prakash Expert Systems With Applications 251 (2024) 123940

Fig. 16. Importance or weight of various features.

Fig. 17. Important features used in the field study.

Table 5
Features used in field study.
Type Feature Calculation
∑30
𝑛=1 𝑟𝑛
Statistical features

Mean (𝜇)
∑30𝑛
𝑛=1
(𝑟𝑖 − 𝜇)2
Standard deviation (𝜎)
𝑛−1
Variance (𝜃) (𝜎)2
𝜎
Coefficient of variation (𝛼)
𝜇
Difference (𝛿) Difference of maximum and minimum values in each window.
Capability potential (𝛽) Threshold value = 𝑔 ± 0.3𝑔, if observed acceleration is > threshold
Non-Statistical features

𝛽 = 0.6 else 0.3.


Threshold for standard deviation (𝜏) If 𝜎 > 0.15𝑔, then the value is 0.9 else 0.2.
Threshold for variance (𝜈) If 𝜃 > 0.15𝑔, then the value is 0.8 else 0.2.
Derived feature1 (𝜂) 𝜈, 𝛼, 𝜏, and 𝛽. If any three of these exceed their limits, then the
value of 𝜂 is 0.8 else 0.2.
Threshold for coefficient of variation (𝜒) If 𝛼 > 0.015, then the value of 𝜒 is 0.8 else 0.2.
Derived feature2 (𝜖) 𝜈+𝜏 +𝜒 +𝛽+𝜂
Derived feature3 (𝛾) If 𝜖 ≥ 2, then 𝛾 = 0.8 else 0.2.

metric scores. Final model parameters used to train the RF classifier has 12 features for each split, denoted by the ‘Features’ parameter. The
been presented in Table 6. It enumerates the key parameters employed ‘Minimum impurity split’ parameter is established at 10, signifying the
in the model, providing insights into how they influenced the behavior impurity threshold for node splitting.
of the Random Forest classifier. The ‘Bootstrap’ has been set to ‘True’, After the data collected from the selected road, it was processed,
allowing the training data to be sampled with replacement during divided into windows, features have been extracted and labeled. At
tree construction. The ‘Maximum depth’ parameter was set to 80 to this point, the total data was converted into feature arrays and the
determine the depth of individual decision trees. The model relies on corresponding labels. This data has been split into training data and

12
R. Dugalam and G. Prakash Expert Systems With Applications 251 (2024) 123940

Fig. 18. Appropriate selection of window size.

Fig. 19. Convergence study for classifier metrics.

Table 6 labels and the 40% test dataset has been given to this trained model
Model parameters.
for the prediction. This test dataset contains a total number of 20
Bootstrap True speed-bumps, rutting of 54 and potholes of 24 in numbers originally.
Maximum depth 80
Features 12
Now the classifier predicted the TP’s as 12, 54 and 20 for speed-
‘n’ estimators 80 bumps, rutting and potholes respectively, the classification report for
‘n’ jobs’ 1 each anomaly has been presented in Table 7. Class 1, representing
Random state 0 speed bumps, demonstrates exceptional precision (1.00), yet a recall
Minimum impurity split 10
of 0.68 suggests some instances were not accurately identified. The F1-
score, a nuanced measure balancing precision and recall, stands at 0.75.
Class 2, corresponding to rutting, showcases high precision (0.86) and
testing data with a ratio of 0.6 and 0.4. Now the split 60% data was perfect recall (1.00), resulting in an impressive F1-score of 0.93. Class
used to train the RF classifier using the final model parameters. The 3, denoting potholes, displays commendable precision (0.83) and recall
RF model has been trained using the features and the corresponding (0.82), yielding an F1-score of 0.83.

13
R. Dugalam and G. Prakash Expert Systems With Applications 251 (2024) 123940

Table 7
Model performance for classification of various damages.
Class Precision Recall F1-score Support
1 1 0.68 0.75 20
2 0.86 1 0.93 54
3 0.83 0.82 0.83 24
Accuracy 0.85 98
Macro avg. 0.89 0.87 0.84 98
Weighted avg. 0.88 0.89 0.86 98

Fig. 21. ROC curve for the classification.

Table 8
Classification metrics for different classifiers.
Classifier Class Accuracy Precision Recall F1 score
1 0.87 1.00 0.68 0.75
Random Forest 2 0.87 0.86 1.00 0.93
3 0.81 0.83 0.82 0.83
1 0.77 1.00 0.36 0.53
SVM 2 0.81 0.72 0.93 0.81
Fig. 20. Precision–recall curve for the classification. 3 0.71 0.33 0.50 0.40
1 0.87 0.82 0.82 0.82
KNN 2 0.81 0.79 0.79 0.79
3 0.81 0.50 0.50 0.50
The performance of the Random Forest (RF) classifier experienced
a 12% decrease during the transition from a simulated study to a 1 0.81 1.00 0.45 0.63
Naive Bayes 2 0.81 0.72 0.93 0.81
real-world field study. In our simulation, we endeavored to replicate
3 0.74 0.38 0.50 0.43
road damages with consistent acceleration patterns, assuming damages
1 0.74 0.64 0.64 0.64
of the same length and pattern throughout the road. However, the Decision Tree 2 0.74 0.75 0.64 0.69
real-world scenario brought variations because the lengths of road 3 0.74 0.38 0.50 0.43
damages differed, impacting how long it took for vehicles to pass each
type. Several factors contributed to the differences between simulation
and reality. Mechanical factors, like variations in sensor sensitivity
despite using better accelerometers, played a role. Vehicle vibrations materials, labor, and equipment, please refer to Appendices A and B.
also added complexity to real-world conditions. Environmental factors, These appendices provide valuable insights into the intricacies of repair
such as sunny daytime conditions during data collection, might have cost estimation and serve as essential references for understanding
influenced the accuracy of the classifier. the financial considerations associated with road maintenance strate-
For a better understanding of the classifier, the precision–recall gies. The cost estimation template employed in this study follows the
curve on the test data has been depicted and is shown in Fig. 20. methodology established by Dong, Huang, and Jia (2014). All material
The curve illustrates the relationship between precision and recall of rates, labor wages, and equipment hiring charges are sourced from the
each class at every possible threshold. The cut-off values determine Central Public Works Department (CPWD), Government of India.
the fraction of true positive or true negative prediction of the model
for each class. A model with perfect performance would be able to 6. Comparison of the proposed classifier with other classifiers
discriminate between road damage and no damage with 100% recall
and 100% precision. Hence, the graph line will pass through the co- The comparison of the proposed RF classifier’s performance is de-
ordinates (0, 1) and (1, 1). In the present study, the best performance tailed in Table 8, where its classification metrics are comprehensively
has been achieved for the class 2 followed by class 3 and class 1. compared with other well-established classifiers, such as Support Vec-
ROC curves for the RF model on the test data is shown in Fig. 21 tor Machine (SVM), k-Nearest Neighbors (KNN), Naive Bayes, and
to study the output of a classifier. In this study, the output has been Decision Tree. The selection of these classifiers for comparison was
binarized to extend ROC curve for multi-class classification. ROC curves based on the complexity of the task, ensuring a meaningful and relevant
shows the separability of the classes by all possible thresholds i.e., how assessment of performance across various classes. In the evaluation
the model is classifying each class. The perfect performance of the of classification metrics, the proposed RF classifier stands out as a
model shows the graph line with co-ordinates (0,0), (0,1) and (1,1). remarkable performer, showcasing its effectiveness in addressing the
For Fig. 21, it was clear that class 2 performed well. complex task of classifying various road damages. The classification
The cost estimation for repairing road damages hinges on various results reveal the RF classifier’s exceptional accuracy, precision, recall,
factors, including the size, type, and number of damages, along with and F1 score across multiple classes, emphasizing its robust predictive
their location and the current date of repair. This detailed information capabilities.
is crucial to determine the most suitable repair type and the necessary When specifically examining the predictions for road damages,
materials. The proposed Random Forest (RF) algorithm, applied to where class-1 corresponds to speed bumps, class-2 to rutting, and class-
the test data from the selected road segment, provides the number 3 to potholes, the RF classifier consistently excels. With an accuracy of
of damages. For a comprehensive breakdown of unit costs, including 0.87 for both class-1 and class-2, and an accuracy of 0.81 for class-3,

14
R. Dugalam and G. Prakash Expert Systems With Applications 251 (2024) 123940

Fig. 22. Repair of pothole.

the RF model demonstrates its proficiency in distinguishing between on the algorithm’s performance and determining whether adjustments
different types of road damages. Comparatively, other classifiers like are necessary. This serves as an avenue for further exploration. (ii) The
SVM, KNN, Naive Bayes, and Decision Tree exhibit varied performance selected road segment in the present study is geographically situated in
across different metrics and classes. SVM, for example, achieves high a suburban locality. The authors recommend collecting datasets from
precision but comparatively lower recall for rutting (class-2), indicating various geographical areas for a more comprehensive analysis. (iii)
potential challenges in capturing all instances of this specific road dam- The present study involved manual labeling using video stamps from
age. KNN, while competitive, may encounter issues with precision for smartphone-collected data, which is a time-consuming process. There-
class-3, suggesting a tendency for false positives in this category. The fore, the authors suggest exploring unsupervised learning methods such
Naive Bayes classifier shows competitive results, but its performance is as K-means clustering etc., for road damage classification.
slightly lower compared to RF, particularly in precision and recall for
class 1. The Decision Tree classifier performs reasonably well, aligning CRediT authorship contribution statement
closely with RF in terms of accuracy and precision. However, it shows
lower recall and F1 Score for class 2 and class 3. In summary, RF Revanth Dugalam: Conceptualization, Methodology, Data collec-
emerges as a robust classifier, showcasing superior performance in tion, Investigation, Writing – original draft, Writing – review & edit-
multi-class classification for road damages when compared to SVM, ing. Guru Prakash: Conceptualization, Supervision, Investigation, Re-
KNN, Naive Bayes, and Decision Tree. sources, Writing – review & editing.

7. Conclusion, drawbacks and future direction Declaration of competing interest

In this paper a road damage classification algorithm has been devel- The authors declare the following financial interests/personal rela-
oped that utilizes the data collected from accelerometers. The collected tionships which may be considered as potential competing interests: Dr.
raw data has been processed, cleaned and extracted using a sliding Guru Prakash reports financial support was provided by IITI DRISHTI
window technique with 50% overlap. The image data has also been CPS Foundation. If there are other authors they declare that they have
recorded to label the extracted features. The random forest (RF) has no known competing financial interests or personal relationships that
been applied for the classification of road damages such as potholes, could have appeared to influence the work reported in this paper.
speed bumps and rutting. The RF model performance has been in-
creased by tuning the sliding window size using the accuracy, recall, Data availability
f1-score and precision. The main conclusions of the present study are
as follows: Data will be made available on request.
(i) Using novel RF based algorithm, multiple road damages on the
Acknowledgment
selected road segment has been classified accurately.
(ii) The proposed RF algorithm has been validated using both the
The authors gratefully acknowledge the support provided by the
simulation and field studies with an very high accuracy of 97%
Technology Development Programme of IITI DRISHTI CPS Foundation
and 85% respectively.
under the National Mission on Interdisciplinary Cyber-Physical System
(iii) Additionally, the proposed model underwent rigorous validation
(NM-ICPS) of the Department of Science and Technology, Government
against four other models, namely SVM, KNN, Naive Bayes, and
of India and support from the Indian Institute of Technology Indore,
Decision Tree. The comparative analysis revealed that the Ran-
Madhya Pradesh, India.
dom Forest model demonstrated exceptional performance in the
classification of multiple road damages. This superiority further
Appendix A. Cost estimation for potholes
solidifies the effectiveness of the proposed model in comparison
to alternative methodologies.
Based on the observed acceleration changes on the selected road
(iv) By utilizing the results obtained from the RF algorithm, total
segment, it was noted that the average depth of potholes ranged from
repair cost of the road damages (potholes and ruts) on the
7 cm to 10 cm. The majority of these potholes exhibited irregular
selected road segment has been estimated.
shapes, necessitating the excavation of a rectangular area for repair,
Despite the valuable findings, it is important to acknowledge certain as depicted in Fig. 22. In the figure, the solid line represents the
drawbacks. The following are the main drawbacks of the present study, actual pothole, and the dotted line outlines the excavated area. For
along with suggestions from the authors for future research: (i) The the repair process, throw-and-roll patches with a compacted thickness
present study collected data under specific environmental conditions, of 40/50 mm, using bitumen of grade VG-30 at 5.5% (percentage by
specifically sunny daytime. However, it is impractical to maintain weight of the total mix), lime filler at 3% (percentage by weight of
identical environmental factors consistently. Therefore, the authors aggregate), and waste plastic additive at 8% (percentage by weight of
propose investigating the impact of varied environmental conditions bitumen) were prepared in a Batch Type Hot Mix Plant with a capacity

15
R. Dugalam and G. Prakash Expert Systems With Applications 251 (2024) 123940

Table 9
Cost details of material, equipment and labour.
Type Details Unit cost (|/day or |/m3 )
Material Bitumen-aggregate mix 11499.40/-
Base coarse material(Aggregate) 1350 + 103.77/- (including transportation)
Sand 1350 + 103.77/-
Labor 2 mate 617/-
2 safety officers at both end of work site 1000/-
Operator of vibrating plate compactor 738/-
Equipment Safety signs 4533.85 per each sign
Mechanical broom (hydraulic) 360/-
Vibrating plate compactor 400/-
Cones, vests, shovels, pickaxes 500/-

Table 10
Cost estimation for pothole repair.
Classification Items Details of each item Cost (|)
Material A Material cost 22,260.85/-
Labor B Number of men ×wage 3972/-
Equipment C Cost of preparation and compaction equipment 10,327.7/-
User delay D User delay costs 0/-
Productivity E Average daily productivity 50 m2 per day
F Estimated days for patching operation 1
Total cost G [𝐴 + (𝐵 + 𝐶 + 𝐷) × 𝐹 ] 36,560.55/-

Table 11
Various unit cost used in the cost estimation.
Type Details unit cost (|/hr or |/kg)
Material Aggregate - 18, 12 mm 1350 + 103.77 per m3
VG 30 39570 per tonne
Labour 4 men 617/-
2 safety officers at both end of work site 1000/-
Operator of vibrating plate compactor 738
Equipment Safety signs 4533.85 per each sign
Mechanical broom (hydraulic) 360/- per hr
Vibrating plate compactor 400 per day
Spraying lances. 300
Cones, vests, shovels, pickaxes 700/-

of 100–120 TPH. The working procedure to fill the pothole with the Appendix B. Cost estimation for rutting
bitumen mix involves several steps. First, sweep the pothole and clear
away debris. For potholes with a depth exceeding 5 cm, coarse material In this study, rutting, encompassing issues such as depressions,
(aggregates of 18 mm) is filled, followed by the placement of the pavement surface uplift, and undulations, has been examined alongside
potholes and speed bumps. To address these types of damages, it is
bitumen mix. A surplus of 40 percent is provided above the roadbed
crucial to select a specific area around the damage to level the road.
for compaction purposes. The bitumen mix is then compacted to level
Based on acceleration data, rutting damage is more likely to occur over
it with the road surface, and sand is spread over the patch to reduce an average length of 1 meter on the chosen road segment. The repair
friction between the cold mix patch and car tires. procedure involves marking the area to be repaired, cleaning it using a
Table 9 has provided a comprehensive breakdown of unit costs mechanical broom, spreading 18 mm aggregates, and compacting them
associated with the key elements required for the road repair process. with a vibrating plate compactor. Subsequently, VG 30 is sprayed at a
It has detailed costs encompassing various aspects, including materi- rate of one kilogram per square meter. Following this, 12 mm aggre-
als, labor, and equipment. Specifically, it has outlined the expenses gates are spread, compacted, and resealing binder is sprayed at a rate
related to crucial components like the bitumen-aggregate mix, base of 0.4 kilograms per square meter. It is imperative to ensure that the
repaired surface is 10 mm higher than the original road level. Table 11
coarse material, labor wages, and essential equipment such as vibrating
has presented a detailed overview of various unit costs essential for
plate compactors and safety signs. This table has served as a valuable
the ongoing cost estimation associated with repairing rutting damage.
reference, offering transparency regarding the individual costs asso- It systematically categorizes costs into material, labor, and equipment,
ciated with each element involved in the road repair, facilitating a specifying unit costs in Indian Rupees per hour or per kilogram. The
clear understanding of the financial considerations associated with the table includes pertinent details such as the cost of 18.12 mm aggregate
project. per cubic meter, VG 30 bitumen per tonne, labor wages for 4 men, and
Table 10 has presented a meticulous cost estimation for pothole the costs of equipment like safety signs, hydraulic mechanical broom
repair, systematically categorizing expenses into material, labor, equip- per hour, vibrating plate compactor per day, and spraying lances. A
detailed breakdown of the estimated costs associated with repairing
ment, user delay, and productivity. The total cost has been calculated
typical rutting damage has been conducted using these materials, labor,
by aggregating material costs with the product of labor, equipment,
and equipment, and the results are presented in Table 12.
and user delay costs, multiplied by the estimated days for the patching
The classification includes material costs (A), which encompass ex-
operation. The presented total cost of 36,560.55/- in Indian Rupees penses related to procuring necessary materials, totaling |13,912.56/-.
signifies a comprehensive assessment, addressing the complexities of Labor costs (B) are calculated based on the number of men and their
material composition, labor requirements, and equipment expenses wages, resulting in |5206/-. Equipment costs (C) cover expenses re-
crucial for an efficient and sustainable pothole repair approach. lated to the preparation and compaction equipment, amounting to

16
R. Dugalam and G. Prakash Expert Systems With Applications 251 (2024) 123940

Table 12
The estimated cost for repairing of typical rutting damage.
classification Items Details of each item Cost
Material A Material cost 13,912.56/-
Labor B Number of men ×wage 5206/-
Equipment C Cost of preparation and compaction equipment 10,327.7/-
User delay D User delay costs 0
Productivity E Average daily productivity 50 m2 per day
F Estimated days for patching operation 2
Total cost G [A + (B+C+D) × F] 45,179.96/-

|10,327.7/-. User delay costs (D) are specified as |0, indicating no addi- Ghadge, M., Pandey, D., & Kalbande, D. (2015). Machine learning approach for
tional expenses in this category. Productivity factors (E) are considered predicting bumps on road. In 2015 international conference on applied and theoretical
computing and communication technology (pp. 481–485). IEEE.
with an average daily productivity of 50m2 per day. In this analysis, the
Kamal, K., Mathavan, S., Zafar, T., Moazzam, I., Ali, A., Ahmad, S. U., et al. (2018).
anticipated duration for the patching operation (F) has been predeter- Performance assessment of kinect as a sensor for pothole imaging and metrology.
mined at 2 days. The total cost (G) is meticulously calculated using International Journal of Pavement Engineering, 19(7), 565–576.
the formula [A + (B + C + D) × F], culminating in a conclusive cost Kaushik, V., & Kalyan, B. S. (2022). Pothole detection system: A review of different
estimate of |45,179.96/-. methods used for detection. In 2022 second international conference on computer
science, engineering and applications (pp. 1–4). IEEE.
Kumar, A., Kalita, D. J., Singh, V. P., et al. (2020). A modern pothole detection
References technique using deep learning. In 2nd international conference on data, engineering
and applications (pp. 1–5). IEEE.
Agrawal, H., Gupta, A., Sharma, A., & Singh, P. (2021). Road pothole detection Lakmal, H., & Dissanayake, M. B. (2020). Pothole detection with image segmentation
mechanism using mobile sensors. In 2021 international conference on technological for advanced driver assisted systems. In 2020 IEEE international women in engineering
advancements and innovations (pp. 26–31). IEEE. (WIE) conference on electrical and computer engineering (pp. 308–311). IEEE.
Anaissi, A., Khoa, N. L. D., Rakotoarivelo, T., Alamdari, M. M., & Wang, Y. (2019). Li, S., Yuan, C., Liu, D., & Cai, H. (2016). Integrated processing of image and GPR data
Smart pothole detection system using vehicle-mounted sensors and machine for automated pothole detection. Journal of Computing in Civil Engineering, 30(6),
learning. Journal of Civil Structural Health Monitoring, 9(1), 91–102. Article 04016015.
Atencio, E., Plaza-Muñoz, F., Muñoz-La Rivera, F., & Lozano-Galant, J. A. (2022). Cali- Lopes Amaral Loures, L., & Rezazadeh Azar, E. (2023). Condition assessment of unpaved
bration of UAV flight parameters for pavement pothole detection using orthogonal roads using low-cost computer vision–based solutions. Journal of Transportation
arrays. Automation in Construction, 143, Article 104545. Engineering, Part B: Pavements, 149(1), Article 04022066.
Ayman, H., & Fakhr, M. W. (2023). Recent computer vision applications for pavement Maeda, H., Sekimoto, Y., Seto, T., Kashiyama, T., & Omata, H. (2018). Road damage
distress and condition assessment. Automation in Construction, 146, Article 104664. detection and classification using deep neural networks with smartphone images.
Bansal, K., Mittal, K., Ahuja, G., Singh, A., & Gill, S. S. (2020). DeepBus: Machine Computer-Aided Civil and Infrastructure Engineering, 33(12), 1127–1141.
learning based real time pothole detection system for smart transportation using Mednis, A., Strazdins, G., Zviedris, R., Kanonirs, G., & Selavo, L. (2011). Real
IoT. Internet Technology Letters, 3(3), Article e156. time pothole detection using android smartphones with accelerometers. In 2011
Bhatlawande, S., Deshpande, A., Deshpande, S., & Shilaskar, S. (2022). Proactive international conference on distributed computing in sensor systems and workshops (pp.
detection of pothole and walkable path for safe mobility of visually challenged. 1–6). IEEE.
In 2022 3rd international conference for emerging technology (pp. 1–5). IEEE. Miao, J., & Zhu, W. (2022). Precision–recall curve (PRC) classification trees.
Borlea, I.-D., Precup, R.-E., & Borlea, A.-B. (2022). Improvement of K-means cluster Evolutionary Intelligence, 15(3), 1545–1569.
quality by post processing resulted clusters. Procedia Computer Science, 199, 63–70. Ouma, Y. O., & Hahn, M. (2017). Pothole detection on asphalt pavements from
Bosi, I., Ferrera, E., Brevi, D., & Pastrone, C. (2019). In-vehicle IoT platform enabling 2D-colour pothole images using fuzzy c-means clustering and morphological
the virtual sensor concept: A pothole detection use-case for cooperative safety. In reconstruction. Automation in Construction, 83, 196–211.
IoTBDS (pp. 232–240). Pandey, A. K., Iqbal, R., Amin, S., Maniak, T., Palade, V., & Karyotis, C. (2021). Deep
Carlos, M. R., Aragón, M. E., González, L. C., Escalante, H. J., & Martínez, F. (2018). neural networks based approach for pothole detection. In 2021 4th international
Evaluation of detection approaches for road anomalies based on accelerometer conference on signal processing and information security (pp. 1–4). IEEE.
readings—Addressing who’s who. IEEE Transactions on Intelligent Transportation Pandey, A. K., Iqbal, R., Maniak, T., Karyotis, C., Akuma, S., & Palade, V. (2022).
Systems, 19(10), 3334–3343. Convolution neural networks for pothole detection of critical road infrastructure.
Carlos, M. R., Gonzalez, L. C., Wahlström, J., Cornejo, R., & Martinez, F. (2019). Computers & Electrical Engineering, 99, Article 107725.
Becoming smarter at characterizing potholes and speed bumps from smartphone Protic, D., & Stankovic, M. (2023). XOR-based detector of different decisions on
data—introducing a second-generation inference problem. IEEE Transactions on anomalies in the computer network traffic. Science and Technology, 26(3–4),
Mobile Computing, 20(2), 366–376. 323–338.
Chen, K., Lu, M., Fan, X., Wei, M., & Wu, J. (2011). Road condition monitoring using Ryu, S.-K., Kim, T., & Kim, Y.-R. (2015). Image-based pothole detection system for ITS
on-board three-axis accelerometer and GPS sensor. In 2011 6th international ICST service and road management system. Mathematical Problems in Engineering, 2015.
conference on communications and networking in China (pp. 1032–1037). IEEE. Silvister, S., Komandur, D., Kokate, S., Khochare, A., More, U., Musale, V., et al. (2019).
Chibani, N., Sebbak, F., Cherifi, W., & Belmessous, K. (2022). Road anomaly detection Deep learning approach to detect potholes in real-time using smartphone. In 2019
using a dynamic sliding window technique. Neural Computing and Applications, IEEE pune section international conference (pp. 1–4). IEEE.
34(21), 19015–19033. Wickramarathne, T., Garg, V., & Bauer, P. (2018). On the use of 3-d accelerometers
Dong, Q., Huang, B., & Jia, X. (2014). Long-term cost-effectiveness of asphalt pavement for road quality assessment. In 2018 IEEE 87th vehicular technology conference (pp.
pothole patching methods. Transportation Research Record, 2431(1), 49–56. 1–5). IEEE.
Dugalam, R., & Prakash, G. (2024). A hybrid multiple input multiple output (MIMO) Zhang, F., & Hamdulla, A. (2022). Research on pothole detection method for intelligent
model for simultaneous localization and quantification of structural damage in driving vehicle. In 2022 3rd international conference on pattern recognition and
beam structures. Structures, 60, 105879. machine learning (pp. 124–130). IEEE.
Egaji, O. A., Evans, G., Griffiths, M. G., & Islas, G. (2021). Real-time machine learning- Zhao, Y., Chen, B., Wang, X., Zhu, Z., Wang, Y., Cheng, G., et al. (2022). A deep
based approach for pothole detection. Expert Systems with Applications, 184, Article reinforcement learning based searching method for source localization. Information
115562. Sciences, 588, 67–81.
Fox, A., Kumar, B. V., Chen, J., & Bai, F. (2017). Multi-lane pothole detection
from crowdsourced undersampled vehicle sensor data. IEEE Transactions on Mobile
Computing, 16(12), 3417–3430.

17

You might also like