0% found this document useful (0 votes)
55 views15 pages

Wireless-Signal-Based Vehicle Counting and Classification in Different Road Environments

This document presents a study on using Wi-Fi signals for low-cost vehicle counting and classification. The study aims to address limitations in previous work by developing an algorithm to automatically count vehicles, comparing the performance of 2.4GHz and 5GHz Wi-Fi frequencies, and systematically evaluating different machine learning models for vehicle classification. Data was collected from three different road environments in Münster, Germany. Vehicle classification accuracy ranged from 83% to 100% depending on the method and road scenario. The 5GHz Wi-Fi frequency provided better results than 2.4GHz overall. The results suggest Wi-Fi-based techniques can enable cost-effective traffic monitoring in a privacy-preserving way.

Uploaded by

Rohit Sawant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views15 pages

Wireless-Signal-Based Vehicle Counting and Classification in Different Road Environments

This document presents a study on using Wi-Fi signals for low-cost vehicle counting and classification. The study aims to address limitations in previous work by developing an algorithm to automatically count vehicles, comparing the performance of 2.4GHz and 5GHz Wi-Fi frequencies, and systematically evaluating different machine learning models for vehicle classification. Data was collected from three different road environments in Münster, Germany. Vehicle classification accuracy ranged from 83% to 100% depending on the method and road scenario. The 5GHz Wi-Fi frequency provided better results than 2.4GHz overall. The results suggest Wi-Fi-based techniques can enable cost-effective traffic monitoring in a privacy-preserving way.

Uploaded by

Rohit Sawant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Received 5 August 2021; revised 17 December 2021; accepted 17 March 2022.

Date of publication 21 March 2022; date of current version 31 March 2022.


Digital Object Identifier 10.1109/OJITS.2022.3160934

Wireless-Signal-Based Vehicle Counting and


Classification in Different Road Environments
RAOUL KANSCHAT 1, SHIVAM GUPTA 2, AND AURIOL DEGBELO 1

1 Institute for Geoinformatics, University of Münster, 48149 Münster, Germany

2 Bonn Alliance for Sustainability Research, University of Bonn, 53113 Bonn, Germany

CORRESPONDING AUTHOR: A. DEGBELO (e-mail: [email protected])

ABSTRACT Traffic monitoring is key to modern city planning. However, the costs associated with
monitoring devices limit the large-scale deployment of existing traffic monitoring systems. In this article,
we propose and evaluate an algorithm to automatically count the number of vehicles that have passed
through a low-cost system for traffic monitoring. The system uses deviations in the Wi-Fi signals strength
to predict the presence of a vehicle on the road and its type (car, bus). The study further systematically
compares six analytical techniques for the classification of detected vehicles. The methods were tested
with data from three road scenarios in the city of Münster, Germany. Vehicle classification accuracy
ranged from 83% up to 100% in our study. We also observed that a higher Wi-Fi frequency (5 GHz) was
superior to the 2.4 GHz for improving the overall vehicle detection and the results of the classification
algorithms. The results suggest that the Wi-Fi-based techniques proposed in this study are promising for
cost-efficient traffic monitoring in cities in a privacy-preserving manner.

INDEX TERMS Low-cost, machine learning, road traffic monitoring, smart cities, vehicle classification,
vehicle counting, Wi-Fi.

I. INTRODUCTION traffic control systems, statistics, economic development and

W ITH the growing population and traffic in urban


areas, there is a need to efficiently organize and
improve road traffic. Numerous concerns exist when the road
federal reporting, which could also benefit significantly from
increased data availability.
Despite these potential benefits of timely, high-granular
traffic is not efficiently planned, such as increased traffic traffic monitoring data, the costs associated with installing
jams, high gasoline consumption, air pollution, noise pol- and maintaining existing techniques for traffic monitoring
lution [1], [2], [3]. Scarce management of traffic flow may (e.g., inductive loop detectors, video analysis) limit their
result in extended waiting times at traffic lights or in con- large-scale deployment. For instance, inductive loops cost at
gestion that influences the gasoline consumption as well as least 5.000e.1 The Georgia Department of Transportation
the drivers’ mood and behavior. Access to timely, reliable has stated in a report that the installation of a Continuous
traffic data with high spatial coverage of cities can help to Counting Stations (CSS) on a two-lane roadway costs
minimize or address these problems. Information about the approximately $25.000 and can go up to $80.000 [4]. There
speed and direction of a vehicle or the number of vehicles is thus a need for low-cost techniques that produce reli-
that use specific roads can be useful to enhance and redirect able traffic data. Along these lines, Gupta et al. [5] recently
the traffic flow. This data can further be helpful for more proposed an approach for traffic monitoring, which uses
complex spatiotemporal models to infer harmful noise lev- Wi-Fi signals to detect vehicles. Their approach was low-
els or air pollution in cities. Additional applications areas cost (they indicated a cost of less than $50 for it), and they
of timely and high-granular traffic data include road safety, reported promising results for vehicle classification using

The review of this article was arranged by Associate Editor Winnie 1. https://ct-technologyinfo.com/2020/11/09/traffic-detection-systems/
Daamen. (last accessed: July 31, 2021).

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

236 VOLUME 3, 2022


Wi-Fi signals. Nonetheless, their work had a few limitations. a huge drawback of intrusive devices along with increased
First, the vehicle counting strategy used led to multiple false costs for their installation. They state that “as a result, those
positives. It is desirable to avoid these false positive as much solutions are not suitable for large-scale deployment and
as possible, so that vehicle classification is done on relevant hence are restricted to small scale applications” [6].
entities instead of noise. Second, the scenario of multiple Non-intrusive devices: These devices are “more reliable
vehicles passing at the same time was not covered. Third, and cost-effective” [6] than the intrusive devices. Examples
their work reported that the k-nearest neighbor (kNN) tech- of these include: technologies such as video image process-
nique used led to good classification accuracies, but it is ing, microwave radar, laser radar, passive infrared, ultrasonic,
unclear if other models would have performed better. These passive acoustic array, in which devices are mounted over-
limitations are addressed in this work, with a focus on the head on roadways or roadsides (see [5]). Recently, Asiain
following four research questions. and Antolín [7] developed a Low Power Wide Area Network
1) How to automatize the counting of vehicles on the road (LPWAN) based system for detecting traffic flow. However,
based on deviations of wireless signals? The emphasis like intrusive devices, most non-intrusive devices also have
here is on increasing the precision of vehicle counting the downside of being expensive, energy intensive as well
strategies. as being affected by the environment (e.g., they are prone
2) What are the respective merits of different types of sig- to errors when environmental conditions like weather or
nals (5GHz vs 2.4GHz) for detection and classification daytime change, see [6]).
endeavours? Off-roadway devices: These devices use remote sensing
3) How to automatically classify the type of vehicles techniques for traffic monitoring. These techniques include
based on deflections of wireless signals? Here, the aircraft and satellite monitoring as well as tracking phones
aim is to systematically compare several models for or using probes within the vehicles. They are cheap and
the classification tasks. easy to deploy and offer a high spatial resolution but raise
4) To which extent do models learned from a road envi- privacy concerns. Chourasia et al. [8] proposed Wi-Fi-based
ronment can be applied to other road environments? road-side sniffers to examine signals broadcasted by smart-
This question touches on the generalization of the phones available inside vehicles to estimate traffic stats of
models. ‘Other road environments’ here can denote road segments. Hoogendoorn et al. [9] conducted a study
the same road at another time (in which case we talk where they used grayscale imagery, which was recorded by
about temporal generalization) or an entirely new road a camera mounted on a helicopter. The study draw “insight
(in which case we talk about spatial generalization). into the behavior of drivers during· · · congestion, and to
The key contributions of this article include (i) an algo- develop and test theories and models describing congested
rithm for automated counting of the vehicles based on Wi-Fi driving behavior”. The assembly could detect and track 98%
signal; (ii) a comparative evaluation of the 2.4GHz and 5GHz of the vehicle positions and their dimensions—the spatial
signals for vehicle counting and classification, and (iii) a resolution of 22cm and a temporal resolution of 8.6 Hertz.
comparative evaluation of different classification algorithms Overall, the outcome of their study suggests that the weather
to analyse Wi-Fi signals. The rest of the paper is organized had a significant impact on the quality of the data collected.
in the following sections. Section II gives a brief overview Also, the helicopter was affected by the wind strength, which
of the related work, followed by Section III that describes also influenced the image quality. Another study conducted
the methods used during the work. The performances of six by Schreuder et al. [10] found a drop of reliability in the
analytical approaches used to classify the types of vehicle data from 98% of cars that were detected to 90% “after the
are described in Section IV. A discussion of the results and weather conditions worsened.”
their implications in presented in Section V before Section VI The approach to counting vehicles with the help of probes
concludes the article. uses Global Positioning System (GPS) signals to track the
vehicles also belongs to the off-roadways devices class. The
II. RELATED WORK benefit of the system is that it can be easily applied to
Gupta et al. [5] proposed a grouping of traffic monitoring existing cars with GPS sensors and works with high accu-
techniques into five classes: intrusive devices, non-intrusive racy. However, the obvious downside to this approach is
devices, off-roadways devices, sensor combinations devices, the privacy concerns that come with it. Even with anony-
and relatively low-cost devices. Related work in these mous data collection, data mining algorithms have been
categories in briefly reviewed in this section. shown to find out where the individuals live (see [11]).
Intrusive devices: These devices are permanently installed Nanthawichit et al. [12] proposed a method for treating probe
into the pavement. They have high accuracy, but high vehicle data together with fixed detector data in order to esti-
installation and maintenance costs as well. Inductive loops, mate the traffic state variables of traffic volume, space mean
magnetic detectors, micro-loop probes, pneumatic road tubes, speed, and density using a macroscopic model along with
piezoelectric, and other weigh-in-motion sensors are exam- the Kalman filtering technique (KFT). The model combines
ples of intrusive devices. Barbagli et al. [6] described the data collected by probe vehicles and conventional data in a
disruption of the traffic during the installation and repair as microscopic traffic flow model. The KFT was used to handle

VOLUME 3, 2022 237


KANSCHAT et al.: WIRELESS-SIGNAL-BASED VEHICLE COUNTING AND CLASSIFICATION

the inconsistencies of data collected by the probe vehicles. deploy the system. They used a static transmitter on a road-
The researchers also had some experimental results on traf- side, which captured the signals of bypassing cars and were
fic prediction, which confirm “that the proposed method can able to distinguish different traffic intensities with an accu-
provide reasonable estimation not only for traffic states but racy of 96.3% and 87.6% [18]. Another approach that uses a
also, [· · · ], travel time can be effectively estimated and pre- Microwave Doppler Radar Sensor connected to a Raspberry
dicted” [12]. Recently, Ryan et al. [13] and Byun et al. [14] PI 3 Model B was proposed by Czyz̈ewski et al. [19]. Their
proposed the application of unmanned aerial vehicle (UAV) algorithm results were compared to a pneumatic tube count-
for estimating road traffic and vehicle speed automatically ing system deployed on the same road. The study achieved
by analyzing video feed using machine learning approaches an accuracy of 90% for vehicle counting, stating that it would
like deep neural network. be sufficient to use this approach to collect traffic statistics.
Sensor combination devices: These devices try to com- One problem that the authors acknowledge was the diffi-
bine multiple traffic monitoring techniques such as passive culty of detecting vehicles with a high velocity (greater than
infrared with ultrasound and Doppler microwave radar to 100km/h).
enhance their overall accuracy. Even though they result in The Wi-Fi channel state information (CSI) was used in a
higher accuracy, these combinations are often highly com- field study conducted by Zhang et al. [20], Won et al. [21]
plex to install, making it challenging to deploy them for high to detect bypassing vehicles. To access the CSI particular
spatial coverage data collection processes. chipsets and firmware are necessary, which was not avail-
Relatively low-cost devices: Examples of systems men- able in the generation of the Raspberry PI which was used
tioned in Gupta et al. [5] belonging to this category were during this study. Furthermore this study’s focus is on com-
continuous-wave radar, computer vision low-cost sensors, paring the difference between the two different frequencies
and radio-wave technologies. Some of their disadvantages and the comparison of the different classification algorithms.
include the need for specialized hardware and procedures, The CSI is also used for human indoor gesture detection.
limited computation capability for large data set analysis and They achieved an “average detection accuracy of 99.4%
privacy concerns. Forren and Jaarsma [15] proposed a low- and an average classification accuracy of 91.1%.” When
cost system by using acoustic sensors to monitor the traffic. a vehicle passes, the system detects peaks with the stan-
The study used microphones to count vehicles by analyzing dard outlier detection technique using CNN. Homchan and
their tire noise. Sen [16] investigated traffic monitoring of Aswakul [22] further proposed the Wi-Fi packet measure-
non-lane chaotic traffic using the noisiness and excessive ment based approach for vehicular traffic monitoring using
use of vehicle honks. The author used the noise level, the software-defined mesh network.
number of honks, and their duration to check for congested Summary: Overall, many efforts are underway to develop
roads. Two microphones were used along the road to mea- novel methods to count and classify traffic in a transport
sure the noise variation over time and distance. The system system. The central objective has been to reduce the cost
was able to detect different traffic conditions in real-time. of an individual counting station to allow a large-scale
However, it was also acknowledged in the study that the deployment by using existing infrastructures or cheap sen-
honking depends on the driver’s behavior and that chaotic sor technology. Our work is in line with this objective. In
traffic jams result in many loud honks, but sometimes the particular, as mentioned in Section I, we extend and address
road users also queued up in a traffic jam, which resulted several limitations of Gupta et al. [5] to develop a more
in a quiet but still congested traffic situation. More recently, automated and reliable Wi-Fi based approach for vehicle
Kochláň et al. [17] proposed a low-cost vision-based traffic detection and classification.
monitoring system using Raspberry pi and a high defini-
tion camera connected to a car battery in conjunction with III. METHOD
a step-down transformer and an antenna to send the data This section presents an overview of the data and methods
to a remote server. The authors did not state the overall used in the study.
cost of the system. They reported that the system had a low
power consumption because it was able to run on a single A. DATA COLLECTION SCENARIOS
car battery for over one week. The computer vision algo- The focus of this study was on investigating the performance
rithms were able to detect 95.7% and classify 93.2% of the of the system in urban areas, and three road types that are
vehicles. Ryan et al. [13] also proposed the computer vision- common in German cities were chosen to provide a realis-
based approach assisted with small unmanned aerial systems tic test setting. The first type is the one-way roads, where
(sUAS) to capture detailed data for collecting vehicle data. vehicles are only allowed to drive on one lane and in one
A novel approach based on dedicated short-range com- direction. The second type is a road with two lanes and
munications (DSRC) signals to measure and classify traffic the vehicles driving in one direction, and the third type is
demand was introduced by Tulay and Koksal [18]. The the roads with two lanes where the vehicles drive in both
DSRC is a method where vehicles communicate with other directions. The speed limits of all scenarios are 50 km/h.
vehicles without the driver knowing the communication. This Highways were excluded at this point for safety reasons. The
has the advantage that no wired infrastructure is needed to measurements took place in November and December 2020

238 VOLUME 3, 2022


in the city of Münster (North Rhine-Westphalia, Germany).
The data collection activities were approved by the city coun-
cil of Münster, and necessitated the deployment of additional
traffic signs to ensure the safety of the drivers. The three
scenarios are now described in detail.
Scenario 1 - The one-way street: For this scenario, the
Austermannstraße was chosen. It is a busy road in the
city, with cars and busses as the most common vehicle
types passing. The two units of the hardware system were
placed approximately 6 meters apart across the road. Two
measurements were collected from this road on 16.12.2020
and 17.12.2020. The first-day measurement started at 16:00 FIGURE 1. Graphical representation of the installed system and its working (Icons
and ended at 19:00 with weather conditions of 9◦ C, made by Freepik.com).
cloudy and 87% relative humidity 1015 hPa. The second-
day measurement started at 16:30 and ended at 19:30 with
weather conditions of 10◦ C, cloudy, 79%, and 1021 hPa.
The sunset time for the first measurement was at 16:17 and
for the second at 16:18.
Scenario 2 - Two lanes one direction: Here, the
Corrensstraße was chosen. It is a large but not a busy
road in front of a university building. The most common
vehicle types using this road were cars and busses. The
hardware units were approximately 11 meters apart across
the road. Three measurements were collected on 19.11.2020
(16:15-17:30), 24.11.2020 (17:00-18:00) and 25.11.2021
(17:20-18:40). The sunset times for the measurements were
at 16:32, 16:26 and 16:25, respectively. One noteworthy
observation for this particular scenario was that few busses
drove extremely slowly through the system because the hard-
ware installation was located in proximity to a bus stop, and
as the busses started, they needed some time to accelerate. FIGURE 2. A picture of the system’s sending unit. The different hardware
The weather conditions for the first measurement were 2◦ C, components (Router, laser pointer, camera and power supply) are labeled.

cloudy, 85% relative humidity and 1027 hPa; the second


measurement 0◦ C, cloudy, 90% and 1020 hPa, and for the
third measurement, the conditions were 5◦ C, cloudy, 82% multiple elements which are portrayed in Figure 2. A “TP-
relative humidity and 1018 hPa. Link WLAN-Router model Archer C7 AC1750” was used,
Scenario 3 - Two lanes two directions: The third sce- which uses the dual band frequencies 2.4 GHz and 5 GHz.
nario with two lanes and two directions was located at The SSIDs of the router was separated into the 2.4 GHz
the Mendelstraße with a distance between the two parts and 5 GHz frequencies to allow the receiver to distinguish
of the installation by approximately 10 meters across the between the two signals. The Receiver, which is displayed
road. The two most frequent vehicle classes over this road in Figure 3 consists of a Raspberry Pi 3 B, two USB WI-FI
were cars and busses with the possibility of two cars pass- dongles, two dual band directional antennas and a light sen-
ing at the same time. Two measurements with 2.5 hours sor. The graphical representation of whole the installation is
length were collected on 08.12.2020 (starting 16:00) and illustrated in Figure 1 and the annotated configuration of the
09.12.2020 (starting 16:20). The sunset times for both mea- installation is presented in Figure 2 and 3.
surements were at 16:17. The weather conditions for the The signal strength of the frequencies was measured in
first measurement were 1◦ C, foggy, 97% relative humidity decibel (dB). The setup helped us capture the interruptions
and 1006 hPa and for the second 0◦ C, foggy, 100% and in the signal caused by vehicle driving between the sender
1010 hPa. and the receiver on the road.
In contrast to systems discussed in the previous section,
our prototype can be installed within a couple of hours with-
B. DATA COLLECTION APPROACH out stopping or distracting the traffic flow on the road. To
1) SENDER AND RECEIVER install the system, the two units need to be placed on opposite
The data was collected using a system extended and much sides of the road. As the hardware system can be accessed
improved from [5]. A key novelty here was the addition and restarted without interfering with the traffic, detecting
of the 5 GHz frequency. The sending unit consists of errors and restarting the system can be performed remotely.

VOLUME 3, 2022 239


KANSCHAT et al.: WIRELESS-SIGNAL-BASED VEHICLE COUNTING AND CLASSIFICATION

FIGURE 3. A picture of the receiving unit of the system. The different hardware
components (directional antennas, Raspberry Pi, light sensor, camera and power
supply) are labeled.

As it consists of multiple components, defective parts can


be replaced individually with no effect on traffic flow, mak-
ing it more promising in terms of easy maintenance and
cost-effectiveness. Considering the safety approvals required
before installation and the fact that the data collection can
only be initiated with some programming skills, the system
in the current state is not suitable for citizen science ini-
tiatives. The approximate cost of the installation was 200e,
which is a lot cheaper than the state of the art monitoring
systems [4] or even newer approaches [21].

2) GROUND TRUTH
A laser tripwire system was used to collect ground truth
data about vehicles passing or not, and when. In addition,
the types of vehicle passing were recorded through cameras
mounted on the ground (max 20 cm above the ground). That
way, only the wheels of the cars were recorded to ensure
FIGURE 4. Visualization of the change in signal strength in dB when a car drives
that no personal data about the drivers was collected. This through the system. Top: 2.4 GHz signal strength; Middle: 5 GHz signal strength;
was a mandatory requirement by the local authorities. The Bottom: Laser Value (0 = The laser is not interrupted, 1 = The Laser is interrupted).

video data were subsequently manually annotated with the


class of vehicle passing to generate the ground truth data.
A subsequence of the data is shown in Figure 4. In the
work [5], peaks were used to execute the detection of vehi-
visualization, an extract of the different collected time series
cles. A peak starts when the Wi-Fi signal strength drops
is shown for the time that a car drove by the system. In both
more than a given threshold and ends when it recovers to
frequencies a noticeable drop in signal strength is visible.
the starting dB value also within the threshold. A peak has
Corresponding to the signal drop, the laser was interrupted,
a time window, i.e., the duration between its start and end.
which can be seen in Figure 4 (Bottom). The system, which
As the main idea of Gupta et al. [5]’s method was to use
recorded 10 datapoints per second picked up three datapoints
the point in time the signal recovers (i.e., returns to its value
of the disrupted laser, which correspond to 3 datapointts
before deflection) as an indicator for the end of a peak, their
(middle) and 4 datapoints (top).
approach is called recovery-based method to peak extraction
at this point. The threshold of 2dB was used for peak detec-
C. VEHICLE COUNTING APPROACH tion in [5] and that value was also used in this work while
As mentioned in Section I, one of the aim of this work is to implementing the recovery-based approach as a baseline for
reduce noise during the vehicle detection task. In previous comparison. One weakness of the recovery-based approach

240 VOLUME 3, 2022


TABLE 1. Comparison of the peak extraction methods: Nav = Not a vehicle indicates
is its limited ability to detect more complex patterns in sig- a false positive.
nals. For example, in some cases when a bus passes the
system, the signal strength recovers momentary and goes
down again, causing the algorithm to detect two different
busses instead of one.
To enhance the threshold method and make it more robust
to fluctuations, we use a new method called median-based
peak extraction. We begin by first computing the median
of the time series, which we then use as a reference value.
In addition a 3 data point time window was also used to
allow the detection of more complex patterns in the signal
stream. By this new approach new peaks are identified when
the mean of the signal strength within the windows differs
more than 1.5 dB to the median. The peak ends when the
mean of the signal strength of the three data points recovers
back within the 1.5 dB. The threshold of 1.5 dB was chosen
after multiple iterations during the data analysis. Table 1
shows the results across all three scenarios. Note that the
detection algorithm does not perform any classification at
this stage. Since the outcomes of the extraction step are
the input values for the classification methods, the detection
algorithm determines the maximum vehicle count that can
potentially be classified. The assignment of the classes of
vehicles to the peaks has been done manually to generate the
data in this table. As the table illustrates, the median-based
peak extraction allowed a more complex and robust peak
extraction and is more resilient to noise detection across
scenarios. Nonetheless, it might still be prone to noise error
when the idle signal strength changes or a vehicle stands
for a long time between the sensors. In all three scenarios,
the median-based approach has reduced the number of NaV
(false positive) dramatically, leading to substantial improve-
ments in the precision and overall F-score, especially when
applied to 5GHz signals.

D. VEHICLE CLASSIFICATION TECHNIQUES


Once the peaks were extracted from the frequencies with the
new approach discussed above, a classification of the peaks
was intended. The goal here is to find out the type of vehi-
cle that led to the disturbance in the Wi-Fi signal. In order
to accomplish the classification process different algorithms
were used and compared. Initially, the classification algo-
rithm k-nearest neighbor classifier utilized in the previous
work by [5] was implemented and used as a comparison
baseline. Afterwards different preprocessing techniques were
applied to the data and various parameters for the kNN were
investigated. In addition, a Random Forest classifier was used
to classify the peaks. Ultimately, a new approach using the
matrix profile was implemented in our study for the vehicle
classification. A brief description of each method, as well
as the rationale for their choice is presented next.

1) K-NEAREST NEIGHBOR
K-nearest neighbor is a “simple but effective” [23] classifica- first choices for a classification study when there is little or
tion method, suitable as a basis of comparison for following no prior knowledge about the distribution of the data” [24].
classification algorithms because it “should be one of the This classification method which is a supervised machine

VOLUME 3, 2022 241


KANSCHAT et al.: WIRELESS-SIGNAL-BASED VEHICLE COUNTING AND CLASSIFICATION

learning technique [25] learns by simply storing the training robust classification. The number of classifiers, which are
samples and their features. The classification is then based decision trees in the case of random forest, decide the accu-
on the Euclidean distance between the training and test sam- racy and computation time for the algorithm. More trees lead
ples [24]. For each sample that will be classified the method to more computation time [32]. For each new instance, the
computes the Euclidean distances between the testing sam- individual trees vote on a class, and the decision is then based
ple and all the training samples. The “k” samples that have on the majority of votes [33]. Previous work [34] reported
the least distance to the tested sample then determine the that random forest models perform well on a wide variety of
predicted class based on a majority voting [26]. classification problems. The fact that many weak classifiers
In a first step, the peaks have been extracted from the most of the time perform better than a single classifier [35],
two frequencies (see Section III-C). From these peaks the and that this method, in contrast to the kNN is efficient and
amplitude and the length of the peak define the features able to “operate quickly over larger datasets” [33] is the
which, were used to classify the vehicles. These features are reason that we chose to utilize it for our study.
the same that were used in previous work [5]. The ampli- In our study, the random forest used the same input data as
tude is determined by the maximum deflection of the signal the new kNN approach to make a comparison possible. First,
strength during the peak. The length of the system indi- the peaks for both frequencies are extracted using the new
cates how long the signal needed to recover to its standard counting method. The used features are the peaks duration,
strength. The resulting data is then split into 70% training maximum amplitude, and Euclidean distance. The features
data and 30% testing data and the kNN is trained and the were also normalized. The dataset was then split into the
test data is classified. After testing the values 3, 5 and 7 same training and testing data using the same random seed,
for “k”, the value 3 was chosen because it resulted in the which was used for the kNN classification to generate mean-
highest accuracy. ingful results. The scikit-learn’s random forest classifier was
To enhance this method further, firstly, an additional pre- then used with 100 trees. Increasing the trees further up to
processing step was conducted by applying feature scaling to 512 did not change the accuracy of the classification.
the input data. The different features were normalized which
can impact the overall accuracy of the kNN. The standardiza- 3) MATRIX PROFILE
tion method from scikit-learn’s preprocessing package [27] The matrix profile technique was proposed in 2016 by
was used to perform this step. Secondly the Euclidean Yeh et al. [36] and is based on the all-pairs-similarity-search.
Distance (ED) of the peak was added as a third feature for This method, also known as similarity join, retrieves the
the kNN algorithm for classification. The ED was computed nearest neighbors for each object in a data collection. In
by adding up the distances between each of the datapoints the context of time series analysis, the method can be used
of a peak. to identify patterns or outliers. Yeh et al. [36] created a
Since there was a greater amount of cars compared to the “simple, fast, parallelizable and parameter-free” algorithm
other vehicle classes passing the sensors during the measure- called Scalable Time series Anytime Matrix Profile (STAMP)
ment, an oversampling technique was also used to improve which uses the concept of the matrix profile. Their algorithm
the accuracy of the kNN even further. Chawla et al. [28] was, at the time of publishing in 2016, the fastest to detect
stated that a class imbalance is present when the number motifs and anomalies in a time series. We found this tech-
of instances of the different classes are not approximately nique to be relevant and worth of exploration in this study, as
equal. This class imbalance problem results in wrong Wi-Fi-based signals can be modeled as time series. To utilize
detection of the dominant class which in this case are the matrix profile, the first step is to count the peaks in the
cars [29]. The fact other vehicle classes occur less often time series. A peak in the context of the matrix profile would
corresponds to the explanation by Chawla et al. [28] that be defined as a discord or anomaly. However, multiple peaks
“real-world data sets are predominately composed of “nor- with similar patterns occur within the time series which is
mal” examples with only a small percentage of “abnormal” the reason why the algorithm is not able to detect them as
or “interesting” examples” [29]. The Synthetic Minority an anomaly. This is called the “twin freak problem” [37].
Oversampling Technique (SMOTE) provided by the python This problem was investigated by He et al. [37] suggesting
package “imblearn.over_sampling” [30] was used to calcu- that the algorithm “fails to identify rare subsequences when
late the additional samples of the minority classes. Only it occurs more than once in the time series.”
classes with a minimum instance occurrence of 4 were Due to the “twin freak problem,” the matrix profile can-
included. Tests with the oversampling technique did not not simply be used for the peak extraction and classification.
result in significant improvements, and are not reported in Therefore a different approach was used in our study. The
this article. python library “STUMPY” [38] provides different algorithms
for motif and discord detection using the concept of the
2) RANDOM FOREST matrix profile. The initial step is to split the data into a
Random forest is a supervised machine learning classifica- training set which was chosen to be 70% of the data and
tion and regression method that uses a divide and conquer a testing set with the remaining 30%. Next, the new peak
approach [31]. Multiple classifiers are used to achieve a extraction was used on the testing dataset, and a distance

242 VOLUME 3, 2022


FIGURE 5. Matrix profile clustering approach. The signal strength of the 5 GHz frequency is visualized in blue. The vehicle class shown in red indicate 3 cars (left) with a class
of and a bus (right) with a class of 2. The clustering algorithm found 7 different clusters from the votes.

profile to the training dataset for each peak was computed, used for robustness and to ignore some outliers. Each cluster
which is basically an all-pairs-similarity-search. In this dis- contains the votes of classes and has to be assigned to one
tance profile, the global minimum is then sought-after. The peak (see Figure 5).
result is the index of the most similar subsequence in the
training dataset to the extracted peak. If a class was anno- 4) SUMMARY
tated within the length of subsequence, it was assigned to In summary, six algorithms were compared during the study:
be the predicted class for the particular peak.
• kNN (baseline): kNN without any preprocessing using
To make the detection more robust not only the global
the peaks duration and maximum amplitude as features;
minimum of the distance profile was chosen, but also the
• kNN (normalization): kNN baseline with duration and
three lowest minima in the testing set were used in conjunc-
amplitude as features, both normalized;
tion with a majority voting. With the previously explained
• kNN (ED+normalization): kNN baseline with ampli-
approach using the counting method in conjunction with the
tude, duration, and Euclidean distance as a third feature
pattern search, a major advantage of using the matrix pro-
added. All three features were normalized;
file is left out. All the previous methods need a counting
• Random Forest (ED+normalization): random forest
step and a classification step. The matrix profile provides
with amplitude, duration, and Euclidean distance as
the possibility to combine those steps. A first attempt uti-
features, all three normalized;
lizing this advantage but avoiding the “twin freak problem”
• Matrix profile 1: STUMPY Fast Pattern Search for
is proposed in the following text.
each peak. The global minimum was chosen to for the
The aim was to first use a manual extraction of peaks from
classification.
the training dataset. This extraction uses the same technique
• Matrix profile 2: STUMPY Fast Pattern Search for
that was used to find the ground truth of vehicles. Therefore,
each peak. Three lowest minimum where chosen in
no further changes had to be made to the data and the
conjunction with a majority voting for the classification.
extraction is not prone to errors except the ones that were
made during the data annotation or sensor errors. For each
manual extracted peak the distance profile to the testing E. GENERALIZABILITY
dataset is then computed. Thereafter all local minima were Ferguson [39] summarizes generalizability as the combina-
extracted from the distance profile. For each minima, a vote tion of internal and external validity of the findings. Validity
with the extracted peak’s class and the distance profiles value is a criteria for the quality of results and can be separated
at that specific index is assigned to the same index on the into internal and external validity. To interpret the results
testing dataset. All votes then form a cluster on the testing of an experiment, internal validity is necessary. However,
dataset. A Density-Based Spatial Clustering (DBSCAN) was the “external validity pertains to the generalizability of the

VOLUME 3, 2022 243


KANSCHAT et al.: WIRELESS-SIGNAL-BASED VEHICLE COUNTING AND CLASSIFICATION

TABLE 2. Accuracy results of the different classification methods for the measurement at the 08.12.2020 at the Mendelstraße.

treatment effect to other populations, settings, treatment two cars at the same time occur. On the Austermannstraße
variables, or measurement variables” [39]. which has only one lane, only cars and busses occurred. This
Generalizability in the context of the proposed system test generalizes the counting and classification methods spa-
has multiple dimensions. Based on Ferguson’s [39] defini- tially because the training and testing data were collected
tion, hardware and software generalizability are desirable in in different locations with a different distance between the
the different scenarios or settings. Also, different measure- sensors, different numbers of lanes for each scenario as well
ment variables like weather conditions could be investigated as having different vehicle classes appearing.
regarding generalizability. For the system to allow a large
scale deployment generalizability is also important. For the IV. RESULTS
generalizability of our proposed system, we tested it in Tables 2 to 6 show the results. The support values in the
three different road environments. An additional step was to tables indicate the amount of instances that were tested for
investigate the generalizability of the different counting and each method.
classification models. Temporal and spatial generalizability
were tested. Temporal generalizability relates to whether the βE
A. MENDELSTRAβ
counting and classification methods and their parameters This scenario, which had two lanes two directions, had two
apply to the same scenario but at a different time, or if separate measurements of about 3 hours each. Two notewor-
an adjustment to the parameters or learning data is needed thy observations are that the distance between the sensors
for a new measurement. The spatial generalizability related was smaller and that some cars slowed down significantly
to whether or not the counting and classification methods after seeing the traffic warning signs. In total 1416 cars and
and models can be used in other, entirely different road 6 busses were counted. The results are shown in Table 2.
scenarios. Mendelstraße - 08.12.2020: A first measurement at the
To test generalizability, first the training data from the Mendelstraße was conducted on 08.12.2020. A few obser-
16.12.2020 at the Austermannstraße was used to train the vations from the table is that the overall accuracies of the
models of the classification methods. Then testing data from machine learning approaches classifying the 5 GHz peaks
the 17.12.2020 at the Austermannstraße was classified (tem- are almost 10% higher than the 2.4 GHz classifications.
poral generalizability). For the spatial generalizability, the Regarding the classification techniques, the normalization
training data from the Mendelstraße and the testing data did not improve classification accuracies over the baseline
from the Austermannstraße were used. The Mendelstraße is (quite unexpectedly). The use of the Euclidean distance
the two lane street and the three vehicle classes car, bus and increased the accuracy for the 2.4 GHz slightly, but not for

244 VOLUME 3, 2022


TABLE 3. Accuracy results of the different classification methods for the measurement at the 09.12.2020 at the Mendelstraße.

the 5 GHz. The accuracies of both matrix profile techniques 2.4 GHz frequency had more NaV peaks extracted. In the
were much lower, compared to the previous models. 2.4 GHz frequency, no bus was detected compared to the
Mendelstraße - 09.12.2020: A second measurement at the 2 correctly detected bus instances in the higher frequency.
Mendelstraße was conducted on 09.12.2020. The results of This time the normalization improved the accuracy for the
classification methods are shown in Table 3. Again, the kNN, and the Euclidean distance also increased the accu-
5 GHz frequency counted more cars, and the classifica- racy of the 2.4 GHz classification. The random forest became
tion accuracy is higher in all cases. The highest overall slightly worse compared to the kNN baseline in 2.4 GHz
accuracy was scored by the kNN with normalized feature and the same accuracy for 5 GHz. The matrix profile’s over-
values with the 5 GHz frequency with 0.88725. Furthermore, all accuracies were again worse than the machine learning
the amount of 2 cars again were very low, and none was approaches. For the 5 GHz peaks, the accuracy of the method
detected. The feature normalization decreased the accuracy got closer to the previous methods, with the improved
of the kNN algorithm for 2.4 GHz but improved it for 5GHz. matrix profiling version scoring 0.96226. However, in the
After adding the peaks Euclidean distance, the accuracy was 2.4 GHz frequency, the majority voting system decreased
increased for 2.4 GHz but decreased for 5GHz. The random the accuracy.
forest had similar results to the kNN classification with a Austermannstraße - 17.12.2020: The results of the clas-
slight improvement in 2.4 GHz. This time, the matrix pro- sification methods for the second measurement at the
file results were closer to the machine learning results with Austermannstraße can be found in Table 5. Similar to the
0.67619 for 2.4 GHz and 0.82581 for 5 GHz. first measurement’s results in Table 4 the difference in peak
extraction of the frequencies are visible. The lower frequency
βE
B. AUSTERMANNSTRAβ 693 peaks without a vehicle class were extracted compared
The next scenario, which was a one way street had two to zero at the higher frequency. Furthermore, 8 cars and
separate measurements of approximately three-hour long 6 busses, which the 2.4 GHz frequency did not measure,
each. Two observations worth mentioning are that the dis- were perceived in the higher frequency. The best accu-
tance between the sensors was smaller and that some racy was achieved with the kNN method in combination
cars slowed down significantly after seeing the traffic with the feature normalization with 0.99061. The baseline
warning signs. In total, 1306 cars and 18 busses were algorithms overall accuracy was 0.97648 for 2.4 GHz and
counted. 0.98592 for 5 GHz. Normalizing the data decreased the result
Austermannstraße - 16.12.2020: The results of the first for the lower frequency but increased it for the 5 GHz.
measurement at the Austermannstraße are shown in Table 4. This time adding the peaks Euclidean distance to the input
Higher accuracy than the previous two-lane street can be feature increased both frequencies accuracies. The random
seen, with the enhanced kNN method even scoring 100% forest classifier worked best for the 2.4 GHz peaks and
accuracy in the 5 GHz peaks. The number of counted was especially good for bus detection. The enhanced matrix
cars for both frequencies this time was the same, but the profiling classification scored 0.98889 on the 5 GHz peaks

VOLUME 3, 2022 245


KANSCHAT et al.: WIRELESS-SIGNAL-BASED VEHICLE COUNTING AND CLASSIFICATION

TABLE 4. Accuracy results of the different classification methods for the measurement at the 16.12.2020 at the Austermannstraße.

TABLE 5. Accuracy results of the different classification methods for the measurement at the 17.12.2020 at the Austermannstraße.

bypassing the baseline algorithms accuracy with similar very low number of vehicles per measurement, making the
vehicle instances. results not very meaningful. Therefore, only the results of
the first measurement on 19.11.2020 are shown in Table 6.
βE
C. CORRENSSTRAβ The first problem, which occurred, was a signal loss in the
The measurements at the Corrensstraße on the 19. 24. and 25. 2.4 GHz frequency. The Wi-Fi disconnected multiple times
of November in 2020 were chronologically the first ones and for a brief period of time. Another problem was the location
had different problems during data collection and analysis. of the setup, which was next to a bus station. This resulted in
Overall only 320 cars and 35 busses were counted due to a a huge variation of duration that the busses needed to drive
low traffic density. For the analysis, the number of instances by the system. The results of the three measurements also
are split into three different measurements resulting in a exhibit better counting capability in the 5 GHz frequency.

246 VOLUME 3, 2022


TABLE 6. Accuracy results of the different classification methods for the measurement at the 19.11.2020 at the Corrensstraße.

TABLE 7. Models’ generalizability performances.


was used, and the peaks from Austermannstraße were tested.
A significant drop in accuracy ranging from 6% to 10% is
noticeable for the two machine learning approaches. The
enhanced matrix profiling method, however, was able to
score 0.97753 accuracy.
V. DISCUSSION
The research questions are now revisited, before we discuss
implications and limitations.

A. REVISITING THE RESEARCH QUESTIONS


1) HOW TO AUTOMATIZE THE COUNTING OF VEHICLES
ON THE ROAD BASED ON DEVIATIONS OF WIRELESS
SIGNALS?

The overall accuracies in all methods were much lower, The results of the peak extraction algorithms show that peaks,
ranging from 0.47863 to 0.97917. which do not correspond to a vehicle type, are counted and
passed to the classification algorithm. Because the NaVs (i.e.,
false positives) are also passed to the classification meth-
D. TEMPORAL AND SPATIAL GENERALIZABILITY
ods, it is desirable to avoid them as much as possible. The
The results of the model generalizability are shown in
median-based approach suggested and implemented in this
Table 7. Only the kNN (ED+normalization) and the matrix
work reduced the amount of irrelevant peaks extracted in the
profile 2 models were tested for generalizability at this point,
three scenarios both for the 2.4 GHz and 5 GHz signals by
because they led to the best results overall (Tables 2 to 6).
several orders of magnitude (Table 1). A median-based peak
The temporal generalizability was tested by extracting the
extraction is thus more effective for vehicle counting than
peaks from the first measurement at the Austermannstraße
a recovery-based approach and is an important step towards
and using them as the training data for the classifica-
automated Wi-Fi-based vehicle counting. In particular, it is
tions. Then the peaks from the second measurement at the
more robust to noise, and this means greater efficiency for
Austermannstraße were extracted and used as testing data.
the Wi-Fi based traffic monitoring approach as a whole.
For the kNN and random forest algorithm, both frequencies
could get a classification accuracy between 0.96190 and 2) WHAT ARE THE RESPECTIVE MERITS OF DIFFERENT
0.96750 for 2.4 GHz and 0.97806 to 0.98276 for 5 GHz. TYPES OF SIGNALS (5GHZ VS 2.4GHZ) FOR DETECTION
The enhanced matrix profiling approach was able to score AND CLASSIFICATION ENDEAVOURS?
an overall accuracy of 0.97753 using the higher frequency. The results have shown that the higher Wi-Fi frequency
For spatial generalizability testing, data from Mendelstraße (5 GHz) was superior to the 2.4 GHz, improving the overall

VOLUME 3, 2022 247


KANSCHAT et al.: WIRELESS-SIGNAL-BASED VEHICLE COUNTING AND CLASSIFICATION

amount of counted vehicles as well as the results of the At last, it must be mentioned that the scope of all observa-
classification algorithms. Furthermore, the higher frequency tions made in this subsection is limited to the classification
shows greater robustness in urban areas. For the detection of of cars and busses. There were too few instances of multiple
multiple vehicles passing the system simultaneously, the pre- vehicles passing at the same time in the data collected to
liminary results indicate that this was not possible. However, offer solid conclusions of the merits of the techniques on 2
the low amount of occurrences do not allow a final conclu- cars. The best vehicle classification accuracies in the scenar-
sion in that regard. The advantages of the 2.4 GHz is to have ios ranged from 83.4% to 99.8% (2.4 GHz) and from 78.6%
provided slightly better accuracy results, when it comes to to 100% (5 GHz). The received signal strength was used
the spatial generalizability of two models (kNN and random in this study to derive the amplitudes used as features (see
forest). Figure 4), but CSI provides more information than received
signal strength (see [40]). It is thus likely that using CSI as
3) HOW TO AUTOMATICALLY CLASSIFY THE TYPE OF the basis for the classification could improve the accuracies
VEHICLES CONSIDERING THE SHIFT OF PATTERNS IN even further, but this remains to be tested empirically.
WIRELESS SIGNALS?
The first lesson learned from Tables 2 to 6 is that the use 4) TO WHICH EXTENT DO MODELS LEARNED FROM A
of kNN along with normalization and Euclidean distance ROAD ENVIRONMENT CAN BE APPLIED TO OTHER
as a feature performed best for vehicle classification using ROAD ENVIRONMENTS?
the 2.4 GHz signal. This technique consistently performed When a trained model on one scenario was used to classify
best across scenarios. For the classification using the 5 GHz data from the same scenario at a different point in time, the
signal, the results are more inconclusive as no method con- accuracy values were comparable (Table 7). Thus, model
sistently provided the highest accuracy value across scenario. reuse within scenarios is possible and sensible. However,
kNN techniques seem to have a slight advantage, but the ran- when a trained model on one scenario was used to classify
dom forest approach has provided largely comparable results. peaks from another scenario, the accuracy of the method
A second lesson from the tables is that distance between the was significantly reduced. This suggests that reuse of models
sensors seem to matters. The accuracy results obtained at the across different scenarios comes with the cost of accuracy, or
Austermanstrasse, which is a one-lane street, are typically put differently that each model needs to be trained separately
about 10% higher than the results in the other scenarios. for every different road scenario. Even the enhanced matrix
This suggests that the shorter the distance, the better the profiling method was able to score 0.97753 accuracy, which
results, but the impact of distance on the accuracy results is would be a reasonable classification accuracy for a traffic
a matter that needs to be more systematically investigated monitoring system. However, it had the downside of not
in future work. A surprising observation from the tables, offering consistent results in the internal validity, which is
though, is that accuracy values of classifiers may differ by also necessary for generalizability.
some order of magnitude (about 5-7%) in a given scenario
(i.e., Mendelstrasse). It is unclear why this has been the case,
given that the two measurements were only 24 hours apart, B. IMPLICATIONS
with relatively similar conditions (see Section III-A). This Overall, the proposed traffic monitoring system fulfills many
too, needs a more systematic investigation in future work. A desirable requirements. The desirable characteristics men-
third lesson, is that matrix profiling techniques have some tioned in Barbagli et al. [6] are the capability of large-scale
potential, especially using in conjunction with the 5GHz deployment, being passive and operating at low power, being
frequencies. How to make them more robust across scenar- cheap, easy to install and maintain. A large-scale deployment
ios and signal frequencies is an interesting issue for further was not tested in this work; nonetheless, a few comments
work. can be made about regarding this. The main hurdle for a
Fourth, the choice of the location matters. Overall, the large-scale traffic monitoring deployment is the high cost
Correnstrasse led to lower accuracy values, compared to of an individual monitoring device. The proposed system
other scenarios. There were two possible reasons for this. being low cost is a first step in the direction of a large-scale
The first one, already mentioned, was the fact that the system deployment. The fact that the proposed approach is using
was installed next to a bus station. Therefore the busses that Wi-Fi signals and not interfering with the traffic, as well
stopped at the station were picked up by the system during as having the possibility to detect the traffic automatically,
acceleration. In contrast to the busses that did not stop at the makes it a passive system. The power consumption of the
station, the duration of the peaks was a lot longer, making setup was not quantified in this work; however, it was able
classification very difficult. Second, this scenario was located to operate during multiple hours with a car and motorcycle
in a more populated area than the previous ones. Thus, there battery. The amount of power consumption can therefore be
was a higher number of devices using the 2.4 GHz frequency, compared to the approach of Kochláň et al. [17] who indi-
leading to the signal being possibly more disturbed (there cated that their system is operating at low power due to the
were even signal losses during the data collection process fact that a car battery is enough to satisfy its energy needs.
for the 2.4 GHz channel). Furthermore, the system was installed in under one hour

248 VOLUME 3, 2022


for each measurement without stopping the traffic, demon- model can be used to classify data collected about the same
strating how easy it is to install. In addition, Wi-Fi signals road, but at a different point in time.
are not influenced by the day-night cycles and operate in There are a few directions for future work that can be men-
many different weather conditions. The accuracies obtained tioned. First, as mentioned above, it would be interesting
for the temporal generalizability suggest that in the absence to investigate the impact of spatial distance on the accu-
of more recent models, models trained on previous data may racy results more systematically. Also, the impact of noise
still provide useful results. Models trained on two lanes, two (e.g., competing signals on the 2.4 GHz frequency) could be
directions data can also be used for one-lane, one direction looked into more closely. In addition, the study outcomes can
streets, but at the cost of a decrease in accuracy. be well utilized in consideration with work of [41], improv-
ing public transport systems in cities. Developing over the
work of [42], the present study could be extended to evalu-
C. LIMITATIONS
ate the speed of the vehicles on the road. Finally, the matrix
The analysis is subject to multiple limitations as a result
profiling technique has shown promise. The density-based
of a dependency chain. First of all, the peak extraction
clustering algorithm DBSCAN was able to detect clusters
is limited to the actual raw data that is collected by the
of votes for individual cars but had problems combining the
system. If no signal change is recorded on either frequency
votes for busses. In future work, a method for deciding on
although a vehicle passed the system, a peak extraction is
a cluster’s class needs to be implemented. Furthermore, the
not possible. Furthermore, the classification is limited to the
problem that multiple clusters are detected within the time
outcomes of the peak extraction. Two extraction methods
window of a longer vehicle presents interesting challenges
were compared during the work. Both methods have differ-
for future work.
ent parameters, which have to be tuned for each scenario.
One input feature for the machine learning approach is the
duration that a vehicle spends between the two system’s REFERENCES
units. This means it is strongly influenced by the driver’s [1] A. Stevanovic, J. Stevanovic, K. Zhang, and S. Batterman, “Optimizing
behavior. This became clear when some vehicles drove very traffic control to reduce fuel consumption and vehicular emissions:
Integrated approach with VISSIM, CMEM, and VISGAOST,” Transp.
slowly at the Austermannstraße because the driver saw the Res. Rec., vol. 2128, no. 1, pp. 105–113, 2009.
traffic signs as well as at the Corrensstraße where some [2] P. E. K. Fiedler and P. H. T. Zannin, “Evaluation of noise pollution in
busses had to accelerate after stopping at a bus station. It urban traffic hubs—Noise maps and measurements,” Environ. Impact
Assess. Rev., vol. 51, pp. 1–9, Feb. 2015.
follows that the vehicles’ speed and length has to be taken [3] J. Khan, M. Ketzel, K. Kakosimos, M. Sørensen, and S. S. Jensen,
into account. During this work all streets had a speed limit “Road traffic air and noise pollution exposure assessment—A review
of 50km/h. Data visualization during the analysis showed of tools and techniques,” Sci. Total Environ., vol. 634, pp. 661–676,
Sep. 2018.
that some short cars as well as cars exceeding the speed [4] K. Wiegand, Challenges of the Day-to-Day Operation of a Traffic
limit were not picked up by the system. Increasing the tem- Monitoring Program, Georgia Dept. Transp., Atlanta, GA, USA, 2016,
poral resolution of the data collection could help mitigate p. 25.
[5] S. Gupta, A. Hamzin, and A. Degbelo, “A low-cost open hardware
this issue. The results also showed the limits of using the system for collecting traffic data using Wi-Fi signal strength,” Sensors,
2.4 GHz frequency. In total, the lower frequency picked up vol. 18, no. 11, p. 3623, 2018.
less cars and had problems with signal loss, which could be [6] B. Barbagli, G. Manes, R. Facchini, and A. Manes, “Acoustic sensor
network for vehicle traffic monitoring,” in Proc. 1st Int. Conf. Adv.
traced back to the amount of devices using it in a city. Also, Veh. Syst. Technol. Appl., 2012, pp. 24–29.
the number of vehicles passing through the streets used in the [7] D. Asiain and D. Antolín, “LoRa-based traffic flow detection for
scenario is arguably relatively small, and could be increased smart-road,” Sensors, vol. 21, no. 2, p. 338, 2021.
[8] A. Chourasia, B. R. Tamma, and A. A. Franklin, “Wi-Fi based road
through data collection in non-residential areas. Finally, the traffic monitoring system with channel hopping functionality,” in Proc.
manual data annotation, which was labor intensive, limited Int. Conf. COMmun. Syst. NETw. (COMSNETS), 2021, pp. 680–684.
the amount of vehicles that could be used during the data [9] S. P. Hoogendoorn, H. J. Van Zuylen, M. Schreuder, B. Gorte, and
G. Vosselman, “Microscopic traffic data collection by remote sensing,”
analysis. This also led to an under representation of certain Transp. Res. Rec. J. Transp. Res. Board, vol. 1855, no. 1, pp. 121–128,
vehicle types. Jan. 2003.
[10] M. Schreuder, S. P. Hoogendoorn, H. J. Van Zulyen, B. Gorte,
and G. Vosselman, “Traffic data collection from aerial imagery,” in
VI. CONCLUSION AND FUTURE WORK Proc. IEEE Int. Conf. Intell. Transp. Syst., Shanghai, China, 2003,
This work has deployed a low-cost, Wi-Fi-based system for pp. 779–784.
[11] B. Hoh, M. Gruteser, H. Xiong, and A. Alrabady, “Enhancing security
traffic data monitoring in three different urban scenarios. It and privacy in traffic-monitoring systems,” IEEE Pervasive Comput.,
then introduced and thoroughly evaluated a new algorithm vol. 5, no. 4, pp. 38–46, Oct.–Dec. 2006.
for the automatic counting of vehicles (car, busses). The work [12] C. Nanthawichit, T. Nakatsuji, and H. Suzuki, “Application of probe-
vehicle data for real-time traffic-state estimation and short-term travel-
also evaluated the performance of different models for the time prediction on a freeway,” Transp. Res. Rec. J. Transp. Res. Board,
classification of vehicles (car, busses). Using normalization vol. 1855, no. 1, pp. 49–59, Jan. 2003.
and three features (amplitude, duration, Euclidean distance) [13] A. Ryan, C. Ai, M. Knodler, and C. Fitzpatrick, “Evaluation of
small unmanned aerial system highway volume and speed-sensing
have helped to improve the accuracy over the baseline used applications,” IET Intell. Transp. Syst., vol. 15, no. 1, pp. 84–97,
in previous work. Our results also suggest that a trained 2021.

VOLUME 3, 2022 249


KANSCHAT et al.: WIRELESS-SIGNAL-BASED VEHICLE COUNTING AND CLASSIFICATION

[14] S. Byun, I.-K. Shin, J. Moon, J. Kang, and S.-I. Choi, “Road traffic [29] W. Liu and S. Chawla, “Class confidence weighted kNN algorithms
monitoring from UAV images using deep learning networks,” Remote for imbalanced data sets,” in Proc. Pacific–Asia Conf. Knowl. Discov.
Sens., vol. 13, no. 20, p. 4027, 2021. Data Min., 2011, pp. 345–356.
[15] J. F. Forren and D. Jaarsma, “Traffic monitoring by tire noise,” in [30] “imblearn.over_sampling.SMOTE—Imbalanced-Learn 0.3.0.dev0
Proc. Conf. Intell. Transp. Syst., Nov. 1997, pp. 177–182. Documentation.” [Online]. Available: http://glemaitre.github.io/
[16] R. Sen, “Different sensing modalities for traffic monitoring in devel- imbalanced-learn/generated/imblearn.over_sampling.SMOTE.html
oping regions,” Ph.D. thesis, Dept. Comput. Sci., Indian Inst. Tech., (Accessed: Jul. 31, 2021).
Bombay, India, 2014, p. 212. [31] G. Biau and E. Scornet, “A random forest guided tour,” TEST, vol. 25,
[17] M. Kochláň, M. Hodoň, L. Čechovič, J. Kapitulík, and M. Jurečka, no. 2, pp. 197–227, 2016.
“WSN for traffic monitoring using raspberry Pi board,” in Proc. IEEE [32] P. Bonissone, J. M. Cadenas, M. C. Garrido, and
Federated Conf. Comput. Sci. Inf. Syst., 2014, pp. 1023–1026. R. A. Díaz-Valladares, “A fuzzy random forest,” Int. J. Approximate
[18] H. B. Tulay and C. E. Koksal, “Road traffic monitoring using DSRC Reason., vol. 51, no. 7, pp. 729–747, Sep. 2010.
signals,” Dec. 2020, arXiv:2012.13448. [33] T. M. Oshiro, P. S. Perez, and J. A. Baranauskas, “How many
[19] A. Czyz̈ewski, J. Kotus, and G. Szwoch, “Estimating traffic intensity trees in a random forest?” in Machine Learning and Data Mining
employing passive acoustic radar and enhanced microwave doppler in Pattern Recognition (Lecture Notes in Computer Science, 7376),
radar sensor,” Remote Sens., vol. 12, no. 1, p. 110, Jan. 2020. D. Hutchison et al., Eds. Heidelberg, Germany: Springer, 2012,
[20] S. Zhang, M. Won, and S. H. Son, “WiTraffic: Non-intrusive vehicle pp. 154–168.
classification using WiFi: Poster abstract,” in Proc. 14th ACM Conf. [34] M. Fernández-Delgado, E. Cernadas, S. Barro, and D. Amorim, “Do
Embedded Netw. Sens. Syst. CD-ROM, 2016, pp. 358–359. we need hundreds of classifiers to solve real world classification
[21] M. Won, S. Sahu, and K.-J. Park, “DeepWiTraffic: Low cost WiFi- problems?” J. Mach. Learn. Res., vol. 15, no. 90, pp. 3133–3181,
based traffic monitoring system using deep learning,” in Proc. IEEE 2014.
16th Int. Conf. Mobile Ad Hoc Sens. Syst. (MASS), 2019, pp. 476–484. [35] N. Sirikulviriya and S. Sinthupinyo, “Integration of rules from a ran-
[22] M. Homchan and C. Aswakul, “Testing of vehicular traffic monitor- dom forest,” in Proc. Int. Conf. Inf. Electron. Eng., vol. 6, 2011,
ing technique by using WiFi packet measurement in software-defined pp. 194–198.
wireless mesh network,” in Proc. Int. Symp. Electr. Electron. Inf. Eng., [36] C.-C. M. Yeh et al., “Matrix profile I: All pairs similarity joins for time
2021, pp. 190–194. series: A unifying view that includes motifs, discords and shapelets,”
[23] G. Guo, H. Wang, D. Bell, Y. Bi, and K. Greer, “KNN model-based in Proc. IEEE 16th Int. Conf. Data Min. (ICDM), Barcelona, Spain,
approach in classification,” in On the Move to Meaningful Internet Dec. 2016, pp. 1317–1322.
Systems: CoopIS, DOA, and ODBASE (Lecture Notes in Computer [37] Y. He, X. Chu, and Y. Wang, “Neighbor profile: Bagging nearest
Science), R. Meersman, Z. Tari, and D. C. Schmidt, Eds. Heidelberg, neighbors for unsupervised time series mining,” in Proc. IEEE 36th Int.
Germany: Springer, 2003, pp. 986–996. Conf. Data Eng. (ICDE), Dallas, TX, USA, Apr. 2020, pp. 373–384.
[24] L. E. Peterson, “K-nearest neighbor,” Scholarpedia, vol. 4, no. 2, [38] “The Matrix Profile—Stumpy 1.8.0 Documentation.” [Online].
p. 1883, Feb. 2009. Available: https://stumpy.readthedocs.io/en/latest/Tutorial_The_
[25] M. M. Islam, H. Iqbal, M. R. Haque, and M. K. Hasan, “Prediction of Matrix_Profile.html (Accessed: Jul. 31, 2021).
breast cancer using support vector machine and K-Nearest neighbors,” [39] L. Ferguson, “External validity, generalizability, and knowledge
in Proc. IEEE Region 10 Humanitarian Technol. Conf. (R10-HTC), utilization,” J. Nurs. Scholarship, vol. 36, no. 1, pp. 16–22,
Dhaka, Bangladesh, Dec. 2017, pp. 226–229. 2004.
[26] Y. Song, J. Huang, D. Zhou, H. Zha, and C. L. Giles, “IKNN: [40] Y. Ma, G. Zhou, and S. Wang, “WiFi sensing with channel state
Informative K-nearest neighbor pattern classification,” in Knowledge information: A survey,” ACM Comput. Surv., vol. 52, no. 3, pp. 1–36,
Discovery in Databases: PKDD (Lecture Notes in Computer Science), Jul. 2019. [Online]. Available: https://dl.acm.org/doi/10.1145/3310194
J. N. Kok, J. Koronacki, R. L. de Mantaras, S. Matwin, D. Mladenič, [41] R. Zhang et al., “WiFi sensing-based real-time bus tracking and arrival
and A. Skowron, Eds. Heidelberg, Germany: Springer, 2007, time prediction in urban environments,” IEEE Sensors J., vol. 18,
pp. 248–264. no. 11, pp. 4746–4760, Jun. 2018.
[27] “6.3. Preprocessing Data—Scikit-learn 0.24.1 Documentation.” [42] S. Li et al., “WiFi-based device-free vehicle speed measure-
[Online]. Available: https://scikit-learn.org/stable/modules/preprocess ment using fast phase correction MUSIC algorithm,” in Proc.
ing.html (Accessed: Jul. 31, 2021). Int. Conf. Internet Things (iThings) IEEE Green Comput. Commun.
[28] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, (GreenCom) IEEE Cyber Phys. Soc. Comput. (CPSCom) IEEE Smart
“SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell. Data (SmartData) IEEE Congr. Cybermatics (Cybermatics), 2020,
Res., vol. 16, pp. 321–357, Jun. 2002. pp. 93–99.

250 VOLUME 3, 2022

You might also like