Ghost Target Classification Using Scene Models in Radar: Anton Sedin David Wadmark
Ghost Target Classification Using Scene Models in Radar: Anton Sedin David Wadmark
Anton Sedin
David Wadmark
In surveillance contexts, radars can be used to monitor an area, detecting and track-
ing moving objects inside it. Monitored areas in urban environments often contain
many surfaces that reflect radar waves, which can have the undesired consequence
of a single object producing multiple tracks due to multipath propagation effects.
This thesis considers a method of identifying if a track is produced by a real ob-
ject, or if it stems from multipath effects. The proposed method works by creating
a machine-learning-based classifier and modelling the monitored scene over time.
Tracks are assigned features based on their characteristics and the state of the scene
model in regards to their position. These features are then used as inputs to the
classifier model to produce the classification. We propose four machine-learning-
based classifier models, with two different sets of structures and features used. The
classifier models are compared to a naive classifier model for reference.
The proposed models all outperform the naive classifier, although some of them
are biased. As for the usefulness of the scene model, the results are mixed but show
promise. We believe that the scene model can improve classification performance
further with more and better data.
Keywords
radar surveillance, multipath, machine-learning, classification
3
Acknowledgements
We acknowledge the fantastic support and council that was provided by our in-
dustry supervisors Dr. Sebastian Heunisch and Dr. Aras Papadelis, as well as our
unofficial industry supervisors Dr. Stefan Adalbjörnsson, Dr. Anders Mannesson
and Daniel Ståhl. We would like to thank them for all the great discussions, remarks
and advice, and for providing superb feedback on the report with short notice.
Additionally, we would like to thank our opponents David Carpenfelt and Gustaf
Broström for their fair critique, and André Nüßlein for proofreading the report.
Finally, we want to give a heartfelt thank you to our families for their uncondi-
tional support.
5
Abbreviations
7
Contents
1. Introduction 11
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 The multipath problem . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2. Theory 13
2.1 Radar fundamentals . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 FMCW radar . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Tracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Multipath fundamentals . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Machine learning methods . . . . . . . . . . . . . . . . . . . . . 19
2.6 Scene mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.7 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . 24
3. Data 25
3.1 Gathering data . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Processing and annotating data . . . . . . . . . . . . . . . . . . 26
3.3 Structure of dataset . . . . . . . . . . . . . . . . . . . . . . . . 27
4. Scene model 28
4.1 Scene-specific information . . . . . . . . . . . . . . . . . . . . 28
4.2 Mapping reflective surfaces . . . . . . . . . . . . . . . . . . . . 29
4.3 Feature density maps . . . . . . . . . . . . . . . . . . . . . . . 30
5. Features 32
5.1 Track-specific features . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Scene-specific features . . . . . . . . . . . . . . . . . . . . . . . 33
6. Model structures and evaluation 34
6.1 General structure . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.2 Naive classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.3 Model 1 — Random forest with only track-specific features . . . 35
6.4 Model 2 — Random forest with scene-specific features . . . . . 35
9
Contents
10
1
Introduction
1.1 Background
Short-range radar sensors have become a field of interest in the surveillance industry
in recent years. Unlike video surveillance systems, radar-based surveillance systems
can operate independent of lighting conditions and they do not suffer from the same
large decrease in performance in foggy, rainy and snowy conditions. Objects are
detected by transmitting an electromagnetic signal from the radar, and collecting
the reflections from the surroundings. By using certain transmit patterns and com-
bining multiple receiver antennas in an array, the range, radial velocity and angle of
detected objects can be deduced. However, there is no direct way to tell if a signal
comes from a direct reflection from the object or if the signal has been reflected on
other surfaces. This is a big problem in radar surveillance systems that are mounted
in urban environments with many reflecting surfaces like walls, containers or cars.
11
Chapter 1. Introduction
Figure 1.1 Illustration of the multipath phenomenon. The radar signals travel along
two different paths back to the radar after reflecting of the target causing two tracks
to appear from one target.
1.3 Purpose
This thesis aims to investigate the possibility of using radar data to reliably clas-
sify whether or not a detected target is in fact real, or a result of the multipath
phenomenon. More specifically, we will try to solve this problem using machine-
learning methods, and investigate whether the classification can be aided by con-
structing a model of the monitored scene over time.
Scope
We make no attempt to distinguish between the type of objects such as pedestrians,
cars and bikes in this thesis, only whether the detected target is a ghost target or a
real target. All objects are treated the same, no matter what type of object it is. We
also assume that the radar is stationary. The investigation will be done using a single
radar sensor with proprietary signal processing and tracking software provided by
our industry supervisors. This thesis will not consider any changes to these systems.
12
2
Theory
13
Chapter 2. Theory
f Amplitude
t t
f Tx Rx Tx Rx
B
∆f
τ tc t
If there is an object within range of the radar, the chirp will be reflected from the
object and a time-delayed chirp will be received by the radar. This transmit-receive
pattern is illustrated in Figure 2.2.
The received signal is then mixed with the transmitted signal to a so-called inter-
mediate frequency (IF) signal. This is a sinusoid where the frequency and phase is
the difference of the frequencies and phases of the received and transmitted signal:
where ω1 and ω2 are the frequencies of the transmitted and received signal, and φ1
and φ2 are the phases of the transmitted and received signal.
14
2.2 FMCW radar
2Sd
f0 = ω2 − ω1 = Sτ = (2.2)
c
which is equivalent to ∆ f in Figure 2.2. Here, S is the rate of change of the chirp
frequency, d is the distance to the interfering object, and c is the speed of light. From
Equation (2.2) the distance from the radar to an interfering object can be deduced.
We will refer to this distance as the range of an object.
Doppler
To deduce the radial velocity of an object, multiple chirps are transmitted in rapid
succession. With the assumption that the object is traveling much slower than the
speed of light, the resulting IF signals of each chirp will have approximately the
same frequencies, but their phases (φ2 − φ1 in Equation (2.1)) will be different for
non-zero radial velocities. This is because the distance d to the object changes be-
tween chirps due to the movement of the object. This change in distance ∆d is small,
but significant in relation to the wavelength λ of the chirp. The relation between
phase difference ∆φ and difference in distance ∆d is [Rao, 2017]
4π∆d
∆φ = . (2.3)
λ
By transmitting chirps separated by a time of tc and inserting the relation ∆d =
vtc in Equation (2.3) we have
4πvtc λ ∆φ
∆φ = =⇒ v = (2.4)
λ 4πtc
where v is the radial velocity of an object. The signals received by transmitting a set
of chirps will be referred to as a frame from here on.
Azimuth
To determine the angle of arrival θ of an object, several equally spaced receiver
antennas are used. The incoming signals are assumed to be plane waves. The sce-
nario is illustrated in Figure 2.3, where the incoming signals to each of the receiver
antennas (black rectangles) are represented by the red arrows.
For non-zero angles, the received signals at each of the antennas will have a
difference in phase which we call ω. From this phase difference, the angle of arrival
of the received signal can be deduced using the relation
2πd sin θ λω
ω= =⇒ θ = arcsin . (2.5)
λ 2πd
15
Chapter 2. Theory
θ θ d sin θ
Figure 2.3 Incoming signal (red arrows) from an object at an angle θ to some
receiver antennas (black rectangles).
Fourier transforms
When several targets appear in front of the radar the transmitted chirps are reflected
multiple times, leading to the IF signal becoming a sum of sinusoids. To identify
the individual sinusoids, a Fourier transform is applied to convert this time-domain
signal into the frequency domain. From here the independent sinusoids that make
up the IF signal can be deduced by looking at where the peaks of the transformed
signals are. Since the converted signal is complex, each value containing both an
amplitude and a phase, we can also get the initial phase of each sinusoid by looking
at the phase of its peak in the frequency domain.
Doing this in practice means using a fast Fourier transform (FFT) [Duhamel
and Vetterli, 1990], which is a computationally efficient algorithm that computes
the discrete Fourier transform (DFT) of a sequence. When an IF signal is generated
from a receiving antenna, it is transformed using an FFT and the results are stored
in a vector. This is repeated for each chirp in a frame, the resulting vectors being
stored as rows in a matrix. For each range bin another FFT is applied, creating a
range-velocity plot. These steps are repeated for every receiver antenna, creating a
3D-array where a final FFT can be applied for each range-velocity bin. Thus, the
range, radial velocity, and azimuth angle of individual objects can be detected in a
scene with multiple reflections. These steps can be seen in Figure 2.4. For further
information, we refer to [Rao, 2017].
Signal strength
There are two measures commonly used in radar sensors for describing the strength
of the reflected signal. The radar cross-section (RCS) denoted by σ , is a measure-
ment of how well an object reflects radio waves. It interprets the object as if it were a
perfectly reflecting sphere, the cross-sectional area of which is the dimension of the
measure. While RCS formally describes the detected object rather than the strength
of the recieved signal, the two are correlated as can be shown by the radar range
equation [Richards, 2005]:
Pt Gt σ Ae (4π)2 r4 Pr
Pr = 2 4
=⇒ σ = (2.6)
(4π) r Pt Gt Ae
16
2.3 Tracker
Figure 2.4 Signal processing steps. (top left) range-chirp index matrix, (top right)
range-velocity matrix, (bottom left) range-velocity-antenna cube, and (bottom right)
range-velocity-azimuth cube.
where Pt and Pr are, respectively, the power to the transmitting antenna and the
received power. Ae is the effective area of the receiving antenna, a measure of its
geometric area and efficiency. Finally, Gt is the gain of the transmitting antenna,
and r is the distance from the radar to the object.
Another, more straight-forward way to describe the strength of the reflected
signal is the signal-to-noise ratio (SNR). For a bin i, in a set of bins I, we define this
ratio as:
Pr,i
SNRi = (2.7)
P̃r,I
where Pr,i is the received power in a specific bin and P̃r,I is the mean noise power
estimated here as the median of the received power in all bins. Note that SNR does
not take the distance to the detected object into account, meaning a small object
close to the radar could have a higher SNR than a large object further away.
2.3 Tracker
The purpose of the tracker is to follow the position of a target over multiple frames.
The high velocity resolution of millimeter wave radars means that an object like a
car, person, or even a small animal will give off multiple detections. This is not only
17
Chapter 2. Theory
due to the spatial extent of the object, but also due to different parts of the detected
object having different radial velocities, for example the swaying limbs of a person.
This phenomenon is known as a micro-Doppler effect [Chen et al., 2006]. After the
Fourier transforms, each frame contains a point cloud of detections, with each point
containing polar coordinates, radial velocity, RCS and SNR values. By grouping
these detections into clusters and connecting them over time, tracks are created. In
this thesis, this was done by proprietary tracking software, which we were given
access to by our industry supervisors. We will refer to this software as the tracker.
Updating tracks
Updating the existing tracks is based on a version of the well-known Kalman fil-
ter [Kalman, 1960]. If a track has an associated cluster, the new position will be
a weighted average of its predicted position and the position of the cluster. In
cases where a track has no association to a cluster, the tracker will perform dead-
reckoning to predict the current position of the object. If the track is not associated
with a new cluster frequently over a certain amount of frames, it will no longer be
considered alive, and subsequently be removed. Any remaining clusters that have
not been associated to any existing track are used to create new tracks.
18
2.5 Machine learning methods
In a binary classification problem such as the one in this thesis, the Gini impurity
can be simplified to
An optimal split which minimizes the Gini impurity is then a split where one set
only contains samples of class 1 and the other set only contains samples of class
2. An example of a decision tree fit to the dataset used in this thesis can be seen in
Figure 2.5. Two features, "track lifetime" and "closest" which are explained further
in Table 5.1 were used to create the tree, and the depth was limited to 2.
19
Chapter 2. Theory
Figure 2.5 Example of a very simple decision tree fit to the dataset used in this
thesis. The tree has a depth of 2 and only 2 features were used to create the tree.
and boosting (or feature bagging) [James et al., 2017]. Bagging is the practice of
sampling, with replacement, a subset of the dataset used for training, and then us-
ing that subset to fit the model instead of the original training set. Boosting is the
practice of restricting the features considered when making a split in a decision tree
to a random subset of the features. In this thesis, the number of features consid-
√
ered in the random forests used were n f rounded to the nearest integer, where
n f is the number of features in the feature vectors x. This is a common choice for
classification tasks [Hastie et al., 2008].
20
2.5 Machine learning methods
The result of combining several artificial neurons like this is commonly known as
an artificial neural network (ANN) [Gurney, 1997]. An ANN is usually divided into
layers where the output vector y from one layer is used as input to the next. The
structure of an example ANN is visualized in Figure 2.6. This particular network
accepts 5 input values and produces one output value.
Activation functions
The main purpose of using activation functions in artificial neural networks is to add
non-linearity to the network. Without activation functions, artificial neural networks
are nothing more than linear transformations of the input. Many activation functions
also keep the outputs of neurons from growing in an unbounded fashion. This can
cause ANNs to become unstable and can cause computational problems.
Three main activation functions were used in this thesis; ReLU, sigmoid and
softmax [Sharma, 2017]. ReLU is short for Rectified Linear Unit and is defined as
x if x > 0
ReLU(x) = (2.12)
0 otherwise .
In reality, sigmoid functions refer to a class of functions but in the context of ma-
chine learning it typically refers to the logistic function:
1
S(x) = . (2.13)
1 + exp(−x)
21
Chapter 2. Theory
The softmax function is a generalization of the logistic function beyond the one-
dimensional case. Consider a vector x consisting of K real numbers. The softmax
function σ produces a K-dimensional output where the output at index i is defined
as:
exp(xi )
σ (xi ) = K . (2.14)
∑ j exp(x j )
Supervised learning
With several input vectors xk and corresponding targets yk it is possible to adjust
the weights in an ANN to estimate a function which maps xk to yk . This process is
called supervised learning. For this process two things are necessary; a loss func-
tion and an optimization algorithm. The loss function computes a measure of how
poor the network output ŷk = f (xk ) is in relation to the label yk . A commonly used
loss function in classification contexts is the cross entropy loss function [Brownlee,
2020].
Consider a classification task where c is the target class index, and ŷ is the output
from the network, where each element ŷc in ŷ represents the score of class c. This
score is first approximated as a probability with the softmax function from Equation
(2.14), and then the negative logarithm of the likelihood of the data is calculated:
exp(ŷc )
L(ŷ, c) = −wc log . (2.15)
∑ j exp(ŷ j )
Dropout
To avoid overfitting the neural networks to the training sets, a regularization tech-
nique known as dropout [Srivastava et al., 2014] was used. In the implementation
of dropout that was used in this thesis, random elements of the input vector x are set
to zero after multiplication with the weights of each layer except for the final layer.
Each element is set to zero according to a Bernoulli distribution Zik ∈ B(p) where
k is the index of the layer and i is the index of the element in the vector. All Zik are
independent. The value of p used in this thesis was p = 0.3.
22
2.6 Scene mapping
and each cell can be given its own posterior probability of being occupied by an
object:
p(mi |z1:t , x1:t ) = p(mi |z1:t ) (2.18)
where z1:t is the set of all sensor measurements, and x1:t is the path of the robot
[Thrun et al., 2006]. The equality in Equation (2.18) follows from the fact that the
radar is static and its position is considered known in our case. Expanding on this
expression for the occupancy probability we get
and applying Bayes rule to the measurement model p(zt |mi ) we get
23
Chapter 2. Theory
By taking the logarithm of the odds of occupancy from Equation (2.21) and denoting
it lt (mi ) we arrive at an additive and thus more stable representation of occupancy
in grid cell mi at time t:
p(mi |zt ) p(mi |z1:t−1 ) p(mi )
lt (mi ) = log + log − log
1 − p(mi |zt ) 1 − p(mi |z1:t−1 ) 1 − p(mi )
(2.22)
p(mi |zt )
= log + lt−1 (mi ) − l0 (mi ) .
1 − p(mi |zt )
The expression lt−1 (mi ) represents the log-odds of occupancy of grid cell mi at time
t − 1 and l0 represent the prior log-odds of occupancy of the same grid cell. Thus,
all we need to construct the map is the prior occupancy probability and a way to
calculate p(mi |zt ).
This can be done with a so-called inverse sensor model [Thrun et al., 2006].
Given a measurement, this sensor model updates the map according to a probability
distribution based on the accuracy and characteristics of the sensor. One way to do
this is by raising the occupancy odds in the cells around the measurement point, and
also lowering the occupancy odds in the cells between the measurement point and
the position of the sensor. It is worth noting that the inverse sensor model assumes
that there is no correlation between the occupancy of a cell and the occupancy of its
neighboring cells.
24
3
Data
A common sentiment in the area of machine learning is that a model can only be as
good as the data it is trained with. This is of course true, as the model can only be
trained to make predictions about samples that are already in the dataset used and
there are no guarantees for the quality of predictions on data from new scenarios and
contexts. Another aspect is that the dataset needs to contain enough samples for the
model to be able to accurately capture the information contained within the current
dataset. Therefore, a large and diverse dataset is essential to produce a robust and
accurate machine learning model.
Table 3.1 Parameters for the radar used to record the data used in this thesis.
Radar type FMCW
Operating frequency 60 GHz
Frame rate 10 frames/s
Maximum range 138 m
Range resolution 0.765 m
Velocity resolution 0.098 m/s
Azimuth accuracy 1◦
25
Chapter 3. Data
recordings can be divided into two subsets; 13 short ones around one to three min-
utes each, and three long ones around 10 minutes each. They mostly feature people
walking in the scene. Some recordings also contain cars and bikes, but in this thesis
we make no attempt to distinguish between the type of objects.
26
3.3 Structure of dataset
Figure 3.1 Plots of tracks generated by the tracker. A simple frame (left) where
track 2203 is real, and a more problematic frame (right) where it is difficult to tell
which tracks are real.
27
4
Scene model
28
4.2 Mapping reflective surfaces
29
Chapter 4. Scene model
3. For each sampled point, update the grid cell containing the point with proba-
bility p = p0 + αm /Ns according to Equation (2.22).
4. For each sampled point, use Bresenhams line algorithm [Bresenham, 1965]
to find all grid cells in a straight line from the point to the radar at the point of
origin. For all these grid cells, update the occupancy value with probability
p = p0 − βm /Ns according to Equation (2.22).
The parameter p0 is the prior probability of occupancy and σr2 and σθ2 are known
beforehand from the forward sensor model. The user parameters Ns , αm and βm are
chosen with regards to the computational complexity and to how much the occu-
pancy probability is updated when a detection is received. Because the radar in our
model is stationary and detections are accumulated at a high frequency in some grid
cells, αm and βm were chosen to be much lower than what was first theorized to be
appropriate values.
30
4.3 Feature density maps
This process results in a map similar to the occupancy grid map, but with the pur-
pose of mapping common or a lack of common occurrences of certain features.
31
5
Features
Features are the properties of the tracks that were fed into the classifiers. They
were chosen in order to capture only the most relevant information in the data, this
selection process is described in Section 6.8. The features are divided into two types,
track-specific features which were extracted from the data generated by the tracker,
and scene-specific features which were extracted from the scene models.
32
5.2 Scene-specific features
33
6
Model structures and
evaluation
34
6.3 Model 1 — Random forest with only track-specific features
35
Chapter 6. Model structures and evaluation
Figure 6.2 Model structure of classifier which uses a scene model to improve clas-
sification performance.
36
6.7 Evaluation
6.7 Evaluation
In order to evaluate the performance of the classifiers, leave-one-group-out cross-
validation (LOGOCV) was used. LOGOCV is a special case of leave-one-out cross-
validation [Magnusson et al., 2020], the difference being that LOGOCV omits a
group of samples in each training set instead of a single sample. The groups in
our case were the different recordings in our dataset. Data from all but one record-
ing was used to train the classifier model, and then precision, recall and F1 -score
were calculated for the two classes, as well as the weighted and unweighted aver-
ages of these metrics over the classes. This was then repeated for all the record-
ings in the dataset. Finally, the metrics were averaged over all the splits, with the
splits weighted equally regardless of the length of the recording being evaluated.
We consider this method to be the harshest but most fair method of evaluating the
classification performance.
Another method of evaluating the classification would be to split every track
in every frame randomly into a training set and evaluation set regardless of which
recording the track is from. We will refer to this as random split evaluation. This
yields a much better classification performance, but tells us little about how the
classifier would perform in a completely new setting.
37
Chapter 6. Model structures and evaluation
forests as well as the number and size of the layers in the multilayer perceptrons
were chosen large enough such that an increase in the size of the models would
yield no significant improvements in the classification performance. To make this
feasible this was done without cross-validation, by instead using the random split
method of evaluation. Small models were trained and evaluated, iteratively increas-
ing the model sizes until no significant increase in performance could be measured.
38
7
Results
39
Chapter 7. Results
40
7.1 Classification results
score for ghost targets is very high. We can see that there is an interesting difference
between the classifiers that use a random forest, and the ones that use a multilayer
perceptron. The difference is that the classification results are more balanced for
Model 1 and Model 2, while Model 3 and Model 4 classify more samples as ghost
targets. A possible explanation for this is that there are a few features which have a
particularly strong correlation with the class of the sample. Due to the inbalanced
dataset, the multilayer perceptrons can transform these features such that most of
the samples get classified as ghost targets. However, a random forest consists of
many decision trees which each only uses a subset of the features. If these particu-
larly informative features are not selected when fitting a particular decision tree, the
classification result could be worse for that particular tree, essentially meaning that
the results are more random and thus more balanced.
Table 7.6 Classification results using the random split evaluation method for
Model 1.
Class precision recall F1 score
Ghost target 0.93 0.96 0.94
Real target 0.84 0.75 0.80
weighted average 0.91 0.91 0.91
41
Chapter 7. Results
Table 7.7 Classification results using the random split evaluation method for
Model 2.
Class precision recall F1 score
Ghost target 0.97 0.99 0.98
Real target 0.95 0.90 0.92
weighted average 0.97 0.97 0.97
Figure 7.1 Photograph (left) and aerial view (right) of one of the recorded scenes.
The red triangle in the right image represents the position and orientation of the radar.
The blue rectangle represents the metal container that was missing in the original
image.
42
7.2 Visualization of scene models
Figure 7.2 Occupancy grid map produced by recording the scene shown in
Figure 7.1. The walls of the building, the container and the metal furniture are visible
in the map.
Figure 7.3 Density maps of "closest" feature (left) and "strongest" feature (right).
Another example of a recorded scene can be seen in Figure 7.4. The scene con-
sists of a half-full parking lot with a footpath through the middle of it and some trees
next to the path. Data was collected for approximately 10 minutes, during which
some people walked in the parking lot and along the footpath in the middle of the
scene. Figure 7.5 shows the occupancy map to the left and density map of the closest
feature to the right. The reason that the occupancy map looks more sharply defined
than the occupancy map in Figure 7.2 is that the map has been created over a longer
period of time, and has thus been updated with more detections. In the occupancy
map, a problem with the occlusion feature can be seen. The cars and trees in front of
the radar resemble a wall or fence in the map, and thus all tracks appearing behind
this line of trees and cars will have a high occlusion feature value even though they
are not really occluded by a reflective surface. The left image in Figure 7.5 shows
what a feature density map might look like in a more realistic scene. The footpath
in the middle is clearly visible, where the presence of real targets is high relative to
the rest of the scene.
43
Chapter 7. Results
Figure 7.4 Photograph (left) and aerial view (right) of one of the recorded scenes.
The red triangle in the right image represents the position and orientation of the radar.
Figure 7.5 Occupancy map (left) and density map of "closest" feature (right) pro-
duced by recording the scene in Figure 7.4.
44
8
Discussion
8.1 Limitations
Quality of data
A significant problem with the data collected and used in this thesis is that some
sequences do not really resemble realistic scenarios. Some sequences of data were
recorded with the specific purpose of eliciting ghost tracks, and this data can cer-
tainly be used to train classifier models that only use track-specific features. How-
ever, when building a model of the scene it is preferable to have longer and more
realistic sequences. This is because of the fact that more detections are collected
to improve the quality of the maps of the static environment and the feature den-
sity maps. Longer and more realistic sequences of data are necessary to improve
the quality of the scene-specific features created, and furthermore, to improve the
performance of the classifier models that use scene specific-features.
Another consequence of these sequences recorded with the purpose of elicit-
ing ghost tracks is that the dataset is quite imbalanced, with about three times as
many samples from ghost tracks compared to real tracks. An attempt was made to
mitigate this, with weighted loss functions and samples as described in Chapter 6.
Regardless, the classifier models that were trained in this thesis have a propensity
towards classifying samples as ghost tracks. To make a classifier more suited to
realistic scenarios, more realistic data would be needed for the training process.
One final issue with the data used in this thesis is the limitations of the pro-
prietary tracking software and the manual annotation process that are outlined in
Section 3.2. These limitations would lead to a small amount of tracks being anno-
tated incorrectly. A solution was briefly discussed where ambivalent or problematic
tracks would be pruned or split by manually altering the data. This was, however,
deemed too time-consuming for the marginal gain that would probably be achieved.
45
Chapter 8. Discussion
duced some methods for evaluating grid maps, most promising being the solution
proposed by [Schwertfeger et al., 2010], however it would take quite some time to
implement. Since the purpose of this thesis is to classify ghost tracks, and not to
build as accurate a scene model as possible, it was deemed satisfactory to use its
effect on the classification as the only metric of its performance.
This, however, leads to another issue. Since the scene models are built itera-
tively over time, the quality of the scene models and thus also the scene-specific
features, will be low for all samples early in the recording. In a realistic scenario,
the radar will monitor the same environment for months, perhaps even years. As-
suming that mapping parameters are chosen so that the scene model converges after
a week, all tracks after that point will be classified using roughly the same converged
scene model. The effects of this could have been investigated by training and eval-
uating the classifiers where all scene-specific features are extracted from the final
converged scene models at the end of the recording. Additionally, some perfect and
some purposefully poor scene models could have been generated and tested in the
same way to see how much this impacted the classification results.
46
8.2 Future work
Classifier
An area of improvement for the classifier is to better utilize the information con-
tained in the temporal evolution of the features of a track, perhaps by using a Re-
current Neural Network. Another alternative is to represent the features of a track
as spectrograms which are used as input to a convolutional neural network. Another
way of classifying tracks could be to perform a classification after the track has
been active for some fixed period of time, and then never change the classification.
As many ghost tracks are short lived, and samples with a short current lifetime are
prone to be classified as ghost tracks, this could alleviate some incorrect classifica-
tions of real tracks at the beginning of their lifetime.
Scene model
In its current implementation, the scene model uses the data to build its maps im-
mediately as it is available. However, in a real implementation, the scene model
could be built over a much longer time span than the data in this thesis has allowed.
Therefore, it would be possible to take a slower approach to mapping. One way is
to wait until a track has died, and then analysing it over its entire lifespan before
building the scene model using that information. This could potentially be used to
filter out tracks which are more uncertain from being added to the feature density
maps.
The occupancy grid map could be improved by adding a filtering step before the
scene-specific features are generated. Objects like trees and metal poles generate
a lot of static detections, but they do not necessarily block detections from people
walking behind them. An extreme example of this can be seen in Figure 7.5, where
the row of trees create what the scene model treats as a solid wall. Finding a way to
identify which static objects are obscuring objects behind them, and which are not,
would mitigate this problem. The occupancy grid maps produced could also be used
to find lines in the maps and thus more rigidly infer the position of walls and large
reflective surfaces in the scene. One could also expand on the work in [Nüßlein,
2021] to try and detect reflective surfaces by looking at the positions of ghost tracks
in the scene, and guessing where reflections have occurred. Finally, removing the
assumption that cells in the grid maps are independent could produce better maps
at a cost of more computationally complex map updates [Thrun et al., 2006].
47
9
Conclusion
The proposed classifier models prove especially effective in detecting ghost tracks,
but a significant proportion of the real tracks are misclassified as ghost tracks, es-
pecially in the case of Model 3 and Model 4. If the objective is to minimize false
alarms, this is preferable over having a low recall score for ghost tracks. Regardless,
we believe that this flaw could be mitigated by either using balanced sampling of
datapoints when training the models, or by extending the dataset with more bal-
anced data.
We believe that many of the comparative features proposed such as closest and
relative SNR provide a solid base on which many different types of classifiers can
be constructed. It is also our opinion that the scene-specific features extracted from
the scene models have the potential to improve the classification results. However,
to be able to utilize and evaluate these features, data recorded over longer periods
of time will be necessary, preferably from radar installations in real environments.
From a purely visual perspective, the occupancy grid maps produced surpris-
ingly accurate maps for many of the recorded scenes. The mapping algorithm is
quite simple and customizable, although it can be difficult to tune the parameters so
that the algorithm produces accurate maps for multiple scenes with just one set of
parameters.
We believe that this thesis shows that it is possible to reliably classify ghost
tracks with a stationary radar using our proposed method. However, there is a lot of
potential for better results with higher quality data, and the proposed improvements
to our models.
48
Bibliography
49
Bibliography
50
Bibliography
51
Document name
/XQG8QLYHUVLW\
0$67(5¶67+(6,6
'HSDUWPHQWRI$XWRPDWLF&RQWURO Date of issue
%R[ -XQH
6(/XQG6ZHGHQ Document Number
7)57
Author(s) Supervisor
$QWRQ6HGLQ 6HEDVWLDQ+HXQLVFK$[LV&RPPXQLFDWLRQV$%
6ZHGHQ
'DYLG:DGPDUN $UDV3DSDGHOLV$[LV&RPPXQLFDWLRQV$%6ZHGHQ
%M|UQ2ORIVVRQ'HSWRI$XWRPDWLF&RQWURO/XQG
8QLYHUVLW\6ZHGHQ
5ROI-RKDQVVRQ'HSWRI$XWRPDWLF&RQWURO/XQG
8QLYHUVLW\6ZHGHQ H[DPLQHU
Keywords
UDGDUVXUYHLOODQFHPXOWLSDWKPDFKLQHOHDUQLQJFODVVLILFDWLRQ
Classification system and/or index terms (if any)
Supplementary bibliographical information
ISSN and key title ISBN
Language Number of pages Recipient’s notes
(QJOLVK
Security classification
KWWSZZZFRQWUROOWKVHSXEOLFDWLRQV