Python Drone Tracker Project Report
Python Drone Tracker Project Report
1
Machine Translated by Google
introduction
Prior to this project, a project was carried out to characterize and implement a video glider tracking algorithm.
The algorithm is implemented in Matlab and runs at very low rates and far from Time-Real.
The purpose of this project is to take another step towards real-time running of the algorithm on a skimmer, which will allow it (with the help
of a complementary control algorithm) to autonomously track other skimmers with the help of a camera only.
To enable the execution of such an algorithm on the Embedded platform, the project contains two parts:
1. Examining the existing trackers' performance in Python in the OpenCV library, in terms of tracking performance and runtime. 2.
Rewrite the algorithm in Python, fix bugs, optimize and test performance.
2
Machine Translated by Google
Table of
Contents Introduction ............................................... .................................................. .................................................. ..... 2
Points of interest according to the picture of the differences ............................................. .................................................. ............ 8
SVM classified training for separation between skimmer and background with the use of Alexnet ....................................... ..................... 9
Appendix - SVM classified training for separation between skimmer and background from layer fc7 in Alexnet ..................................... ........... 17
3
Machine Translated by Google
The algorithm that preceded this work was developed in Matlab and tested using a group of videos taken from a hovercraft. The followers in OpenCV
were tested using the same videos (numbered as in Matlab) starting from exactly the same bounding
10
11
12
13
14
17
18
19
20
21
22
23
24
25
26
27
It can be seen that the best tracker for examination and in-depth is MILTrack , which shows good tracking performance on most videos while reasonably
4
Machine Translated by Google
This is a Tracking Object algorithm that belongs to a group of tracking methods called "detection by tracking" that have good runtime
performance and are suitable for time-real.
These methods train an adaptive classifier whose purpose is to separate the object from the background. To train the classified online
Use the current state of the follower each time to extract positive (object (and negative) background) examples.
This method is susceptible to small inaccuracies in the tracker's performance that result in defective training examples and rapid trailing
of the tracker. MILTrack tries to deal with this problem by using (MIL (Learning Instance Multiple so instead of training with a single
positive example) very sensitive to inaccuracies (train on a positive "sack", which contains images from the following environment so
that at least one example in the bag is positive.
The algorithm is based on the concept of Boosting - a combination of many weak classifiers) usually included a decision on a single
feature (to get one strong classifier, with each stage updating the weak classifiers one after the other according to the feature that best
separates positive and negative examples.
By using a positive "sack" that contains the object, the algorithm manages to cope well with changes and partial concealments, but
cannot deal with complete concealment.
Thanks to the adaptive classification, this tracker shows very good performance on the glider's videos, and manages to cope
with the rapid shifts, scale changes and the angle from which we see the glider. It can be seen that thanks to the following classifier
also copes well with the changing background, and all videos in which the follower fails completely (24,23,21,17) are such that the
glider starts small and has a problematic background, so tracking fails immediately in the first frames before there was significant
learning of the classifier. Skimmer in the background.
Frame(t) Frame(t+1)
Example of a MILTrack operation: The tracker predicts the position of the face in the next frame, takes pictures in the vicinity of the
object and treats them as an X1 -positive "sack" where the object should appear. Each negative example is in its own "sack".
5
Machine Translated by Google
6
Machine Translated by Google
if center_point(1)-ceil(size_to_crop(1)/2) < 0
size_to_crop(1)=size_to_crop(1)-abs((center_point(1) -
ceil(size_to_crop(1)/2))*2) ;
end
center_point (1) -ceil (size_to_crop (1) / 2) = 0 Q in case no and he
For example, for a 100X100 patch and a midpoint of [50,50] in the original image, the code will actually cut a 99X99
patch where the center point is in the first pixel before the middle, which causes a slight deviation in the continuation
of the algorithm.
The code in Matlab does not support situations where the skimmer is at the edge of the image. The first thing that crashes
is trying to insert an image smaller than the expected queue 2_cropHistory_img_ref , which causes an error.
()+ =ÿ(,)ÿ( )
,
And in python I used a ready-made function of opencv that uses the extent defined in the article : http: //
www.bmva.org/bmvc/1988/avc-88-023.pdf , Harris & Stephens of the original
2
= it () - (())
The function parameters were chosen to be as similar as possible to the implementation in Matlab:
The implementation of the library 1. Matlab a gradient computer and then passes through a Gaussian filter with 1
.
Python uses the Sobel.cv2 () function, which combines the gradient and the Gaussian filter. Therefore the
parameter 1 = -SIZE_K_SOBEL_HARRIS was chosen, which uses a 3X3 kernel that is in a sense equivalent to 1.
=
2. The free parameter for sensitivity is set to a value of 04.0 which is an acceptable value for the parameter, but has
no equivalent in the Matlab code.
3. The value of 2 = SIZE_BLOCK_HARRIS which refers to the size of the window in Harris is minimal, so
That as in the realization in Matlab we will not disqualify corners that are close to each other.
B. The Matlab code uses a nonmaxsuppts function that does not exist in python and I implemented myself,
The function of the two functions is the same as the realizations of the dilate function in python compared to imdilate in
Matlab, which behave similarly for several images I tested.
7
Machine Translated by Google
A. Registration - In Matlav the registration is performed by the imregcorr function based on phase
correlation and computerization of the similarity type (4 degrees of freedom. Knows the model of sliding, rotating and
stretching). The python code uses another method - ECC, which is described in the article:
http://xanthippi.ceid.upatras.gr/people/evangelidis/george_files/PAMI_2008.pdf
It is not possible to try to calculate similarity, so we will try to calculate a characteristic change (6 degrees of freedom).
The algorithm has a parameter of several iterations that greatly affects the overall performance and runtimes. I set
I used the objects_small_remove.morphology function from the skimage library that performs the same morphological operation
(open) with the same parameters: Download each component with less
than 3 = size_min pixels when diagonal pixels are also considered connected (8 = connectivity, which is also the default
in Matlab.)
In Matlab I use SURF, and in Python I use ORB which is the only descriptor given http://www.willowgarage.com/sites/default/files/
For each track, try to see if a new point of interest is appropriate for it. The adjustment is based on a weighting between two indices,
where the point with the minimum total index corresponds to the track. The two indices are:
a. The distance of the new point from the expected position of the glider
b. The distance between the descriptors of the new point compared to the last point in the track
of this parameter is 10.) After the division, the window / dis size is usually a number between 0 and 10.
The distance in SURF is a very small number (usually less than 1,) and the same calculation for the distance in ORB gives a much
higher number (usually tens to hundreds). So it turns out that if you stay with the same formula, the criterion of distance
between points will not affect tracking .
To balance the effect of the two indices, I divided the distance in the ORB by 512, which brought the index to a reasonable size.
Now even the threshold for a minimum distance test is no longer valid (because different sizes are summed), but in the absence of a
smarter choice I left it at the original value 2 = dist_min.
8
Machine Translated by Google
SVM-classified training for separation between skimmer and background with the use of Alexnet
The alexnet network I used in python has the same structure as the network in Matlab but is fostered on different examples, so its
Not documented what examples were used to train the classifier in Matlab, so I took out by the follower in Matlab pictures
of skimmers) from the frames he follows properly (and background) from the frames he does not follow properly (and I trained on
them.
The size of the database created is 4614 skimmer images and 6874 background images, and they were divided into 60% images
for train and 40% images for test.
In practice, although when checking on the set test you get a score of 4.99%, the accuracy of the skimmer / background
decision in the videos themselves seems mediocre, and this is due in my opinion to overfitting.
Because we took a lot of consecutive images from the follower in Matlab, it is likely that for most of the images in set test there is
an image that is very similar to them in set train, so even though the classifier does not perform well in separating skimmers and
Drone
Background
9
Machine Translated by Google
Performance comparison
Tracking
performance Comparison of videos from the mp4.GOPR0014 file that appear by the same numbering in m.GetVideoInfo in Matlab
Yes
6
Yes
7
Yes
8
Yes
9
Yes
10
Yes
11
Yes
12
Yes
13
Yes
14
Yes Yes yes yes yes yes yes yes yes yes loses 17
Yes
18
Yes
after half video yes no 19
Yes
20
No Yes yes no yes yes yes yes yes yes yes yes yes yes no ran 22
no
Yes
23
Yes
Yes Yes No 24
Yes
no no 25
No 26 27
Yes
Yes
No no Yes
No no no yes yes no no
Yes
Yes lose after half video lose after half video Yes no ran
It can be seen that the algorithm's performance did not change significantly between Matlab and the implementation in Python. In general, MILTrack tracks better and more stably than both implementations even
though it has not been specifically designed for tracking skimmers, and there are a number of particularly difficult videos in which all algorithms fail - 26,24,23,17.
10
Machine Translated by Google
Running time
performance Running times in Matlab
When running video number 1, which is about 5 seconds long, the main loop of the code ran 206 seconds for 345 frames, ie in FPS 68.1. Matlab's
And it can be seen that there are two main processes that limit the running pace (marked in color).
Calculating the conversion for registration alone takes 6s.102 , half the total running time.
The skimmer / background classification process with the help of Alexnet and SVM is also time consuming, and takes about 40s (of which about 30% of the time
11
Machine Translated by Google
When running video number 1, which lasted about 5 seconds, the main loop of the code ran 64 seconds for 345
And it can be seen that here too there are two main processes that limit the pace of running.
The process of finding points of interest is the most expensive and alone takes 7s.36 , about 55% of the total running time.
, The process of arranging the tracks that includes the skimmer / background classification with the help of Alexnet and SVM is also time consuming and takes
Let's look at the list of the most valuable functions in time to understand exactly what actions are holding us back:
And you can see that there are two basic operations that are particularly expensive:
findTransformECC .1
As part of the process of finding points of interest from the difference picture, this is the function that finds a characteristic transform for the
purpose of registration between successive frames. This function is called once per frame, and takes about 55% of the algorithm's runtime
- 1s.0 per read. This alone limits the running time performance to FPS 10.
As part of running the algorithm it is possible to select a parameter of several iterations and it is the main one influencing the run time, and is
2. Switching to Alexnet
Transferring images through alexnet to extract features to the svm classifier is a relatively expensive operation, which takes (running) on my
cpu (about 33ms per read and can be done multiple times per frame. At a discount of 4
Average readings per frame (this is roughly the ratio across all the videos tested), the transition on the network alone limits us to the FPS 8
area. If the transition on the network runs parallel on the GPU, performance can be improved
Significantly.
12
Machine Translated by Google
The optimization and realization of the algorithm in Python accelerates the runtime by at least 5 times and brings us closer to the possibility of
running in Time-Real. The transition to Python allows a simple transition between platforms and operating systems.
The main thing that is currently delaying the algorithm (both in the implementation of Matlab and in Python) is finding the conversion that is
required for registration between consecutive images, which is necessary for the use of the differential image. Choosing a more successful
alternative for finding points of interest in the image will allow for significant acceleration of the algorithm and real-time running.
As part of the alternatives in Python, it was found that the MILTrack tracker fits our problem and has better performance both in terms of tracking
To further improve performance, one can take the concept of MILTrack and adapt it to skimmer tracking in our scenario:
1. Adding MTT logic and saving several Tracks at any given moment.
2. Taking advantage of the fact that we only monitor skimmers and using an SVM classifier that we have trained specifically for skimmers.
13
Machine Translated by Google
Appendix - Installation
Instructions Install Pycharm
Installing Anaconda
If it is not possible to click OK, delete the venv folder and repeat steps b, a.
14
Machine Translated by Google
Installing directories
2. Install the following packages in the order in which they appear, the exact version must be observed:
Package Version
opencv-python 3.4.5.20
joblib 0.13.2
numpy 1.11.3
scipy 1.2.1
scikit-image 0.13.1
sklearn scikit- 0.0
learn matplotlib 0.20.3
1.5.1
Pillow 2.9.0
Installing Caffe
Installing CMake
Git installation
WITH_NINJA = 0 .1
PYTHON_VERSION = 3 .2
CONDA_ROOT = < your conda root, e.g C:\Program Files\Anaconda3> .3
15
Machine Translated by Google
The prepared bvlc_alexnet folder in the caffe / models / bvlc_alexnet folder you have replaced. 1
There are bvlc_alexnet.caffemodel and deploy_fc7.prototxt that the files make sure have. 2
16
Machine Translated by Google
Appendix - SVM training tool for separating skimmer and background from layer fc7 in Alexnet The
classifier_svm_tracker_drone folder has two useful scripts, one for train and one for test, to alleviate the
memory limitations required to load all images and operations on them.
Changes to the code before using it for training and testing a new classifier:
1. Change the path to the Database that contains the skimmer and background images
3. Changing the path to the trained model (important) so as not to accidentally override the model used (
1. Change the path to the Database that contains the skimmer and background images
17
Machine Translated by Google
Typical results on 4614 skimmer images and 6874 backgrounds) of which 60% training and 40% testing (:
train_svm_classifier.py Running
test_svm_classifier.py Run
Database structure
If you want to use other images of skimmers and backgrounds, keep the same format:
18