0% found this document useful (0 votes)
6 views4 pages

Concurrent Segment at i on and Object Detection

This document presents a method for detecting and identifying aircraft in satellite images using a combination of two convolutional neural networks (CNNs): a segmentation model based on U-net and a detection model based on RetinaNet. The proposed concurrent approach significantly improves detection results by effectively reducing false negatives and enhancing precision. Experimental results demonstrate the effectiveness of this hybrid model in various operational modes, achieving high recall and precision rates in aircraft recognition tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views4 pages

Concurrent Segment at i on and Object Detection

This document presents a method for detecting and identifying aircraft in satellite images using a combination of two convolutional neural networks (CNNs): a segmentation model based on U-net and a detection model based on RetinaNet. The proposed concurrent approach significantly improves detection results by effectively reducing false negatives and enhancing precision. Experimental results demonstrate the effectiveness of this hybrid model in various operational modes, achieving high recall and precision rates in aircraft recognition tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

CONCURRENT SEGMENTATION AND OBJECT DETECTION CNNS FOR AIRCRAFT

DETECTION AND IDENTIFICATION IN SATELLITE IMAGES

Damien Grosgeorge, Maxime Arbelot, Alex Goupilleau, Tugdual Ceillier, Renaud Allioux

Earthcube, Paris, France


arXiv:2005.13215v1 [cs.CV] 27 May 2020

ABSTRACT
Detecting and identifying objects in satellite images is a
very challenging task: objects of interest are often very small
and features can be difficult to recognize even using very high
resolution imagery. For most applications, this translates into
a trade-off between recall and precision. We present here a
dedicated method to detect and identify aircraft, combining
two very different convolutional neural networks (CNNs): a
segmentation model, based on a modified U-net architecture
[1], and a detection model, based on the RetinaNet architec-
ture [2]. The results we present show that this combination (a) Soukhoï Su-25 (b) F-16 Fighting Falcon
outperforms significantly each unitary model, reducing dras-
Fig. 1. Illustration of the data diversity (with ground truth).
tically the false negative rate.
Index Terms— CNNs, deep learning, segmentation,
identification, aircraft, satellite images fundamental role in designing new architectures and thus in
achieving higher performances [4].
1. INTRODUCTION For segmentation tasks, the U-net architecture has been
widely used since its creation by [1]. This architecture allows
The last decade has seen a huge increase of available high a better reconstruction in the decoder by using skip connec-
resolution satellite images, which are used more and more for tions from the encoder (Fig. 2). Various improvements have
surveillance tasks. When monitoring military sites, it is nec- been made in the literature considering each CNN compo-
essary to automatically detect and identify objects of interest nents [4], but the global architecture of the U-net is still one
to derive trends. In this domain, aircraft recognition is of par- of the state-of-the-art architecture for the segmentation task.
ticular interest: each aircraft model has its own role, and a For detection tasks, two main categories have been de-
variation in the number of a specific type of aircraft at a given veloped in the literature. The most well-known uses a two-
location can be a highly relevant insight. This recognition task stages, proposal-driven mechanism: the first stage generates a
needs to be reliable to allow the automation of site analysis – sparse set of candidate object locations and the second stage
in particular to derive alerts corresponding to unusual events. classifies each candidate location either as one of the fore-
Robustness to noise, shadows, illumination or ground texture ground classes or as background using a CNN. One of the
variation is challenging to obtain but mandatory for real-life most used two-stages model is the Faster-RCNN [5], which
applications (see Fig. 1). has been considered as the state-of-the-art detector by achiev-
Nowadays, CNNs are considered as one of the best tech- ing top accuracy on the challenging COCO benchmark. How-
niques to analyse image content and are the most widely used ever, in the last few years, one-stage detectors, such as the
ML technique in computer vision applications. They have Feature Pyramid Network (FPN) [6], have matched the accu-
recently produced the state-of-the-art results for image recog- racy of the most complex two-stages detectors on the COCO
nition, segmentation and detection related tasks [3]. A typi- benchmark. In [2], authors have identified that since one-
cal CNN architecture is generally composed of alternate lay- stage detectors are applied over a regular, dense sampling of
ers of convolution and pooling (encoder) followed by a de- object locations, scales, and aspect ratios, then class imbal-
coder that can comprise one or more fully connected layers ance during training is the main obstacle impeding them from
(classification), a set of transpose convolutions (segmenta- achieving state-of-the-art accuracy. They thus proposed a new
tion) or some classification and regression branches (object loss function that eliminates this barrier (the focal loss) while
detection). The arrangement of the CNN components plays a integrating improvements such as the FPN [6] in their model
known as the RetinaNet [2]. • maxpool layers have been replaced by convolutionnal
In this paper, we are looking for a dedicated and robust layers with a stride of 2 (we reduce the spatial informa-
approach to address the aircraft detection and identification tion while increasing the number of feature maps);
problems, that can be easily adapted to multiple applications. • the depth and the width of the network have been set ac-
We propose a hybrid solution based on different CNNs strate- cordingly to the application: spatial information is only
gies: a segmentation model based on the U-Net architecture reduced twice (while doubling filters), the encoding is
[1] for a better detection rate and an object detection model composed of 36 IM blocks and the decoding of 8 IM
based on the RetinaNet [2], a fast one-stage detector, for iden- blocks (resp. 72 and 16 conv. layers).
tifying and improving the precision. Section 2 details this
concurrent approach while Section 3 presents results obtained Skip connections of the U-net are used for a better reconstruc-
on high-resolution satellite images. tion of the prediction map.

2. CONCURRENT SEGMENTATION AND OBJECT 2.2. Object detection CNN


DETECTION APPROACH
Our object detector is based on the RetinaNet architecture,
In this section, we present the choices made in designing each illustrated by the Fig. 3.
model considering the aircraft recognition problem, and how
they interact together. These choices are based on simple ob-
servations: (i) changing the paradigm of training modifies the
way features are learnt/extracted inside the model, (ii) seg-
mentation models are really efficient but suffer from bad sep-
aration and identification of objects, (iii) in high-resolution
images from satellites, aircraft are of limited size. We also
based our choices on the latest developments in the field.
Fig. 3. Original RetinaNet architecture (from [2]).
2.1. Segmentation CNN
For our concurrent approach, the objective of this model
Our segmentation model is based on the U-net architecture, is: (i) to split the detected objects and (ii) to correctly identify
illustrated in Fig. 2 (original architecture). the objects. For that purpose, the RetinaNet architecture has
been carefully set:

• one level has been added in the feature pyramid net-


work [6] for a finest detection level (small objects);
• the backbone of the RetinaNet model to extract features
is a ResNet101;
• a non maximal suppression (NMS) algorithm is used to
remove duplicated results.

The focal loss proposed by [2] is used to address the foreground-


background class imbalance encountered during the training.

Fig. 2. Original U-Net architecture (from [1]). 2.3. Concurrent approach


The training strategy of each model is different. Features of
For our concurrent approach, the objective of this model the segmentation model are learnt on the aircraft objects, the
is: (i) to detect aircraft (without identification), (ii) to have a model is then good at localizing objects but is not designed for
very high recall (in particular for the location even if the de- separating or recognizing them. Features of the object detec-
lineation is of low quality), (iii) to be robust to difficult cases tion model are learnt on the finest aircraft identification (see
(like occultation, shadow or noise). For that purpose, the U- Section 3.1), the model is then good at separating and recog-
net architecture has been updated: nizing aircrafts but has a very low precision but a high recall.
• convolutionnal layers have been replaced by identity The idea of our concurrent approach is to use these comple-
mapping (IM) blocks, as proposed by [7]. It has been mentary properties to improve detections. The process of the
proven that this choice eases the training and the effi- system is sequential and can be summarized by the following
ciency of deep networks; steps.
1. Apply the segmentation model on the unknown image Datasets N img N obj N tiles Area
to extract the prediction value for each pixel. This is Train - Seg 9 984 122 479 105 206 51 166
the localization step. Train - Obj 10 179 128 422 361 843 49 905
2. Apply the object detector for each positive area of the Test 30 689 - 403
localization step. This process can be iterative, consid-
ering how shift-invariant the object detection model is, Table 1. Dataset information. Areas are in km2 .
by repeating: (i) apply the detection model, (ii) remove
the detected objects from the segmentation map. 3.2. Method parameterization
3. (optional) Study the remaining positive areas of the pre-
The segmentation model has been trained using a weighted
diction map to increase the recall: add objects to the
categorical cross-entropy loss:
detected list considering size or distance to the detected
aircraft. C
X
wCE(y, yb) = − αi yi log ybi (1)
These steps allow the definition of several operating
i=1
modes considering the intrinsic qualities of the models:
parameters definition can yield a system dedicated to high where yb is the prediction, y the ground truth, C the number of
recall, to high precision or balanced. classes and α the median frequency balancing weights. These
weights allow to balance the class distribution (compensate
the high number of background pixels). ADAM optimizer
3. EXPERIMENTAL RESULTS
has been used with an initial learning rate of 0.001 (this one
3.1. Data information is decreased on plateau considering the validation loss).
The object detector has been trained using the focal loss
Our method has been applied to the aircraft recognition prob- [2] for the classification and the smooth L1 loss for the regres-
lem. Our datasets have three levels of aircraft identification: sion. We slightly increased the weighting of the classification
the first level is the type of the object (‘aircraft‘), the sec- compared to the regression (with a factor of 1.5) and used
ond level represents the function of the aircraft (‘bomber‘, the ADAM optimizer with an initial learning rate of 0.0004.
‘civilian‘, ‘combat‘, ‘drone‘, ‘special‘ and ‘transport‘) and The NMS threshold has been set to 0.35 (aircraft have a low
the third level is the aircraft identification. This last level is overlap rate).
currently composed of 61 classes (for example ‘F-16‘ Fight- For both trainings, various data augmentations have been
ing Falcon is a third level of type ‘combat‘ and Tupolev ‘Tu- used to increase model generalization: geometric transforma-
95‘ a third level of type ‘bomber‘). Fig. 4 shows an example tions (flip, rotate) and radiometric transformations (grayscale,
of the ground truth at level 3. histogram equalization and normalization). For both models,
different operating modes can be set by modifying two pa-
rameters: the prediction threshold and the minimum size. We
empirically defined several modes, to balance recall and pre-
cision.

3.3. Quantitative and qualitative results


On our test dataset, we evaluated: (i) the segmentation model
alone (an overlap of 50% is required to be considered as a
detected aircraft), (ii) the object detection alone and (iii) our
concurrent approach. Table 2 shows the detection results
for each case, with two different modes: one balanced be-
tween recall and precision and one with a better recall. As
expected, we can observe that our concurrent method allows
to significantly increase the detection results compared to the
Fig. 4. Example of the ground truth at level 3. segmentation model or the detection model alone: errors of
each model are corrected by the other one to obtain better
Train and test datasets have been created using images results (the false positives produced by the two models are
from different satellites (resolution 30-50 cm). Train tiles are not the same). This is illustrated in Fig. 5: we can observe
of size 512 pixels, with an overlap of 128 to improve shift in- that false positives obtained with the object detection model
variance. The test dataset is composed of 30 satellite images are removed by our method.
at unknown locations (not seen during the training). Details On the same test dataset, we evaluated the identification of
of the datasets are given in Table 1. well-detected aircraft. The identification rate for the level 2 is
Balanced mode Recall mode
R P R P
Segmentation 0.91 0.78 0.95 0.5
Object detection 0.87 0.75 0.95 0.37
Our approach 0.95 0.88 0.96 0.84

Table 2. Quantitative results of the aircraft detection on the


test dataset (R: recall, P: precision).

Fig. 5. Visual comparisons of ground truth (light green), the


segmentation result (pink), the object detection model (orange
dotted lines) and our method (light blue). Fig. 6. Illustration of the aircraft classification. The good
classifications are in blue, the wrong classifications in red,
the false positives in yellow.
0.91 and for the level 3 is 0.80. Some errors happen because
of the definition of some level 3 labels: regrouping different
and Piotr Dollár, “Focal loss for dense object detection,”
aircraft in the same class (for example small-aircraft) can lead
in Proceedings of the IEEE international conference on
to confusion with combat aircraft. This can be seen in Fig. 6:
computer vision, 2017, pp. 2980–2988.
the misclassified aircraft in the top image should have been
assigned the small-aircraft label. [3] Xiaolong Liu, Zhidong Deng, and Yuhan Yang, “Recent
progress in semantic image segmentation,” Artificial In-
4. CONCLUSION AND PERSPECTIVES telligence Review, vol. 52, no. 2, pp. 1089–1106, 2019.
[4] Asifullah Khan, Anabia Sohail, Umme Zahoora, and
In this work, we developed a concurrent method combining Aqsa Saeed Qureshi, “A survey of the recent architectures
two CNNs: a segmentation model and a detection model. We of deep convolutional neural networks,” arXiv preprint
have shown that this combination allows to significantly im- arXiv:1901.06032, 2019.
prove aircraft detection results (very low false detection rate
and high rate of good identification). In the future, we plan [5] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun,
on: (i) refining our level 3 dataset in order to avoid some iden- “Faster r-cnn: Towards real-time object detection with re-
tification confusions and (ii) designing an all-in-one model in- gion proposal networks,” in Advances in neural informa-
tegrating level 1 and level 3 features in the same architecture. tion processing systems, 2015, pp. 91–99.
[6] Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He,
5. REFERENCES Bharath Hariharan, and Serge Belongie, “Feature pyra-
mid networks for object detection,” in Proceedings of the
[1] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, IEEE conference on computer vision and pattern recog-
“U-net: Convolutional networks for biomedical image nition, 2017, pp. 2117–2125.
segmentation,” in International Conference on Medi-
cal image computing and computer-assisted intervention. [7] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian
Springer, 2015, pp. 234–241. Sun, “Identity mappings in deep residual networks,” in
European conference on computer vision. Springer, 2016,
[2] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, pp. 630–645.

You might also like