Advances in Scene Classification of Remotely Sensed High Resolutin Image and The Existing Datasets PDF
Advances in Scene Classification of Remotely Sensed High Resolutin Image and The Existing Datasets PDF
Published By:
Retrieval Number J88410881019/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.J8841.0881019 1643 & Sciences Publication
Advances in Scene Classification of Remotely Sensed High Resolution Images and the Existing Datasets
Table 1 : The Publicly Available Datasets For Remote Sensing Scene Classification:
Spatial
Sl Total Image No Of Images
Dataset Resolution in Image Size Year
No Images Classes Per Class
m
A UC-Merced Land-Use [1] 2100 21 100 0.3 256 X 256 2010
B WHU-RS19[2] 1005 19 50(approx) 0.5 and less 600 X 600 2012
C SIRI-WHU[3] 2400 12 200 2 200 x 200 2016
No fixed
D RSSCN7 2800 7 400 400 x 400 2015
resolution
E RSC11 1232 11 100(approx) 0.2 512 x 512 2016
F Pattern Net[4] 30400 38 800 0.06 to 4.7 256 x 256 2017
G NWPU-RESISC45[5] 31500 45 700 0.2 to 30 256 x 256 2017
Varies
H RSI - CB128[6] > 36000 45 between 198 - 0.3 to 3 128 x 128 2017
1331
Varies
I RSI-CB256[6] > 24000 35 between 173 - 0.3-3 256 X 256 2017
1550
Varies
J AID[7] 10000 30 between 173 - 0.5 - 0.8 600 x 600 2017
1550
Relatively
higher no of Variable higher
K AID++[8] > 400000 46 512 x 512 2018
images per resolution.
class.
Published By:
Retrieval Number J88410881019/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.J8841.0881019 1644 & Sciences Publication
International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-8 Issue-10, August 2019
'thermal-power-station'). It has 700 images per class. can be deployed in training deep CNN extensively and
H. RSI - CB128 and RSI-CB256[6] thereby helps in RSSC.
RSI-CB is extracted from Google Earth and Bing Maps
with 0.2m-3m spatial resolution. RSI-CB128 has 128x128 III. REMOTE SENSING SCENE CLASSIFICATION
pixel size and RSI-CB256 has 256x256 pixel size therefore METHODS
the researchers can select the dataset according to their depth Many scene classification techniques using aerial and
of Classification model. satellite images have been proposed in the last decade. The
RSI-CB128 has 45 classes ('turning circle', 'town', 'tower', process of Scene Classification consist of two steps i.e.
'stream', 'storage room', 'sparse forest', 'snow mountain', Feature extraction and then Classification based on the
'shrub wood', 'sea', 'sapling', 'sand beach', 'river protection extracted features. Therefore an effective representation of
forest', 'river', 'residents', 'rail', 'pipeline', 'parking lot', 'over features is required to develop a high performance Remote
pass', 'natural grass land', 'mountain road', 'mountain', ' Sensing scene Classifier. There are three main types of scene
marina', 'mangrove', 'lakeshore', 'hirst', 'highway', 'green farm classification techniques based on the features of scene image
land', 'grave', 'fork road', 'forest', 'dry farm', 'desert', 'dam', namely the traditional RSSC using handcrafted Feature,
'cross road', 'container', 'coast line', 'city road', 'city green tree', RSSC using unsupervised feature learning(UFL) and RSSC
'city building', 'city avenue', 'bridge', 'bare land', 'avenue', based on Deep Learning.
'artificial grass land', 'airport run way') A. Handcrafted Feature[15] Based RSSC Methods:
RSI-CB256 has 35 classes ('town', 'stream', 'storage room', In Handcrafted Feature Based RSSC Methods, the
'sparse forest', 'snow mountain', 'shrub wood', 'sea', 'sapling', classification is based on handcrafted features. These
'sand beach', 'river protection forest', 'river', 'residents', methods requires extracting human engineering skills based
'pipeline', 'parking lot', 'mountain', 'marina', 'mangrove', features such as colour, shape, spectral resolution, spatial
'lakeshore', 'hirst', 'highway', 'green farm land', 'forest', 'dry resolution, size and texture etc. The classification was based
farm', 'desert', 'dam', 'cross road', 'container', 'coast line', 'city on these features individually or by combining some of the
building', 'bridge', 'bare land', 'avenue', 'artificial grass land', features. And based on it, various classification techniques
'airport run way', 'airplane') were proposed such as colour histograms , GIST, texture
I. AID[7] descriptors , scale-invariant-feature-transform(SIFT) , and
Aerial Image Dataset ( AID) is also one of the recent histogram of the oriented gradients (HOG). The main
large scale bench mark dataset extracted from Google Earth drawback in this method is the difficulty in obtaining the
and contains 30 aerial scene classes. ('viaduct', 'stream', discriminative features in a challenging scene image datasets.
'storage tank', 'square', 'sparse residential', 'school', 'river
B. Unsupervised Feature Learning Based RSSC
protection forest', 'river', 'railway station', 'resort', 'port', Methods[13,14]:
'pond', 'play ground', 'parking', 'park', 'mountain', 'meadow', The limitation of handcrafted feature based method was
'medium residential', 'industrial', 'forest', 'farm land', 'dense overcome by UFL Based RSSC Methods where the required
residential', 'desert', 'commercial', 'centre', 'church', 'bridge', features are automatically extracted from the scene image.
'beach', 'base ball field', 'bare land', 'airport'). AID is Some examples of this methods are Principal Component
multi-source unlike UC-Merced and samples are taken from Analysis, K means clustering, sparse coding and auto
different regions like Unites States, China, Italy, France, encoders. This method performed better than handcrafted
England, Germany etc at different times and seasons. It has feature based methods, but failed to give a state of art
high intra class variations, smaller inter class dissimilarity performance as this method was not able to provide best
and larger scale dataset which makes the classification task discriminative features between the classes due to lack of
challenging and attracts more researchers. semantic information's provided by the category label.
J. AID++[8] C. Deep Learning based classification methods[16-26]:
This is the latest large scale aerial image dataset that consist In the last decade various Deep Learning based
of 4,00,000 images distributed in 46 classes('airport', classification methods were then developed by researchers
'runway', 'bridge', 'parking', 'parking by the road', 'road', which were capable of learning the discriminative features on
'viaduct', 'port', 'railway station', 'beach', 'lake', 'river', 'bare its own using deep learning neural network architectures. The
land', 'desert', 'ice',' rock', 'mountain', 'mix resident', 'multi- unsupervised feature learning architectures has a shallow
family', 'single family', 'dry land', 'paddy fields', 'terraces', architecture whereas deep learning uses multi layered
'meadow' , 'shrub', 'forest', 'solar power station', 'wind power architecture, therefore it has a powerful feature learning
station', 'hydraulic power station', 'storage tanks', 'work capability. So it is capable of extracting the hidden
factory', 'mine', 'oil field', 'commercial', 'church', 'base ball information's and discriminative features of multi
field', 'basket ball field', 'golf course', 'stadium', 'soccer field', dimensional data's. The semantic features of the data are also
'tennis court', 'cemetery', 'amusement park', 'park', 'pool', observed in the top layers itself. All these factors led the
'square') . This was constructed by (i) forming a category successful implementation and state of art performance of
network which is derived by using available geodatabases deep neural networks architecture in semantic level scene
(Google Map API and Open Street Map) to obtain the classification.
category coordinates, (ii) querying and then downloading RSSC using CNN: Though there are many deep learning
the images by using those coordinates, (iii) manually architecture, the CNN
eliminating the annotation errors to scale up the dataset and architecture is the
(iv) improving the separation between similar classes. This is predominant architecture
the most powerful dataset as it has largest no of images that
Published By:
Retrieval Number J88410881019/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.J8841.0881019 1645 & Sciences Publication
Advances in Scene Classification of Remotely Sensed High Resolution Images and the Existing Datasets
used for classification techniques. be based either by using pretrained network or by making the
pretrained model adapt or by training new networks. Fine
CNN proved to be successful in classifying challenging tuning the pretrained CNN networks showed a greater
large scale variant image datasets using efficient high performance. In some models the activation is directly taken
performance GPUs. The CNN process the input in the form as image representation from fully connected layers . In some
of multi dimensional arrays. For example RGB spectral band of the models dicriminative features are extracted by
Image consist of three 2D arrays, similarly multi spectral encoding CNN activations from convolutional in feature
image consist of multiple 2D arrays. coding scenarios. Here the convolutional features maps are
Basically CNN architecture consist of the following main viewed as a 2-D array of local features. In general, all the
layers. above model were able to provide state of art performance
(i) Convolutional layers extracts the low level features at using the publicly available high resolution datasets.
the initial layers and then the more discriminative and The year 2018 showed a major breakthrough in RSSC.
expressive features are obtained as the depth of the layers Various RSSC techniques were proposed. We list here a few.
increases. (i) RSSC by concatenation of the global features and the
(ii) Pooling layer is used to reduce the size of the rearranged local features[21]. (ii) RSSC by fusion of
representations i.e. down sampling and also to speed up features of the same Image with different scale[23]. (iii)
calculations, as well as to make some of the features it detects RSSC by extracting intermediate level features to prevent
to be a bit more robust. It is of two types - Maxpooling and overfitting and performing the fusion by analyzing
Average pooling. Stride, size and types are the hyper canonical correlation to obtain more powerful discriminative
parameters of pooling. features[22,24]. (iv) RSSC by concentric circle pooling to
(iii) Fully connected layers are the last few layers of the avoid rotation invariant problem[25].
network where they process the information from lower Capsule Network (CapsNet)[26], is a novel network
layers and feed them to the output layers to make decisions. architecture which has become the active research area in
Overfitting is one of the major problem in Convolutional classification for the past two years. Here the term capsule
Neural Networks which has to be sorted out by proper design. refers to the group of neurons or vectors used as input. It is
Another main issue in classification is lesser no of images in capable of exploiting the properties and spatial information
the training dataset which can be rectified to some extend by of features in the image to a greater extend and thereby gives
Data Augmentation technique. Data Augmentation can be an efficient output performance. It is expected that Capsule
defined as the way to create new data from the existing data network will soon replace the traditional CNN architecture
by different orientation techniques. It also prevents though still under research.
overfitting problems. Mirroring, Random Cropping, Scaling,
Rotation, Shearing, Local warping and colour shifting are IV. FUTURE RESEARCH DIRECTION
some of the data augmentation methods.
Developing a Improved Datasets: Almost all of the
CNN Architectures: Some of the CNN Architectures
research on scene classification aims at improving the
which has shown state of art performance in RSSC are
accuracy using the existing datasets. And we can say that it
(i) Alexnet[9] was proposed in the year 2012 consists of
has reached a saturation level due to the limitation of
approximately 60 million parameters for the purpose of
available dataset. The deep learning architectures are more
classification.
powerful with millions of parameters which does not match
(ii) VGGNet[10] - VGG-16 was proposed in the year
with the quantity of datasets used. For practical
2014 has almost 16 trainable layers. The advantages of using
implementation the available datasets are not enough i.e. we
this architecture are it can have up to 138 million parameters,
require a enormous amount of data "Big data". So we expect
simplicity, reduced dimension and increased depth.
the research community to develop a high quality and high
(iii) ResNet[11] - The previous deep neural networks are
quantity challenging datasets in the coming years which is
harder to train and the training error starts to raise again as the
capable for real world applications. And new algorithms for
number of layers increase and also due to Exploding and
classification of this datasets can be developed.
vanishing gradients problem. All this problems were
Fusion of Remote Sensing data with Social Media data:
overcome by Resnet which implemented skip connection i. e.
There are lot of data coming from social media such as
the output of one layer is fed to the input of deeper
facebook, twitter, instagram which can give a better
layers. ResNet (Residual Network), was proposed in the year
understanding of the image scene of interest. Scene
2015. The advantage of using this networks are the efficient
Classification by combining the remote sensing images with
performance with very deep network , less computational
this social media information using suitable deep learning
cost and ability to train very deep network in a effective
architecture can provide us with an improved state of art
manner.
performance for real time application in the coming years.
(iv) GoogLeNet[12] is a inception network module that
Scene Classification With Caption: Another possible
was proposed in the year 2014, which has 9 inception
research direction is instead of simply classifying the image
modules. In Inception network , the network can decide the
scene with their class, we can try to describe the scene of
suitable filter size on its own from the given choices.
interest. The description can contain the details about the
Therefore the computational cost is reduced.
objects present in the scene,
Recent Research works in RSSC[21-25]: Various
the size of the objects, their
research works has been proposed for RSSC[16-20]. The
orientation details, texture,
existing deep learning methods for scene classification will
Published By:
Retrieval Number J88410881019/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.J8841.0881019 1646 & Sciences Publication
International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-8 Issue-10, August 2019
reason for it to be classified in to a particular scene category 16. M. Castelluccio and G. Poggi and C. Sansone and L. Verdoliva, “Land
use classification in remote sensing images by convolutional neural
etc. Combination of Image scene classification models with networks” in arXiv :1508.00092, [2015].
image scene captioning can give better understanding of the 17. K. Nogueira and O. A. Penatti and J. A. d. Santos, “Towards Better
image scene. Exploiting Convolutional Neural Networks for Remote Sensing Scene
Classification” in arXiv:1602.01517, [2016].
18. G. Cheng and P. Zhou and J. Han, "RIFD-CNN: Rotation-Invariant and
V. CONCLUSION Fisher Discriminative Convolutional Neural Networks for Object
Detection", in Proc. IEEE Int. Conf. Comp. Vision & Patt Recgn.,
In this paper we have discussed the basic ideas and [2016], pp. 2884-2893.
concepts behind RSSC of HR image datasets and also 19. G. Cheng and C. Ma and P. Zhou and X. Yao and J. Han, "Scene
provided the direction of investigations for the researchers to classification of high resolution remote sensing images using
convolutional neural networks," in Proc of IEEE Int. Geo.sci & Remte
move on. We have summarised the different publicly Sens. Symp, [2016], pp. 767-770.
available datasets for classifications with their merits and 20. X. Yao and J. Han and G. Cheng and L. Guo, "Semantic segmentation
demerits. The advancement of scene classification techniques based on stacked discriminative autoencoders and context-constrained
weakly supervised learning" in Proc of ACM Int. Conf in Multimedia,
from the traditional BOVW methods to Deep learning [2015], pp1211-1214.
methods were also discussed. Then the Scene classification 21. Zeng,Dan&Chen,Shuaijun&Chen,Boyang&Li,Shuying.[2018]."Impro
techniques based on CNN model was discussed along with ving Remote Sensing Scene Classification by Integrating
Global-Context and Local-Object Features", Remt
their architectures. The Capsnet architecture and Recent Sens.10.734.10.3390/rs10050734.
Research works in RSSC were also briefed. And At last we 22. Yuan,Yuan&Fang,Jie&Lu,Xiaoqiang&Feng,Yachuang.[2018]."Remo
have also provided the Future research ideas for RSSC. Hope te Sensing Image Scene Classification Using Rearranged Local
Features" in IEEE.Trans on Geosci & Remt Sens.PP.1-14.
that the research community of RSSC gets benefitted by this 23. Y. Liu and Y. Zhong and Q. Qin, "Scene Classification Based on
paper and they also share their research ideas. Multiscale Convolutional Neural Network," in IEEE Trans on Geosci
& Remt Sens, volume. 56, no. 12, pp. 7109-7121, [Dec2018].
doi: 10.1109/TGRS.2018.2848473
REFERENCES
24. M. Usman and W. Wang and A. Hadid, "Feature Fusion with Deep
1. Y. Yang and S. Newsam, "Bag-Of-Visual-Words and Spatial Supervision for Remote-Sensing Image Scene Classification," in
Extensions for Land-Use Classification," in International Conference on [2018] IEEE 30th Intl Conf on Tools with AI [ICTAI], pp. 249-253.
Advances in Geog Inf Sys ,ACM GIS,, 2010. doi: 10.1109/ICTAI.2018.00046
2. Gui-Song Xia and Wen Yang and Julie Delon and Yann Gousseauand 25. Qi,Kunlun &Guan,Qingfeng &Yang,Chao&Peng,Feifei
Hong Sun, et al,. "Structural High-resolution Satellite Image Indexing", &Shen,Shengyu&Wu,Huayi. (2018). "Concentric Circle Pooling in
in ISPRS TC VII Symposium - 100 Years ISPRS, Jul 2010, Vienna, Deep Convolutional Networks for Remote Sensing Scene
Austria. pp.298-303 Classification". Rem Sens. 10.934.10.3390/rs10060934.
3. B. Zhao and Y.i Zhong and G-S. Xia and L. Zhang, "Dirichlet-Derived 26. Zhang and Wei and Tang and Ping and Zhao and Lijun, "Remote
Multiple Topic Scene Classification Model for High Spatial Resolution Sensing Image Scene Classification Using CNN-CapsNet", Remote
Remote Sensing Imagery", in IEEE Transactions on Geoscience & Sensing, VOlUME 11,NO5,[2019], mdpi, 2072-4292
Remote Sensing, 2016, 54, 2108-2123
4. Zhou, W., Newsam, S., Li, C., & Shao, Z. (2018). "PatternNet: A AUTHORS PROFILE
benchmark dataset for performance evaluation of remote sensing image
retrieval.", ISPRS Journal of Photogrammetry and Remote Sensing,
145, 197-209. Akila G received B.E degree in Electronics And
5. G. Cheng, J. Han, X. Lu. "Remote Sensing Image Scene Classification: Communication Engineering in the year 2005 and M.E
Benchmark and State of the Art." Proceedings of the IEEE, 2017, 105, degree in Applied Electronics at Anna University
1865–1883 Chennai in the year 2014. Currently She is pursuing
6. Haifeng Li, Chao Tao, Zhixiang Wu, Jie Chen, Jianya Gong, and Min PhD at anna university, Chennai , India. Her area of
Deng, “Rsi-cb: A large scale remote sensing image classification interests are Computer Vision, Deep Learning,
benchmark via crowdsource data,” arXiv preprint arXiv:1705.10450, Machine Learning, Artificial Intelligence and Remote
2017 Sensing. She has been into teaching field from the year 2010. Currently she
7. G. Xia. et al., [2017], "AID: A Benchmark Data Set for Performance is working as a Assistant Professor in the department of Electronics And
Evaluation of Aerial Scene Classification," in IEEE Trans on Geosci & Communication Engineering.
Rem Sensing, volume 55, no. 7, pp. 3965-3981, July 2017
8. P. Jin and G. Xia and F. Hu and Q. Lu and L. Zhang, "AID++: An
Updated Version of AID on Scene Classification," in IGARSS 2018 - Gayathri R received B.E degree in Electronics
2018 IEEE International Geosci and Rem Sensing Symp, Valencia, and Communication Engineering from Madras
2018, pp. 4721-4724. University, M.Tech., and Ph.D degree from Anna
9. A. Krizhevsky and I. Sutskever and G. Hinton, " Imagenet University, College of Engineering Guindy, Chennai,
classification with deep convolutional neural networks",. In NIPS, India in 1999, 2001 and 2014 respectively. She is
[2012]. currently working as an Associate Professor in the
10. K. Simonyan & A. Zisserman, "Very deep convolutional networks for to Department of Electronics and Communication
large-scale image recognition",inICLR, [2015] Engineering, Sri Venkateswara College of Engineering, Autonomous
11. Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun, institution and Affiliated to Anna University, Chennai, India. Her main
"Deep Residual Learning for Image Recognition", research interests are in the field of computer vision, pattern recognition,
inarXiv:1512.03385v1,[10Dec2015] VLSI signal processing and machine learning. She has published more than
12. C. Szegedy and W. Liu and Y. Jia and P. Sermanet and S. Reed and D. 48 papers in international journals and conferences.
Anguelov and D. Erhan and V. Vanhoucke and A. Rabinovich. "Going
deeper with convolutions", in CVPR, [2015].
13. F. Zhang, B. Du, and L. Zhang, “Saliency-guided unsupervised feature
learning for scene classification,” IEEE Trans. Geosci. Remote Sens.,
vol. 53, no. 4, pp. 2175-2184, 2015.
14. A. Romero, C. Gatta, and G. Camps-Valls, “Unsupervised deep feature
extraction for remote sensing image classification,” IEEE Trans.
Geosci. Remote Sens., vol. 54, no. 3, pp. 1349-1362, 2016.
15. L. Huang, C. Chen, W. Li, and Q. Du, “Remote Sensing Image Scene
Classification Using Multi-Scale Completed Local Binary Patterns and
Fisher Vectors,” Remote Sensing, vol. 8, no. 6, pp. 483, 2016.
Published By:
Retrieval Number J88410881019/2019©BEIESP Blue Eyes Intelligence Engineering
DOI: 10.35940/ijitee.J8841.0881019 1647 & Sciences Publication