0% found this document useful (0 votes)
16 views6 pages

Ship Classification Using An Image Dataset: Okan Atalar (Okan@stanford - Edu), Burak Bartan (Bbartan@stanford - Edu)

Uploaded by

mujumdarsumit00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views6 pages

Ship Classification Using An Image Dataset: Okan Atalar (Okan@stanford - Edu), Burak Bartan (Bbartan@stanford - Edu)

Uploaded by

mujumdarsumit00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Ship Classification Using an Image Dataset

Okan Atalar ([email protected]), Burak Bartan ([email protected])

Abstract—In this project, we developed three different sets of cruise ships, fire-fighting vessels, and fishing vessels. The rea-
classification algorithms to classify different ship types based on son being that the accuracy we obtained with preprocesssing
color (RGB) images of ships. An image of a ship is the input to followed by SVM was comparable to the other two methods:
our classification algorithms, which are mainly bag of features,
convolutional neural networks (CNNs), and SVM along with some bag of features method using 10 classes and convolutional
preprocessing. The best-performing of these is CNN, and is able neural networks using 10 classes. On average we used 1000
to classify a given ship into its corresponding category with an images from each class for training and 200 for testing for
accuracy probability of 0.8822 when using 10 different classes of bag of features method. Convolutional neural networks used
ships. on average 1680 images for training and 360 for testing.
I. I NTRODUCTION
The increased presence of autonomous systems requires re-
liable classification algorithms to understand their surrounding
environment. These autonomous systems have the potential to
find widespread use in sea and ocean waters, necessitating a
reliable classification of their surrounding. Since ships are the
most popular means of transportation and warfare in seas and
oceans, they need to be classified by autonomous systems.
We are therefore interested in applying machine learning,
and computer vision techniques to the problem of reliably
Fig. 1. 4 examples from each of the first 5 image categories: Aircraft carriers,
classifying ships into different classes using captured ship bulkers, cruise ships, fire-fighting vessels, fishing vessels (from left to right).
images in different lighting conditions, image quality, and
distance to the ships.
The set of algorithms we use for ship classification take ship A. Preprocessing of Images
images in red, green and blue (RGB) color format as input
The dataset consists of ship images taken at different
and outputs the most likely ship category that the input image
orientations, distances, and with varying background. Since the
represents. For the ship classification setting, we applied three
background of the ship contains limited information regarding
different classification algorithms: Preprocessing of images
the ship category and introduces significant noise to the image,
followed by SVM with a set of features that we choose, bag of
we aim to remove this randomness by cropping a part of the
features method, and a convolutional neural network trained
image which contains only the ship. The objective in this
by AlexNet.
step is to crop the ship image, without throwing away pixels
II. R ELATED W ORK related to the ship. We first perform edge detection on the
initial image, since the ship has well defined borders. The
The work [2] uses the same images from the same website flow diagram for image cropping based on edge detection is
we are using. They consider 140, 000 ship images from 26 dif- demonstrated in Fig. 2.
ferent ship categories for classification. Their baseline method,
which uses Crammer and Singer multi-class SVM with feature
vectors extracted from a VGG-F network, achieves an accuracy
of 0.54. Their CNN, which utilizes AlexNet, achieves an
accuracy of 0.73. They do not however, consider using bag
of features or SVM method with preprocessing.

III. DATASET AND F EATURES


The dataset that we used consists of 10 different classes Fig. 2. Edge Detection
of ships: aircraft carriers, bulkers, cruise ships, fire-fighting
vessels, fishing vessels, inland dry cargo vessels, restaurant Since the ship has well defined borders, our goal is to
ships, motor yachts, drilling rigs, and submarines. We obtained determine the rows and columns in the image corresponding
the dataset by downloading classified ship images from [1]. We to these edges. We therefore sum the pixels along a row in
used 1000 images for training for the SVM algorithm and 200 the image and compare with the sum in the next row. If there
for validation with 5 different classes: aircraft carriers, bulkers, is a big difference between the values for the two rows, this
indicates an edge in that row. Due to the random background,
edge detection also detects random artefacts. To overcome the  rmax cy +0.5c 
P P BW
noise and make the algorithm robust against errors, we find the |Image(r, c)| 
 min c=cy −0.5cBW
 r=r
sum over a number of rows, which we define as the bandwidth. cmax = arg min   (4)

After the row and column locations are found where there is a cy P cy −0.5c
 rmax P BW 
significant change in edges detected, we crop the image. The |Image(r, c)|
r=rmin c=cy −1.5cBW
flow diagram is shown in Fig. 3 for cropping.
Due to noise artefacts in the background, this algorithm
may not properly crop the image. To capture such mistakes,
the dimensions of the cropped image are checked. If the ship
image is smaller than 10 pixels in the vertical direction and
75 pixels in the horizontal direction, then the cropped image
is probably not containing the ship. In this case, we do not
crop the image and work with the whole picture. It should
be noted that these numbers were based on the dataset that
Fig. 3. Image Cropping we used and will vary based on the number of pixels used to
represent each image in the dataset.
We first determine the minimum and maximum row loca-
tions for the ship by finding the row such that the ratio of B. Feature Extraction
row summed over the row bandwidth to the row a bandwidth After the ship image has been cropped, the related features
away summed over the same bandwidth is maximized. The must be extracted for classification. Relevant features that we
row locations are estimated first rather than column locations identified include the power spectrum samples of the two
since most ship images are oriented horizontally (parallel to dimensional image after normalizing the cropped image size
sea surface) and therefore longer in the horizontal direction. to 70 rows by 140 columns, the RGB color moment generating
We then use the row estimates when determining the column function, and the ratio of the number of rows to the columns
locations for robustness. Let |Image(r, c)| denote the gray in the cropped image.
color converted from RGB by taking the magnitude with The power spectrum for an image is calculated by nor-
respect to RGB, and where r denotes the row number c the malizing the size of the cropped image to 70 rows by 140
column number. We are trying to estimate rmin and rmax by columns, followed by taking the two dimensional Fourier
summing over the row bandwidth rBW . After the minimum Transform of the normalized image (calculated as in Eq. 5),
and maximum row locations have been estimated, we use these and taking the absolute squared. Since the cruise ship contains
estimates for determining the minimum and maximum column many windows and variations within its body, high frequency
locations, cmin and cmax by summing again over a column components show up in its two dimensional power spectrum,
bandwidth of cBW . The equations used for determining the shown in Fig. 5, unlike the cruise ship, shown in Fig. 4.
minimum and maximum row and column locations are shown To prevent overfitting, we only use ”some” samples of the
in Eq. 1-4. two dimensional power spectrum. In particular, we sample the
Fourier Transform at regular intervals (we used 7 samples with
  regular spacing from the power spectrum). Additionally, since
rx +0.5r
P BW cmax
P
|Image(r, c)|  the Fourier Transform also contains phase information which

 r=rx −0.5rBW c=1 we do not care about, we only pass on the magnitude square of
rmin = arg max  r −0.5r (1)

rx  x P BW cmax
P

 the Fourier Transform coefficients after sampling with regular
|Image(r, c)| intervals to capture 7 samples. Since the ship images contain
r=rx −1.5rBW c=1
different lighting conditions, the total amount of power present
in each image varies. To normalize this variation in power, we
 rx +0.5r cmax
 normalize the power in each image and therefore compute the
P BW P
 |Image(r, c)|  power normalized power spectrum for each image. The power
 r=rx −0.5rBW c=1 normalized power spectrum is calculated as in Eq. 6 by using
rmax = arg min  r −0.5r (2)

cmax

rx  x P BW P  the two dimensional Fourier transform of the image computed
|Image(r, c)|
r=rx −1.5rBW c=1 by Eq. 5. Fs (k1 , k2 ) is then used as a set of features, where n
= 5 and m = 10 (sampling the power spectrum with a spacing
of 5 in the row and 10 in the column directions, respectively.
 rmax
P cy +0.5c
P BW

|Image(r, c)|  rX
max cX
max
 min c=cy −0.5cBW
 r=r
cmin = arg max   (3) F (k1 , k2 ) = Image(r, c)e−i2π(k1 r/rmax +k2 c/cmax )

cy P cy −0.5c
 rmax P BW  r=rmin c=cmin
|Image(r, c)|
r=rmin c=cy −1.5cBW (5)
|F (nk1 , mk2 )|2 Cruise Ship Power Spectrum
Fs (k1 , k2 ) = rmax (6)
P cmax P
(Image(r, c))2
r=rmin c=cmin

Another feature we use is the color distributions. Different


ship categories are predominantly the same color. For instance,
aircraft carriers are usually gray, fire-fighting vessels are usu-
ally red etc. The RGB colors for a firefighter-vessel is shown Fig. 5. Cruise Ship Power Spectrum
in Fig. 6. We also use higher orders of color distributions
(moment generating functions), since color distributions is
also critical. This is based on the mathematical fact that
knowing all the moment generating functions of a distribution
is equivalent to knowing the actual distribution. Calculating
the nth order moment for red color (denoted Rn ) is expressed
in Eq. 7. Taking the first 5 orders yielded the best accuracy.
This choice depends on the dataset and may show variability
when tested with different datasets. The nth order moment
for green and blue colors is also done in the same way with Fig. 6. RGB Representation of Image
using the green and blue values for the image. Similar to
the power normalization when computing the samples of the
power spectrum, we also normalize the moment generating
function with respect to the total power in the image to achieve A. Preprocessing Followed by SVM
consistency among different images captured under different The relevant features of the image are extracted after the
lighting conditions. image has been cropped. As aforementioned, the relevant fea-
tures are: two dimensional power spectrum of image, moment
rmax cmax
P P
(ImageR (r, c))n generating functions for red, green, and blue and the ratio
r=rmin c=cmin of number of rows to columns in the cropped image. These
Rn = rmax (7)
P cmax P features are vectorized and used with a conventional SVM. The
|ImageR (r, c)|2
r=rmin c=cmin SVM algorithm is shown in Eq. 8, where w are the parameter
we are trying to learn and  is the error margin.
The color distributions and the power spectrum for the
image show great variability among value. To compensate for
m
this huge difference, similar to preprocessing before Principal 1 X
minimize ||w||2 + C 2i
Component Analysis (PCA), we normalize the mean and γ,w,b 2 i=1
variance. Let x(i) represent the features extracted pertaining (i)
(8)
to the ith image and j the j th feature. Let µ = m 1
P (i)
x . subject to y (i) (wT xb ) ≥ 1 − i , i = 1, ..., m,
i=m i ≥ 0, i = 1, ..., m.
Replace each x(i) with x(i) − µ to have zero mean. Let
(i)
1
P (i) 2 (i) x The SVM algorithm tries to separate the data by hyperplanes
σj2 = m (xj ) . Replace each xj with σjj to normalize
i=m in the high dimensional space of the feature vectors. Based on
the variance. the samples from the training data, the algorithm separates
the data points by hyperplanes determined by the number of
Aircraft Carrier Power Spectrum classes present in the classes. During the testing stage, the
algorithm outputs the class of the input image by extracting
the relevant features as aforementioned and then outputting
the ship class based on the point defined by the features in the
hyperspace. The parameter C in Eq. 8 is equal to 1/n, where
n is the number of parameters we have. For the preprocessing
algorithm followed by SVM, we have a total of 37 features:
Fig. 4. Aircraft Carrier Power Spectrum
7x3 from samples of the power spectrum, 5x3 from samples
of the color moment generating function, and 1 from ratio of
number of rows to columns. Therefore, C = 1/37.
IV. M ETHODS
We used three different learning algorithms: preprocessing B. Bag of Features
followed by SVM, bag of features, and convolutional neural This method [3] is based on SURF feature description.
networks trained from scratch and using transfer learning SURF is an algorithm that can be used for both feature
(AlexNet). detection and description. In our method, we do not use the
feature detection; instead we use grids of various sizes as and it prevents the neuron outputs from getting too big. It
features to be described. For feature description, however, we also leads to faster learning.
use the SURF algorithm. Feature description is the process of The ending layers in both methods are also the same. We
obtaining a numerical vector for every feature; in our case, for use a softmax layer as the last layer for classification. Softmax
every grid. The steps of the bag of features method is visually function is as given in Eq. 10.
summarized in Fig. 7, taken from [4].
exp(θiT z)
φi = PK (10)
T
j=1 exp(θj z)

The output of the softmax layer is a K-dimensional vector of


probabilities. Namely, the i’th entry of this vector gives the
probability that the input ship image belongs to class i.
1) From scratch: In this method, we train a network from
Fig. 7. Summary of bag of features method scratch. We consider a network with 5 convolutional layers.
Each convolutional layer is followed by a batch normalization
layer, a ReLu layer, and a maxpooling layer. Connected to
After obtaining a feature vector for every image, we then
the last maxpooling layer is a dropout layer to reduce the
drop 25% of the feature vectors. We then apply k-means
overfitting. A fully connected layer with 10 neurons (corre-
clustering to the remaining 75% of the feature vectors found
sponding to 10 ship categories) follows. Then, finally we have
so that we can have a feature basis that we can represent our
the softmax layer for classification.
images in. The basis vectors are the centroids from k-means
algorithm, and they are referred to as vocabulary, as in the bag In MATLAB, we also specify a classification layer after the
of words method for language processing. softmax layer. This layer is to indicate what type of loss we
The next step of the algorithm is to project every image into want to use. We used the cross entropy function as the cost
this feature space, and the resulting projections are our inputs function, which is given in Eq. 11 for a single example:
to the multi-class SVM classifier. K
X
CE = − yk log(yˆk ), (11)
C. Convolutional Neural Networks (CNNs)
k=1
We have experimented with two different networks: A
where yk is the true label in one-hot representation, and yˆk is
network from scratch, and another that uses transfer learning
the probability that the given example belongs to class k.
(namely, AlexNet). First let us describe the common properties
of both approaches, and then we will talk about the differences 2) AlexNet: In this method, we use transfer learning
of the methods. A representative figure of the CNNs we (AlexNet, [6]) to decrease the training time, and obtain higher
considered is visually depicted in Fig. 8, taken from [5]. performance. We copy the first 22 layers of the AlexNet
network, and attach three layers to it: A dropout layer, a fully
connected layer, and a softmax layer. The initial layers of
CNNs, even for very different classification tasks, are similar
in that they detect similar features such as edges. The transfer
learning basically aims to transfer the learning done before on
a large dataset to a new network. Because the weights from a
pre-trained network will likely be closer to the optimal weights
Fig. 8. Layers of the CNN than a randomly chosen set of weights, transfer learning is
quite a useful technique.
In both of the methods, we use the rectifier function as the
V. E XPERIMENTS /R ESULTS /D ISCUSSION
activation function given in Eq. 9.
We evaluated the performance of our learning algorithms
g(z) = max(0, z) (9)
using the ratio of the correct labels to the number of examples.
Rectifier function is desirable in CNNs because they make the We call this ratio the accuracy of the algorithm. To evaluate
computations faster, and are widely preferred in practice. the accuracy of our results, we extracted the confusion matrix
We use dropout layers in both methods. This is because from the data and the overall accuracy for test samples..
we found that there was a considerable amount of overfitting.
Dropout layers make the network less complicated by drop- A. Preprocessing with SVM
ping some portion of the neurons from the previous layer. A The SVM learning method obtained an overall training ac-
less complicated network implies less variance, which helps curacy of 0.4685 and test accuracy of 0.457. Since the training
reduce overfitting. and test accuracies are very similar, we can conclude that
We have max pooling layers in both methods to speed up there is very little bias in our learning algorithm. To prevent
the training, and somewhat control overfitting. Batch normal- overfitting, we only used samples of the power spectrum for
ization layer is to normalize the intermediate layer outputs, each image by regular sampling. Nearby points were not used
TABLE I are the 5, 7, 8’th classes. These are fishing vessels, tour boats,
C ONFUSION MATRIX FOR THE METHOD OF PREPROCESSING WITH SVM and motor yachts, respectively. A visual inspection over the
ŷ images of these types of images make it clear why they are
y 1 2 3 4 5 difficult to classify, namely, it is because they do not have a
1 0.5350 0.1700 0.2050 0.0150 0.0750 lot of distinctive features.
2 0.2500 0.3100 0.1400 0.0850 0.2150
3 0.2200 0.1400 0.4400 0.0200 0.1800
4 0.1150 0.1250 0.1100 0.5050 0.1450
5 0.0750 0.1000 0.1450 0.1850 0.4950

since they would be very similar in value, but highly prone to


noise. The same also applied for the color moment generating
function, orders higher than 5 were not used. The confusion
matrix for the test sample is shown in Table I. The columns are
the probabilities of classification. Diagonal entries correspond
to the probability of correct classification for a class. From
the confusion matrix we see that the algorithm was most The training progress of the CNN that uses AlexNet is
successful in classifying aircraft images and least successful given in Fig. 9. This figure illustrates that there is a little
for classifying bulkers. overfitting. There was initially more overfitting, but we added
B. Bag of Features a dropout layer which drops its outputs with probability 0.5.
Some more parameters: Minibatch size was set to 32, learning
The training accuracy of this method is 0.57, and the test
rate was 0.0001. The parameters of the first 22 layers are all
accuracy is 0.44 therefore we observe overfitting. We tried
coming from the pre-trained AlexNet network. Furthermore,
reducing the bias, but this was the best we could get. There
we have used the validation set accuracy to choose the hyper
is a lot of noise in the images, and the background can vary
parameters.
greatly from one image to another. For example, a considerable
number of images have mountains in the background. This
could be misleading the bag of features method especially if
this is a common feature among images from different classes.
The confusion matrix for this method is given in Table II,
which illustrates that the images from classes 4, 5, 7 (namely,
fire-fighting vessels, fishing vessels, and tour boats) are the
ones that get misclassified the most often. This observation
is reasonable since the variation in the images among these
classes is the highest.
Fig. 9. Training progress of the CNN with AlexNet

VI. C ONCLUSION /F UTURE W ORK


It is clear that CNN with AlexNet significantly outperforms
the other methods, which is expected given CNNs’ perfor-
mance in image classification tasks are superior to other known
methods. Combined with transfer learning, the performance
went up even further.
There is still some overfitting in the CNN method, and even
more in the bag of features method. Even though we tried to
C. Convolutional Neural Networks (CNNs) reduce it by adding regularization terms, making the models
We will mostly talk about the results of the CNN with simpler by reducing the number of parameters, in the end, we
AlexNet since it outperforms the CNN from scratch. Specif- are still left with some overfitting. This is mainly due to the
ically, the test accuracy is 0.7633 for the CNN from scratch, fact that we do not have enough data. CNN method used 16800
and 0.8822 for the CNN with AlexNet. To be able to use training samples, which is too little for CNN. This problem
AlexNet, we resized every image into 227 × 227 to use could be solved through data augmentation as future work.
AlexNet. We used bicubic interpolation for resizing. Although there is no overfitting for the preprocessing fol-
The confusion matrix of the CNN with AlexNet for the test lowed by SVM method, the attained accuracy is low compared
set is given in Table III. The confusion matrix illustrates that to the other two methods. To improve the performance, wavelet
categories that cause the highest misclassification probabilities analysis could be used to generate additional features.
VII. C ONTRIBUTIONS
• Okan Atalar: Worked on Preprocessing, SVM, experi-
ments and downloading dataset.
• Burak Bartan: Worked on bag of features and CNN
methods, and their experiments.
R EFERENCES
[1] Shipspotting.com, Home - ShipSpotting.com - Ship Photos
and Ship Tracker, ShipSpotting.com. [Online]. Available:
http://www.shipspotting.com/. [Accessed: 21-Oct-2017].
[2] E. Gundogdu, B. Solmaz, V. Ycesoy, and A. Ko, MARVEL: A Large-
Scale Image Dataset for Maritime Vessels, Computer Vision ACCV 2016
Lecture Notes in Computer Science, pp. 165180, 2017.
[3] C. Dance, J. Willamowski, L. Fan, C. Bray, and G. Csurka. Visual
categorization with bags of keypoints. In ECCV International Workshop
on Statistical Learning in Computer Vision., Prague, 2004.
[4] Mathworks.com. Available: https://www.mathworks.com/help/vision/ug/image-
classification-with-bag-of-visual-words.html
[5] Mathworks.com. Available: https://www.mathworks.com/discovery/convolutional-
neural-network.html.
[6] Mathworks.com. Available: https://www.mathworks.com/help/nnet/ref/alexnet.html

You might also like