Batch 6
Batch 6
IN SATELLITE IMAGES
A Report submitted
in partial fulfillment for the award of the Degree of
BACHELOR OF TECHNOLOGY
in
ELECTRONICS AND COMMUNICATION ENGINEERING
By
L.RAMANA 1012004907
P.SWETHA 1011904032
K.AKSHAYA 1011904002
N.SAI GOWTHAM 1011904025
Smt G.SILPALATHA
(Ph.D),
Academic Consultant
Department of ECE
CERTIFICATE
This is to certify that the project report entitled DEEP LEARNING
BASED DETECTION OF DEFORESTATION IN SATELLITE IMAGES
submitted by L.Ramana, P.Swetha, K.Akshaya & N.Sai Gowtham to the
Y.S.R Engineering College of Yogi Vemana University, Proddatur, in partial
fulfillment for the award of the degree of B.Tech in Electronics and
Communication Engineering is a bonafide record of project work carried out
by him/her under my/our supervision.
The contents of this report, in full or in parts, have not been submitted to
any other Institution or University for the award of any degree or diploma.
ii
DEPARTMENT OF ELECTRONICS AND COMMUNICATION
ENGINEERING
Y.S.R ENGINEERING COLLEGE OF YOGI VEMANA UNIVERSITY
Proddatur-516360, Y.S.R (Dt)
ANDHRA PRADESH
DECLARATION
We declare that this project report titled Deep Learning Based Detection Of
Deforestation In Satellite Images submitted in partial fulfillment of the degree of B.Tech in
Electronics and Communication Engineering is a record of original work carried out by us
under the Guidance of Smt G.Silpalatha. The matter embodied in this project report has not
been submitted by us for the award of any other degree or diploma.
L.Ramana 1012004907
P.Swetha 1011904032
K.Akshaya 1011904002
N.Sai Gowtham 1011904025
iii
ACKNOWLEDGMENTS
We take this opportunity to express our deepest gratitude appreciation to all those
who have helped us directly or indirectly towards the successful completion of this project.
It is great pleasure in expressing deep sense of gratitude and veneration to our guide,
Smt G.Silpalatha Academic Consultant, Department of Electronics and Communication
Engineering for her valuable guidance and thought provoking discussion throughout the
course of the project work.
We extend our profound gratefulness to Dr. S. Shafiulla Basha, Associate professor,
Department of Electronics and Communication Engineering for his encouragement and
support throughout the project.
We extend our profound gratefulness to, Dr. B.P. Santosh Kumar , Associate
professor, Project Coordinator & Head of the Department of Electronics and communication
Engineering for his encouragement and support throughout the project.
We take this opportunity to offer gratefulness to our Prof. K.Venkata Ramanaiah,
Dean Faculty of Engineering, Y.S.R Engineering College of Yogi Vemana university,
Proddatur for providing all sorts help during the project work.
We take this opportunity to offer gratefulness to our Prof. C. Nagaraju, Principal
of Y.S.R Engineering college of Yogi Vemana university, Proddatur for providing all sorts
help during the project work.
We express our thanks to all our college teaching and non-teaching staff members
who encouraged and helped us in some way or other throughout the project work.
Finally, we are thankful to all our friends who have in some way or the other
helped us getting towards the completion of this project work.
L.Ramana 1012004907
P.Swetha 1011904032
K.Akshaya 1011904002
N.Sai Gowtham 1011904025
iv
ABSTRACT
Human activity is an undeniable factor that increases the total forest area
loss. Different NGOs, governments, and private companies are looking for ways to prevent
human driven deforestation. A model that can generate interpretable deforestation predictions
is a valuable asset to prevent some causes of forest area loss, such as illegal logging and the
creation of pasture or plantation fields. Advances in satellite data analysis and artificial neural
networks resulted in different methods of creating such a model. This work lists the most used
machine learning techniques to create interpretable predictions. The problem of automatic
monitoring the deforestation process is considered for efficient prevention of illegal
deforestation. Image segmentation model on the basis of U-Net and ResNet family of deep
neural networks (DNNs) was created. The forest/deforestation dataset was collected by
parsing areas of Ukrainian floristries, where satellite images of 512×512 pixels contain areas
with forest, deforestation, and other areas. To overcome the imbalance of created dataset the
hybrid loss function was created and tested in the training environment. K-fold cross validation
and numerous runs for different random seeds were conducted to prove the model and dataset
usefulness and stability during the training and validation process. These results demonstrate
that variation of images in the dataset and randomness of initialization have no significant
effect on model performance, but the future research will be needed in the view of the possible
increase of datasets where performance could be improved by the larger data representation,
but some decrease of performance could be observed due to possible wider data variability. It
is especially important for deployment of DNNs on devices with the limited computational
resources for Edge Computing layer.
Key words: Deep learning, U-NET, Resnet 50, Deep Neural Networks
v
TABLE OF CONTENTS
DESCRIPTION PAGE NO
CERTIFICATE ii
DECLARATION iii
ACKNOWLEDGEMENTS iv
ABSTRACT v
TABLE OF CONTENTS vi
LIST OF FIGURES x
LIST OF TABLES xi
1. INTRODUCTION 1-12
1.1 Digital Image Processing 1
1.2 What is an Image 1
1.2.1 Analog Image 1
1.2.2 Digital Image 1
1.3 Representation of Digital Image 2
1.3.1 Neighbors of a Pixel 3
1.3.2 Image Resolution 3
1.3.3 Human Visual System 3
1.3.4 Brightness and Contrast 3
1.4 Image Formats 3
1.4.1 JPG 3
1.4.2 GIF 4
1.4.3 PNG 4
1.4.4 SVG 4
1.5 Types of Digital Images 5
1.5.1 Black and White Images 5
1.5.2 Colour Images 5
1.5.3 Binary or Bi-level Images 5
1.5.4 Indexed Coloured Images 5
vi
1.6 Resolution 6
1.6.1 Pixel Resolution 7
1.7 Colour Terminology 7
1.7.1 Primary and Secondary Colours and
Additive and Subtractive Colour Mixing 7
1.7.2 Colour Gamut 8
1.7.3 Colour Management 8
1.7.4 Hue 8
1.7.5 Saturation 8
1.7.6 Brightness 9
1.7.7 Luminance 9
1.7.8 Chrominance 9
1.8 Digital Image Colour Spaces 9
1.8.1 RGB 9
1.8.2 Hue Saturation Value 10
1.8.3 Hue Saturation Lightness 11
1.9 Forest 11
1.9.1 Deforestation 11
1.9.2 Degradation 12
vii
3. SEGMENTATION TECHNIQUES AND CNN 17-43
3.1 A Mathematical Definition of Segmentation 17
3.2 A Review on Existing Segmentation Techniques 17
3.2.1 Histogram Thresholding 18
3.2.2 Edge Based Segmentation 19
3.2.3 Active Contour Based Segmentation 19
3.2.4 Region Based Segmentation 20
3.3 Region Growing 22
3.4 Region Based Method 23
3.4.1 Growth of Regions 23
3.4.2 Growth Algorithm 23
3.4.3 Growth Types 24
3.4.4 Seed Growth 24
3.4.5 Neighbor Growth 24
3.4.6 Disadvantages of Region Growing 25
3.5 Clustering Method 25
3.6 Types of Clustering 25
3.6.1 Hierarchical Clustering 26
3.6.2 K-Means Clustering 28
3.6.3 Fuzzy C-Means Clustering 29
3.6.4 QT Clustering Algorithm 30
3.6.5 Spectral Clustering 31
3.7 Comparison Between Data Clustering 31
3.7.1 K-Means Algorithm 33
3.7.2 Fuzzy C-Means 33
3.7.3 EM Algorithm 33
3.8 Convolution Neural Networks 34
3.9 Convolutional Layer 35
3.10 Pooling Layer 39
3.11 Batch Normalization 40
3.12 Residual Connections 41
viii
4. PROPOSED METHODOLOGY 44-50
4.1 Introduction 44
4.2 Resnet 45
4.3 Proposed Model 47
4.3.1 Data Set Used 47
4.3.2 Loss Function 49
4.3.3 Model 50
6. RESULTS 58-59
7.CONCLUSION 60
REFERENCES 61-62
APPENDIX 63-65
ix
LIST OF FIGURES
DESCRIPTION NUMBERS PAGE NO
1.1 Resolution of the image if the pixels get larger and larger
6
details in the image are less
1.2 Resolution of an image if the size is reduced 7
1.3 RGB Cube 10
1.4 Different percentage of HSV 10
1.5 Different percentage of HSL 11
2.1 RADAR data examples 12
3.1 Histogram of images 19
3.2 Raw data 27
3.3 Traditional representation 27
3.4 Hierarchical feature extraction CNN 35
3.5 Illustration of convolution layer 36
3.6 Example of valid cross correlation without zero padding 38
3.7 Example of valid cross correlation with zero padding 38
3.8 Illustration pooling layer 39
3.9 Illustration of residual connection 41
3.10 Comparison of the standard residual design with the 42
bottle neck design
4.1 Resent functioning 46
4.2 Satellite image, image from initial dataset, mask 48
6.1 Input image 58
6.2 Output image using U-NET 58
6.3 Output image using RESNET 59
x
LIST OF TABLES
DESCRIPTION NUMBERS PAGE NO
3.1 Example data 33
4.1 Resnet description 47
xi
DEEP LEARNING BASED DETECTION OF DEFORESTATION IN SATELLITE IMAGES
CHAPTER 1
INTRODUCTION
1.1 Digital Image Processing
Digital image plays an important role, both in daily life applications such as satellite
television, magnetic resonance imaging, computer tomography as well as in area of research
and technology such as geographical information system and astronomy. An image is a 2D
representation of a three-dimensional scene. A digital image is basically a numerical
representation of an object. The term digital image processing refers to the manipulation of an
image by means of a processor. The different elements of an image processing system include
image acquisition, image storage, image processing and display.
The use of computer algorithms to performs on digital images. As a subcategory or
field of digital signal processing, digital image processing has many advantages over analog
image processing. It allows a much wider range of algorithms to be applied to the input data
and can avoid problems such as the build-up of noise and signal distortion during processing.
Since images are defined over two dimensions (perhaps more) digital image processing may
be modeled in the form of multi-image processing. Digital image processing methods stem
from two principal application areas: improvement of pictorial information for human
interpretation and processing of image data for storage, transmission and representation for
autonomous machine perception.
1.2 What is an Image?
An image is a two-dimensional function that represents a measure of characteristic
such as brightness or colour of a viewed scene. An image is a projection of a 3D scene into a
2D projection plane. It can be defined as a two variable function f(x, y) where for each position
(x, y) in the projection plane; f(x, y) defines the light intensity at this point.
1.2.1 Analog Image
An analog image can be mathematically represented as a continuous range of values
representing position and intensity. An analog image characterized by physical magnitude
varying continuously in space.
1.2.2 Digital Image
A digital image is composing of picture elements called pixels. Pixels are the smallest
sample of an image. A pixel represents the brightness at one point. Conversion of an analog
image into digital image involves two important operations namely sampling and quantization.
though. If you decrease the quality of a JPG too much, you will begin to lose important colour
information that cannot be recovered. The JPG file format also allows you to save progressive
JPGs, which will load in stages. You may have experienced this before when visiting a website
and watching as an image slowly loses its blurriness and becomes clearer. Use JPGs for
product photos, human portraits and other images where colour variances are important. Do
not use JPGs if you need transparency, which is the ability to see through an image and
decipher the background behind it. JPGs do not support transparency.
1.4.2 GIF
PNGs, or Portable Network Graphics, were created as an alternative to the GIF file
format, when the GIF technology was copyrighted and required permission to use. PNGs allow
for 5 to 25 percent greater compression than GIFs, and with a wider range of colours. Like
GIFs, PNG file formats also support transparency, but PNGs support variable transparency,
where users can control the degree to which an image is transparent. The downside to
advanced transparency in PNGs is that not all older browsers will display the transparency the
same. PNGs also support image interlacing, similar to GIFs, but PNGs use two-dimensional
interlacing, which makes them load twice as fast as GIF images. If you are interested in this
interlacing technology.
1.4.4 SVG
The standard has actually been around for more than a decade, but with the recent
emergence of HTML5 it is finally coming of age. For now, know that SVG allows you to
create very high-quality graphics and animations that do not lose detail as their size increases.
This means that with SVG you could create one graphic that looked great on a tiny mobile
phone screen or on a 60-inch computer monitor.
1.5 Types of Digital Images
For photographic purposes, there are two important types of digital images—colour
and black and white. Colour images are made up of coloured pixels while black and white
images are made of pixels in different shades of gray.
1.5.1 Black and White Images
A black and white image is made up of pixels each of which holds a single number
corresponding to the gray level of the image at a particular location. These gray levels span
the full range from black to white in a series of very fine steps, normally 256 different grays.
Since the eye can barely distinguish about 200 different gray levels, this is enough to give the
illusion of a step less tonal scale as illustrated below:
Assuming 256 gray levels, each black and white pixel can be stored in a single byte
1.5.2 Colour Images
A colour image is made up of pixels each of which holds three numbers corresponding
to the red, green, and blue levels of the image at a particular location. Red, green, and blue
(sometimes referred to as RGB) are the primary colours for mixing light—these so-called
additive primary colours are different from the subtractive primary colours used for mixing
paints (cyan, magenta, and yellow). Any colour can be created by mixing the correct amounts
of red, green, and blue light. Assuming 256 levels for each primary, each colour pixel can be
stored in three bytes (24 bits) of memory. This corresponds to roughly 16.7 million different
possible colours. Note that for images of the same size, a black and white version will use
three times less memory than a colour version.8 bits) of memory.
1.5.3 Binary or Bi-Level Images
Binary images use only a single bit to represent each pixel. Since a bit can only exist
in two states—on or off, every pixel in a binary image must be one of two colours, usually
black or white. This inability to represent intermediate shades of gray is what limits their
usefulness in dealing with photographic images.
1.5.4 Indexed Colour Images
Some colour images are created using a limited palette of colours, typically 256
different colours. These images are referred to as indexed colour images because the data for
each pixel consists of a palette index indicating which of the colours in the palette applies to
that pixel. There are several problems with using indexed colour to represent photographic
images. First, if the image contains more different colours than are in the palette, techniques
such as dithering must be applied to represent the missing colours and this degrades the image.
Second, combining two indexed colour images that use different palettes or even retouching
part of a single indexed colour image creates problems because of the limited number of
available colures.
1.6 Resolution
The more points at which we sample the image by measuring its colour, the more
detail we can capture. The density of pixels in an image is referred to as its resolution. The
higher the resolution, the more information the image contains. If we keep the image size the
same and increase the resolution, the image gets sharper and more detailed. Alternatively, with
a higher resolution image, we can produce a larger image with the same amount of detail.
Fig.1.1 shows resolution of the image if the pixels get larger and larger details in the
image are less
As we reduce the resolution of an image while keeping its pixels the same size—the image
gets smaller and smaller while the amount of detail (per square inch) stays the same
mixing is the one most of us learned in school, and it describes how two coloured paints or
inks combine on a piece of paper. The three subtractive primaries are Cyan (blue-green),
Magenta (purple-red), and Yellow (not Blue, Red, and Yellow as we were taught). Additive
colour mixing refers to combining lights of two different colours, for example by shining two
coloured spotlights on the same white wall. The additive colour model is the one used in
computer displays as the image is formed on the face of the monitor by combining beams of
red, green, and blue light in different proportions. Colour printers use the subtractive colour
model and use cyan, magenta, and yellow inks. To compensate for the impure nature of most
printing inks, a fourth colour, black is also used since the black obtained by combining cyan,
magenta, and yellow inks is often a murky dark green rather than a deep, rich black. For this
and other reasons, commercial colour printing presses use a 4-colour process to reproduce
colour images in magazines. A colour created by mixing equal amounts of two primary colours
is called a secondary.
1.7.2 Colour Gamut
In the real world, the ideal of creating any visible colour by mixing three primary
colours is never actually achieved. The dyes, pigments, and phosphors used to create colours
on paper or computer screens are imperfect and cannot recreate the full range of visible
colours. The actual range of colours achievable by a particular device or medium is called its
colour gamut and this is mostly but not entirely determined by the characteristics of its primary
colours. Since different devices such as computer monitors, printers, scanners, and
photographic film all have different colour gamut, the problem of achieving consistent colour
is quite challenging. Different media also differ in their total dynamic range—how dark is the
darkest achievable black and how light is the brightest white.
1.7.3 Colour Management
The process of getting an image to look the same between two or more different media
or devices is called colour management, and there are many different colour management
systems available today. Unfortunately, most are complex, expensive, and not available for a
full range of devices.
1.7.4 Hue
The hue of a colour identifies what is commonly called “colour.” For example, all reds
have a similar hue value whether they are light, dark, intense, or pastel.
1.7.5 Saturation
The saturation of a colour identifies how pure or intense the colour is.A fully saturated
colour is deep and brilliant—as the saturation decreases, the colour gets paler and more washed
out until it eventually fades to neutral.
1.7.6 Brightness
The brightness of a colour identifies how light or dark the colour is. Any colour whose
brightness is zero is black, regardless of its hue or saturation. There are different schemes for
specifying a colour's brightness and depending on which one is used, the results of lightening
a colour can vary considerably.
1.7.7 Luminance
The luminance of a colour is a measure of its perceived brightness. The computation
of luminance takes into account the fact that the human eye is far more sensitive to certain
colours (like yellow-green) than to others (like blue).
1.7.8 Chrominance
Chrominance is a complementary concept to luminance. If you think of how a
television signal works, there are two components—a black and white image which represents
the luminance and a colour signal which contains the chrominance information. Chrominance
is a 2-dimensional colour space that represents hue and saturation, independent of brightness.
1.8 Digital Image Colour Spaces
A colour space is a mathematical system for representing colours. Since it takes at least
three independent measurements to determine a colour, most colour spaces are three-
dimensional. Many different colour spaces have been created over the years in an effort to
categorize the full gamut of possible colours according to different characteristics.
1.8.1 RGB
Most computer monitors work by specifying colours according to their red, green, and
blue components. These three values define a 3-dimensional colour space call the RGB colour
space. The RGB colour space can be visualized as a cube with red varying along one axis,
green varying along the second, and blue varying along the third. Every colour that can be
created by mixing red, green, and blue light is located somewhere within the cube. The
following images show the outside of the RGB cube viewed from two different directions:
The eight corners of the cube correspond to the three primary colours (Red, Green, Blue), the
three secondary colours (Cyan, Magenta, Yellow) and black and white. All the different
neutral grays are located on the diagonal of the cube that connects the black and the white
vertices.
but not limited to, soil erosion, ooding, forest fires, and hurricanes. While deforestation effects
can be perceived with satellite images, the same cannot be stated about selective logging.
1.9.2 Degradation
Forest degradation is a process that usually precedes deforestation. The Kyoto protocol
does not provide a definition, and there is no universally agreed interpretation (Sasaki and
Putz, 2009). This project defines that when a forest loses 10 to 30% of canopy cover on forest
area, forest degradation is in course. It is easy to see the effect from the ground, but not so
easy using satellites.
Figure 2.1: Radar data examples. Each green pixel is 15m2of forest area. A red
pixel corresponds to deforestation and colors between yellow and orange are levels of
degradation.
For this project, radar sensors provide the technology to differentiate both of them, as seen on
1.9. Human-based degradation can also be the result of selective logging, forest usage by
guerillas, or drug trafficking. Selective logging is a type of tree removal in which the objective
is to retrieve a limited number of marketable tree species This modality can damage other
trees, affect soil and local fauna. Since it is hard to capture it with satellite images, it will be
out of scope for this study, unless it can be categorized as forest degradation.
CHAPTER 2
LITERATURE SURVEY
2.1 Introduction
Forest is considered as an important part in context to the environment. The major
purpose is to inhale carbon dioxide and generate oxygen in their cycle of photosynthesis for
maintaining a balance and healthy atmosphere. Examination of environmental disasters, such
as biodiversity loss, deforestation, depletion of natural resources, etc., necessitates the
computation of continuous change detection in the forest. Nowadays, land cover change
analysis is performed using satellite images. Several techniques are introduced for forest
change detection, but missing data in the satellite images is a serious problem due to artifacts,
cloud occlusion, and so on. Thus, techniques handling missing data for forest change detection
are essential. As a result, this survey provides a review of unique forest change detection
mechanisms. Therefore, this paper presents a complete analysis of 25 papers presenting a
forest change detection method, like Machine learning techniques, Pixel-based techniques. In
addition, a detailed investigation is carried out based on the performance measures, images
adapted, datasets used, evaluation metrics, and accuracy range. Finally, the issues faced by
different forest change detection methods are offered to extend the researchers to form
enhanced role in considerable detection methods.
Sustainable development is an agreed-upon global forestry trend because it impacts
other deforestation-related industries. Lots of factors, such as forest canopy density, forest
degradation, the patterns and processes of deforestation, and logging intensity, are important
for the sustainable development of forest ecosystems and the global carbon budget [1–8]. The
important principles of Chinese forestry blueprinting have been well integrated with the
harnessing of natural resources, environmental protection, and ecological balance. Improving
the ecological environment, focusing on ecoengineering, and effectively maintaining ecology
have become the leading demands of socioeconomic forestry development in this century [1–
8]. The change of this leading demand has given the Chinese forestry industry the most
favourable economic development position.
Specifically, forestry has become the center and foundation of ecological maintenance
and socioeconomic sustainable development. Forestry is no longer a single industry; rather, it
is a comprehensive and dynamic system within which any development or change of its
components would directly or indirectly impact the entire economy. Therefore, in response to
sustainable forestry development, a broad set of regulations, guidelines, and the technology
DEPT OF ECE, YSREC OF YVU, PDTR 13
DEEP LEARNING BASED DETECTION OF DEFORESTATION IN SATELLITE IMAGES
were developed to control and safeguard forest management practice that encompasses
silvicultural treatment, forest conservation strategy, cutting rate control, deforestation and
forest degradation monitoring, developing cableway logging [9–12], reducing impact logging,
assessment of biomass and carbon stock, and so forth [12]. Many studies have examined the
patterns and processes of deforestation [6–8, 13–17], but information about the light cableway
skidding technology beneficial to forest ecosystems is still limited [10, 18–25].
Noticeably, technological innovation is one of the most important components of the
forestry system. Scientific deforestation plays a vital role in the forestry sustainable
development and forestry competitiveness. The adoption of deforestation technology may
have a direct or an indirect and beneficial or damaging impact on the overall forestry system;
furthermore, it also has immediate implications for the sustainable development of forest
resources as well as the economy and society. Accordingly, the engineering-based study of
the adoption of rational analytic methods for the evaluation of sustainable forestry
development is of great significance.
2.2 Fundamental Principles of the Evaluation Model
2.2.1. Systematization and Connectivity
The forest ecosystem is complicated and features systematization and
comprehensiveness, which has many interrelated and interacted components. This system is
also interlocked with every operation of forestry business, demonstrating its innate
diversification. Noticeably, every subsystem in the forest ecosystem is not only relatively
independent but also interdependent. They are directly or indirectly related to each other to
develop dynamically and to compose a compatible system.
2.2.2. Scientific Integrity and Objectivity
Scientific integrity refers to the fact that research in any discipline should make its
subject objective, use detectable laws, and have theoretical verifiability, strict logicality, and
united scientific value. Specifically, the construction of the model should not only conform to
the fundamental scientific principles but also reflect the internal components, law of
development and characteristics of the material and technical base of the forest ecosystem
itself. Additionally, the construction of the evaluation model and the choosing of an index
system should consider the architecture layer, organization, and archetypes. Objectivity means
that the construction of the evaluation model and the chosen index should be as consistent as
possible with the objective reality and the objective laws of forest ecosystem development. In
addition, all of the data from the index and evaluation systems should be as objective and
accurate as possible. The data collection process should be based on the statistics released by
national or provincial statistics departments or profession-qualified statistics institutions to
guarantee data authority and reliability.
2.2.3. Sustainable Development
Because the forest ecosystem has economic, ecological, and social benefits, forest
operators cannot be driven merely by economic interests but must be guided by a sustainable
development principle to reconcile these main three benefits. Additionally, forest operators
should be oriented by the market mechanism and the law of value reasonably and should
consider the economic and social sustainable development of the forest in the pursuit of profits.
Hence, the related governmental departments should provide necessary guidelines, education
and regulatory supervision to ensure that the forest operators are engaged in sustainable
development efforts.
2.2.4. Transparency
Transparency means that the construction of the evaluation model and index system for
the sustainable forest development is set to be open, transparent, accurate, and specific so that
some constructive suggestions can be adopted.
2.2.5. Reliability
Reliability requires that the methods adopted by the evaluation model for sustainable
forest development should be feasible and practical. Otherwise, the model would lose its
significance and operability. The index system should have a reliable, continuous, and
authoritative data source. Since some important indexes fail to attain reliable data sources,
they should be preserved until the data collection process is ready or they are just regarded as
a theoretical basis and the calculation is omitted. On all accounts, the index system should
represent the facts and be precise, workable, and practical.
2.2.6. Comparability
Comparability means that the construction of the evaluation model and index system for
the sustainable forest development should be available for objective comparison with the
alternatives. Different types of statistics are used to reflect the sustainable development of the
forest during construction; so, different types of indexes should be compared. For example,
the dimensionless method can give index comparability.
CHAPTER 3
SEGMENTATION TECHNIQUES AND CNN
3.1 A Mathematical Definition of Segmentation
The following is a very general definition of image segmentation. It uses a
homogeneity predicate P(R) that helps formalizing the notion of homogeneity in an image: a
region R is homogeneous if and only if P(R) = True. Therefore, the homogeneity can be
defined in infinity of different ways: on the grey levels, on the textures or even on non-obvious
properties of the image
Definition 1 (segmentation): Let I be the set of pixels (the input image) and P(𝑅 ) the
homogeneity predicate defined on groups of connected pixels.
A segmentation S of I is a partitioning set of image regions {R1,R2, . . . ,Rn} sPuch that
⋃ 𝑅 = 𝐼 𝑎𝑛𝑑 𝑅 ∩ 𝑅 = ∅ Ɐ 𝑖 ≠ 𝑗 (eq. 3.1)
Is a mathematical definition of a partition: the union of all the regions forms the whole image
and all the regions are distinct.
𝑃(𝑅 ) = 𝑇𝑟𝑢𝑒 Ɐ 𝑖 (eq. 3.2)
Signifies that the homogeneity predicate is valid on every region
𝑃 𝑅 ∪𝑅 = 𝐹𝑎𝑙𝑠𝑒 Ɐ 𝑅 𝑎𝑑𝑗𝑎𝑐𝑒𝑛𝑡 𝑡𝑜 𝑅 (eq. 3.3)
signifies that the union of two adjacent regions cannot satisfy the homogeneity predicate, i.e.
two adjacent regions must be distinct regarding the homogeneity predicate.
𝑅 ⊂𝑅 ∧ 𝑅 ≠∅ ∧ 𝑃 𝑅 = 𝑇𝑟𝑢𝑒 = (𝑃(𝑅 ) = 𝑇𝑟𝑢𝑒) (eq. 3.4)
Signifies that the homogeneity predicate is valid on any sub-region of a region where it is
verified.
3.2 A Review on Existing Segmentation Techniques
A wide range of very specialized segmentation techniques currently exist and since the
research is very active in this field; the panel of available techniques and algorithms constantly
evolves. Therefore, a complete study that would review all the state-of-the-art techniques is
not relevant in the context of this document. Instead, this section tries to present a simple yet
homogeneous and relevant classification of the existing techniques into a number of families.
For each family the general functional philosophy is analyzed and a non-extensive list of
algorithms is presented, with a short explanation of the specificities for each of them.
There are numerous types of classifications proposed in the specialized literature, each
of which is relevant respectively to the point of view required by the study. Since this research
DEPT OF ECE, YSREC OF YVU, PDTR 17
DEEP LEARNING BASED DETECTION OF DEFORESTATION IN SATELLITE IMAGES
project deals with medical image segmentation, where a large majority of the acquired data is
grey-scaled, and all the techniques concerning color images will be left aside. The techniques
are categorized into three main families:
a. Pixel based techniques (also known as histogram thresholding);
b. Edge based techniques;
c. Region based techniques.
This classification is very commonly encountered in numerous papers as
Histogram thresholding
Edge based segmentation
Tree or graph based approaches
Region growing
Clustering
Probabilistic and Bayesian approaches
Neural networks segmentation
Other approaches.
3.2.1 Histogram Thresholding
The pixel-based family of techniques is probably the simplest one; it essentially
consists in finding an acceptable threshold in the grey levels of the input image in order to
separate the object(s) from the background. It is often referred to as histogram thresholding
since the grey-levels histogram of an ideal image will clearly show two distinct peaks
assimilable to Gaussians (which can be obtained by applying a filter to the image) representing
the distribution of grey levels for one object and its background.
Histogram-Based Methods
Histogram-based methods are very efficient when compared to other image
segmentation methods because they typically require only one pass through the pixels. In this
technique, a histogram is computed from all of the pixels in the image, and the peaks and
valleys in the histogram are used to locate the clusters in the image. Color or intensity can be
used as the measure. A refinement of this technique is to recursively apply the histogram-
seeking method to clusters in the image in order to divide them into smaller clusters. This is
repeated with smaller and smaller clusters until no more clusters are formed.
One disadvantage of the histogram-seeking method is that it may be difficult to identify
significant peaks and valleys in the image. In this technique of image classification distance
metric and integrated region matching are familiar.
unlike edge detection, active contour methods are much more robust to noise as the
requirements for contour smoothness and contour continuity act as a type of regularization.
Another advantage of this approach is that prior knowledge about the object’s shape can be
built into the contour parameterization process. However, active contour-based algorithms
usually require initialization of the contour close to the object boundary for it to converge
successfully to the true boundary. More importantly, active contour methods have difficulty
handling deeply convoluted boundary such as CSF, GM and WM boundaries due to their
contour smoothness requirement. Hence, they are often not appropriate for the segmentation
of brain tissues. Nevertheless, it has been applied successfully to the segmentation of
intracranial boundary, brain outer surface and Neuro-anatomic structures in MR brain images.
3.2.4 Region-Based Segmentation
The region-based family of techniques fundamentally aims at iteratively building
regions in the image until a certain level of stability is reached. The region growing algorithms
start from well-chosen seeds (usually defined by the user). They then expand the seed regions
by annexing their homogeneous neighbors. The process is iterated until all the pixels in the
image have been classified. The region splitting algorithms use the entire image as a seed and
split it into regions until no more heterogeneity can be found. An algorithm that associates the
advantages of both methods, called the Split, Merge and Group (SMG) algorithm, has been
developed by Horowitz and Pavlidis.
The shape of an object can be described in terms of its boundary or the region it
occupies. Image region belonging to an object generally have homogeneous characteristics,
e.g. similar in intensity or texture. Region-based segmentation techniques attempt to segment
an image by identifying the various homogeneous regions that correspond to different objects
in an image. Unlike clustering methods, region-based methods explicitly consider spatial
interactions between neighboring voxels. In its simplest form, region growing methods usually
start by locating some seeds representing distinct regions in the image. The seeds are then
grown until they eventually cover the entire image. The region growing process is therefore
governed by a rule that describe the growth mechanism and a rule that check the homogeneity
of the regions at each growth step. Region growing technique has been applied to MRI
segmentation.
The main goal of segmentation is to partition an image into regions. Some
segmentation methods such as "Thresholding" achieve this goal by looking for the boundaries
between regions based on discontinuities in gray levels or color properties. Region-based
segmentation is a technique for determining the region directly. The basic formulation for
Region-Based Segmentation is:
(eq. 3.5)
(eq. 3.7)
(eq. 3.9)
P(Ri) is a logical predicate defined over the points in set P(Rk) and is the null set.
(a) means that the segmentation must be complete; that is, every pixel must be in a region.
(b) requires that points in a region must be connected in some predefined sense.
(c) indicates that the regions must be disjoint.
(d) deals with the properties that must be satisfied by the pixels in a segmented region.
(e) For example, P(Ri) = TRUE if all pixels in Ri have the same gray level.
(f) indicates that region Ri and Rj are different in the sense of predicate P.
A semi-automatic, interactive MRI segmentation algorithm was developed that employ
simple region growing technique for lesion segmentation. In an automatic statistical region
growing algorithm based on a robust estimation of local region mean and variance for every
voxel on the image was proposed for MRI segmentation. Furthermore, relaxation labeling,
region splitting, and constrained region merging were used to improve the quality of the MRI
segmentation. The determination of an appropriate region homogeneity criterion is an
important factor in region growing segmentation methods. However, such homogeneity
criterion may be difficult to obtain a priori. An adaptive region growing method is proposed
where the homogeneity criterion is learned automatically from characteristics of the region to
be segmented while searching for the region. Other region-based segmentation techniques,
1. Split-and-merge based segmentation and
2. Watershed based segmentation have also been proposed for MRI segmentation
Split-and-merge based segmentation
In the split-and-merge technique, an image is first split into many small regions during the
splitting stage according to a rule, and then the regions are merged if they are similar enough
the clusters, as some elements may be close to one another according to one distance and
farther away according to another.
For example, in a 2-dimensional space, the distance between the point (x = 1, y = 0)
and the origin (x = 0, y = 0) is always 1 according to the usual norms, but the distance between
the point (x = 1, y = 1) and the origin can be 2, √2 or 1 if you take respectively the 1-norm, 2-
norm or infinity-norm distance.
Common distance functions
The Euclidean distance (also called distance as the crow flies or 2-norm distance). A
review of cluster analysis in health psychology research found that the most common
distance measure in published studies in that research area is the Euclidean distance or
the squared Euclidean distance.
The Manhattan distance (aka taxicab norm or 1-norm)
The maximum norm (aka infinity norm)
The Mahalanobis distance corrects data for different scales and correlations in the
variables.
The angle between two vectors can be used as a distance measure when clustering high
dimensional data. See Inner product space.
The Hamming distance measures the minimum number of substitutions required to
change one member into another.
Another important distinction is whether the clustering uses symmetric or asymmetric
distances. Many of the distance functions listed above have the property that distances are
symmetric (the distance from object A to B is the same as the distance from B to A). In other
applications this is not the case. (A true metric gives symmetric measures of distance.)
3.6.1 Hierarchical Clustering
Hierarchical clustering creates a hierarchy of clusters which may be represented in a
tree structure called a dendrogram. The root of the tree consists of a single cluster containing
all observations, and the leaves correspond to individual observations.
Algorithms for hierarchical clustering are generally either agglomerative, in which one
starts at the leaves and successively merges clusters together; or divisive, in which one starts
at the root and recursively splits the clusters. The choice of which clusters to merge or split is
determined by a linkage criterion, which is a function of the pairwise distances between
observations. Cutting the tree at a given height will give a clustering at a selected precision.
In the following example, cutting after the second row will yield clusters {a} {b c} {d e} {f}.
Cutting after the third row will yield clusters {a} {b c} {d e f}, which is a coarser clustering,
with a smaller number of larger clusters.
Agglomerative hierarchical clustering
For example, suppose this data is to be clustered, and the Euclidean distance is the distance
metric.
The mean distance between elements of each cluster (also called average linkage
clustering, used e.g. in UPGMA):
(eq. 3.11)
(eq. 3.12)
The sum of all intra-cluster variance. The increase in variance for the cluster being
merged (Ward's criterion).
Each agglomeration occurs at a greater distance between clusters than the previous
agglomeration, and one can decide to stop clustering either when the clusters are too
far apart to be merged (distance criterion) or when there is a sufficiently small number
of clusters (number criterion).
3.6.2 K-means Clustering
The k-means algorithm assigns each point to the cluster whose center (also called
centroid) is nearest. The center is the average of all the points in the cluster — that is, its
coordinates are the arithmetic mean for each dimension separately over all the points in the
cluster.
Example: The data set has three dimensions and the cluster has two points: X = (x1, x2, x3)
and Y = (y1, y2, y3). Then the centroid Z becomes Z = (z1, z2, z3), where z1 = (x1 + y1)/2
and z2 = (x2 + y2)/2 and z3 = (x3 + y3)/2.
The algorithm steps are
Choose the number of clusters, k.
Randomly generate k clusters and determine the cluster centers, or directly generate k
random points as cluster centers.
Assign each point to the nearest cluster center.
Re compute the new cluster centers.
Repeat the two previous steps until some convergence criterion is met
The main advantages of this algorithm are its simplicity and speed which allows it to
run on large datasets. Its disadvantage is that it does not yield the same result with each
run, since the resulting clusters depend on the initial random assignments. It minimizes
intra-cluster variance, but does not ensure that the result has a global minimum of variance.
Other popular variants of K-means include the Fast Genetic K-means Algorithm (FGKA)
and the Incremental Genetic K-means Algorithm (IGKA).
3.6.3 Fuzzy C-means Clustering
In fuzzy clustering, each point has a degree of belonging to clusters, as in fuzzy logic,
rather than belonging completely to just one cluster. Thus, points on the edge of a cluster, may
be in the cluster to a lesser degree than points in the center of cluster. For each point x we have
a coefficient giving the degree of being in the kth cluster uk(x).
(eq.3.13)
With fuzzy c-means, the centroid of a cluster is the mean of all points, weighted by their degree
of belonging to the cluster:
(eq. 3.14)
The degree of belonging is related to the inverse of the distance to the cluster center then the
coefficients are normalized and fuzzyfied with a real parameter m > 1 so that their sum is 1.
(eq. 3.15)
(eq. 3.16)
For m equal to 2, this is equivalent to normalizing the coefficient linearly to make their sum
1. When m is close to 1, then cluster center closest to the point is given much more weight
than the others, and the algorithm is similar to k-means.
The fuzzy c-means algorithm is very similar to the k-means algorithm:
Choose a number of clusters.
Assign randomly to each point coefficients for being in the clusters.
Repeat until the algorithm has converged (that is, the coefficients' change between two
iterations is no more than , the given sensitivity threshold) : Compute the centroid
for each cluster, using the formula above. For each point, compute its coefficients of
being in the clusters, using the formula above.
The algorithm minimizes intra-cluster variance as well, but has the same problems as k-means,
the minimum is a local minimum, and the results depend on the initial choice of weights.
The expectation-maximization algorithm is a more statistically formalized method which
includes some of these ideas: partial membership in classes. It has better convergence
properties and is in general preferred to fuzzy-c-means.
3.6.4 QT Clustering Algorithm
QT (quality threshold) clustering is an alternative method of partitioning data, invented
for gene clustering. It requires more computing power than k-means, but does not require
specifying the number of clusters a priori, and always returns the same result when run several
times. The algorithm is:
The user chooses a maximum diameter for clusters.
Build a candidate cluster for each point by including the closest point, the next closest,
and so on, until the diameter of the cluster surpasses the threshold.
Save the candidate cluster with the most points as the first true cluster, and remove all
points in the cluster from further consideration. Must clarify what happens if more than
1 cluster has the maximum number of points ?
The distance between a point and a group of points is computed using complete
linkage, i.e. as the maximum distance from the point to any member of the group
DEPT OF ECE, YSREC OF YVU, PDTR 30
DEEP LEARNING BASED DETECTION OF DEFORESTATION IN SATELLITE IMAGES
Locality-sensitive hashing:
Locality-sensitive hashing can be used for clustering. Feature space vectors are sets,
and the metric used is the Jaccard distance. The feature space can be considered high-
dimensional. The min-wise independent permutations LSH scheme (sometimes MinHash) is
then used to put similar items into buckets. With just one set of hashing methods, there are
only clusters of very similar elements. By seeding the hash functions several times, it is
possible to get bigger clusters.
Graph-theoretic methods:
Formal concept analysis is a technique for generating clusters of objects and attributes,
given a bipartite graph representing the relations between the objects and attributes. Other
methods for generating overlapping cluster are discussed by Jardine and Sibson (1968) and
Cole and Wishart (1970).
3.6.5 Spectral Clustering
Given a set of data points A, the similarity matrix may be defined as a matrix S where
description of clustering methods for images is given in a source. The other approach to
partition an image into K clusters is the statistical hierarchical agglomerative cauterization
technique for identification of images regions by the color similarity. This method uses a
binary mask and ranks the color components of the clusters’ central components. The basic
algorithm is:
1. Each pixel is the separate cluster
2. The clusters with the same masks join into new clusters
3.7.2 Fuzzy C-Means
The fuzzy c-means algorithm, like the k-means algorithm, the fuzzy c-means aims to
minimize an objective function. The fuzzy c- mean algorithm is better than the k-mean
algorithm, since in k-mean algorithm, feature vectors of the data’s set can be partitioned into
hard clusters, and the feature vector can exactly be a member of one cluster only. Instead, the
fuzzy c-mean relax the condition, and it allows the feature vector to have multiple membership
grades to multiple clusters, Suppose the data set with known clusters and a data point which
is close to both clusters but also equidistant to them. Fuzzy clustering gracefully copes with
such dilemmas by assigning this data point equal but partial memberships to both clusters that
is the point belong to both clusters with some degree of membership grades varies 0 to 1.
Example:
Suppose we have taken the data in table (1). We choose k = 2 (two clusters), where k
is a number of clusters, and we use both crisp clustering method and fuzzy clustering method
to make 2 clusters. Instead, in the fuzzy clustering, the object belongs to both clusters with
different degrees of memberships.
Table 3.1 Example Data
3.7.3 EM Algorithm
Expectation Maximization (EM) is one of the most common algorithms used for
density estimation of data points in an unsupervised setting. The algorithm relies on finding
the maximum likelihood estimates of parameters when the data model depends on certain
latent variables. In EM, alternating steps of Expectation (E) and Maximization (M) are
performed iteratively till the results converge. The E step computes an expectation of the
DEPT OF ECE, YSREC OF YVU, PDTR 33
DEEP LEARNING BASED DETECTION OF DEFORESTATION IN SATELLITE IMAGES
likelihood by including the latent variables as if they were observed, and a maximization (M)
step, which computes the maximum likelihood estimates of the parameters by maximizing the
expected likelihood found on the last E step. The parameters found on the M step are then used
to begin another E step, and the process is repeated until convergence. Mathematically for a
given training dataset {x(1),x(2),….x(m)} and model p(x, z) where z is the latent variable, We
have
(eq. 3.17)
(eq. 3.18)
Fig 3.4: Hierarchical feature extraction of a convolutional neural network. The bottom
the ImageNet dataset]. The top row illustrates the layers of a CNN.
For example, the first layers extract color blobs and edges, while the middle layers extract
combinations such as circles. Thereafter, certain objects are extracted that are hopefully
linearly separable by a classifier (i.e., the final fully-connected layer).
3.9 Convolutional Layer
The convolutional layer is motivated by the fact that, in an image, the information of each
pixel has a strong local correlation to neighboring pixels (e.g., edges are an important feature
formed by local correlations). Since features can be present in several areas of an image, a
filter needs to slide over the complete input data to extract them. The local correlations are
utilized by convolving a small filter K with the input data. The filter often has a symmetric
kernel size of k× k. Although the layer is called a convolutional layer, the cross-correlation is
typically calculated because this helps to omit kernel flipping. For a two-dimensional input
matrix, I and filter K, the two-dimensional cross-correlation is calculated as follows:
(eq.3.19)
Notably we calculate a valid cross-correlation. This means that the calculation area is
× ×
constrained to pixels (i, j), where the filter K ∈ ℝ is fully within the input matrix I ∈ ℝ
Let h = k/2, where ⌊. ⌋ is the integer division. Thus, we can define the calculation area with
i ∈ {h, h + 1, … … . , p − h} and j ∈ {h, h + 1, … … . , q − h}. The parameters of the filters are
learned during training of the neural network.
(eq. 3.20)
the input that contributed to the feature calculation. As such, each local receptive field can
learn its own filter K , , with the same size as R , , . The displacement of each local receptive
field in a convolutional layer is defined by the stride s ∈ N ∗ without weight sharing (which is
explained next), each of the N neurons would have k k d + 1parameters, while the
convolutional layer would have N(k k d + 1) parameters in total. Notably, one parameter
is added due to the bias b of each neuron.
Weight sharing:
Since the same feature can appear at multiple locations, the concept of weight sharing
was proposed. This makes it unnecessary to learn the same feature extractor multiple times
and reduces the parameters significantly. Weight sharing implies that all neurons belonging to
the same slice v have the same filter K . Therefore, the depth dl controls how many filters can
be learned. This reduces the total parameters of the convolutional layer by w h ; hence, the
layer only has d(k k d + 1) parameters. In Figure 4.3, we provide a simple example of a
× ×
convolutional layer with stride s = 2 and kernel size k = 2, k ∈ K . To calculate the
final results, we use the cross-correlation in Equation 3.1 and add the bias b. For example, in
the top row of Figure 4.3, we calculate the result for the first cell as follows:
(eq. 3.21)
First, the filter Kl (size:2 ×2) is applied to the top left area of the l-1-th layer (i.e., the light
red area). Thereafter, the bias 𝑏 is added and the result is the top left pixel of the l-th layer
(i.e., the light red pixel). Then, the filter is shifted by the stride sl to the right and the same
calculation is performed again. This calculation is shown as the light green area and pixels.
The local receptive field must be fully connected to the input. Thus, the size of feature
Map 𝐹 can be calculated by:
(eq. 3.21)
(eq. 3.21)
This would always reduce the size of the input tensor by at least 𝑘 + 1. Therefore, padding
was introduced. Padding artificially increases the size of the l-1-th layer by adding a border
around the input tensor. The size of the border is defined by 𝑝 ∈ 𝑁 and the added border
typically contains only zeros. Hence, padding is also known as zero-padding. In Figure 3.4,
we illustrate zero-padding with padding 𝑝 = 1 and stride 𝑠 = 2for an example matrix. The
width and height are then calculated as follows:
(eq.3.24)
DEPT OF ECE, YSREC OF YVU, PDTR 37
DEEP LEARNING BASED DETECTION OF DEFORESTATION IN SATELLITE IMAGES
Fig 3.6: Example of a valid cross-correlation calculation with stride sl = 2 and without
zero-padding. Only the first two steps are shown.
Fig 3.8: Illustration of a pooling layer example. The input layer (size: 4× 4× 𝟏) is max-
pooled with filter size 𝒌𝒍 = 𝟐 and stride 𝒔𝒍 = 𝟐 into an output layer of size 2×2×1.
Pooling layers help a model become invariant for small translations of the input;
however, the spatial meaning of a pixel is lost [Goodfellow et al., 2016]. In this context,
invariant means that most output values of the pooling layer do not change if the input is
shifted (i.e., translated) by a small amount. In the past, pooling layers were integrated into
neural networks many times because they are an efficient way to reduce the total parameters.
This acts as a regularization method and can counter overfitting on small datasets [Krizhevsky
et al., 2012]. Due to increased computing power and data availability, Springenberg etc al.
[2014] suggests that pooling layers should be replaced by convolutional layers or omitted. For
example, the convolutional neural networks in our experiments only contain two or three
pooling layers.
(eq. 3.25)
Where 𝜇 and 𝜎 are the mean and standard deviation for each column, respectively. Mean and
variance are computed over the mini-batch by,
(eq. 3.26)
(eq.27)
A simple normalization can reduce the representation power of a neural network. For example,
a normalized input to a sigmoid nonlinearity would constrain the function to the linear area.
Therefore, two additional parameters are used to apply a linear transformation:
(eq. 3.28)
where 𝛾 and 𝛽 are parameters of the neural network that are optimized during gradient
descent. This allows the neural network to restore the original activation by driving 𝛾 to 𝜎
and 𝛽 to 𝜇 .
Fig 3.9: Illustration of a residual connection, which is the shortcut from x to the sum
F(x) + x (i.e., the identity connection).
He et al. [2015a] concluded that the optimizer often faces difficulties in finding a
favorable solution with a small error for deep neural networks. As a result, He et al. [2015a]
introduced residual connections to ease the optimization process for very deep neural
networks. Figure 3.6 illustrates the basic concept of a residual connection. A residual
connection is often implemented in deep neural networks by adding connections that act as a
shortcut over one or more stacked layers and forward the identity x to the output of the stacked
layers. Let H(x) be the desired mapping. Instead of driving F(x) to H(x), we can reformulate
the problem so that F(x) := H(x)- x fits the residual mapping. Thus, the desired mapping H(x)
is F(x) + x. This is realized by the shortcut connection (as seen in Figure 4.6 (a)) and is
motivated by the fact that it might be more difficult for deeper layers to learn an identity
mapping than to drive F(x) to zero [He et al., 2016].
A bottleneck architecture was also proposed to reduce computation complexity in
terms of floating-point operations (FLOPs) since complexity does not scale well by adding
more layers to a neural network. For example, training a 200-layer ResNet with bottleneck
architecture on ImageNet takes approximately three weeks on eight graphics processing units
(GPUs) and would not otherwise be possible [He et al., 2016]. In a bottleneck architecture, a
block of two convolutional layers is replaced with three convolutional layers. While this may
seem counterintuitive at first due to the additional convolutional layer, it has a major impact
on computational complexity. The convolutional layers perform the following three steps.
Figure 3.10: Comparison of the standard residual connection design with the
bottleneck design.
The bottleneck has a four times greater input dimension when compared to the standard design.
However, the time complexity is the same for both designs. First, a convolutional layer with a
filter size of 1 × 1 × 𝑑 is employed to reduce the depth dimension of the input [Lin et al.,
2013]. As explained in Section 4.2, the convolutional layer can reduce the depth dimension
× ×
𝑑 of the input map 𝐹 ∈ℝ to 𝑑 by having only dl filters. This is illustrated
in Figure 4.7 (b), where the input map with 𝑑 = 256 is reduced to 𝑑 = 64. Secondly, the
time-consuming 3 × 3 × 𝑑 convolution is only calculated on the reduced dimensions dl .
Finally, the last convolutional layer restores the depth dimension d by also performing again
a1×1×d convolution. The depth dimension d is restored via the same method used to
reduce the depth dimension in the first layer; however, the number of filters is now greater
than the input depth.
For the example, in Figure 3.7, the number of parameters are 73.728 and 69.632 for
the old and bottleneck design, respectively. While both have similar complexity in terms of
FLOPs, the bottleneck design calculates with an input that has a four times greater depth
dimension.
CHAPTER 4
PROPOSED METHODOLOGY
4.1 Introduction
Deforestation whether legal or not should be monitored by authorities and it could
be difficult to grasp all possible forest cuttings using just the human eye, furthermore the
process of counting the area of deforestation by hand from unfiltered satellite imagery is
another problem for a human. Regions of illegal deforestation might become unplanted for a
long time and reduce the amount of usable tree. Attempts to solve aforesaid problems were
attempted in the past [1], [2]. Image thresholding [3] and morphological image transformations
[4] were used [1] which were fine for the regions of deforestation visible and understandable
for a human eye. For example, the cuttings which were created a few days ago and left the
terrain without any vegetation, but such cuttings are just a small percent of all of them. Most
of the areas of interest have some amount of vegetation and probably even some amount of
newly planted trees, so these areas could not be determined by such methods. Also, the satellite
imagery was taken in ultraviolet or infrared spectra which have more information about the
amount of vegetation, and tasks of deforestation were solved using the Normalized Difference
Vegetation Index (NDVI) [5], [6].
This is an acceptable solution, but usage of open-source satellite imagery with low
resolution is not an option in our case, because the area of the forestry sections could be much
lower than the resolution of this data. Therefore, the dataset was created in the visible range
of electromagnetic spectrum. The dataset consists of 322 images, images shape is 512x512
pixels and the dataset is saved in tfrecord format [7]. Tfrecord format was used to increase
compatibility with Google Cloud Services. There are, however, a number of obstacles that
need to overcome, for example the model predictions are more accurate in the regions which
were the most numerous in the dataset distribution. The fact that the dataset for model training
was created on the data from Ukrainian forestries means that most of the forest is located on
the steppe areas and the terrain of the area was not an important factor for model training. But
some areas with special conditions, like forests located on the hill of the mountains, could be
completely misclassified due to the shadow of the mountain and predictions are more
dependent on the time of the day when satellite images were taken. Also, areas around the
forest boundary are the areas of uncertainty for the model.
The aforesaid problems could be fixed by a much larger dataset size in comparison to
the current size. The main problems for such dataset enlargement are the consistency of
labelling rules among all of the images, to prevent ambiguous areas. For example, some areas
could look like light deforestation, but some labellers could mark these areas as forest, and the
other one as deforestation. These images will make model learning more complicated. The
research was done with the model-centric [8] view which declares that the results could
become better with more sophisticated model architecture or model’s hyperparameters. After
satisfactory results with our created U-Net model [9], the dataset was recreated with the data-
centric view [8], to further increase the accuracy. The data-centric view declares that the model
could not return good results without good data, also known as “garbage in, garbage out”.
Therefore, the dataset was recreated with more precise segmentation.
4.2 ResNet
The working of neural network is performed by considering a picture and the weights
are been generated depending on the nature of image complexity and dividing the weights
from each consecutive values. The CNN network the ease of doing pre-processing for the
given input is better when compared to other deep learning algorithms. The training of
classifiers in CNN utilizes a simple technique with fundamental capabilities. This will help to
identifies the similar features of the object which is been targeted. The structure of CNN is
composed of human brain in which the neuron’s structure is built mainly the visual cortex.
The response of every neuron is important in a particular section of visual area which is called
as field of receptive.
One of the models created is the deep residual network, or ResNet. This design was
created to overcome difficulties in the convolution network model since the time and the
number of layers while performing network training is high. The connection skipping or
creating shortcuts is the operation of ResNet and is widely used in such applications. The
ResNets model has a benefit over other architectural approaches in that its efficiency does not
degrade as the design becomes deeper. Furthermore, the computational level of complexity is
low and the training capability of the network is been drastically improved. One of the
advantages of proposed model is its level skipping functionality, where it can skip two to three
level that effects the ReLU and batch normalization. This paper uses residual learning to apply
to several levels of layers.
In ResNet the residual block is given as follows
𝑦 = 𝐹(𝑥, 𝑊 + 𝑥) (eq. 4.1)
Block of convolutions
The utilization of this block refers to mismatch of dimensions for input and output
activations and the layer i.e., CONV2D have shortcut paths and it is different from
identification block.
The input and output dimension need to be same to obtain the residual block using the
ResNet. Furthermore, each ResNet block is made up of two or three number of layers in which
ResNet-18 is having two layers and ResNet-34 is having two layers and ResNet-50 and 101
are termed to be three layered. The two layers present in the starting of ResNet utilizes 7×7
convolution operations with a size of 3×3 of max-pooling and finally 277 number of strides.
In the suggested work, Network-18 and Network-20 is investigated. The input image
considered for performing the process is resized into a 224×224 grid. ResNet weights are
initialized utilizing Stochastic Gradient Descent (SGD) and typical momentum settings. The
proposed network structure is shown in table1.
Table 4.1. ResNet description
atmospheric correction. The dataset spans over diverse types of forestries with different terrain
what is useful for more solid model training. Table 1 contains the overall amount of segmented
pixels per class. 40 pixels were unlabeled during the dataset creation. The distribution of
classes in the dataset is imbalanced and this became one of the main training problems.
Dataset was created by parsing areas of Ukrainian forestries using PyAutoGui and
Google Earth Pro. Images contain not only areas with forest or deforestation, but areas with
roads, villages, rivers, and ponds. To make our model more robust and predict areas of
deforestation more accurately. The dataset contains masks with three classes: “Forest”,
“Deforestation” and “Other”. Finally, the dataset with satellite imagery was created for the
task considered here and uploaded at GitHub repository1 where it is available with the
correspondent code for distributed training on TPU.
Dataset benchmarking
After the creation of the baseline U-Net model [9] for initial predictions, we have
segmented the dataset one more time from scratch to create a more accurate one. The second
version of the dataset was much better in comparison with the first one, but the problem with
both of them is the poor ability to determine the accuracy of the model using such metrics as
F1 Score and Intersection over Union (IoU) because the number of minor areas of
deforestation and little trees was so huge that it was too work consuming to mark absolutely
all of them. Also, no help from subject matter experts was not involved in process of dataset
segmentation, so there might be incorrectly labelled areas of deforestation. For example, can
we segment the area as deforestation if it contains a certain density of trees and trees of a
certain age? This question is hard to answer using satellite imagery.
Fig 4.2. The original satellite imagery (left), the image from the initial dataset (centre),
the mask from the final dataset (right).
Example of an image and mask from different versions of the training dataset in Fig. 4.2. The
quality of segmentation on the first mask is lower than on the second one, but the results of
model predictions from the initial dataset were still accurate enough.
4.3.2. Loss Function
Usage of Categorical Cross entropy [11] loss function gave poor results due to treating
all classes equally, but in our case prediction of the “Deforestation” class is the most important.
Dice loss and Tversky loss are common choices in image segmentation tasks. Dice loss is a
widely used loss function in computer vision. Tversky loss can also be seen as a generalization
of the Dices coefficient. It adds weight to false positives and false negatives with the help of
coefficients. The idea to combine these loss functions to merge their strengths was taken into
account. Therefore loss function similar to the hybrid loss function in Anatomy Net [12] was
created and modified to treat the “Deforestation” class as the most valuable by multiplying its
contribution to the total loss by a value which would be proportioned to the number of classes
the model should predict. Using available data about loss functions for image segmentation
[5,6] different combination were checked. The next loss functions was tested: Focal Tversky
Loss [13], Dice loss [14], [15], Focal Loss [16], [17], etc. Tversky loss function and Dice loss
function were proved to be the best solution for the current problem [18], so they were
combined with the factor of lambda for control of the Dice subloss. Focal Tversky Loss not as
good as Tversky Loss with manually weighting of class subclasses. Despite on being a good
idea to use hybrid loss [12] instead of simple Categorical Crossentropy, the results could be
improved much further with a proper new loss function, which will use borders of segmented
classes instead of areas [19].
𝑇𝑃 (𝑐) = ∑ 𝑝 (𝑐)𝑔 (𝑐) (eq. 4.2)
classes, C=3 in our case, 𝛌 is the trade-off between dice loss𝐿 (𝑐) and loss 𝐿 , 𝛌 and 𝛽
are the trade-offs of penalties for false negatives and false positives which are set to 0.5 in our
case, wc is the weight for class c, in our case weights for classes “Forest“, “Deforestation“ and
“Other“ are equal to 0.4, 2.2 and 0.4 respectively. A high value for the Deforestation class is
important to overcome the dataset imbalance. Because learning the correct representation of
the “Forest” class is the easiest to do.
4.3.3. Model
The architecture for the model was chosen to be standard UNet and Resnet architecture with
the next amount of filters in encoder f32, 64, 128, 256, 512, 1024g, bottleneck with 2048 filters
and the decoder part with the same filters as the encoder, starting with 1024, ending with 32.
The total amount of trainable parameters is 124,424,995 and equals 475 megabytes of disk
space. The RMS-prop optimizer [20] was used with a learning rate equals 1e-6, with other
parameters set to their default values. This is the optimal learning rate for this problem, which
was proved experimentally, and it helps the model to learn the correct representations of the
“Deforestation” class more accurately. Default learning rate yielded in constant “overshoot”
in weights updates. To speed up training for the model with more than 100 million parameters
the distributed TPU (Tensor Processing Unit) [21], [22] strategy was used. Recently, the
efficiency of TPU-based training and inference of various DNNs was demonstrated by us on
various applications from classification problems [23] (including medical applications [24])
to gesture and pose recognition with the detailed scaling analysis of GPU and TPU
performance [25].
CHAPTER 5
SOFTWARE DESCRIPTION
5.1 Introduction
MATLAB is a high-performance language for technical computing. It integrates
computation, visualization, and programming in an easy-to-use environment where problems
and solutions are expressed in familiar mathematical notation. MATLAB stands for matrix
laboratory, and was written originally to provide easy access to matrix software developed by
LINPACK (linear system package) and EISPACK (Eigen system package) projects.
MATLAB is therefore built on a foundation of sophisticated matrix software in which the
basic element is array that does not require pre dimensioning which to solve many technical
computing problems, especially those with matrix and vector formulations, in a fraction of
time.
MATLAB features a family of applications specific solutions called toolboxes. Very
important to most users of MATLAB, toolboxes allow learning and applying specialized
technology. These are comprehensive collections of MATLAB functions (M-files) that extend
the MATLAB environment to solve particular classes of problems. Areas in which toolboxes
are available include signal processing, control system, neural networks, fuzzy logic, wavelets,
simulation and many others.
Typical uses of MATLAB include: Math and computation, Algorithm development,
Data acquisition, Modeling, simulation, prototyping, Data analysis, exploration, visualization,
Scientific and engineering graphics, Application development, including graphical user
interface building.
5.2 Basic Building Blocks of MATLAB
The basic building block of MATLAB is matrix. The fundamental data type is the
array. Vectors, scalars, real matrices and complex matrix are handled as specific class of this
basic data type. The built in functions are optimized for vector operations. No dimension
statements are required for vectors or arrays.
5.2.1 MATLAB Window
The MATLAB works based on five windows: Command window, Workspace
window, Current directory window, Command history window, Editor Window, Graphics
window and Online-help window.
a. Command window
The command window is where the user types MATLAB commands and expressions
at the prompt (>>) and where the output of those commands is displayed. It is opened
when the application program is launched. All commands including user-written
programs are typed in this window at MATLAB prompt for execution.
b. Work space window
MATLAB defines the workspace as the set of variables that the user creates in a work
session. The workspace browser shows these variables and some information about them.
Double clicking on a variable in the workspace browser launches the Array Editor, which can
be used to obtain information.
c. Current directory window
The current Directory tab shows the contents of the current directory, whose path is
shown in the current directory window. For example, in the windows operating system the
path might be as follows: C:\MATLAB\Work, indicating that directory “work” is a
subdirectory of the main directory “MATLAB”; which is installed in drive C. Clicking on the
arrow in the current directory window shows a list of recently used paths. MATLAB uses a
search path to find M-files and other MATLAB related files..
d. Command history window
The Command History Window contains a record of the commands a user has entered
in the command window, including both current and previous MATLAB sessions. Previously
entered MATLAB commands can be selected and re-executed from the command history
window by right clicking on a command or sequence of commands.
e. Editor window
The MATLAB editor is both a text editor specialized for creating M-files and a
graphical MATLAB debugger. The editor can appear in a window by itself, or it can be a sub
window in the desktop. In this window one can write, edit, create and save programs in files
called M-files
f. Graphics or figure window:
The output of all graphic commands typed in the command window is seen in this
window.
g. Online help window: MATLAB provides online help for all it’s built in functions and
programming language constructs. The principal way to get help online is to use the MATLAB
help browser, opened as a separate window either by clicking on the question mark symbol
(?) on the desktop toolbar, or by typing help browser at the prompt in the command window.
The help Browser is a web browser integrated into the MATLAB desktop that displays a
Hypertext Markup Language (HTML) document. The Help Browser consists of two panes,
the help navigator pane, used to find information, and the display pane, used to view the
information. Self-explanatory tabs other than navigator pane are used to perform a search.
5.3 MATLAB Files
MATLAB has three types of files for storing information. They are: M-files and
MAT-files.
5.3.1 M-Files
These are standard ASCII text file with ‘m’ extension to the file name and creating
own matrices using M-files, which are text files containing MATLAB code. MATLAB editor
or another text editor is used to create a file containing the same statements which are typed
at the MATLAB command line and save the file under a name that ends in .m. There are two
types of M-files:
1. Script Files
It is an M-file with a set of MATLAB commands in it and is executed by typing name of file
on the command line. These files work on global variables currently present in that
environment.
2. Function Files
A function file is also an M-file except that the variables in a function file are all local.
This type of files begins with a function definition line.
5.3.2 MAT-Files
These are binary data files with .mat extension to the file that are created by MATLAB
when the data is saved. The data written in a special format that only MATLAB can read.
These are located into MATLAB with ‘load’ command
5.4 MATLAB System
The MATLAB system consists of five main parts:
5.4.1 Development Environment
This is the set of tools and facilities that help you use MATLAB functions and files.
Many of these tools are graphical user interfaces. It includes the MATLAB desktop and
Command Window, a command history, an editor and debugger, and browsers for viewing
help, the workspace, files, and the search path.
find :- find indices of nonzero elements e.g.:d =find(x>100) returns the indices
of the vector x that are greater than 100
break :- terminate execution of m-file or WHILE or FOR loop
load :- loads contents of matlab.mat into current workspace
save filename x y z :- saves the matrices x, y and z into the file titled filename.mat
save filename x y z /ascii :- save the matrices x, y and z into the file titled filename.dat
load filename :- loads the contents of filename into current workspace; the file can
be a binary (.mat) file
load filename.dat:- loads the contents of filename.dat into the variable filename
xlabel(‘ ’) :- Allows you to label x-axis
ylabel(‘ ‘) :- Allows you to label y-axis
title(‘ ‘) :- Allows you to give title for plot
subplot() :- Allows you to create multiple plots in the same window
5.6 Some Basic Plot Commands
Kinds of plots:
plot(x,y) :- creates a Cartesian plot of the vectors x & y
plot(y) :- creates a plot of y vs. the numerical values of the elements in the y-
vector
semilogx(x,y) :- plots log(x) vs y
semilogy(x,y) :- plots x vs log(y)
loglog(x,y) :- plots log(x) vs log(y)
polar(theta,r) :- creates a polar plot of the vectors r & theta where theta is in radians
bar(x) :- creates a bar graph of the vector x. (Note also the command stairs(x))
bar(x, y) :- creates a bar-graph of the elements of the vector y, locating the bars
according to the vector elements of 'x'
Plot description:
grid :- creates a grid on the graphics plot
title('text') :- places a title at top of graphics plot
xlabel('text') :- writes 'text' beneath the x-axis of a plot
ylabel('text') :- writes 'text' beside the y-axis of a plot
text(x,y,'text') :- writes 'text' at the location (x,y)
text(x,y,'text','sc') :-writes 'text' at point x,y assuming lower left corner is (0,0) and
upper right corner is (1,1)
axis([xmin xmax ymin ymax]) :- sets scaling for the x- and y-axes on the current plot
5.7 MATLAB Working Environment
5.7.1 MATLAB Desktop
Matlab Desktop is the main Matlab application window. The desktop contains five sub
windows, the command window, the workspace browser, the current directory window, the
command history window, and one or more figure windows, which are shown only when the
user displays a graphic. The command window is where the user types MATLAB commands
and expressions at the prompt (>>) and where the output of those commands is displayed.
MATLAB defines the workspace as the set of variables that the user creates in a work session.
The workspace browser shows these variables and some information about them.
Double clicking on a variable in the workspace browser launches the Array Editor, which can
be used to obtain information and income instances edit certain properties of the variable.
The current Directory tab above the workspace tab shows the contents of the current
directory, whose path is shown in the current directory window. For example, in the windows
operating system the path might be as follows: C:\MATLAB\Work, indicating that directory
“work” is a subdirectory of the main directory “MATLAB”; WHICH IS INSTALLED IN
DRIVE C. clicking on the arrow in the current directory window shows a list of recently used
paths. Clicking on the button to the right of the window allows the user to change the current
directory.
MATLAB uses a search path to find M-files and other MATLAB related files, which
are organize in directories in the computer file system. Any file run in MATLAB must reside
in the current directory or in a directory that is on search path. By default, the files supplied
with MATLAB and math works toolboxes are included in the search path. The easiest way to
see which directories are soon the search path, or to add or modify a search path, is to select
set path from the File menu the desktop, and then use the set path dialog box. It is good
practice to add any commonly used directories to the search path to avoid repeatedly having
the change the current directory.
The Command History Window contains a record of the commands a user has entered
in the command window, including both current and previous MATLAB sessions. Previously
entered MATLAB commands can be selected and re-executed from the command history
window by right clicking on a command or sequence of commands. This action launches a
menu from which to select various options in addition to executing the commands. This is
useful to select various options in addition to executing the commands. This is a useful feature
when experimenting with various commands in a work session.
5.8.2 Using the MATLAB Editor to Create M-Files
The MATLAB editor is both a text editor specialized for creating M-files and a
graphical MATLAB debugger. The editor can appear in a window by itself, or it can be a sub
window in the desktop. M-files are denoted by the extension .m, as in pixelup.m.
The MATLAB editor window has numerous pull-down menus for tasks such as
saving, viewing, and debugging files. Because it performs some simple checks and also uses
color to differentiate between various elements of code, this text editor is recommended as the
tool of choice for writing and editing M-functions.
To open the editor, type edit at the prompt opens the M-file filename.m in an editor
window, ready for editing. As noted earlier, the file must be in the current directory, or in a
directory in the search path.
5.8.3 Getting Help
The principal way to get help online is to use the MATLAB help browser, opened as a
separate window either by clicking on the question mark symbol (?) on the desktop toolbar,
or by typing help browser at the prompt in the command window. The help Browser is a web
browser integrated into the MATLAB desktop that displays a Hypertext Markup
Language(HTML) documents. The Help Browser consists of two panes, the help navigator
pane, used to find information, and the display pane, used to view the information. Self-
explanatory tabs other than navigator pane are used to perform a search.
CHAPTER 6
RESULTS
Output:
3×1 cell array {'OTHERS’} {'DEFORESTATION'} {'FOREST'}
Evaluating semantic segmentation results using UNET
Selected metrics: global accuracy, class accuracy, IoU, weighted IoU, BF score.
Selected metrics: global accuracy, class accuracy, IoU, weighted IoU, BF score.
CHAPTER 7
CONCLUSION
The results obtained allowed us to conclude that the problem of automatic
monitoring the deforestation process for efficient prevention of illegal deforestation can be
efficiently resolved by the method proposed. Despite the limited number of satellite images in
the considered dataset, the proposed image segmentation model on the basis of U-Net and
Resnet family achieved the reasonable results with the strictly defined segmentation metrics
with mean and standard deviation values measured by k-fold cross validation and numerous
runs for different random seeds. The dataset with satellite imagery and segmented masks was
uploaded at GitHub repository and could be increased by size and variety of data to check the
correspondent influence. It should be emphasized that these training/validation methods and
segmentation results obtained can be used in the more general context (and they are actually
used for the medical applications mentioned above), but the more extended research will be
necessary, especially for deployment of the U-Net DNNs and Resnet DNN on Edge
Computing TPU-based devices with the limited computational resources for the
aforementioned applications.
REFERENCES
[1] A. K. Ludeke, R. C. Maggio, and L. M. Reid, “An analysis of anthropogenic deforestation
using logistic regression and GIS,” Journal of Environmental Management, vol. 31, no. 3, pp.
247–259, 1990.
[2] J. R. Makana and S. C. Thomas, “Impacts of selective logging and agricultural clearing on
forest structure, floristic composition and diversity, and timber tree regeneration in the Ituri
Forest, Democratic Republic of Congo,” in Forest Diversity and Management, pp. 315–337,
Springer, 2006.
[3] L. Miles and V. Kapos, “Reducing greenhouse gas emissions from deforestation and forest
degradation: global land-use implications,” Science, vol. 320, no. 5882, pp. 1454–1455, 2008.
[4] J. Phelps, E. L. Webb, and A. Agrawal, “Does REDD+ threaten to recentralize forest
governance?” Science, vol. 328, no. 5976, pp. 312–313, 2010.
[5] M. R. W. Rands, W. M. Adams, L. Bennun et al., “Biodiversity conservation: Challenges
beyond 2010,” Science, vol. 329, no. 5997, pp. 1298–1303, 2010.
[6] E. H. Baur, R. B.McNab, L. E.Williams, V. H. Ramos, J. Radachowsky, and M. R.
Guariguata, “Multiple forest use through commercial sport hunting: Lessons from a
community-based model fromthe Pet´en, Guatemala,” Forest Ecology and Management, vol.
268, pp. 112–120, 2012.
[7] P.Cronkleton, M. R.Guariguata, and M. A. Albornoz, “Multiple use forestry planning:
Timber and Brazil nut management in the community forests of Northern Bolivia,” Forest
Ecology and Management, vol. 268, pp. 49–56, 2012.
[8] M. S. Mon, N. Mizoue, N. Z. Htun, T. Kajisa, and S. Yoshida, “Factors affecting
deforestation and forest degradation in selectively logged production forest: a case study
inMyanmar,” Forest Ecology and Management, vol. 267, pp. 190–198, 2012.
[9] L. G. Z. Xinnian and W. Yilong, “Design models of the single span cableway on the
accurate catenary method,” Journal of Fujian College of Forestry, 1999.
[10] Z. Xinnian, Z. Zhengxiong, andW. Zhilong, “Progress in forest ecological logging,”
Journal of Fujian College of Forestry, vol. 27, p. 6, 2007.
[11] Z. Chuanfang, L. Minrong, Z. Chunxia, and Z. Huiru, Annual Report on Competitiveness
of China’s Provincial Forestry No.1(2004∼2006), Social Sciences Academic Press, Beijing,
China, 2010.
[12] F. Huirong, Z. Xinnian, L. Minhui et al., “Three benefits comparison on skidding methods
of light-duty cableway and road-cutting,” Scientia Silvae Sinicae, vol. 48, p. 6, 2012.
APPENDIX
SOURCE CODE
Clc; close all; warning off all
imgDir=('ImageDatastore\*');
imds = imageDatastore(imgDir);
pixDir_t = ('PixelLabelDatastore 2d\*');
classes = ["OTHERS" "DEFORESTATION" "FOREST"];
pixelLabelID = [47 80 126];
pxds_t = pixelLabelDatastore(pixDir_t,classes,pixelLabelID);
pixDir = ('PixelLabelDatastore 2d 1st\*');
classes = ["OTHERS" "DEFORESTATION" "FOREST"];
pixelLabelID = [47 80 126];
pxds = pixelLabelDatastore(pixDir,classes,pixelLabelID);
image=98;
imageSize = [512 512 3];
numClasses = 3;
unet_net = unetLayers(imageSize, numClasses);
options = trainingOptions('sgdm', ...
'InitialLearnRate',1e-3, ...
'MaxEpochs',5, ...
'VerboseFrequency',10);
ds = combine(imds,pxds);
net = resnet50();
net.Layers; % net = trainNetwork(ds,lgraph,options)
imds1 = imageDatastore('database', 'LabelSource', 'foldernames', 'IncludeSubfolders',true);
% net = trainNetwork(ds,lgraph,options)
[trainingSet, testSet] = splitEachLabel(imds1, 0.6, 'randomize');
imageSize = net.Layers(1).InputSize;
augmentedTrainingSet = augmentedImageDatastore(imageSize, trainingSet,
'ColorPreprocessing', 'gray2rgb');
augmentedTestSet = augmentedImageDatastore(imageSize, testSet, 'ColorPreprocessing',
'gray2rgb');
featureLayer = 'fc1000';
% Specify the network image size. This is typically the same as the traing image sizes.
imageSize = [512 512 3];
% Specify the number of classes.
numClasses = numel(classes);
imgDir=('ImageDatastore\*');
imds = imageDatastore(imgDir);
pixDir = ('PixelLabelDatastore 2d 3rd\*');
classNames = ["OTHERS" "DEFORESTATION" "FOREST"];
pixelLabelID = [47 80 126];
pxds2 = pixelLabelDatastore(pixDir,classNames,pixelLabelID);
I = readimage(imds,image);
C2 = readimage(pxds2,image);
pixDir = ('PixelLabelDatastore 2d 2nd\*');
classNames = ["OTHERS" "DEFORESTATION" "FOREST"];
pixelLabelID = [47 80 126];
pxds1 = pixelLabelDatastore(pixDir,classNames,pixelLabelID);
I = readimage(imds,image);
C1 = readimage(pxds1,image);
pixDir = ('PixelLabelDatastore 2d 1st\*');
classNames = ["OTHERS" "DEFORESTATION" "FOREST"];
pixelLabelID = [47 80 126];
pxds = pixelLabelDatastore(pixDir,classNames,pixelLabelID);
I = readimage(imds,image);
C = readimage(pxds,image);
categories(C)
B = labeloverlay(I,C);
B1 = labeloverlay(I,C1);
labels = trainingSet.Labels;
mdl_knn = fitcknn(trainingFeatures',labels);
predictedLabels_knn = predict(mdl_knn, test_features');
CVMdl = crossval(mdl_knn);
L = kfoldLoss(CVMdl);
Acc = (1-L)*100;
imgDir=('ImageDatastore\*');
imds = imageDatastore(imgDir);
pixDir = ('PixelLabelDatastore 2d\*');
classes = ["OTHERS" "DEFORESTATION" "FOREST"];
pixelLabelID = [47 80 126];
pxds = pixelLabelDatastore(pixDir,classes,pixelLabelID);
total_pixels = countEachLabel(pxds);
frequency = total_pixels.PixelCount/sum(total_pixels.PixelCount);
metrics2 = evaluateSemanticSegmentation(pxds2,pxds_t);
training_loss2 = (1-metrics2.DataSetMetrics.GlobalAccuracy);
metrics1 = evaluateSemanticSegmentation(pxds1,pxds_t);
training_loss1 = (1-metrics1.DataSetMetrics.GlobalAccuracy);
figure;imshow(I)
title('Input Image')
figure;imshow(B1)
title('Output Deforestation recognition Image using U-NET with KNN')
figure; imshow(B)
title('Output Deforestation recognition Image using Resnet50 with KNN')