S1TBX Landcover Classification With Sentinel-1 GRD
S1TBX Landcover Classification With Sentinel-1 GRD
S1TBX Landcover Classification With Sentinel-1 GRD
Andreas Braun
This tutorial requires basic understanding of radar data and its processing, for example as provided by the
SAR Basics Tutorial or the S1TBX Introduction. Also, the S1TBX Graph Building Tutorial is recommended.
A separate tutorial on Time-series analysis with Sentinel-1 focuses agricultural aspects in particular.
For an introduction into SAR-based classification approaches, the following materials and references are
recommended.
• Caetano, M. (2018): Land cover and land use theory. ESA Land Training 2018 (PDF)
• NASA (2019): SAR for landcover applications (URL)
• NASA (2018): SAR for mapping land cover (URL)
• EO College (2016): Classification (URL), Land cover classification (URL)
• Abdikan et al. (2016): Land cover mapping using Sentinel-1 SAR data (URL)
• Banqué et al. (2015): Polarimetry-based landcover classification with Sentinel-1 data (PDF)
The data used in this tutorial can be downloaded from the Copernicus Open Access Hub at
https://scihub.copernicus.eu/dhus (free registration requires). Search and download the following products:
S1A_IW_GRDH_1SDV_20180220T165225_20180220T165250_020693_023723_FB7D
S1A_IW_GRDH_1SDV_20180726T165231_20180726T165256_022968_027E14_0A09
Instead of entering the product IDs, you can also search for the following criteria:
Please note: Since September 2018, the Copernicus Open Access Hub has transitioned into a long-term
archive which means that products of older acquisition dates have to be requested first. Please find more
information about this here: https://scihub.copernicus.eu/userguide/#LTA_Long_Term_Archive_Access
and here https://scihub.copernicus.eu/userguide/LongTermArchive.
2
Figure 1: Retrieval of suitable images
The data used in this tutorial is a part of a Sentinel-1 IW product located at the border between Germany
and Poland and covers an area of ca. 150 x 110 km (Figure 2, blue line). The subset which is selected
contains the city of Szczecin and the Dąbie Lake (bottom center), surrounded by agricultural areas and
forests, and the Szczecin lagoon and the Baltic Sea in the north.
3
Unlike multi-spectral missions, such as Sentinel-2 or Landsat-8, Sentinel-1 operates at a single wavelength
only (5.6 cm which corresponds to a frequency of 5.4 GHz). However, Sentinel-1 mostly operates in dual-
polarization mode which means that vertical waves are emitted from the sensor and both vertical and
horizontal waves are measured when returning to the sensor, leading to backscatter intensity of VV and
VH. As the proportion of vertically transmitted waves returning to the sensor horizontally is small, the
intensity of the VH band is predominantly lower than the one of the VV band (Figure 9, middle and right).
Still, the feature space (the number of variables which can be used to predict target classes) of S1 data is
limited compared to optical data. Therefore, two images are used in this tutorial which are taken in the same
year, but at different seasons. This allows to describe surfaces based on their temporal characteristics to a
certain degree. For this reason, image 1 was acquired in the month with the least precipitation (February)
and image 2 was acquired at the end of July (most precipitation), as shown in Figure 3.
40 80
Precipitation [mm]
30 60
Temperature [°C]
20 40
10 20
0 0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Pre-processing
Open the products
Use the Open Product button in the top toolbar and browse
for the location of the downloaded data. Select the two
downloaded zip files and press Open Product.
In the Products View you will see the opened products. Each
Sentinel-1 product consists of Metadata, Vector Data, Tie-
Point Grids, Quicklooks and Bands (which contains the actual
raster data, represented by Intensity and Amplitude)
as demonstrated in Figure 4).
The World View or World Map menus are also a good way to
check the coverage of one or more products (indicated by a
small black number). To reduce the effects of looking direction
and angles, both images were acquired from the same track.
Figure 4: Product Explorer
4
Figure 5: Preview of the S1 GRD product
5
Create a subset
If precise orbits are not yet available for your product, restituted orbits can be selected which may not be
as accurate as the Precise orbits but will be better than the predicted orbits available within the product.
Execute the operator for both subsets as generated in the previous step (Figure 7) and select the following
target product names (also make sure to select suitable target directories):
▪ 20180220_Orb.dim
▪ 20180726_Orb.dim
6
Radiometric calibration
Radiometric calibration converts backscatter intensity as received by the sensor to the normalized radar
cross section (Sigma0) as a calibrated measure taking into account the global incidence angle of the image
and other sensor-specific characteristics. This makes radar images of different dates, sensors, or imaging
geometries comparable.
Open the Calibration operator (under Radar > Radiometric) and use each product from the previous step
(20180220_Orb.dim and 20180726_Orb.dim) as a source product. For the target product name,
add _Cal at the end of the name as suggested (Figure 7). All other settings can be left on default. SNAP
uses the image metadata to calibrate the image as described in this documentation. After completion, the
output products are added to the Product Explorer.
As shown in Figure 9, the data ranges between 0 and 1 after calibration to Sigma 0. Thermal Noise Removal
is not applied in this tutorial but should be considered when working with larger subsets or entire images.
Figure 9: Histograms before (left), after (middle and right) after radiometric calibration
7
Coregistration
After both images were radiometrically calibrated in the previous step, the coregistration brings both into
one stack. Even if both images were acquired from the same track (relative orbit), smaller differences in the
incidence angle can cause inaccurate pixel positioning in parts of the image. These differences are
identified and compensated by the Coregistration operator (under Radar > Coregistration) which produces
one output product containing both images from February and July with best possible geometric overlay.
Load the two products (with orbit files applied) to the ProductSet-Reader tab (Figure 10, left) and leave all
settings as predefined. Enter S1_Orb_Cal_Stack as target product name and click Run to start the
coregistration. Depending on the size of the products and the number of GCPs, this process can take a
while.
8
Besides the accuracy of the residuals, it is recommended to visually check the quality of the stack. This can
be done by an RGB representation of the master and slave product which shows if the images are correctly
aligned. Select February VV for red, February VH for green and July VV for blue (Figure 12).
Zoom in to an area with distinct surfaces, for example the border between land and water. The RGB image
should be clear and sharp. The only exception are changes in land cover or scattering mechanisms which
occurred in the time between the first and the second image acquisition. In Figure 13, the rivers, roads and
agricultural fields are sharply displayed in all colors. White pixels indicate high backscatter in all three bands
(urban areas) and black pixels low backscatter values in all three bands (water). Blue pixels indicate a
slightly different shoreline in July compared to February.
Figure 13: RGB image of VV1 (red), VH1 (green) and VV2 (blue)
9
Speckle filtering
As shown in Figure 12, many areas which are expected to have a homogenous backscatter intensity are
characterized by granular patterns. These salt-and-pepper effects are speckle which is inherent in any SAR
image and caused by different constructive and destructive contributions to backscatter intensity of different
scattering mechanisms below the pixel resolution. Adaptive speckle filters were designed to enhance the
image quality by suppressing random variations in the image while maintaining sharp edges.
Open Radar > Speckle Filter > Single Product Speckle Filter and select S1_Orb_Cal_Stack as input
product. In the second tab, select the Lee Sigma filter from the drop-down menu and leave the window
sizes and parameters as predefined. A new filtered product (S1_Orb_Cal_Stack _Spk) is created which
has the same band names as the input, but now speckle has been reduced.
You can compare the image before and after filtering by using the Split Window function from the
toolbar (Figure 14). As always, the different types of filters will produce slightly different results. Also, the
impact of filter sizes and other parameters can strongly affect how the result looks. It is suggested to
explorer and compare some filters and configurations to find the most suitable for the respective application.
As shown in the example, the Lee Sigma filter smoothed larger patches of similar backscatter while
preserving linear features and sharp edges between different surfaces. This is an important prerequisite for
the later land cover classification because speckle introduce unwanted patterns and granular effects in the
classified outputs. Especially for SAR-based land cover classifications a strong filtering is advised.
Figure 14: Sigma0 VV before (left) and after (right) application of a speckle filter
10
Terrain correction
Terrain Correction will geocode the image by correcting SAR geometric distortions using a digital elevation
model (DEM) and producing a map projected product.
Geocoding converts an image from slant range or ground range geometry into a map coordinate system.
Terrain geocoding involves using a Digital Elevation Model (DEM) to correct for inherent geometric
distortions, such as foreshortening, layover and shadow (Figure 15). More information on these effects is
given in the ESA radar course materials.
Open the Range Doppler Terrain Correction operator (under Radar > Geometric > Terrain Correction).
Select the coregistered product as an input in the first tab.
In the Processing Parameters tab, select SRTM 1Sec HGT (AutoDownload) as input DEM.
Please be aware that SRTM is only available between 60° North and 54° South. If your area lies outside
this coverage (https://www2.jpl.nasa.gov/srtm/coverage.html), you can use one of the other DEMs with
AutoDownload option or use an external DEM (just make sure it stored as a GeoTiff and projected in
geographic coordinates / WGS84).
In this tutorial we select WGS84 as Map Projection. It is based on geographic coordinates (latitude and
longitude). For the later use in a geographic information system (GIS) projected coordinate systems, such
as UTM (Automatic) could be selected as well.
If no Source Band is selected, all bands of the input product are geometrically corrected.
Click on Run to start the terrain correction. SNAP now establishes a connection to an external elevation
database to download all SRTM tiles required to fully cover the input dataset.
11
Figure 16: Range Doppler Terrain Correction
12
Conversion to dB scale
As highlighted by the histograms in Figure 9, the values of Sigma0_VV roughly range between 0 and 1,
with average backscatter values of 0.07 (VV) and 0.02 (VH). This means that there are many dark pixels
and only very few bright pixels with large values. This is not ideal in a statistical sense and leads to low
visual contrasts.
To achieve a more normal distribution of values, the log function is applied to the radar image. It translates
the pixel values into a logarithmic scale and yields in higher contrasts, because the bright values are
shifted towards the mean while dark values become stretched over a wider range (Figure 18, right). The
values of calibrated dB data roughly range between -35 to +10 dB.
Right-click on each of the four terrain corrected bands and select Linear to/from dB. Confirm with Yes
to create a virtual band (indicated by the symbol and the _db suffix). These virtual bands are not
physically stored on the hard drive but can still be displayed based on the underlying mathematical
expression.
Figure 18: Sigma0 VV before (left) and after (right) conversion to dB scale
13
Unsupervised classification
An unsupervised classification is a good way to identify and
aggregate pixels with similar features.
As shown in Figure 20, you can use the Color Manipulation tab
to assign colors and even names to these clusters to see how
well they coincide with the general land use and land cover Figure 19: Unsupervised clustering
classes.
Accordingly, the advantage of an unsupervised classification is that it does not require any a-priori
knowledge on the study area or the data and still groups pixels of similar characteristics. The downside
is that the number of classes strongly determines if the result is useful or not. As shown in the example,
water consists of five different classes (3-7) while urban areas are represented by only one class (8). In
case of class overlap, a higher number of clusters should be selected and merged subsequently. On the
other hand, too many classes make it hard to find patterns in the data.
14
Rule-based classification
Although there are automated ways of assigning classes to pixels, especially radar images are suitable for
rule-based classification approaches, because the number of input features is limited (four polarizations).
This can be done in the following order:
1. Definition of classes: In contrast to the unsupervised classification, rule-based classification
requires the a-priori definition of target classes. This means, a list of expected land cover types has
to be defined in advance. In this case, these are: 1=water, 2=urban, 3=forest, 4=agriculture.
2. Inspection of pixel values: Find value ranges which represent thee using the four defined classes
based on the available bands (VV and VH for February and July). You can use the Pixel Information
tab, the Histogram (of defined ROIs), the Profile Plot view (across two different classes) to
understand which values represent the different surface types. You can even use the Pin Manager
to label different surfaces and export a table with the class names and the corresponding values,
for example for the Tree classification module in the free and open data mining software Orange to
systematically search for thresholds in the data to separate the classes.
3. Implement the thresholds in the Mask Manager. In this case, a systematic thresholding is applied
to the four bands in order to separate the four target classes in a hierarchical way. An example is
given in Figure 12
Now we use the Mask Manager to create the expressions shown in Table 1 which represent the
predicted spatial occurrence of the four target classes. Note: A detailed explanation on the use of the Mask
Manager is given in the tutorial “Synergetic use of S1 (SAR) and S2 (optical) data Tutorial”.
Note: The band names depend on the order in the stack and might differ accordingly. Also, the thresholds might be
different if you used another filter technique.
15
Figure 22: Implementation and result of the rule-based classification
To export the masks, for example to work with them in an external software packages such as QGIS, they
must be converted into a single band. This can be done with the Band Maths. Deactivate the option “Virtual”
and activate “Show masks” as shown to build the following statement:
IF water THEN 1 ELSE IF urban THEN 2 ELSE IF agriculture THEN 3 ELSE IF forest THEN 4 ELSE 0
After its creation, confirm with File > Save product. You can now open the created .img file in any GIS.
Please also check the last chapter “
To illustrate the information content of the newly generated stack, a principal component analysis (PCA)
can be performed. It is a technique which identifies redundancies and differences in a stack of bands and
transforms them into new uncorrelated bands (principal components) with have a maximized information
content. Accordingly, the first principal component (PC1) represents the maximum proportion of variance
from the original stack. The remaining variation is expressed by PC2, PC3 and so on. More information can
be found here.
Open the Principal Component Analysis operator (under Raster > Image Analysis). Select the dB image
bands as Source bands (Figure 23). Leave the other settings and start the processing with Run. Note that
a PCA can take a considerable amount of time, especially for large datasets or stacks with many bands.
16
Figure 24 is an RGB image of the first three components and shows how they contain most of the variation
of all images, including differences resulting from the two polarizations (VV and VH) and the backscatter
differences between both acquisition dates (February and July). While urban areas are largely red
(PC1=red, values which differ only little between all four input bands), because they are constantly high in
VV and VH for both dates, water areas can be of various colors because they are different throughout the
four input bands and therefore represented by different principal components (additive mixture of
PC2=green and PC3=blue). All other land use or landcover classes have mixtures of red, green and blue:
For example, croplands are purple because of a strong red component (volume scattering in VV and VH)
and a strong blue component (strong difference between February and July). Generally, a PCA reveals
interesting patterns in the data, but interpreting the result can be difficult.
17
Generation of textures
Image textures are metric derivatives which describe local spatial patterns of a greyscale image in a
quantitative way. They are especially popular for SAR products, because most of them consist of only a
limited number of bands (single or dual polarization). As image classifications based on a one or two-
dimensional feature space are often not bringing the desired accuracies, image textures are a way to
increase the number of input bands (Ulaby et al. 1986). Additionally, they are capable of describing the
degree of speckle in different parts of the image and are therefore also contributing to a better separation
of surface types. Image textures are implemented in SNAP in two ways:
• Right-click on a raster band > Filtered band: Allows to apply edge detection filters, statistical
measures, non-linear and morphological filters and directly create the output as a virtual band.
• Menu > Raster > Image Analysis > Texture > Grey Level Co-occurrence Matrix: Allows to
compute a set of image features which describe contrast, orderliness and local statistics. These
were initially introduced by Haralick et al. (1973). A detailed introduction on the concepts and the
mathematical derivation of the different texture measures is given in Hall-Beyer (2007): GLCM
Texture: A Tutorial.
In this tutorial, GLCM textures are used as additional features for the later supervised classification.
However, as the information content of the four polarizations is redundant to a larger degree and also many
textural features produce similar results, the module is only applied to the VV_dB and VH_dB bands from
July, and only the following textures are used: Homogeneity, Energy, Maximum Probability, Entropy and
GLCM Mean. Since SNAP Version 8 it is possible to determine a No Data value outside the range of
possible outcomes, so -9999 is a legit choice.
18
The map in Figure 26 illustrates the information content of the generated textures by the use of RGB
composites (right-click on the product > Open RGB Image Window) using the bands Homogeneity VV (red),
Energy VH (green) and Entropy VV (blue).
19
Supervised Classification
Very important:
• To make the changes permanent, select each of the input products where bands were renamed,
and select File > Save product.
• Please visually control each of the input bands for correct contents before starting the
classification.
• Any pixel which is defined as no data in the input features will not be classified.
20
Digitization of training areas
Supervised classification requires training data which defines the landcover classes you want to have in
your later result and where they are located. This can be done in different ways:
Examples for the import options (shp and csv) are attached to this document. They can be imported by
selecting the main product (S1_Orb_Cal_Stack_Spk_TC) in the Product Explorer and then selecting from
the Menu > Vector > Import > ESRI Shapefile or Import from CSV.
To manually digitize training areas for the different classes, click on New Vector Data Container .A
new image pops up which asks for a name of the class (Figure 28, left side). We enter “urban” and confirm
with OK. The new vector container will appear in the Layer Manager and also in the folder “Vector
Data” in the corresponding product. Repeat this step for the following classes so that the Layer Manager
will look like displayed in Figure 28 (right side). Note: pins and ground_control_points are automatically set
as vectors, but they can be ignored for this task.
1. urban
2. agriculture
3. forest_broadleaf
4. forest_deciduous
5. grassland
6. water
Once you select the Polygon Drawing Tool and click in the map,
you will be asked which of the vector containers you want to edit or fill
with geometries (Figure 29). This allows to digitize multiple polygons
which belong to the same class (container) and to effectively manage
vectors of multiple classes.
Select a container and proceed with OK. You can then digitize polygons
and finish with a double-click. You can modify or delete existing polygons
using the Selection tool (Figure 29). In this tutorial, a polygon is Figure 29: Class selection
created covering a homogenous crop area.
21
To supply the Random Forest classifier with a sufficient number of training samples (5000 randomly
selected pixels are used for each class per iteration) the result of the training data collection could look like
this:
22
Raise the Number of trees to 25. If you select no Feature bands, all bands in the stack will be used for
the training of the classifier. This is important when you have bands that cannot be used for training (for
example masks or the rule-based classification in case it was not deleted from the stack).
Click on Run to start the training and classification of the product. As there are many polygons and input
bands, this can also take some time.
The Random Forest classifier now extracts all pixels under the training polygons and tries to find thresholds
which allow to separate the different input types as best as possible.
Once the processing is finished, a new product is created containing a band called LabeledClasses,
which is the result of the classification and Confidence band as a side product of the Random Forest
classification. To view the classification, double-click the bands. You will see that the pixels are now
assigned to one of the training classes. You see the legend in the Color Manipulation tab, which also
displays the relative frequency of each class (Figure 33 left).
23
However, the Random Forest classifier does not automatically assign a class to each pixel – some of the
pixels remain transparent. This is for all pixels which could not be classified with a confidence higher than
0.5. This can have various reasons:
1. The pixel has statistical values which do not match any of the training data
2. The pixel was assigned to many different classes during the 25 iterations without a clear minimum
3. The training area representing this pixel was inhomogeneous and not representing the majority its
actual areas.
Accordingly, the quality of the training data plays a significant role for the final result. Check the confidence
band (Figure 33, right side, scaled from red [0] to green [1]) to identify those areas which have low
confidence and try to modify the training data accordingly. Another option is to raise the confidence
threshold in the band properties of LabeledClasses.
Figure 33: Result of the Random Forest classification (left) and confidence (right)
To display classes with low confidence values, open the Band Properties of the LabeledClasses dataset
and change the Valid-Pixel Expression from Confidence >= 0.5 to Confidence >=0.2. Pixels with
low confidence are now no longer masked out. An example is given in Figure 34.
Figure 34: Classification with confidence threshold 0.5 (middle) and 0.2 (right)
24
Validating the training accuracy
If Evaluate classifier was selected in the Random Forest operator (Figure 32), a text file RF_25.txt will
open during the classification process which gives information on how well the rule-set of the classifier was
able to describe the training data based on the input feature bands (Sigma0 dB, PCA and textures). It does
not validate the accuracy of the entire prediction, but only how well the training data was classified based
on the hierarchical thresholding of the Random Forest classifier.
Table 2 presents the different accuracy metrics for the 6 trained classes
While accuracy, precision, correlation and errorRate are relative measures (1 = 100%). The numbers show
that the Random Forest classifier was able to predict 95.4 % (deciduous forest) to 100 % (water) of the
training pixels correctly using the SAR input raster features. For a more detailed interpretation of these
metrics, please see Accuracy and precision and Sensitivity and specificity
The last four metrics refer to the statistical distribution of the training pixels per class (x).
A. True Positive: Number of pixels which are class x and have been assigned to class x.
High values indicate that most of the pixels of this class were reliably assigned.
B. False Positive: Number of pixels which are class x, but have not been assigned to class x.
High values indicate that this class is overestimated in the prediction.
C. False Negative: Number of pixels which are not class x, but have been assigned to class x.
High values indicate that this class is underestimated in the prediction.
D. True Negative: Number of pixels which are not class x and have not been assigned to class x.
High values indicate that only few pixels of this class were missed in the prediction.
reference
+ -
True False
prediction
+
Positive Positive
False True
-
Negative Negative
Lastly, the file reports: Using Testing dataset, % correct predictions = 92.35 which means
that the training accuracy is 92.3 % in total. Accordingly, the Random Forest was not able to predict 7.7 of
the training areas correctly. This should be considered when applying this classifier to non-trained pixels
(next chapter).
25
Validating the prediction accuracy
A high training accuracy, as observed in the previous chapter, does not grant a good result. It simply
indicates how well the classification method (Random Forest) was able to replicate the classes from the
training areas based on the available features (Sigma0 dB, PCA and textures). Accordingly, a low training
accuracy a result from bad configuration of the classifier (not enough trees) an insufficient number of
features or unsuitable training areas. In turn, some classifiers tend to over-fitting, which means that the
training data was predicted to a very high accuracy, but the application to non-trained pixels can still be
low. Therefore, accuracy assessment of the actual result has to be undertaken which is not based on the
training areas, but on independently collected reference areas.
At the moment, SNAP has no automated accuracy assessment, but the number of True Positive, False
Positive, False Negative and True Negative pixels can be easily assessed with the Mask Manager.
In this study, we use landcover information from the Corine Landcover dataset of the year 2018 (CLC
2018) provided by the Copernicus Programme: It is available for large parts of Europe at a spatial resolution
of around 100 meters with 45 classes of land use and landcover and can be freely downloaded under this
address: https://land.copernicus.eu/pan-european/corine-land-cover/clc2018
26
Because of its high number of classes, it cannot
not directly be used as a reference, because many
of the defined classes which exist in the study area
are not part of the classification.
The result is a mask which can be now used in the Figure 36: Re-classification of CLC2018 data
Mask manager (here as glc_urban).
Figure 37: Classification result with imported ‘urban’ reference areas (black) from CLC2018
27
To get the prediction accuracy of the urban class, we need the following numbers:
We use the Mask Manager to identify these pixels in a first step. The Mask Manager allows to define
logic or mathematic expressions to see which pixels fulfill certain conditions. A raster product (or RGB
composite) must be opened in the current view for the tools of the Mask Manager to become active.
A new mask can be created with . A new window (Figure 38, left) opens where a statement can be
created in the Expression field. Make sure to activate Show masks to see the imported reference mask.
Given that the imported mask of urban areas is called clc_urban and the urban class in the classified
raster has the pixel value of 0 (Figure 33) the expression of the True Positive pixels is
Click OK to confirm. A new mask occurs in the Mask Manager (Figure 38, right), you can name it all_pixels
by double-clicking in the first column (Name). If you want to change the expression, double-click Maths to
open the expression editor again.
Once all masks are created, we can use the menu > Raster > Mask > Mask Area to get the number of
pixels of each mask.
A new window opens asking for the mask to be analyzed (Figure 39, left). Select a mask and confirm with
OK. After some seconds of computation, a report is shown listing the number of pixels and the area of the
selected mask (Figure 39, right).
28
Figure 39: Retrieve mask areas
Based on these values, the prediction accuracies of the urban class can be calculated. A visual form of
True Positive, True Negative, False Positive and False Negative is shown in Figure 40. The proportions
of these four values indicate how accurate a defined landcover type was classified.
The overall accuracy determines how much the urban classification is correct in general. 90.94 % seems
a good result at first sight, but as you can see in Figure 40 and the equation, this high accuracy is mostly
caused by the large amount of True Negative pixels (correctly classified non-urban landcover). The actual
urban areas are partly overestimated and underestimated. Therefore, the producer and user accuracies
are calculated as more specific measures per class.
29
True Positive 80
producer accuracy = = = 0.3137 = 𝟑𝟏. 𝟑𝟕 %
True Positive + False Negative 80 + 175
The producer accuracy is a measure for the probability that urban landcover in the study area is classified
correctly. It means that only 31 % of the urban areas were identified by the classifier. The large number of
False Negative pixels leads to a high error of omission, that means many urban areas were missed by
the classifier (pink pixels). Accordingly, these areas were strongly underestimated.
True Positive 80
user accuracy = = = 0.1834 = 𝟏𝟖. 𝟑𝟒 %
True Positive + False Positive 80 + 356
In turn, the user accuracy is a measure for the reliability of the classified pixels. It means that only 18.34
% of all pixels which were classified as urban are correct. The large number of False Positive pixels leads
to a high error of commission, that means that many pixels were also classified as urban while they are
of different landcover in the reference dataset (red pixels). In these regions, urban areas were strongly
over-estimated.
To conclude, this example underlines the necessity for independent validation datasets and shows that
• high training accuracies can still result in bad results (prediction accuracies)
• high overall prediction accuracies can be misleading, because they do not really reflect
overestimation and underestimation of classes which hold a small share of the study area.
Finally, it is also the quality and processing of the reference data which can have an effect on the estimated
accuracies. For example, the aggregation of classes as shown in Figure 36 can contain false assumptions
and therefore have a negative impact on the accuracy.
Please also be aware that the class names and colors assigned by SNAP are only stored in the .dim file,
so these are not part of the raster. Accordingly, you have to give the colors in any software again based on
the pixel coding. For example, in QGIS, this can be done by switching from a “singleband grayscale” to a
“paletted/unique values” symbology (Figure 41). An example is given in Figure 42.
Figure 41: Files inside the data folder of the BEAM DIMAP format (left)
and selection of unique values representation in QGIS (right)
30
Figure 42: Imported img file of the classification before (top) and after (bottom) assignment of colors
31
For more tutorials visit the Sentinel Toolboxes website
http://step.esa.int/main/doc/tutorials/
http://forum.step.esa.int/
32