IPRREPORT
IPRREPORT
ASSIGNMENT
Rashmi T
( USN: 1MS20EC078 )
Contents:
1. IPR Forms
2. Semantic Change Detection Innovation Document.
3. Similarity Check Report
ABSTRACT
Semantic change detection(SCD) plays a crucial role in remote sensing and image analysis
by identifying and monitoring dynamic changes in land cover and land use.SCD is the
improved concept of conventional Change Detection(CD).The former tells 'where' and 'how'
instead of telling only 'where' the changes have occurred.This project focuses on binary
semantic change detection of buildings and roads using bi-temporal satellite images.
Accurate detection of changes in urban infrastructure holds immense significance for
applications such as urban planning, infrastructure management, and navigation systems.
The project begins with a concise introduction to the U-NET convolutional neural network
architecture and its implementation for multiclass semantic segmentation. The UNET model
serves as the foundation for subsequent binary semantic change detection tasks. The main
objective is to develop a robust system that can accurately detect and track changes in
buildings and roads across different timestamps.The LEVIR CD dataset is utilised, with
buildings and roads manually annotated using Label Studio to provide ground truth labels
for training and evaluation. Semantic segmentation is then performed using the U-NET
model, enabling precise identification and segmentation of buildings and roads in the
bi-temporal satellite images. To quantify changes between the two timestamps, the project
incorporates the Gray-Level Co-occurrence Matrix (GLCM). By comparing GLCM features
extracted from the segmented masks, differences in texture and spatial relationships are
measured, providing a reliable indication of change in roads and buildings. We have been
able to obtain an accuracy of 91.41 in multi-class semantic segmentation with a Mean IOU
of 50.72. The accuracy of Binary semantic segmentation which focuses on roads and
buildings is 95.25 with a Mean IOU of 92.21.
The successful implementation of this project has diverse applications in various domains.
The developed system facilitates efficient and automated detection of dynamic changes in
buildings and roads, enhancing urban planning, infrastructure management, and navigation.
CLAIMS
The U-Net is a convolutional neural network architecture designed for image segmentation
tasks. It was first introduced in a research paper titled "U-Net: Convolutional Networks for
Biomedical Image Segmentation" by Ronneberger et al. in 2015. The U-Net architecture
consists of a contracting path and an expanding path. The contracting path is a typical
convolutional network that extracts features from the input image, while the expanding path
uses up sampling and concatenation operations to produce a segmentation mask with the
same dimensions as the original input image. The contracting path is composed of multiple
convolutional and max-pooling layers that gradually reduce the spatial resolution of the
feature maps. This allows the network to capture high-level features and contextual
information from the input image. The expanding path consists of multiple deconvolutional
and concatenation layers that gradually increase the spatial resolution of the feature maps.
The deconvolutional layers perform upsampling to increase the spatial resolution, while the
concatenation layers merge the feature maps from the contracting path with the
corresponding feature maps in the expanding path. This allows the network to recover the
spatial information lost during the contracting path and produce an accurate segmentation
mask
Fig.(1)
The motivation behind the first part of our project is to demonstrate the effectiveness of
multiclass semantic segmentation using the existing UNET architecture on a representative
dataset. By showcasing the capabilities of this approach, we aim to lay the foundation for
the subsequent phases of our project, which involve semantic change detection in buildings
and roads from bi-temporal satellite images. In this initial phase, we focus on utilising the
widely adopted UNET architecture for multiclass semantic segmentation. We select an
existing dataset that encompasses diverse urban scenes and contains annotated ground truth
labels for buildings and roads. By training the UNET model on this dataset, we aim to
showcase its ability to accurately classify and segment different classes within the urban
environment. Through rigorous experimentation and parameter tuning, we optimise the
UNET model's performance for multiclass segmentation. We evaluate the model using
appropriate metrics such as intersection over union (IoU), accuracy, and precision-recall
curves. The results of this demonstration provide valuable insights into the strengths and
limitations of the UNET architecture for the specific task of multiclass semantic
segmentation. By successfully demonstrating the efficacy of multiclass semantic
segmentation using the UNET architecture on an existing dataset, we establish a strong
foundation for the subsequent phases of our project. This demonstration serves as a
precursor to the main focus of our project, which involves extending this methodology to
detect semantic changes in buildings and roads from bi-temporal satellite images.
Fig.(2)
The main focus of our project is to perform binary semantic segmentation using the
multiclass semantic segmentation as our reference model and hence narrow down the scope
to the two target classes and optimise the model to accurately segment buildings and roads
in satellite imagery. The dataset used for this phase is manually annotated with masks
specifically for buildings and roads, enabling focused training and evaluation.
• Binary Semantic Segmentation: Binary semantic segmentation is a type of image analysis
technique that involves partitioning an image into two distinct classes or categories, usually
labelled as "foreground" and "background". The goal of binary semantic segmentation is to
identify and separate the objects or regions of interest from the background in an image. In
binary semantic segmentation, each pixel in the image is assigned a binary value of either 0
or 1, where 0 represents the background and 1 represents the foreground. The segmentation
process involves identifying the boundaries or edges of the foreground objects and
separating them from the background based on certain criteria. In our analysis foreground is
roads and buildings rest all the classes are considered as background.
• Dissimilarity Decoder: A dissimilarity block is a component commonly used in image
processing and computer vision systems to detect changes or differences between two or
more images of the same scene taken at different times. The goal is to identify any
significant changes in the scene, which can be used for a variety of applications such as
surveillance, environmental monitoring, and industrial quality control. The change detection
block works by taking the two semantically segmented images, and detecting the
dissimilarity between them. The resulting output image will highlight the regions where
significant changes have occurred between the two input images. The Euclidean distance
between the 2-feature vector is calculated as:
REQUIREMENT SPECIFICATION
HARDWARE REQUIREMENT
• Intel i7 microprocessor.
• 16GB RAM.
• NVIDIA ® GeForce ® GTX 1650 Ti (4GB).
• 256 GB SSD and 1TB HD.
• Windows 10 OS. Figure
SOFTWARE
REQUIREMENT
• Miniconda 3.
• Conda 23.1.0.
• Python 3.9.16. 15
• CUDA 11.3.1.
• CUDNN 8.2.1.
RESULTS :