Conference Paper
Conference Paper
[2] Y. Wang, X. Li, and Z. Chen, "Ultra-High-Resolution Researchers can develop new deep learning models
Segmentation with Ultra-Rich Context: A Novel specifically designed to process UHR images. The goal of
Benchmark," Proceedings of the IEEE/CVF Conference these architectures should be to increase computing
on Computer Vision and Pattern Recognition (CVPR), performance while protecting remote locations and
2023, pp. 23621-23630, doi: integrating data points efficiently. UHR will be able to
10.1109/CVPR52729.2023.02262. improve the performance of segmentation tasks.
UHR segmentation presents a difficult but important area of
computer vision.
IV. METHODOLOGY steps and metrics to ensure accuracy and precision of
classifying the buildings from the aerial image datasets.
Model Definition: Finally, the output will be a set images with accurately
segmented buildings with high accuracy. These outputs are
Input Dataset: essential for applications such as urban planning, disaster
management, and environment monitoring. The extracted
The dataset utilized in this research is Inria Aerial Image footprints can be used to analyze urban growth, assess
Dataset. This dataset is taken from Kaggle. Author is Sagar damage after natural disasters, and plan new infrastructure
Rathod. This dataset contains remote sensing images developments.
containing urban areas which has 180 files to train and 180
files to test. First, we need to input the dataset in google collab
to use the dataset.
V. MODEL TRAINING
Data Preprocessing:
In past, in the field of segmentation has made significant
The initial step involves preparing the training and progress with the help of deep learning. These structures are
testing/validation datasets. This is done by resizing the designed to identify individual objects in an image by
images, rescaling the pixel values, and batching the data. assigning a pixel-level face to each instance. One such model
These steps are crucial for efficient processing during is Mask2Former, introduced by Facebook which uses the
training. advantages of Transformers to achieve the best performance
on the segmentation task., explores its architecture,
advantages, limitations, and future directions.
We can perform normalization, augmentation, noise Groundbreaking models such as Mask R-CNN have been
reduction, remove blur. successfully implemented using a two-stage approach:
Model Training:
Region Proposal Network (RPN): This subnetwork
The next step is to utilize the Mask2Former model and train identifies features hidden in the image Object. position it by
the dataset using that model. This involves importing the creating a surrounding bounding box. Originally developed
for linguistic processing (NLP), it has revolutionized many
Mask2Former model and train the images using that model.
computer vision applications. This model is good at capturing
By doing so, the model learns from the data and improves its
relationships between objects in data. Unlike CNNs that rely
performance. on local convolutions, Transformers use the concept of
identity, which allows them to focus on a portion of the input
Feature Extraction and segmentation can be done. data at once. These properties make them ideal for tasks that
require global understanding, such as segmentation.
Performance Validation:
3. Calculation of Assessment Metrics: Following model Figure 4: Resulting hyperparameter comparison graphs
training, the model is assessed on the testing dataset using the which tells how well the model is performing
chosen assessment metric. These metrics provide insights
into various facets of the model's performance, such as its
capability to accurately predict the buildings and other parts
of the data in the dataset, minimize the error of not properly
segmenting the outline of the building, and strike a balanced
trade-off between precision and recall.
Figure 6: F1 Score Curve for our Model
VII. CONCLUSION
Further research could explore ways to improve the Figure 7: Labels Correlogram for our model
generality of the Mask2Former model. This will include
training on different materials including various image
resolutions, environments and building types, where all data
extraction will be set up. Foreground method has been
removed from the top image for the use of the house. Its
ability to detect remote dependencies and its ability to
perform well make it useful for many applications.
Figure 6: Normalized Confusion Matrix of our Model
Figure 9: Confusion Matrix for our model Figure 11: Labels Graphs for our model
VIII. REFERENCES