Phase 1 Report
Phase 1 Report
Phase 1 Report
CERTIFICATE
Certified that the project work entitled “Facial Recognition On Low Resolution Images” carried out
by Ms. Muteeba Shoukat, USN 1CR20CS121, Ms. Moksha Sri S, USN 1CR20CS119, Ms. P Varshika
Prashanth, USN 1CR20CS133, bonafide students of CMR Institute of Technology, in partial
fulfillment for the award of Bachelor of Engineering in Computer Science and Engineering of the
Visveswaraiah Technological University, Belgaum during the year 2023-2024. It is certified that all
corrections/suggestions indicated for Internal Assessment have been incorporated in the Report
deposited in the departmental library.
The project report has been approved as it satisfies the academic requirements in respect of Project
work prescribed for the said Degree.
_____________________________ _____________________________
Signature of Guide Signature of HOD
Prof. Paramita Mitra Dr. Shreekanth M Prabhu
Assistant Professor Professor & HoD
Dept. of CSE, CMRIT Dept. of CSE, CMRIT
ii
DECLARATION
We, the students of 7th semester of Computer Science and Engineering, CMR Institute of
Technology, Bangalore declare that the work entitled " Facial Recognition On Low Resolution
Images” has been successfully completed under the guidance of Prof. Paramita Mitra, Assistant
Professor, Computer Science and Engineering Department, CMR Institute of Technology,
Bangalore. This dissertation work is submitted in partial fulfillment of the requirements for the
award of Degree of Bachelor of Engineering in Computer Science and Engineering during the
academic year 2023 - 2024. Further the matter embodied in the project report has not been
submitted previously by anybody for the award of any degree or diploma to any university.
Place: Bangalore
Date:
Team members:
iii
ABSTRACT
The project aims to address the challenge of enhancing image resolution through the application
of advanced deep learning techniques. Image resolution enhancement is a critical task in various
domains, including medical imaging, satellite imagery, surveillance, and photography.
Traditional methods often suffer from limitations in handling complex patterns and generating
high-quality results. In this project, we propose a novel approach leveraging SR3 for image
resolution enhancement.
iv
ACKNOWLEDGEMENT
We take this opportunity to express my sincere gratitude and respect to CMR Institute
of Technology, Bengaluru for providing me a platform to pursue my studies and carry out my
final year project.
I have a great pleasure in expressing my deep sense of gratitude to Dr. Sanjay Jain,
Principal, CMRIT, Bangalore, for his constant encouragement.
I would like to thank Dr. Shreekanth M Prabhu , HOD, Department of Computer
Science and Engineering, CMRIT, Bangalore, who has been a constant support and
encouragement throughout the course of this project.
I consider it a privilege and honor to express my sincere gratitude to my guide
Prof. Paramita Mitra, Assistant Professor, Department of Computer Science and Engineering,
for the valuable guidance throughout the tenure of this review.
I also extend my thanks to all the faculty of Computer Science and Engineering who
directly or indirectly encouraged me.
Finally, I would like to thank my parents and friends for all their moral support they have
given me during the completion of this work.
v
TABLE OF CONTENTS
Page No.
Certificate ii
Declaration iii
Abstract iv
Acknowledgement v
Table of contents vi-vii
List of Figures viii
List of Tables ix
List of Abbreviations x
1 INTRODUCTION 1-4
1.1 Relevance of the Project 1
1.2 Problem Statement 1
1.3 Objectives 2
1.4 Scope of the project 2
1.5 Software Engineering Methodology 2-3
1.6 Tools and Technologies 3-4
1.7 Chapter Wise Summary 4
vi
2.5 High-Resolution Image Synthesis and Semantic Manipulation with 8-9
Conditional GANs
2.6 Convolutional Sparse Coding for Compressed Sensing CT 9-10
Reconstruction
2.7 A Variational Auto-Encoder Approach for Image Transmission in 10-11
Noisy Channel
2.8 A Comparative Study on Variational Autoencoder and Generative 11-12
Adversarial Networks
2.9 Research Gap / Market Analysis 13-14
3 PROBLEM FORMULATION 15
REFERENCES 17
vii
LIST OF FIGURES
Page No.
Fig 1.1 Software Engineering Methodology Model 3
viii
LIST OF TABLES
Page No.
Table 2.1 Comparison of different approaches 13
Table 4.1 Schedule of project 16
ix
LIST OF ABBREVIATIONS
CT Computed Tomography
CNN Convolutional Neural Network
DDPM Dell Display And Peripheral Manager
GANs Generative Adversarial Networks
PSNR Peak Signal-to-Noise Ratio
RNNs Recurrent Neural Networks
SR3 Super Resolution Via Repeated Refinement
SSI Small Scale Integration
VAEs Variational Autoencoders
x
Facial Recognition On Low Resolution Images
CHAPTER 1
INTRODUCTION
In the realm of computer vision and image processing, the demand for high-resolution
images continues to surge across various domains, including medical imaging, satellite
observations, and surveillance systems. This project introduces a novel approach to Image
Resolution Enhancement through the implementation of SR3 (Super-Resolution)
modeling.
Super-Resolution (SR) techniques aim to reconstruct high-resolution images from their
low-resolution counterparts, offering a solution to enhance visual quality and extract finer
details. The SR3 modeling technique employed in this project integrates the power of
recurrent neural networks (RNNs) with residual error learning to achieve superior image
resolution.
1.3 Objectives
• Develop a State-of-the-Art Super Resolution Technique: Create an innovative
image super-resolution method that leverages diffusion models to significantly
enhance image quality and detail.
• Human Perception Testing: Conduct rigorous human evaluation tests to validate
the perceptual quality and realism of super-resolved images generated by the
diffusion model.
• User-Friendly Implementation: Develop user-friendly interfaces or frameworks
to make the diffusion model accessible and practical for a wider range of
applications.
• Achieve High-Quality Results: Aim to produce super-resolved images that
exhibit superior visual quality, closely resembling high-resolution ground truth
images
Requirements Dataset
Training
Gathering Retrieval
Monitoring &
Documentation
Optimization
• OpenCV: For image and video processing tasks, including face detection and video
capture.
➢ Data Collection and Annotation Tools:
• Video surveillance hardware or a surveillance camera.
➢ Data Preprocessing:
• Image and video preprocessing libraries for tasks like resizing, normalization, and
augmentation.
➢ PyCharm.
CHAPTER 2
LITERATURE SURVEY
The literature survey on the SR3 model for image resolution enhancement reveals a
foundational paper introducing the integration of recurrent neural networks (RNNs) and
residual error learning. Researchers have explored architectural innovations, emphasizing
the importance of curated datasets and optimal training strategies. Quantitative metrics
such as PSNR and SSI are commonly used for performance evaluation, and real-world
applications in medical imaging, satellite observations, and other domains have been
investigated. Studies also focus on interpretability and visualization of SR3 models,
highlighting ongoing research on attention mechanisms, multi-scale architectures, and
adversarial training. Challenges related to computational complexity, generalization, and
real-time deployment are addressed as future directions. The literature reflects a dynamic
field with continuous advancements in super-resolution techniques.
2.1 Overview
Sources:
• Google
• IEEEXplore
• Springer
• Elsevier
• Google Scholar
Keywords used for the search: Image super-resolution, diffusion models, deep generative
models, image-to-image translation, denoising process, iterative methods, face
recognition
Advantages
1. The iterative nature of the approach may enable the model to capture finer details
in the images over successive iterations, resulting in a more accurate and detailed
reconstruction.
2. Iterative refinement methods may enhance the robustness of the super-resolution
model by mitigating noise and artifacts present in low-resolution images through
successive improvements.
3. The model may adapt and learn from its own previous iterations, allowing it to
refine its predictions based on the feedback and information gained during each
iteration.
Disadvantages
1. Iterative refinement approaches can be computationally intensive, requiring
multiple passes through the network for each image. This may lead to increased
computational time and resource requirements.
2. Training models with iterative refinement might be more complex compared to
single-pass models. It may involve additional challenges related to convergence,
stability, and tuning hyperparameters for multiple stages.
3. The interpretation of the learning process and feature extraction in each iteration
may be challenging, making it harder to understand and explain the decisions
made by the model.
of dense nested attention mechanisms facilitates the model's ability to capture both global
context and intricate local features, enabling it to discern subtle target signatures against
challenging backgrounds.
Advantages
1. The proposed Dense Nested Attention Network may lead to improved accuracy in
detecting small targets in infrared imagery, thanks to the integration of advanced
attention mechanisms that capture both global and local context.
2. The dense nested attention mechanisms can contribute to a more effective
representation of features, allowing the network to discern subtle details of small
targets against complex backgrounds, leading to better discrimination.
3. If the proposed Dense Nested Attention Network is computationally efficient, it
could be advantageous for real-time applications, where quick and accurate small
target detection is crucial.
Disadvantages
1. If the Dense Nested Attention Network has a high computational cost, it might
limit its practicality, especially in real-time applications or scenarios with resource
constraints.
2. The iterative and complex nature of attention mechanisms could potentially lead
to overfitting, where the model memorizes details from the training data but
struggles to generalize well to new, unseen infrared images.
3. The success of the iterative refinement process may be sensitive to the quality of
the initializations. If the model's performance is highly dependent on the initial
estimates, it could be a limitation.
Advantages
1. Deep convolutional neural networks are known for their ability to learn complex
mappings, enabling more accurate reconstructions in inverse problems. The paper
may demonstrate improved accuracy in reconstructing images or information from
noisy or incomplete data.
2. A well-designed deep learning model can often generalize well to unseen data. If
the paper proposes a model that performs well across a variety of imaging tasks
and datasets, it would be considered an advantage.
3. Deep learning models can automatically learn relevant features from data,
reducing the need for manually designed algorithms. This can be advantageous in
situations where the underlying mathematical model of the inverse problem is
complex or not well understood.
Disadvantages
1. Deep learning models, especially deep convolutional neural networks, can be
computationally intensive. The paper might face criticism if it does not adequately
address concerns about the computational complexity of the proposed method,
especially in scenarios where computational resources are limited.
2. Deep learning models often require large amounts of labelled training data to
generalize well. If the paper suffers from a lack of diverse and representative
training data, the model's performance might be limited in real-world applications.
3. Deep models are susceptible to overfitting, where the model performs well on the
training data but fails to generalize to new, unseen data. The paper may face
criticism if it does not adequately address or mitigate overfitting issues.
Advantages
1. The paper may introduce a novel architecture or training strategy that enhances
the fidelity of generated images, producing results with higher resolution and
visual quality compared to existing methods.
2. Leveraging conditional GANs allows for the generation of images based on
specific conditions or attributes. This capability enables more controlled and
customizable image synthesis, addressing the needs of various applications
requiring specific visual characteristics.
3. If the paper focuses on semantic manipulation, it could provide a method for
precise control over specific features or aspects of the generated images. This
fine-grained control is valuable in applications where users need to modify or
customize certain visual elements.
Disadvantages
1. GANs are susceptible to mode collapse, where the generator produces limited
varieties of samples, failing to capture the full diversity of the target distribution.
If the proposed model is prone to mode collapse, it could limit the range of
generated images.
2. GAN training is known for being sensitive and prone to instability. If the paper
does not address or mitigate training challenges, such as oscillations or divergence
issues, it may hinder the practical applicability of the proposed model.
3. Generating high-resolution images with complex models can be computationally
intensive, requiring substantial resources in terms of memory and processing
power. This could limit the accessibility of the proposed approach, particularly for
users with limited computational resources.
functions, and convolutional sparse coding extends this idea by incorporating local spatial
relationships through convolutional operations. The paper explores how convolutional
sparse coding methods can be tailored to the specific challenges of CT image
reconstruction from sparse data.
Advantages
1. Compressed sensing techniques aim to reconstruct images from a reduced set of
acquired data, potentially leading to a lower radiation dose for patients undergoing
CT scans. If the paper successfully demonstrates a reduction in radiation exposure
without compromising image quality, it would be a significant advantage.
2. Convolutional sparse coding methods may capture local spatial relationships more
effectively than traditional sparse coding approaches. This could lead to improved
image quality, reduced artifacts, and better preservation of fine details in the
reconstructed CT images.
3. Convolutional sparse coding, by incorporating local spatial relationships, may
contribute to achieving higher spatial resolution in reconstructed CT images.
Higher resolution is crucial for accurate diagnosis and better visualization of
anatomical structures.
Disadvantages
1. Convolutional sparse coding methods, especially when integrated into complex
algorithms or neural network architectures, can be computationally intensive. This
might pose challenges for real-time applications or environments with limited
computational resources.
2. Convolutional neural networks and sparse coding models often require large
amounts of training data to generalize well. If the proposed method demands
extensive training datasets, it could be a limitation in scenarios where such data
are scarce or difficult to obtain.
3. Convolutional sparse coding models might have numerous hyperparameters that
require careful tuning for optimal performance. Finding the right balance between
model complexity and generalization can be a challenging task.
The paper focuses on leveraging variational autoencoders for the transmission of images
over a noisy communication channel. Variational autoencoders are a type of generative
model that aims to learn a probabilistic representation of input data, and they have been
widely used in image processing and compression tasks. The paper discusses how the
variational autoencoder is structured and trained to encode images into a latent space and
decode them back to the original form, emphasizing its ability to handle noisy channel
conditions.
Advantages
1. VAEs are known for their ability to generate data with inherent noise robustness.
The probabilistic nature of VAEs allows them to handle noise in the transmission
channel more effectively, resulting in improved image reconstruction under noisy
conditions.
2. VAEs encode images into a latent space, which often captures meaningful and
compact representations of the input data. This can lead to efficient transmission
as the information is concentrated in a lower-dimensional space.
3. VAEs are generative models, meaning they can generate new samples from the
learned latent space. This generative capability can be advantageous in scenarios
where reconstructed images need to be generated from partial or degraded data
received in a noisy channel.
Disadvantages
1. Although VAEs can generate samples from the learned latent space, the quality of
generated images may not always match the quality of the input images. This
could be a limitation in scenarios where high-fidelity image reconstruction is
crucial.
2. The latent space representation learned by VAEs might lack interpretability.
Understanding the significance of specific dimensions in the latent space may be
challenging, impacting the model's transparency and explainability.
3. The effectiveness of VAEs in handling noise may depend on the type and
characteristics of the noise. The model may not generalize well to certain types of
noise patterns, limiting its robustness.
The paper explores and contrasts two prominent generative models: variational
autoencoders (VAEs) and generative adversarial networks (GANs). Both VAEs and
GANs are popular frameworks in the field of deep learning for generating realistic data,
such as images, and their comparative analysis provides valuable insights into their
strengths and weaknesses. The paper highlights the significance of generative models in
various applications, including image synthesis, data augmentation, and generative tasks.
Advantages
1. The paper aids researchers and practitioners in making informed decisions about
selecting the appropriate generative model for their specific tasks. Understanding
the advantages and disadvantages of both VAEs and GANs can guide the choice
based on the requirements of the application.
2. A comparative study offers a comprehensive overview of the strengths and
weaknesses of VAEs and GANs. This can serve as a valuable resource for readers
seeking a deeper understanding of these generative models.
3. The paper may provide insights into the architectural differences between VAEs
and GANs, explaining how each model operates and generates realistic data. This
knowledge can be beneficial for researchers aiming to design or modify
generative models.
Disadvantages
1. Comparative studies can be sensitive to the choice of datasets, hyperparameters,
and evaluation metrics. Small variations in these factors might lead to different
conclusions. It's essential for authors to thoroughly detail their experimental setup
to enhance the study's reproducibility.
2. Findings from a comparative study may be specific to the datasets and tasks
chosen for evaluation. The study's generalizability to different domains or
applications might be limited, and this limitation should be acknowledged.
3. VAEs and generative adversarial networks GANs can be sensitive to
hyperparameter tuning. The study might not capture the full range of each model's
performance if certain hyperparameter configurations are not explored.
Analyzing research gaps and conducting a market analysis for image resolution involves
identifying areas where current research or market offerings fall short of meeting specific
needs or expectations.
Research Gaps-
CHAPTER 3
PROBLEM FORMULATION
This project aims to address this challenge by harnessing the power of diffusion-based
probabilistic models. The primary issue at hand is the limitation of low-resolution images,
which lack the fine details, sharpness, and clarity necessary for various applications, such
as medical imaging, surveillance, entertainment, and remote sensing. This problem arises
from the inherent constraints of image sensors, hardware, or transmission systems,
resulting in images with reduced visual fidelity therefore the project revolves around the
development of a novel framework that can produce high-resolution images from low-
resolution inputs, utilizing the principles of diffusion and denoising
The goal of this project is to use SR3 (Super-Resolution via Repeated Refinement), a new
approach to conditional image generation, inspired by recent work on Denoising
Diffusion Probabilistic Models (DDPM) and denoising score matching. SR3 works by
learning to transform a standard normal distribution into an empirical data distribution
through a sequence of refinement steps. The key is a U-Net architecture that is trained
with a denoising objective to iteratively remove various levels of noise from an image.
We adapt DDPMs to image-to-image translation by proposing a simple effective
modification to the U-Net architecture. In contrast to GANs, which require inner-loop
maximization, we minimize a well-defined loss function. Unlike autoregressive models,
SR3 uses a constant number of inference steps regardless of output resolution. SR3
models work well across a range of magnification factors and input resolutions.
CHAPTER 4
STATUS AND ROADMAP
Currently, in order to identify research gaps and user demands, a complete literature study
and market analysis were conducted before beginning the process of implementing SR3
for enhancing low resolution facial image.
A conducted comprehensive testing is used for the standard datasets for achieving an
average accuracy boost of percentage in facial recognition compared to traditional
upscaling methods.
The roadmap outlines important stages including requirement analysis, system design,
testing, deployment, optimization, monitoring, and documentation. Implementation
entails creating the application and incorporating the optimization method. Testing and
optimization phases aim for high accuracy, effective accessibility, and real-time
performance. The deployment phase includes a pilot release, iterative refinements, and
comprehensive documentation.
REFERENCES
[1] Image Super Resolution via Iterative Refinement, Chitwan Saharia, Jonathan Ho,
William Chan , Tim Salimans, David J. Fleet , and Mohammad Norouzi, IEEE, 2023
[2] Dense Nested Attention Network for Infrared Small Target Detection, Boyang Li ,
Chao Xiao , Longguang Wang , Yingqian Wang , Zaiping Lin ,Miao Li, Wei An , and
Yulan Guo , Senior Member, IEEE, 2023
[3] Deep Convolutional Neural Network for Inverse Problems in Imaging, Chitwan
Saharia, Jonathan Ho, William Chan , Tim Salimans, David J. Fleet , and Mohammad
Norouzi, IEEE, 2019
[4] High-Resolution Image Synthesis and Semantic Manipulation with Conditional
GANs, Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan
Catanzaro, NVIDIA Corporation, UC Berkeley, 2018
[5] Convolutional Sparse Coding for Compressed Sensing CT Reconstruction, Chitwan
Saharia, Jonathan Ho, William Chan , Tim Salimans, David J. Fleet , and Mohammad
Norouzi, IEEE,2023
[6] A variational auto-encoder approach for image transmission in noisy channel, Amir
Hossein Estiri, Ali Banaei, Benyamin Jamialahmadi, Mahdi Jafari siavoshani, 2021
[7] A comparative study on variational autoencoder and generative adversarial networks,
Mirza Sami , Iftekharul Mobin, 2019