Computer Vision
Computer Vision
Computer Vision
Submitted by:
Harsh Chaudhary
(21CS38/220089020045)
2024
ii
Candidate’s Declaration
I hereby declare that the seminar report titled “Computer Vision” by me based
on available literature and I have not submitted it anywhere else for the award of
any other degree or diploma.
I certify that the above statement made by the candidate is true to the best of my
knowledge.
Acknowledgement
Harsh Chaudhary
(21CS38/220089020045)
iv
Table of Contents
Candidate’s Declaration .....................................................................................ii
Acknowledgement ........................................................................................... iii
1 Introduction ................................................................................................. 1
1.1 Overview of Computer Vision ............................................................................................ 1
1.2 Importance of Computer Vision ......................................................................................... 2
2 Fundamentals of Computer Vision ............................................................... 4
2.1 Image Processing Basics .................................................................................................... 4
2.2 Key Concepts ..................................................................................................................... 5
3 Core Architectures used in Computer Vision ............................................... 6
3.1 CNN .................................................................................................................................. 6
3.2 YOLO ................................................................................................................................ 7
4 Applications of Computer Vision ................................................................. 9
5 Challenges & Limitations of Computer Vision .......................................... 13
6 Conclusion ................................................................................................. 14
7 References ................................................................................................. 15
1
1 Introduction
Computer Vision is a field of Artificial Intelligence (AI) that enables machines to
interpret, analyse, and make decisions based on visual data from the world, such
as images and videos. It involves techniques for tasks like image recognition,
object detection, image segmentation, and video analysis, aiming to replicate the
capabilities of human vision. Applications include facial recognition, autonomous
vehicles, medical imaging, and augmented reality.
(CNNs), and advanced hardware like GPUs and edge devices for real-time
processing. Despite its potential, challenges such as handling varying lighting
conditions, angles, and occlusions persist. However, with continuous
advancements in AI and computing power, Computer Vision is expanding into
robotics, environmental monitoring, and personalized technologies, shaping the
future of automation and intelligent systems.
The fields of medical imaging and diagnosis both benefit from the application of
computer vision. Computer vision algorithms are able to provide radiologists with
assistance in the detection of anomalies, tumors, and other problems associated
with the human body by examining medical pictures like X-rays, CT scans, and
MRIs. Not only does computer vision improve diagnostic accuracy, but it permits
early diagnosis and quick treatment, which has the potential to save lives.
Computer vision makes surgical procedures easier by offering real-time guidance
and analysis, which in turn improves surgical accuracy and the overall health of
patients.
evaluate the environment in which they are operating. The use of computer vision
enables autonomous vehicles to make educated decisions, successfully handle
difficult situations, and contribute to an increased level of road safety.
3.1 CNN
Convolutional Neural Networks (CNNs) are a class of deep learning models
primarily used for processing and analyzing visual data, such as images and
videos. CNNs are designed to automatically and adaptively learn spatial
hierarchies of features from the input data by applying convolutional operations.
These networks consist of several layers, each performing specific tasks to extract
increasingly complex features. The core components of CNNs include
convolutional layers, pooling layers, and fully connected layers.In a
convolutional layer, filters (or kernels) slide over the input data, performing
element-wise multiplication and summing the results to produce feature maps.
This operation helps the network learn local patterns such as edges, textures, or
shapes, which are essential for recognizing objects in images.
7
3.2 YOLO
YOLO (You Only Look Once) is a groundbreaking real-time object detection system that has
significantly transformed how we approach detecting objects in images and videos. Introduced
by Joseph Redmon and colleagues, YOLO streamlines the object detection process by treating
it as a single regression problem, predicting bounding boxes and class probabilities in one step.
This approach contrasts with traditional object detection methods, such as R-CNN or Faster R-
CNN, which rely on region proposal networks to identify potential objects, followed by
classification and localization. YOLO’s innovative architecture eliminates the need for a multi-
stage pipeline, making it faster and more efficient.
8
One of YOLO’s core advantages is its speed. By processing an image in a single neural network
pass, YOLO achieves real-time performance, enabling applications where rapid decision-
making is crucial. For example, it is widely used in autonomous driving, surveillance systems,
and robotics, where quick object detection is essential to ensure safety and operational
efficiency. YOLO divides an input image into a grid and predicts bounding boxes and class
probabilities for each cell, allowing it to detect multiple objects simultaneously with high
accuracy.
Over the years, YOLO has undergone several iterations, such as YOLOv2, YOLOv3, and more
recent versions like YOLOv4 and YOLOv5, each introducing enhancements in accuracy,
speed, and computational efficiency. These versions optimize anchor boxes, loss functions, and
architecture to improve detection capabilities for small objects, reduce false positives, and
increase compatibility with edge devices. YOLO’s adaptability to various hardware
environments, from GPUs to mobile devices, has made it a preferred choice for researchers
and practitioners alike.
CT Scan and MRI: Computer vision has now been greatly applied in CT
scans and MRI analysis. AI with computer vision designs such a system
that analyses the radiology images with a high level of accuracy, similar to
a human doctor, and also reduces the time for disease detection, enhancing
the chances of saving a patient's life. It also includes deep learning
algorithms that enhance the resolution of MRI images and hence improve
patient outcomes.
Self-driving cars: Computer vision is widely used in self-driving cars. It
is used to detect and classify objects (e.g., road signs or traffic lights),
create 3D maps or motion estimation, and plays a key role in making
autonomous vehicles a reality.
helps us detect and extract printed or handwritten text from visual data such
as images. Further, it enables us to extract text from documents like
invoices, bills, articles, etc. and verifies against the databases.
Fingerprint recognition and Biometrics: Computer vision technology is
used to detect fingerprints and biometrics to validate a user's identity.
Biometrics is the measurement or analysis of physiological characteristics
of a person that make a person unique such as Face, Finger Print, iris
Patterns, etc. It makes use of computer vision along with knowledge of
human physiology and behaviour.
Lastly, ethical concerns like privacy invasion and the potential for misuse of
surveillance technologies pose societal challenges. As computer vision becomes
more integrated into daily life, addressing these limitations is crucial for building
robust, fair, and reliable systems that align with ethical and practical standards.
14
6 Conclusion
Computer vision has emerged as a pivotal technology that enables machines to
interpret and interact with the visual world, bringing significant advancements
across various industries. From healthcare diagnostics and autonomous vehicles
to retail and security, its applications have transformed traditional practices and
opened new possibilities for innovation. By leveraging advancements in machine
learning and deep learning, computer vision systems now achieve remarkable
accuracy in tasks such as object detection, image recognition, and video analysis,
proving to be indispensable in solving real-world challenges. Despite its
impressive achievements, computer vision faces limitations that must be
addressed for broader adoption and reliability. Issues like data dependency,
computational requirements, and generalization across domains pose significant
hurdles. Moreover, ethical considerations such as privacy concerns and
vulnerability to adversarial attacks emphasize the need for responsible
development. Continued research and collaboration between academia and
industry are essential to overcoming these challenges and building robust, fair,
and secure systems.
7 References
[1] Qtravel.ai
https://www.qtravel.ai/wpcontent/uploads/2024/01/HumanVisionvsComp
uterVisionEN.png
https://www.researchgate.net/figure/Basic-steps-of-Image-Processing-
Techniques_fig1_342379058
of-self-driving-cars.jpg