0% found this document useful (0 votes)
8 views

Computer Vision

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Computer Vision

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

i

Computer Vision

A Seminar Report Submitted in partial fulfilment for the award of


Degree in Bachelor of Technology
IN
Computer Science & Information Technology

Submitted by:
Harsh Chaudhary
(21CS38/220089020045)

Under Supervision of:


Dr. Anil Bisht

Department of Computer Science & Information


Technology

Faculty of Engineering & Technology


M.J.P Rohilkhand University, Bareilly

2024
ii

Candidate’s Declaration

I hereby declare that the seminar report titled “Computer Vision” by me based
on available literature and I have not submitted it anywhere else for the award of
any other degree or diploma.

Date: Harsh Chaudhary


(21CS38)

Certificate from Supervisor

I certify that the above statement made by the candidate is true to the best of my
knowledge.

Date: Dr. Anil Bisht


iii

Acknowledgement

First and foremost, I wish to express my sincere thanks and gratitude


to my esteemed Mentor “Dr. Anil Bisht” who has contributed so much
for successful completion of my seminar report by his thoughtful
reviews and valuable guidance.

I extend my heartfelt appreciation to all the faculty members of the


Department of Computer Science & Information Technology who
have encouraged me throughout this journey and provided their
insightful suggestions.

Signature of the Student

Harsh Chaudhary
(21CS38/220089020045)
iv

Table of Contents
Candidate’s Declaration .....................................................................................ii
Acknowledgement ........................................................................................... iii
1 Introduction ................................................................................................. 1
1.1 Overview of Computer Vision ............................................................................................ 1
1.2 Importance of Computer Vision ......................................................................................... 2
2 Fundamentals of Computer Vision ............................................................... 4
2.1 Image Processing Basics .................................................................................................... 4
2.2 Key Concepts ..................................................................................................................... 5
3 Core Architectures used in Computer Vision ............................................... 6
3.1 CNN .................................................................................................................................. 6
3.2 YOLO ................................................................................................................................ 7
4 Applications of Computer Vision ................................................................. 9
5 Challenges & Limitations of Computer Vision .......................................... 13
6 Conclusion ................................................................................................. 14
7 References ................................................................................................. 15
1

1 Introduction
Computer Vision is a field of Artificial Intelligence (AI) that enables machines to
interpret, analyse, and make decisions based on visual data from the world, such
as images and videos. It involves techniques for tasks like image recognition,
object detection, image segmentation, and video analysis, aiming to replicate the
capabilities of human vision. Applications include facial recognition, autonomous
vehicles, medical imaging, and augmented reality.

Figure 1: Computer Vision vs Human Vision [1]

1.1 Overview of Computer Vision


Computer Vision is a field of Artificial Intelligence (AI) that empowers machines
to interpret and analyze visual data, such as images and videos, to extract
meaningful information and make decisions. It involves techniques like image
processing, feature extraction, object recognition, and image segmentation to
enable applications in diverse areas. Key uses include medical imaging for
diagnostics, autonomous vehicles for navigation, facial recognition for security,
and augmented reality for enhanced user experiences. Modern Computer Vision
heavily relies on deep learning, particularly Convolutional Neural Networks
2

(CNNs), and advanced hardware like GPUs and edge devices for real-time
processing. Despite its potential, challenges such as handling varying lighting
conditions, angles, and occlusions persist. However, with continuous
advancements in AI and computing power, Computer Vision is expanding into
robotics, environmental monitoring, and personalized technologies, shaping the
future of automation and intelligent systems.

1.2 Importance of Computer Vision


Computer vision is an essential component in a wide variety of real-world
applications, which results in considerable benefits for a wide range of business
sectors. Its significance derives from the fact that it enables machines and systems
to comprehend and interpret visual input in a manner that is analogous to human
vision and perception. The capability has disruptive consequences in a variety of
fields including, but not limited to, healthcare, autonomous vehicles,
manufacturing, surveillance, and augmented reality.

The fields of medical imaging and diagnosis both benefit from the application of
computer vision. Computer vision algorithms are able to provide radiologists with
assistance in the detection of anomalies, tumors, and other problems associated
with the human body by examining medical pictures like X-rays, CT scans, and
MRIs. Not only does computer vision improve diagnostic accuracy, but it permits
early diagnosis and quick treatment, which has the potential to save lives.
Computer vision makes surgical procedures easier by offering real-time guidance
and analysis, which in turn improves surgical accuracy and the overall health of
patients.

Computer vision is extremely important for the perception and object


identification capabilities of autonomous vehicles. Computer vision algorithms
examine sensor data obtained from cameras, LiDAR, and radar systems in order
to detect and categorize objects, recognize lane markings and traffic signs, and
3

evaluate the environment in which they are operating. The use of computer vision
enables autonomous vehicles to make educated decisions, successfully handle
difficult situations, and contribute to an increased level of road safety.

The use of computer vision is becoming increasingly important in the


manufacturing industry, particularly in the areas of quality control and process
optimization. Inspection of products and components carried out by computer
vision systems using high-speed cameras and complex mathematical algorithms
allows for the detection of flaws, the maintenance of product consistency, and the
improvement of production efficiencies. The use of computer vision helps cut
down on waste, boost product quality, and streamline production processes.

The use of computer vision capabilities into surveillance systems is quite


beneficial. Analysis of surveillance camera footage by computer vision
algorithms enables detection and tracking of objects, recognition of suspicious
behaviors, and instantaneous notifications. The use of computer vision improves
security measures, contributes to increased public safety, and supports the
investigative efforts of law enforcement authorities.

Augmented reality, known as AR, is achieved by combining virtual elements with


the real world with the use of computer vision. The use of computer vision
algorithms enables augmented reality (AR) systems to accurately superimpose
virtual objects onto the scene of the real world by comprehending and interpreting
the visual environment. The development of computer vision has opened the door
to more immersive experiences, interactive games, and practical applications
such as the visualization of architecture, interior design, and remote
collaboration.The application of computer vision has repercussions for society
that go beyond the scope of these particular disciplines.
4

2 Fundamentals of Computer Vision


The fundamentals of Computer Vision focus on enabling machines to interpret
and analyze visual data effectively. It begins with image acquisition, where visual
inputs are collected using cameras or sensors. These images undergo
preprocessing, such as filtering, resizing, and noise reduction, to enhance quality
and prepare for further analysis. Feature extraction follows, identifying crucial
patterns like edges, textures, and shapes that carry meaningful information. Core
tasks like object detection and recognition involve locating and classifying
objects within images or videos, while image segmentation divides an image into
distinct regions to isolate objects or areas of interest. Additionally, motion and
tracking analysis help monitor changes in visual data over time, enabling the
tracking of moving objects in videos. These building blocks form the foundation
for advanced Computer Vision applications in diverse industries.

2.1 Image Processing Basics


Image processing is a technique used to enhance, analyze, and manipulate visual
data, often in the form of digital images, using algorithms and mathematical
models. It begins with image acquisition, where images are captured using digital
cameras or scanners and prepared for analysis. Key steps include image
enhancement to improve visual quality by adjusting contrast, reducing noise, or
removing artifacts, and image restoration to correct degradation like blurring or
distortion. Advanced tasks like segmentation divide images into regions or
objects for specific analysis, while representation and description focus on
transforming visual data into meaningful and compact formats for computational
purposes. Image analysis extracts valuable information, such as recognizing
objects or detecting patterns, while synthesis and compression focus on
generating new images or reducing storage requirements. Images are often
represented as matrices of pixel values, where each pixel denotes intensity or
color. The process of image processing finds applications in areas like medical
5

imaging, remote sensing, multimedia, and computer vision, making it a


cornerstone of many modern technological advancements.

Figure 2: Basic steps in Image Processing [2]

2.2 Key Concepts


The key concepts of Computer Vision revolve around enabling machines to
understand and interpret visual data effectively. Image Acquisition is the first
step, involving the collection of images or videos through sensors or cameras.
Image Processing follows, where raw visual data is preprocessed using
techniques like filtering, resizing, and noise reduction to enhance quality. Feature
Extraction focuses on identifying critical patterns such as edges, textures, or
shapes that help in analyzing images. Object Detection and Recognition play a
vital role in locating objects within images and classifying them into predefined
categories. Image Segmentation divides images into meaningful regions to
isolate objects or areas of interest. Motion Analysis and Tracking monitor changes
over time, enabling applications like gesture recognition or autonomous
navigation. 3D Vision allows the reconstruction of three-dimensional
6

representations from 2D data, enhancing depth perception. Finally, Deep


Learning and neural networks, particularly Convolutional Neural Networks
(CNNs), are integral to modern Computer Vision for tasks like face recognition,
semantic segmentation, and real-time object detection, ensuring accuracy and
efficiency in numerous applications.

3 Core Architectures used in Computer Vision


Core architectures in Computer Vision include models specifically designed to
process, analyze, and understand visual data. Convolutional Neural Networks
(CNNs) are the foundation, excelling in tasks like image classification and feature
extraction by leveraging convolutional layers to detect spatial hierarchies. YOLO
(You Only Look Once) is a key architecture for real-time object detection,
allowing fast identification and localization of objects in images. These
architectures form the backbone of modern Computer Vision, powering
applications in various industries like healthcare, autonomous driving, and
robotics.

3.1 CNN
Convolutional Neural Networks (CNNs) are a class of deep learning models
primarily used for processing and analyzing visual data, such as images and
videos. CNNs are designed to automatically and adaptively learn spatial
hierarchies of features from the input data by applying convolutional operations.
These networks consist of several layers, each performing specific tasks to extract
increasingly complex features. The core components of CNNs include
convolutional layers, pooling layers, and fully connected layers.In a
convolutional layer, filters (or kernels) slide over the input data, performing
element-wise multiplication and summing the results to produce feature maps.
This operation helps the network learn local patterns such as edges, textures, or
shapes, which are essential for recognizing objects in images.
7

Figure 3:CNN basic architecture [3]

The pooling layer, typically a max-pooling or average-pooling layer, reduces the


spatial dimensions of the feature maps, which helps in reducing the computational
cost and controlling overfitting.Finally, the fully connected layers act as
classifiers, where the high-level features learned by the convolutional and pooling
layers are used to predict the final output, such as the classification of an image
into specific categories.CNNs are highly effective in computer vision tasks like
image classification, object detection, and facial recognition due to their ability
to detect spatial hierarchies in data.

3.2 YOLO
YOLO (You Only Look Once) is a groundbreaking real-time object detection system that has
significantly transformed how we approach detecting objects in images and videos. Introduced
by Joseph Redmon and colleagues, YOLO streamlines the object detection process by treating
it as a single regression problem, predicting bounding boxes and class probabilities in one step.
This approach contrasts with traditional object detection methods, such as R-CNN or Faster R-
CNN, which rely on region proposal networks to identify potential objects, followed by
classification and localization. YOLO’s innovative architecture eliminates the need for a multi-
stage pipeline, making it faster and more efficient.
8

One of YOLO’s core advantages is its speed. By processing an image in a single neural network
pass, YOLO achieves real-time performance, enabling applications where rapid decision-
making is crucial. For example, it is widely used in autonomous driving, surveillance systems,
and robotics, where quick object detection is essential to ensure safety and operational
efficiency. YOLO divides an input image into a grid and predicts bounding boxes and class
probabilities for each cell, allowing it to detect multiple objects simultaneously with high
accuracy.

Over the years, YOLO has undergone several iterations, such as YOLOv2, YOLOv3, and more
recent versions like YOLOv4 and YOLOv5, each introducing enhancements in accuracy,
speed, and computational efficiency. These versions optimize anchor boxes, loss functions, and
architecture to improve detection capabilities for small objects, reduce false positives, and
increase compatibility with edge devices. YOLO’s adaptability to various hardware
environments, from GPUs to mobile devices, has made it a preferred choice for researchers
and practitioners alike.

Figure 4:Object Detection using YOLO [4]


9

4 Applications of Computer Vision

 X-Ray Analysis : Computer vision can be successfully applied for medical


X-ray imaging. Although most doctors still prefer manual analysis of X-
ray images to diagnose and treat diseases, with computer vision, X-ray
analysis can be automated with enhanced efficiency and accuracy. The
state-of-art image recognition algorithm can be used to detect patterns in
an X-ray image that are too subtle for the human eyes.
 Cancer Detection: Computer vision is being successfully applied for
breast and skin cancer detection. With image recognition, doctors can
identify anomalies by comparing cancerous and non-cancerous cells in
images. With automated cancer detection, doctors can diagnose cancer
faster from an MRI scan.

Figure 5:Skin Cancer detection using Computer Vision [5]


10

 CT Scan and MRI: Computer vision has now been greatly applied in CT
scans and MRI analysis. AI with computer vision designs such a system
that analyses the radiology images with a high level of accuracy, similar to
a human doctor, and also reduces the time for disease detection, enhancing
the chances of saving a patient's life. It also includes deep learning
algorithms that enhance the resolution of MRI images and hence improve
patient outcomes.
 Self-driving cars: Computer vision is widely used in self-driving cars. It
is used to detect and classify objects (e.g., road signs or traffic lights),
create 3D maps or motion estimation, and plays a key role in making
autonomous vehicles a reality.

Figure 6:Self Driving Car [6]

 Pedestrian detection: Computer vision has great application and research


in Pedestrian detection due to its high impact on the designing of pedestrian
systems in various smart cities. With the help of cameras, pedestrian
11

detection automatically identifies and locate the pedestrians in image or


video. Moreover, it also considers the variations among pedestrians related
to attire, body position, and illuminance in different scenarios. This
pedestrian detection is very helpful in different fields such as traffic
management, autonomous driving, transit safety, etc.

Figure 7:Pedestrian detection system[7]


 Road Condition Monitoring & Defect detection: Computer vision has
also been applied for monitoring the road infrastructure condition by
accessing the variations in concrete and tar. A computer vision-enabled
system automatically senses pavement degradation, which successfully
increases road maintenance allocation efficiency and decreases safety risks
related to road accidents.
To perform road condition monitoring, CV algorithms collect the image
data and then process it to create automatic crack detection and
classification system.
 Analyzing text and barcodes (OCR): Nowadays, each product contains
a barcode on its packaging, which can be analyzed or read with the help of
the computer vision technique OCR. Optical character recognition or OCR
12

helps us detect and extract printed or handwritten text from visual data such
as images. Further, it enables us to extract text from documents like
invoices, bills, articles, etc. and verifies against the databases.
 Fingerprint recognition and Biometrics: Computer vision technology is
used to detect fingerprints and biometrics to validate a user's identity.
Biometrics is the measurement or analysis of physiological characteristics
of a person that make a person unique such as Face, Finger Print, iris
Patterns, etc. It makes use of computer vision along with knowledge of
human physiology and behaviour.

Figure 8:Fingerprint detection using Deep Learning[8]

 3D Model building: 3D model building or 3D modelling is a technique to


generate a 3D digital representation of any object or surface using the
software. Computer vision plays its role here also in constructing 3D
computer models from existing objects.
13

5 Challenges & Limitations of Computer Vision


Computer vision, while powerful, faces several challenges and limitations that
can hinder its accuracy and scalability. One of the primary issues is data quality
and diversity. Computer vision models require large amounts of labeled data for
training, and poor-quality or biased datasets can lead to inaccurate results,
especially in diverse real-world scenarios. For instance, models trained on limited
datasets may struggle with recognizing objects in varying lighting, angles, or
environmental conditions.

Another major challenge is the computational complexity of computer vision


tasks. High-resolution image processing, real-time object detection, and video
analysis demand significant computational resources, making it difficult to
deploy these models on edge devices or in environments with limited power and
processing capabilities. Additionally, interpretability remains a concern, as the
black-box nature of deep learning models in computer vision makes it difficult to
understand why certain decisions or errors occur, which can be critical in
sensitive applications like healthcare or autonomous vehicles.

Computer vision systems are also vulnerable to adversarial attacks, where


small, imperceptible changes to an image can trick a model into misclassifying
objects. This raises concerns in security-critical applications like surveillance or
autonomous navigation. Furthermore, generalization across different domains
remains a limitation; a model trained in one environment often fails when applied
to another without further fine-tuning.

Lastly, ethical concerns like privacy invasion and the potential for misuse of
surveillance technologies pose societal challenges. As computer vision becomes
more integrated into daily life, addressing these limitations is crucial for building
robust, fair, and reliable systems that align with ethical and practical standards.
14

6 Conclusion
Computer vision has emerged as a pivotal technology that enables machines to
interpret and interact with the visual world, bringing significant advancements
across various industries. From healthcare diagnostics and autonomous vehicles
to retail and security, its applications have transformed traditional practices and
opened new possibilities for innovation. By leveraging advancements in machine
learning and deep learning, computer vision systems now achieve remarkable
accuracy in tasks such as object detection, image recognition, and video analysis,
proving to be indispensable in solving real-world challenges. Despite its
impressive achievements, computer vision faces limitations that must be
addressed for broader adoption and reliability. Issues like data dependency,
computational requirements, and generalization across domains pose significant
hurdles. Moreover, ethical considerations such as privacy concerns and
vulnerability to adversarial attacks emphasize the need for responsible
development. Continued research and collaboration between academia and
industry are essential to overcoming these challenges and building robust, fair,
and secure systems.

Looking ahead, the future of computer vision is promising, with advancements in


AI, hardware optimization, and ethical frameworks paving the way for more
efficient and inclusive solutions. As technology evolves, computer vision will
continue to play a crucial role in shaping smarter cities, enhancing healthcare
outcomes, and improving everyday experiences. Its potential to impact society
positively is vast, making it an exciting field with boundless opportunities for
innovation and growth.
15

7 References
[1] Qtravel.ai

https://www.qtravel.ai/wpcontent/uploads/2024/01/HumanVisionvsComp

uterVisionEN.png

[2] A Comparative Review of Various Approaches for Plant Disease Detection

- Scientific Figure on ResearchGate. Available from:

https://www.researchgate.net/figure/Basic-steps-of-Image-Processing-

Techniques_fig1_342379058

[3] Medium.com https://miro.medium.com/

[4] Medium.com https://miro.medium.com/

[5] Nature.com https://media.springernature.com

[6] Vesttech.com https://vesttech.com/wp-content/uploads/2017/06/dangers-

of-self-driving-cars.jpg

[7] Veise.com https://image.made-in-china.com/

[8] Aratek.com https://cdn.prod.website-files.com/

You might also like