Multimedia and Computer Vision unit 5
Multimedia and Computer Vision unit 5
The entire process involves image acquiring, screening, analyzing, identifying, and extracting
information. This extensive processing helps computers to understand any visual content and
act on it accordingly. Computer vision projects translate digital visual content into precise
descriptions to gather multi-dimensional data. This data is then turned into a computer-readable
language to aid the decision-making process. The main objective of this branch of Artificial
intelligence is to teach machines to collect information from images.
Applications of Computer Vision
• Medical Imaging: Computer vision helps in MRI reconstruction, automatic pathology,
diagnosis, and computer aided surgeries and more.
• AR/VR: Object occlusion, outside-in tracking, and inside-out tracking for virtual and
augmented reality.
• Smartphones: All the photo filters (including animation filters on social media), QR
code scanners, panorama construction, Computational photography, face detectors,
image detectors like (Google Lens, Night Sight) that we use are computer vision
applications.
• Oil and natural gas -The oil and natural gas companies produce millions of barrels
of oil and billions of cubic feet of gas every day but for this to happen, first, the
geologists have to find a feasible location from where oil and gas can be extracted.
• Video surveillance -The Concept of video tagging is used to tag videos with
keywords based on the objects that appear in each scene. Now imagine being that
security company who’s asking to look for a suspect in a blue van amongst hours and
hours of footage
• Internet: Image search, Mapping, photo captioning, Ariel imaging for maps, video
categorization and more.
OpenCV (Open Source Computer Vision), a cross-platform and free to use library of
functions is based on real-time Computer Vision which supports Deep Learning frameworks
that aids in image and video processing. In Computer Vision, the principal element is to extract
the pixels from the image to study the objects and thus understand what it contains. Below are
a few key aspects that Computer Vision seeks to recognize in the photographs:
• Object Detection: The location of the object.
• Object Recognition: The objects in the image, and their positions.
• Object Classification: The broad category that the object lies in.
• Object Segmentation: The pixels belonging to that object.
Need of Computer Vision
From selfies to landscape images, we are flooded with all kinds of photos today. A report by
Internet Trends says people upload more than 1.8 billion photos daily, and that’s just the number
of uploaded images. Consider what the number would come to if you count the images stored
in phones. We consume more than 4, 146, 600 videos on YouTube and send 103, 447, 520 spam
mails daily. Again, that’s just a part of it – communication, media, and entertainment, the IoT
are all actively contributing to this number. This abundantly available visual content demands
analyzing and understanding and Computer vision helps in doing that by way of teaching
machines to “see” these images and videos.
• Hardware:
• Image Acquisition: This involves capturing images using devices like cameras
(still or video), medical imaging devices, or other sensors.
• Processing: This requires powerful processors (CPUs, GPUs) and memory to
handle the large amounts of image data.
• Display: Monitors or other output devices are needed to visualize the captured
and processed images.
• Software:
• Image Processing Algorithms: These algorithms are used to enhance, filter,
and manipulate images.
• Pattern Recognition Algorithms: These algorithms identify patterns and
features in images, enabling object recognition and other tasks.
• Machine Learning Models: These models, often trained on large datasets of
images, are used to make predictions and inferences about images.
tasks like object recognition or facial recognition.
Advantages
• 1) Improved Accuracy: Digital imaging is less susceptible to human factors and gives
accurate output of the object with high detailed capture.
• 2) Enhanced Flexibility: Digital images are easy to manipulate, edit or analyse as per
the requirements through different software hence they provide flexibility of post
processing.
• 3) High Storage Capacity: Data in any digital format such as in one or more digital
images can still be stored in large amount with very high resolution and quality and will
not suffer physical wear and tear.
• 4) Easy Sharing and Distribution: The use of digital images allows them to be quickly
duplicated and transmitted across various channels and to various gadgets, helping to
speed up the work.
• 5) Advanced Analysis Capabilities: Digital imaging enables the application of
analytical tools, including image recognition and machine learning, which can provide
better insights and increase productivity.
Disadvantages
• 1) Data Size: Large-structured digital image could occupy large storage space and
computational power hence may be expensive.
• 2) Image Noise: Digital images may be compromised by noise and artifacts, which
degrades the image quality mainly when photographed at night or using low image
sensors.
• 3) Dependency on Technology: Digital imaging entails the use of sophisticated
technology and equipment that may be costly and there may be constant need to service
or replace the equipment.
• 4) Privacy Concerns: The ability to take and circulate photographs digitally also poses
concern because personal information can be photographed without the subject’s
permission.
• 5) Data Loss Risks: Digital image repositories, however, are prone to data loss caused
by hardware failures, corrupting software, or unintentional erasure.
Applications
• 1) Medical Imaging: Digital imaging is employed in the medical fields in the diagnostic
process such as X-ray pictures, MRI scans, and CT scans, for internal body reflections.
• 2) Surveillance and Security: Digital cameras and imaging systems are greatly needed
for various security or surveillance purposes as they offer live feed and are also useful
in acquiring data for investigations.
• 3) Remote Sensing: Digital imaging plays an important role in remote sensing
applications in terms of monitoring and mapping of environment and disasters and
involve data captured from satellite and aerial systems.
• 4) Entertainment and Media: The entertainment industry involves the use of digital
imaging in films, video games, and virtual reality to deliver improved visual impact.
• 5) Scientific Research: Digital imaging helps in scientific studies through providing
best picture at research fields like astronomy, biology, and material science.
Image Analysis in Computer Vision
image analysis involves using algorithms to extract meaningful information and insights from
digital images, encompassing tasks like object recognition, segmentation, and feature
extraction.
Here's a more detailed explanation:
• What is Image Analysis?
Image analysis is a core component of computer vision, focusing on enabling computers to
"see" and understand images, much like humans do.
• Key Tasks in Image Analysis:
• Object Recognition: Identifying and classifying specific objects within an
image.
• Image Segmentation: Dividing an image into distinct regions or segments based
on characteristics like color, texture, or edges.
• Feature Extraction: Identifying and extracting relevant features from an image,
such as edges, corners, or shapes, to facilitate further analysis.
• Motion Detection: Identifying and tracking moving objects or changes in an
image sequence.
• Image Enhancement: Improving the quality of an image, for example, by
reducing noise or increasing contrast.
• Image Restoration: Reconstructing images that have been degraded or
damaged.
• Color Image Processing: Analyzing and manipulating images with color
information.
• Applications of Image Analysis:
• Medical Imaging: Analyzing medical scans (X-rays, CT scans, MRIs) to detect
diseases or abnormalities.
• Autonomous Vehicles: Enabling self-driving cars to "see" and navigate their
environment.
• Security Systems: Detecting intruders or unusual activity in surveillance
footage.
• Quality Control: Inspecting products for defects or inconsistencies.
• Document Analysis: Extracting text or data from scanned documents.
• Techniques Used in Image Analysis:
• Digital Image Processing: Techniques for manipulating and enhancing images.
• Pattern Recognition: Identifying patterns and structures in images.
• Machine Learning: Training algorithms to recognize objects and classify
images.
• Deep Learning: Using artificial neural networks to perform complex image
analysis tasks.
Note: The images we give into these algorithms should be in black and white. This helps the
algorithms to focus on the features more.
• Image Classification: Assigns a specific label to the entire image, determining the
overall content such as identifying whether an image contains a cat, dog, or bird. It uses
techniques like Convolutional Neural Networks (CNNs) and transfer learning.
• Object Localization: Goes beyond classification by identifying and localizing the
main object in an image, providing spatial information with bounding boxes around
these objects. This method allows for more specific analysis by indicating the object's
location.
• Object Detection: Combines image classification and object localization, identifying
and locating multiple objects within an image by drawing bounding boxes around each
and assigning labels. Techniques include Region-Based CNNs (R-CNN), You Only
Look Once (YOLO), and Single Shot MultiBox Detector (SSD).
• Comparison: While image classification assigns a single label to the entire image,
object localization focuses on the main object with a bounding box, and object detection
identifies and locates multiple objects within the image, providing both labels and
spatial positions for each detected item. These methods are applied in various fields,
from medical imaging to autonomous vehicles and retail analytics.
How Image Classification Works?
The process of image classification can be broken down into several key steps:
Data Collection and Preprocessing:
• Data Collection: The first step involves gathering a large dataset of labeled images.
These images serve as the foundation for training the classification model.
• Preprocessing: This step includes resizing images to a consistent size, normalizing pixel
values, and applying data augmentation techniques like rotation, flipping, and
brightness adjustment to increase the dataset's diversity and robustness.
Feature Extraction:
• Traditional methods involve extracting hand-crafted features like edges, textures, and
colors. However, modern techniques leverage Convolutional Neural Networks (CNNs)
to automatically learn relevant features from the raw pixel data during training.
Model Training:
• Choosing a Model: CNNs are the most commonly used models for image classification
due to their ability to capture spatial hierarchies in images.
• Training the Model: The dataset is split into training and validation sets. The model is
trained on the training set to learn the features and patterns that distinguish different
classes. Optimization techniques like backpropagation and gradient descent are used to
minimize the error between the predicted and actual labels.
• Validation: The model's performance is evaluated on the validation set to fine-tune its
parameters and prevent overfitting.
Model Evaluation and Testing:
• The trained model is tested on a separate test set to assess its accuracy, precision, recall,
and other performance metrics, ensuring it generalizes well to unseen data.
Deployment:
• Once validated, the model can be deployed in real-world applications where it processes
new images and predicts their classes in real-time or batch processing modes.
Algorithms and Models of Image Classification
There isn't one straightforward approach for achieving image classification, thus we will take
a look at the two most notable kinds: supervised and unsupervised classification.
Supervised Classification
Supervised learning is well-known for its intuitive concept - it operates like an apprentice
learning from a master. The algorithm is trained on a labeled image dataset, where the correct
outputs are already known and each image is assigned to its corresponding class. The algorithm
is the apprentice, learning from the master (the labeled dataset) to make predictions on new,
unlabeled data. After the training phase, the algorithm uses the knowledge gained from the
labeled data to identify patterns and predict the classes of new images.
• Supervised algorithms can be divided into single-label classification and multi-label
classification. Single-label classification assigns a single label to an image, which is the
most common type. Multi-label classification, on the other hand, allows an image to be
assigned multiple labels, which is useful in fields like medical imaging where an image
may show several diseases or anomalies.
• Famous supervised classification algorithms include k-nearest neighbors, decision
trees, support vector machines, random forests, linear and logistic regressions, and
neural networks.
• For instance, logistic regression predicts whether an image belongs to a certain
category by modeling the relationship between input features and class probabilities. K-
nearest neighbors (KNN) assigns labels based on the closest k data points to the new
input, making decisions based on the majority class among the neighbors. Support
vector machines (SVM) find the best separating boundary (hyperplane) between classes
by maximizing the margin between the closest points of each class. Decision trees use
a series of questions about the features of the data to make classification decisions,
creating a flowchart-like model.
Unsupervised Classification
Unsupervised learning can be seen as an independent mechanism in machine learning; it doesn't
rely on labeled data but rather discovers patterns and insights on its own. The algorithm is free
to explore and learn without any preconceived notions, interpreting raw data, recognizing
image patterns, and drawing conclusions without human interference.
• Unsupervised classification often employs clusterization, a technique that naturally
groups data into clusters based on their similarities. This method doesn't automatically
provide a class; rather, it forms clusters that need to be interpreted and labeled. Notable
clusterization algorithms include K-means, Mean-Shift, DBSCAN, Expectation–
Maximization (EM), Gaussian mixture models, Agglomerative Clustering, and
BIRCH. For instance, K-means starts by selecting k initial centroids, then assigns each
data point to the nearest centroid, recalculates the centroids based on the assigned
points, and repeats the process until the centroids stabilize. Gaussian mixture models
(GMMs) take a more sophisticated approach by assuming that the data points are drawn
from a mixture of Gaussian distributions, allowing them to capture more complex and
overlapping data patterns.
• Among the wide range of image classification techniques, convolutional neural
networks (CNNs) are a game-changer for computer vision problems. CNNs
automatically learn hierarchical features from images and are widely used in both
supervised and unsupervised image classification tasks.
Techniques Used in Image Classification
Machine Learning Algorithms
Traditional machine learning algorithms, such as Support Vector Machines (SVM), k-Nearest
Neighbors (k-NN), and Decision Trees, were initially used for image classification. These
methods involve manual feature extraction and selection, which can be time-consuming and
less accurate compared to modern techniques.
Deep Learning
Deep learning, a subset of machine learning, has revolutionized image classification with the
advent of Convolutional Neural Networks (CNNs). CNNs automatically learn hierarchical
features from raw pixel data, significantly improving classification accuracy. Some popular
deep learning architectures include:
• AlexNet: One of the first CNNs to demonstrate superior performance in image
classification tasks.
• VGGNet: Known for its simplicity and depth, achieving high accuracy with deep
networks.
• ResNet: Introduces residual connections to address the vanishing gradient problem,
allowing for the training of very deep networks.
• Inception: Utilizes parallel convolutions with different filter sizes to capture multi-
scale features.
Transfer Learning
Transfer learning involves using pre-trained models on large datasets, such as ImageNet, and
fine-tuning them on specific tasks with smaller datasets. This approach saves time and
computational resources while achieving high accuracy.
Applications of Image Classification
Image classification has a wide range of applications across various industries:
1. Medical Imaging
In the medical field, image classification is used to diagnose diseases and conditions from
medical images such as X-rays, MRIs, and CT scans. For instance, it can help in detecting
tumors, fractures, and other abnormalities with high accuracy.
2. Autonomous Vehicles
Self-driving cars rely heavily on image classification to interpret and understand their
surroundings. They use cameras and sensors to classify objects like pedestrians, vehicles,
traffic signs, and road markings, enabling safe navigation and decision-making.
3. Facial Recognition
Facial recognition systems use image classification to identify and verify individuals based on
their facial features. This technology is widely used in security systems, smartphones, and
social media platforms for authentication and tagging purposes.
4. Retail and E-commerce
In the retail industry, image classification helps in product categorization, inventory
management, and visual search applications. E-commerce platforms use this technology to
provide personalized recommendations and enhance the shopping experience.
5. Environmental Monitoring
Image classification is used in environmental monitoring to analyze satellite and aerial images.
It helps in identifying land cover types, monitoring deforestation, tracking wildlife, and
assessing the impact of natural disasters.
Challenges in Image Classification
Despite its advancements, image classification faces several challenges:
• Data Quality and Quantity: High-quality, labeled datasets are essential, but collecting
and annotating these datasets is resource-intensive.
• Variability and Ambiguity: Images can vary widely in lighting, angles, and
backgrounds, complicating classification. Some images may contain multiple or
ambiguous objects.
• Computational Resources: Training deep learning models requires significant
computational power and memory, often necessitating specialized hardware like GPUs.