Computer Vision
Computer Vision
1. Loss of Information in 3D to 2D: Cameras and eyes capture images in 2D, losing depth
information from the real world. This makes it hard to determine the size and distance of
objects accurately. For instance, a small object close to the camera might look similar in
size to a larger object far away.
2. Interpretation of Images: Humans naturally interpret images based on past experiences
and knowledge. However, teaching computers to understand images requires encoding
vast amounts of information and reasoning abilities, which is still limited in practice.
3. Mapping from Image Data to Models: Interpreting images involves mapping them to
logical models representing real-world concepts. This requires understanding syntax
(rules) and semantics (meanings) similar to language processing.
4. Dealing with Noise and Uncertainty: Real-world measurements, including images, are
prone to noise and uncertainty. Computer vision algorithms must account for this
uncertainty, often using complex mathematical tools like probability theory.
5. Handling Large Amounts of Data: Images and videos contain vast amounts of data,
which can strain processing and memory resources. While technology has improved,
efficient processing remains crucial, especially for real-time applications.
6. Complex Image Formation Physics: The brightness of objects in images depends on
various factors like light sources, surface geometry, and reflectance properties.
Reconstructing these properties accurately from images is challenging due to the ill-
posed nature of the inverse tasks involved.
7. Balancing Local and Global Perspectives: Most image analysis algorithms focus on
local regions, limiting their ability to understand the broader context of a scene.
Incorporating global context is crucial for accurate interpretation, posing a longstanding
challenge in computer vision.
The motivation behind the study of computer vision stems from the desire to enable machines to
perceive and understand the visual world in a manner similar to humans. Here are some key
motivations:
1. Human-Like Perception: Vision is one of the primary senses through which humans
interact with the world. By giving computers the ability to "see," we aim to replicate the
human ability to understand and interpret visual information.
2. Automation and Efficiency: Many tasks that rely on visual perception, such as
inspection, surveillance, and monitoring, are currently performed by humans. Automating
these tasks through computer vision can increase efficiency, reduce costs, and improve
accuracy.
3. Safety and Security: Computer vision systems can enhance safety and security by
detecting anomalies, identifying objects or individuals, and monitoring environments in
real-time. Applications include detecting intruders in secure areas, monitoring traffic for
safety violations, and identifying potential hazards in industrial settings.
4. Medical Applications: Computer vision plays a crucial role in medical imaging, aiding
in the diagnosis and treatment of various diseases and conditions. From detecting tumors
in medical scans to tracking the movement of surgical instruments, computer vision
technology has the potential to revolutionize healthcare.
5. Improved User Interfaces: Advancements in computer vision technology can lead to
more intuitive and immersive user interfaces. For example, gesture recognition systems
can allow users to interact with devices using hand movements, while augmented reality
applications can overlay digital information onto the physical world.
6. Scientific Exploration: Computer vision is essential for scientific exploration, enabling
researchers to analyze and interpret data from diverse sources such as telescopes,
microscopes, and satellites. It allows for the extraction of valuable insights from large
volumes of visual data, contributing to advancements in fields such as astronomy,
biology, and environmental science.
7. Enhanced Accessibility: By enabling computers to understand visual information,
computer vision technology can improve accessibility for individuals with disabilities.
For example, it can facilitate text recognition for the visually impaired or enable gesture-
based interfaces for individuals with mobility impairments.
Image representation in computer vision can be categorized into different levels based on the
abstraction of the data. Here are four possible levels of image representation suitable for image
analysis problems to detect and classify objects:
Let's break down the process of going from objects or scenes to images and then to features, and
finally back to objects, using the specified levels of representation:
The hierarchy of image representation levels encompasses both low-level and high-level image
processing techniques, each serving different purposes in the analysis and understanding of
images. Here's a discussion of the hierarchy with low and high-level image processing:
1. Low-Level Image Processing:
o Data Input: Begins with raw image data captured by sensors like cameras,
represented as matrices of pixel values.
Ex. The process begins with a raw image of a person's face captured by a camera.
o Ex. Noise in the image, such as camera sensor noise or lighting variations, is
suppressed to enhance the quality of the image.
Ex. : Edges of facial features such as eyes, nose, and mouth are detected using
edge detection algorithms. This helps in identifying the boundaries of facial
components.
o Image Segmentation: Separates objects from the background and from each
other, possibly distinguishing total and partial segmentation. Partial segmentation
focuses on extracting cues useful for higher-level processing.
Eg. : The image is segmented to separate the face from the background. This
helps in isolating the facial region for further analysis.
Eg. : Low-level features like corners, edges, and textures are extracted from the
facial region. For example, the eyes may be represented by keypoints indicating
their position and orientation.
o Eg. Relevant information such as facial features, shape, and proportions are
extracted from the low-level features, reducing the data to a symbolic
representation of the face.
o Knowledge Incorporation: Utilizes domain-specific or general knowledge about
the image content, including object size, shape, and spatial relationships.
Eg. The facial features are represented symbolically, such as by a set of key facial
landmarks or descriptors representing key features like eye color or facial
symmetry.
Eg. : The symbolic representation is used to recognize the face and understand its
identity. This involves comparing the extracted features with a database of known
faces to determine the identity of the person.
Eg.: Feedback from the recognition process may be used to refine the initial
feature extraction or segmentation steps. For example, if a face is misidentified,
adjustments may be made to the edge detection parameters to improve accuracy.
Unit 2
1. Edge:
o An edge in image processing refers to a significant change in intensity or color
between neighboring pixels in an image.
o It represents a local property of a pixel and its immediate neighborhood,
indicating the boundary or transition between objects or regions in an image.
o Edges are typically detected using gradient-based methods, where the magnitude
and direction of the gradient of the image function indicate how fast the image
intensity varies in the vicinity of a pixel.
2. Crack Edge:
o A crack edge is a structure between pixels in an image that is similar to cellular
complexes but is more pragmatic and less mathematically rigorous.
o Each pixel has four crack edges attached to it, defined by its relation to its 4-
neighbors.
o The direction of a crack edge is determined by the increasing brightness between
the relevant pair of pixels, and its magnitude is the absolute difference in
brightness.
3. Border:
o In image analysis, the border of a region refers to the set of pixels within the
region that have one or more neighbors outside the region.
o It represents the boundary of the region and is composed of pixels that transition
between the region and its surroundings.
o The inner border corresponds to pixels within the region, while the outer border
corresponds to pixels at the boundary of the background (complement) of the
region.
4. Convex Region:
o A convex region is a region in an image where any two points within it can be
connected by a straight line segment, and the entire line lies within the region.
o It is characterized by its convexity, meaning that it does not have any indentations
or concavities.
o Convex regions can be identified by their property of allowing straight-line
connections between any two points within them.
5. Non-Convex Region:
o A non-convex region is a region in an image that does not satisfy the criteria of
convexity.
o It may contain indentations or concavities, making it impossible to draw a straight
line between certain points within the region without crossing its boundary.’
6. Contrast: Difference in luminance or color that helps distinguish an object from its
background.
7. Acuity: Sharpness or clarity of vision, essential for seeing fine details.
8. Perceptual Grouping: The visual system's method of organizing elements into
coherent groups based on principles like proximity, similarity, continuity, closure, and
common fate.
1. Edge:
2. Crack Edge:
Definition: A crack edge is a structure that connects pixels in a way that is similar to
cellular complexes but is more practical and less mathematically rigorous.
Relation to Pixels: Each pixel has four crack edges, each connecting it to one of its four
neighbors (up, down, left, right).
Direction and Magnitude: The direction of a crack edge is determined by the increase in
brightness between the connected pixels, and the magnitude is the absolute difference in
brightness. This helps in identifying the edge based on intensity changes.
3. Border:
Definition: The border of a region in an image consists of the pixels that have one or
more neighboring pixels outside the region.
Boundary Representation: It represents the boundary of the region, marking the
transition between the region and its surroundings.
Inner and Outer Borders: The inner border includes pixels within the region that touch
the boundary, while the outer border consists of pixels at the boundary of the region’s
complement (background).
4. Convex Region:
Definition: A convex region is a region where any two points within it can be connected
by a straight line, and the entire line segment lies within the region.
Characteristic: It has no indentations or concavities, meaning it is "bulged out" without
any inward curves.
Identification: Convex regions are identified by their property of allowing straight-line
connections between any two points within them, ensuring the line stays inside the
region.
5. Non-Convex Region:
Definition: A non-convex region is a region that does not meet the criteria of convexity.
Indentations/Concavities: It may have indentations or concavities, making it impossible
to connect certain points within the region with a straight line without crossing the
boundary.
Complex Shape: Non-convex regions have more complex shapes due to these inward
curves and indentations.
Additive Noise: This type of noise is added to the original signal as an additional component.
It alters the original signal by adding random fluctuations or disturbances. Additive noise affects
all parts of the signal equally and independently of the signal's magnitude. An example of
additive noise is Gaussian noise, where the noise values are sampled from a Gaussian
distribution.
Multiplicative Noise: Multiplicative noise alters the original signal by scaling it with a
random factor. Unlike additive noise, multiplicative noise affects different parts of the signal
differently, depending on the signal's magnitude. An example of multiplicative noise is television
raster degradation, where the noise level varies across the image depending on the TV lines.
Gaussian Noise: Gaussian noise is a type of additive noise characterized by values sampled
from a Gaussian (normal) distribution. It is often used to model random fluctuations or errors in
signals and measurements.
Impulsive Noise: Impulsive noise, also known as spike noise, introduces sudden and extreme
changes in signal values. It corrupts the signal with individual noisy pixels whose brightness
significantly differs from their neighborhood. Salt-and-pepper noise, where individual pixels are
corrupted with either white or black values, is a common example of impulsive noise.
1. Additive Noise:
Definition: Additive noise is noise that is added to the original signal as an additional
component. It introduces random fluctuations or disturbances to the signal, affecting all
parts of the signal equally and independently of the signal's magnitude.
Example: An image of a clear sky with Gaussian noise added. The noise values are
sampled from a Gaussian distribution, resulting in a speckled appearance across the entire
image. Each pixel in the image has a small, random value added to it, which can make the
image look grainy.
In the image above, the original image of "Lenna" is altered by adding Gaussian noise,
resulting in a grainy effect throughout the image.
2. Multiplicative Noise:
Definition: Multiplicative noise scales the original signal with a random factor. Unlike
additive noise, it affects different parts of the signal differently, depending on the signal's
magnitude.
Example: An image taken with a camera sensor that has varying sensitivity across the
sensor's surface, resulting in a pattern where the noise level changes depending on the
brightness of different parts of the image.
The image above shows how multiplicative noise might look, with the noise level varying
across the image.
3. Gaussian Noise:
Definition: Gaussian noise is a type of additive noise where the noise values are sampled
from a Gaussian (normal) distribution. It is often used to model random fluctuations or
errors in signals and measurements.
Example: A photograph of a dark scene with Gaussian noise added. The noise introduces
random variations in pixel values, which can appear as small white or black dots
throughout the image.
The image of a boat has Gaussian noise added, creating a speckled effect across the
image.
4. Impulsive Noise:
Definition: Impulsive noise, also known as spike noise, introduces sudden and extreme
changes in signal values. It corrupts the signal with individual noisy pixels whose
brightness significantly differs from their neighborhood.
Example: A scanned document with impulsive noise. Certain pixels in the image are
randomly replaced with either very bright or very dark values, making it look like there
are white or black specks scattered across the document.
The image above shows a text document corrupted with impulsive noise, resulting in
random bright and dark specks.
5. Salt-and-Pepper Noise:
1. Euclidean Distance:
Definition: Manhattan distance calculates the distance between two points by summing
the absolute differences of their coordinates.
Calculation: It is particularly useful in grid-based environments where movement is
restricted to horizontal and vertical paths.
Use Cases: Commonly used in routing algorithms, computer vision, and robotics.
Example: Determining the distance between two locations on a city grid, where
movement is constrained to streets.
3. Hamming Distance:
Definition: Hamming distance measures the number of positions at which corresponding
elements of two equal-length strings are different.
Calculation: It is often used for comparing strings of equal length, particularly in
genetics and error correction.
Use Cases: Widely applied in DNA sequence analysis, data transmission, and error
detection.
Example: Comparing binary strings to identify errors in transmission or mutations in
genetic sequences.
Color constancy refers to the ability of the human visual system (and ideally, artificial perception
systems) to perceive the colors of objects as relatively stable and consistent under varying
lighting conditions. This means that even when an object is viewed under different illuminations,
such as sunlight or artificial indoor lighting, humans tend to perceive the colors of the object as
remaining the same.
The concept of color constancy is essential for image processing and analysis for several reasons:
1. Perceptual Consistency: Color constancy ensures that objects in images are perceived
consistently across different lighting conditions, allowing viewers to accurately interpret
and understand the content of the image. Without color constancy, objects may appear to
change color dramatically depending on the lighting, leading to confusion or
misinterpretation.
2. Color Correction: In image processing, color constancy algorithms can be used to
correct for variations in lighting and white balance, ensuring that colors are accurately
represented in the final image. This is particularly important in applications such as
photography, where accurate color reproduction is crucial for conveying the intended
visual information.
3. Object Recognition and Segmentation: Color constancy aids in object recognition and
segmentation by enabling algorithms to focus on the inherent color characteristics of
objects rather than being influenced by changes in illumination. This allows for more
reliable and accurate identification of objects in images, which is important for tasks such
as automated surveillance, medical imaging, and autonomous navigation.
4. Human Perception Alignment: By mimicking the color constancy abilities of the
human visual system, artificial perception systems can produce images that are more
perceptually consistent with how humans perceive the world. This improves the usability
and interpretability of images generated by these systems, leading to better human-
computer interaction and visual communication.
Photosensitive cameras utilize photosensitive sensors to capture images by converting incoming
light into electrical signals. These sensors can be broadly categorized into two types: those based
on photo-emission principles and those based on photovoltaic principles.
Photo-emission sensors exploit the photoelectric effect, where incoming photons cause the
emission of electrons in a material. This principle is most evident in metal-based sensors and has
been historically used in technologies like photomultipliers and vacuum tube TV cameras.
Photovoltaic sensors, on the other hand, utilize semiconductor materials. Incoming photons
elevate electrons from the valence band to the conduction band, generating an electric current
proportional to the intensity of the incident light. Various semiconductor devices such as
photodiodes, avalanche photodiodes, photoresistors, and Schottky photodiodes are based on this
principle.
Among semiconductor photoresistive sensors, two widely used types are CCDs (Charge-
Coupled Devices) and CMOS (Complementary Metal Oxide Semiconductor) sensors. CCDs,
developed in the 1960s and 1970s, initially dominated the market due to their maturity and
widespread usage. CMOS technology, while developed around the same time, became
technologically mastered in the 1990s.
Advantages:
1. Lower Power Consumption: CMOS sensors typically consume less power compared to
CCDs, making them suitable for battery-operated devices like smartphones and portable
cameras.
2. Random Access: Each pixel in a CMOS sensor has its own charge-to-voltage conversion
circuitry, allowing for individual pixel readout and manipulation. This enables faster
readout speeds and on-chip processing capabilities.
3. Integration with Other Technologies: CMOS sensors can be integrated with other
components on the same chip, such as processors and memory, leading to compact and
cost-effective "smart" camera solutions.
4. High Sensitivity: CMOS sensors exhibit high sensitivity, making them capable of
capturing images in low-light conditions.
Disadvantages:
1. Higher Noise Levels: CMOS sensors generally have higher levels of noise compared to
CCDs, which can affect image quality, especially in low-light environments.
2. Limited Dynamic Range: While CMOS sensors offer a wider dynamic range compared
to CCDs, they may still have limitations in capturing scenes with extremely high contrast.
3. Complex Design: The integration of additional functionalities on the same chip can
increase design complexity and potentially reduce the area available for light capture,
impacting image quality.
Components and Working of Analog and Digital Cameras
Analog Cameras:
Analog cameras use photosensitive sensors, such as CCDs (Charge-Coupled Devices), to convert
incoming light into electrical signals. Analog cameras are traditional cameras that produce a
continuous video signal. Here’s a breakdown of their components and working:
2. Optics (Lens):
o The lens focuses the light onto the CCD sensor, forming an image.
3. Q/U Conversion:
o Converts the luminance and chrominance information into a digital format, preparing
the data for further processing.
5. γ (Gamma) Correction: This block adjusts the brightness to correct the display on older
TV screens (CRTs), making sure the image looks right by balancing out any distortions.
6. Video Circuits:
o These circuits add synchronization pulses to the signal. These pulses are necessary for
the proper display of the video signal, ensuring that the image is displayed row by row.
7. High-Pass Filter:
o This filter compensates for the loss of high frequencies in the optical part, improving the
clarity and sharpness of the image.
8. Analog Filters:
o These filters process the analog signal to remove noise and improve signal quality.
Working:
The lens focuses light onto the CCD sensor, generating an analog video signal.
The AGC adjusts the gain based on the light intensity.
The γ correction compensates for the nonlinear response of CRTs.
Video circuits add synchronization pulses for proper display.
High-pass and analog filters process the signal to improve quality.
The analog signal is transmitted via a coaxial cable to the digitizer.
The digitizer converts the analog signal into digital form and sends it to the computer for
processing.
Advantages:
Cost-effective
Long cable lengths possible (up to 300 meters)
Widely used in legacy systems
Disadvantages:
Digital cameras convert the captured light directly into digital data. Here’s a breakdown of their
components and working:
2. Optics (Lens):
o The lens focuses light onto the CCD sensor to form an image.
4. γ (Gamma) Correction:
o Performs a nonlinear transformation of the intensity scale, if needed.
6. Q/U Conversion:
o Converts the luminance and chrominance information into a digital format, preparing
the data for further processing.
7. Digital Cable:
o Transmits the digital data to the computer. This can be done using USB, FireWire, or
other digital interfaces.
8. Computer Memory:
o Stores the digital video data for further processing.
Working:
The lens focuses light onto the CCD sensor, generating an analog signal.
The AGC adjusts the gain based on the light intensity.
The γ correction adjusts the intensity scale if needed.
The A/D converter digitizes the analog signal from the CCD.
The Q/U conversion processes the luminance and chrominance information.
The digital data is transmitted to the computer via a digital cable.
The digital data is stored in the computer's memory for further processing.
1. Advantages:
o No transmission noise or signal degradation
o Higher resolution and image quality compared to analog cameras
o Easier integration with digital systems and devices
2. Disadvantages:
o Initially more expensive than analog cameras (although prices have dropped
significantly)
o Limited cable length for certain digital interfaces (e.g., FireWire)
o May require additional processing power for real-time image processing
Comparison:
Unit 3
The definitions, formulae, and theorems along with their respective notations:
7. Union:
o Definition: The union of two events AAA and BBB, denoted by A∪BA \cup
BA∪B, is the event containing all the elements that belong to either AAA or BBB
or both.
o Notation: A∪BA \cup BA∪B
o Formula: P(A∪B)=P(A)+P(B)−P(A∩B)P(A \cup B) = P(A) + P(B) - P(A \cap
B)P(A∪B)=P(A)+P(B)−P(A∩B)
8. Multiplication Rule:
o Definition: If an operation can be performed in n1n_1n1 ways, and if of each of
these ways a second operation can be performed in n2n_2n2 ways, then the two
operations can be performed together in n1×n2n_1 \times n_2n1×n2 ways.
o Formula: n1×n2n_1 \times n_2n1×n2
9. Permutation:
o Definition: A permutation is an arrangement of all or part of a set of objects. The
arrangements are different/distinct.
o Notation: nPrnPrnPr
o Formula: nPr=n!(n−r)!nPr = \frac{n!}{(n-r)!}nPr=(n−r)!n!
10. Factorial Rule:
o Definition: The number of permutations of nnn objects is denoted by n!n!n!.
o Notation: n!n!n!
o Formula: n!=n×(n−1)×(n−2)×…×2×1n! = n \times (n-1) \times (n-2) \times \ldots
\times 2 \times 1n!=n×(n−1)×(n−2)×…×2×1
Types of features
In various fields such as mathematics, statistics, computer science, and machine
learning, features refer to individual measurable properties or characteristics of
data points or objects. Features play a crucial role in representing and analyzing
data. Here are some types of features commonly encountered across different
domains:
1. Numerical Features:
Numerical features represent quantitative measurements or values. They can
be continuous (e.g., height, weight, temperature) or discrete (e.g., counts,
ratings).
2. Categorical Features:
Categorical features represent qualitative attributes with distinct categories
or labels. They can be nominal (categories with no inherent order) or ordinal
(categories with a specific order).
3. Binary Features:
Binary features are a special case of categorical features with only two
possible values, often represented as 0 and 1 (e.g., yes/no, true/false,
presence/absence).
4. Text Features:
Text features represent textual data, such as documents, sentences, or words.
Text features may require preprocessing techniques like tokenization,
stemming, or vectorization for analysis.
5. Temporal Features:
Temporal features represent time-related information, such as timestamps,
dates, or durations. They can be used to analyze trends, seasonality, or time-
based patterns in data.
6. Spatial Features:
Spatial features represent geographic or spatial attributes, such as
coordinates, regions, or distances. They are commonly used in geographic
information systems (GIS) and location-based applications.
7. Image Features:
Image features represent visual characteristics extracted from images, such
as colors, textures, shapes, or edges. Image features are essential for tasks
like object detection, recognition, and classification.
8. Audio Features:
Audio features represent characteristics of sound signals, such as frequency,
amplitude, pitch, or duration. They are used in speech recognition, music
analysis, and audio processing.
9. Derived Features:
Derived features are created by transforming or combining existing features.
Examples include polynomial features, logarithmic transformations, or
feature interactions.
10.Meta-Features:
Meta-features describe properties or characteristics of other features. They
can include statistics (e.g., mean, standard deviation), properties of
distributions (e.g., skewness, kurtosis), or information gain measures.
Understanding the types of features present in a dataset is crucial for data
preprocessing, feature engineering, and selecting appropriate algorithms for
analysis or modeling tasks. Each type of feature may require different
preprocessing techniques and algorithms for effective analysis and interpretation.
1. Nominal Data:
Nominal data consist of categories or labels that represent distinct groups or
classes. These categories have no inherent order or ranking.
Examples: Gender (male, female), marital status (single, married, divorced),
types of fruits (apple, banana, orange).
Nominal data can be represented using numbers, but the numbers are
arbitrary and do not have any quantitative significance.
2. Ordinal Data:
Ordinal data represent categories with a specific order or ranking. The
intervals between categories may not be equal or well-defined.
Examples: Educational levels (elementary, high school, college), Likert scale
responses (strongly disagree, disagree, neutral, agree, strongly agree),
ranking of movie ratings (1 star, 2 stars, 3 stars, etc.).
While ordinal data have an inherent order, the differences between
categories may not be consistent or meaningful.
3. Interval-Valued Variables:
Interval-valued variables represent numerical data where the differences
between values are meaningful and consistent, but there is no true zero
point.
Examples: Temperature measured in Celsius or Fahrenheit, years (AD),
longitude and latitude coordinates.
Interval-valued variables can be added, subtracted, and compared, but ratios
between values are not meaningful because there is no true zero point.
4. Ratio Variables:
Ratio variables represent numerical data where there is a true zero point, and
ratios between values are meaningful and interpretable.
Examples: Height, weight, age, time (in seconds), income, number of
children.
Ratio variables allow for all mathematical operations (addition, subtraction,
multiplication, division), and ratios between values convey meaningful
information.
1. Euclidean Distance:
Euclidean distance is one of the most common measures of proximity
between two points in a multidimensional space.
It is calculated as the square root of the sum of squared differences between
corresponding coordinates.
2. Manhattan Distance (City Block Distance):
Manhattan distance calculates the distance between two points by summing
the absolute differences of their coordinates.
It is particularly useful in grid-based environments where movement is
restricted to horizontal and vertical paths
3. Hamming Distance:
Hamming distance measures the number of positions at which
corresponding elements of two equal-length strings are different.
It is often used for comparing strings of equal length, particularly in genetics
and error correction.
Formula: Hamming distance=Number of positions with different symbols
In summary, similarity measures quantify how much two objects are alike or
similar, while dissimilarity measures quantify how much they are different or
distinct. These measures play a crucial role in various data analysis tasks, including
clustering (grouping similar objects together), classification (assigning objects to
predefined categories), and retrieval (finding objects similar to a given query). The
choice of similarity or dissimilarity measure depends on the specific application,
the type of data being analyzed, and the desired properties of the analysis.
Unit 4
Pattern recognition is a branch of machine learning and artificial intelligence that focuses on the
identification and classification of patterns and regularities in data. It involves the use of
algorithms and models to recognize patterns, structures, or regularities within a set of data, often
for the purpose of making predictions, decisions, or classifications.
Biometrics
Bioinformatics
Document Recognition
Fault Diagnostics
Expert Systems
Paradigms:
Definition: Syntactic pattern recognition involves using formal language theory and
grammatical rules to describe and recognize patterns.
Approach: It focuses on defining syntactic rules or grammars that capture the structure
of patterns and using these rules for recognition.
Key Characteristics:
o Based on formal language theory and symbolic manipulation.
o Defines rules or grammars to represent the syntax or structure of patterns.
o Requires precise descriptions of patterns using formal languages.
o Not well-suited for noisy environments or complex real-world data.
Applications:
o Handwritten Text Recognition
o Document Analysis
o Speech Understanding (limited)
o DNA Sequence Analysis (limited)
o Compiler Design and Parsing
Comparison:
Statistical Pattern Recognition is more popular and widely used due to its effectiveness
in handling real-world data with noise and uncertainty.
Syntactic Pattern Recognition is less common and often limited to applications where
patterns have well-defined and structured syntax, such as in document analysis or
language processing tasks.
Feature Selection
Feature selection involves identifying and eliminating irrelevant or redundant features from a
dataset to improve classification accuracy and reduce computational cost. The process ensures
that only the most meaningful features are used, which simplifies the representation of patterns
and the complexity of classifiers, making them faster and more memory-efficient. It also helps
avoid the "curse of dimensionality" by improving classification accuracy, especially when the
number of training samples is limited.
Purpose
1. Exhaustive Search:
o Involves evaluating all possible feature subsets to find the best one.
o Impractical for large feature sets due to computational infeasibility.
4. Sequential Selection:
o Operates by either adding or removing features sequentially.
o Two types:
Sequential Forward Selection (SFS): Starts with an empty set and adds features.
Sequential Backward Selection (SBS): Starts with the full set and removes
features.
o Both methods suffer from the "nesting effect," where features once added or removed
cannot be reconsidered.
Artificial Neural Networks (ANNs) can be effectively used in feature selection through a process
known as node pruning. Here's how it works:
Less Computational
More Computational Complex
Computational Complexity Complexity
In supervised learning it is
In unsupervised learning it is
not possible to learn larger
possible to learn larger and
and more complex models
more complex models than in
than in unsupervised
supervised learning
Model learning
In supervised learning
In unsupervised learning
training data is used to infer
training data is not used.
Training data model
Test of model We can test our model. We can not test our model.
Optical Character
Find a face in an image.
Example Recognition
Bayesian statistics is a statistical paradigm that uses probability to represent uncertainty in our
inferences. Unlike traditional (frequentist) statistics, which only uses data from the current study,
Bayesian statistics incorporates prior knowledge or beliefs along with current data to make
statistical inferences.
Key Concepts
1. Bayes' Theorem
o Formula: p(θ∣y)=p(y∣θ)p(θ)/p(y)
o Explanation: This theorem updates the probability estimate for a parameter (θ)
given new data (y). Here,
o p(θ∣y))is the posterior probability,
o p(y∣θ) the likelihood,
o p(θ) is the prior probability,
o and p(y) is the marginal likelihood or evidence.
2. Prior Distribution (p(θ)
o What it is: Your initial belief or knowledge about the parameter before seeing the
current data. For example, in medical research, this could be based on past studies
or expert knowledge.
3. Likelihood (p(y∣θ)
o What it is: This is derived from the data and the statistical model used.
4. Posterior Distribution (p(θ∣y)
o What it is: Your updated belief about the parameter after considering the new
data. It combines the prior distribution and the likelihood to give a complete
picture of what the parameter might be, considering both past and present
information.
Example Application
Computational Methods
Applications
1. Policy Decisions
o Example: The FDA might use Bayesian methods to incorporate various sources of
evidence when deciding on the approval of a new medical device.
2. Moderate-size Problems
o Example: Hierarchical models in meta-analysis, where results from multiple
studies are combined, accounting for variability between studies.
3. Large Complex Models
o Example: Bayesian methods in image processing or genetics, where traditional
methods fail due to the complexity and size of the data.
Advantages
Incorporates Prior Knowledge: Allows the inclusion of prior information, making the
analysis more robust and contextually relevant.
Updates with New Data: Flexible framework that can continuously update the parameter
estimates as new data becomes available.
Handles Complex Models: Capable of dealing with complex models and multiple
sources of variability through hierarchical modeling.
Challenges
Subjectivity in Priors: Choosing the prior distribution can be subjective and may
influence the results significantly.
Computational Intensity: Bayesian methods, especially with large datasets or complex
models, can be computationally demanding.
Sensitivity Analysis: Requires thorough sensitivity analysis to ensure that the results are
not unduly influenced by the choice of priors or the likelihood model.
KNN is one of the most basic yet essential classification algorithms in machine
learning. It belongs to the supervised learning domain and finds intense application in
pattern recognition,
data mining, and
intrusion detection.
This is nothing but the cartesian distance between the two points which are in the
plane/hyperplane. Euclidean distance can also be visualized as the length of the
straight line that joins the two points which are into consideration. This metric helps us
calculate the net displacement done between the two states of an object.
Manhattan Distance
Manhattan Distance metric is generally used when we are interested in the total
distance traveled by the object instead of the displacement. This metric is calculated
by summing the absolute difference between the coordinates of the points in n-
dimensions.
Minkowski Distance
We can say that the Euclidean, as well as the Manhattan distance, are special cases
of the Minkowski distance
For a given data point, compute the distance between this point and all other points in the
training dataset. Common distance metrics include:
o Euclidean distance:
o Manhattan distance: d(p,q)=∑i=1n∣pi−qi∣
o Minkowski distance: A generalized form combining Euclidean and Manhattan distances.
Sort all the distances and identify the k data points in the training set that are closest to the given
data point.
Step 4: Voting for Classification or Taking Average for Regression
Decision Tree
Decision trees are a popular and powerful tool used in various fields such
as machine learning, data mining, and statistics. They provide a clear and
intuitive way to make decisions based on data by modeling the relationships
between different variables.
1. Root Node: Represents the entire dataset and the initial decision to be
made.
2. Internal Nodes: Represent decisions or tests on attributes. Each internal
node has one or more branches.
3. Branches: Represent the outcome of a decision or test, leading to
another node.
4. Leaf Nodes: Represent the final decision or prediction. No further splits
occur at these nodes.
Pruning
To overcome overfitting, pruning techniques are used. Pruning reduces the size of
the tree by removing nodes that provide little power in classifying instances. There are
two main types of pruning:
Pre-pruning (Early Stopping): Stops the tree from growing once it meets certain
criteria (e.g., maximum depth, minimum number of samples per leaf).
Post-pruning: Removes branches from a fully grown tree that do not provide
significant power.
Applications of Decision Trees
Business Decision Making: Used in strategic planning and resource allocation.
Healthcare: Assists in diagnosing diseases and suggesting treatment plans.
Finance: Helps in credit scoring and risk assessment.
Marketing: Used to segment customers and predict customer behavior.
INTRODUCTION:
Support Vector Machines (SVMs) are a type of supervised learning algorithm that can
be used for classification or regression tasks. The main idea behind SVMs is to find a
hyperplane that maximally separates the different classes in the training data. This is
done by finding the hyperplane that has the largest margin, which is defined as the
distance between the hyperplane and the closest data points from each class. Once
the hyperplane is determined, new data can be classified by determining on which
side of the hyperplane it falls. SVMs are particularly useful when the data has many
features, and/or when there is a clear margin of separation in the data.
What are Support Vector Machines? Support Vector Machine (SVM) is a relatively
simple Supervised Machine Learning Algorithm used for classification and/or
regression. It is more preferred for classification but is sometimes very useful for
regression as well. Basically, SVM finds a hyper-plane that creates a boundary
between the types of data. In 2-dimensional space, this hyper-plane is nothing but a
line. In SVM, we plot each data item in the dataset in an N-dimensional space, where
N is the number of features/attributes in the data. Next, find the optimal hyperplane to
separate the data. So by this, you must have understood that inherently, SVM can
only perform binary classification (i.e., choose between two classes). However, there
are various techniques to use for multi-class problems. Support Vector Machine for
Multi-CLass Problems To perform SVM on multi-class problems, we can create a
binary classifier for each class of the data. The two results of each classifier will be :
The data point belongs to that class OR
The data point does not belong to that class.
For example, in a class of fruits, to perform multi-class classification, we can create a
binary classifier for each fruit. For say, the ‘mango’ class, there will be a binary
classifier to predict if it IS a mango OR it is NOT a mango. The classifier with the
highest score is chosen as the output of the SVM. SVM for complex (Non Linearly
Separable) SVM works very well without any modifications for linearly separable
data. Linearly Separable Data is any data that can be plotted in a graph and can be
separated into classes using a straight line.
Applications
Face detection
Classification of images
Bioinformatics
Handwriting recognition
Advantages
vectors
It is very effective when the number of dimensions is greater than the number of
observations
Disadvantages
Unsupervised Learning
Unsupervised learning is where you have unlabeled data (or no target variable) in the
dataset.
The goal of Unsupervised Learning Algorithms is to find some structure in the dataset.
These are called unsupervised learning algorithms because unlike supervised learning, there
are no correct answers and there is no teacher. Algorithms are left to their own to discover and
present the interesting structure in the data.
Unsupervised Learning = Learning without labels
Why clustering?
Clustering is a popular unsupervised machine learning technique that groups similar data
points together based on their characteristics. By organizing data into clusters, we can uncover
hidden insights and make predictions about future data points.
Clustering - Divide and Rule
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects
in the same group (called a cluster) are more similar (in some sense) to each other than to those
in other groups (clusters).
Clustering
Clustering can be considered the most important unsupervised learning problem. It deals with
finding a structure in a collection of unlabelled data.
A loose definition of clustering could be “the process of organizing objects into groups
whose members are similar in some way”. A cluster is therefore a collection of objects
which are “similar” between them and are “dissimilar” to the objects belonging to other
clusters.
Types of Hierarchical Clustering
Steps:
Consider each alphabet as a single cluster and calculate the distance of
one cluster from all the other clusters.
In the second step, comparable clusters are merged together to form a
single cluster. Let’s say cluster (B) and cluster (C) are very similar to each
other therefore we merge them in the second step similarly to cluster (D)
and (E) and at last, we get the clusters [(A), (BC), (DE), (F)]
We recalculate the proximity according to the algorithm and merge the
two nearest clusters([(DE), (F)]) together to form new clusters as [(A),
(BC), (DEF)]
Repeating the same process; The clusters DEF and BC are comparable
and merged together to form a new cluster. We’re now left with clusters
[(A), (BCDEF)].
At last, the two remaining clusters are merged together to form a single
cluster [(ABCDEF)].
Output:
A dataset of K clusters
Method:
1. Randomly assign K objects from the dataset(D) as cluster centres(C)
2. (Re) Assign each object to which object is most similar based upon mean
values.
3. Update Cluster means, i.e., Recalculate the mean of each cluster with the
updated values.
4. Repeat Step 2 until no change occurs.