0% found this document useful (0 votes)
18 views47 pages

Computer Vision

Computer vision faces challenges such as the loss of depth information in 2D images, the need for complex interpretation and mapping of images to models, and the handling of noise and large data volumes. The motivation for studying computer vision includes replicating human-like perception, enhancing automation and safety, and improving medical applications. The document also discusses the hierarchy of image representation levels and definitions of key terms related to edges, regions, and noise in image processing.

Uploaded by

Misba firdose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views47 pages

Computer Vision

Computer vision faces challenges such as the loss of depth information in 2D images, the need for complex interpretation and mapping of images to models, and the handling of noise and large data volumes. The motivation for studying computer vision includes replicating human-like perception, enhancing automation and safety, and improving medical applications. The document also discusses the hierarchy of image representation levels and definitions of key terms related to edges, regions, and noise in image processing.

Uploaded by

Misba firdose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Unit 1

Computer vision is challenging for several reasons:

1. Loss of Information in 3D to 2D: Cameras and eyes capture images in 2D, losing depth
information from the real world. This makes it hard to determine the size and distance of
objects accurately. For instance, a small object close to the camera might look similar in
size to a larger object far away.
2. Interpretation of Images: Humans naturally interpret images based on past experiences
and knowledge. However, teaching computers to understand images requires encoding
vast amounts of information and reasoning abilities, which is still limited in practice.
3. Mapping from Image Data to Models: Interpreting images involves mapping them to
logical models representing real-world concepts. This requires understanding syntax
(rules) and semantics (meanings) similar to language processing.
4. Dealing with Noise and Uncertainty: Real-world measurements, including images, are
prone to noise and uncertainty. Computer vision algorithms must account for this
uncertainty, often using complex mathematical tools like probability theory.
5. Handling Large Amounts of Data: Images and videos contain vast amounts of data,
which can strain processing and memory resources. While technology has improved,
efficient processing remains crucial, especially for real-time applications.
6. Complex Image Formation Physics: The brightness of objects in images depends on
various factors like light sources, surface geometry, and reflectance properties.
Reconstructing these properties accurately from images is challenging due to the ill-
posed nature of the inverse tasks involved.
7. Balancing Local and Global Perspectives: Most image analysis algorithms focus on
local regions, limiting their ability to understand the broader context of a scene.
Incorporating global context is crucial for accurate interpretation, posing a longstanding
challenge in computer vision.

The motivation behind the study of computer vision stems from the desire to enable machines to
perceive and understand the visual world in a manner similar to humans. Here are some key
motivations:

1. Human-Like Perception: Vision is one of the primary senses through which humans
interact with the world. By giving computers the ability to "see," we aim to replicate the
human ability to understand and interpret visual information.
2. Automation and Efficiency: Many tasks that rely on visual perception, such as
inspection, surveillance, and monitoring, are currently performed by humans. Automating
these tasks through computer vision can increase efficiency, reduce costs, and improve
accuracy.
3. Safety and Security: Computer vision systems can enhance safety and security by
detecting anomalies, identifying objects or individuals, and monitoring environments in
real-time. Applications include detecting intruders in secure areas, monitoring traffic for
safety violations, and identifying potential hazards in industrial settings.
4. Medical Applications: Computer vision plays a crucial role in medical imaging, aiding
in the diagnosis and treatment of various diseases and conditions. From detecting tumors
in medical scans to tracking the movement of surgical instruments, computer vision
technology has the potential to revolutionize healthcare.
5. Improved User Interfaces: Advancements in computer vision technology can lead to
more intuitive and immersive user interfaces. For example, gesture recognition systems
can allow users to interact with devices using hand movements, while augmented reality
applications can overlay digital information onto the physical world.
6. Scientific Exploration: Computer vision is essential for scientific exploration, enabling
researchers to analyze and interpret data from diverse sources such as telescopes,
microscopes, and satellites. It allows for the extraction of valuable insights from large
volumes of visual data, contributing to advancements in fields such as astronomy,
biology, and environmental science.
7. Enhanced Accessibility: By enabling computers to understand visual information,
computer vision technology can improve accessibility for individuals with disabilities.
For example, it can facilitate text recognition for the visually impaired or enable gesture-
based interfaces for individuals with mobility impairments.

Image representation in computer vision can be categorized into different levels based on the
abstraction of the data. Here are four possible levels of image representation suitable for image
analysis problems to detect and classify objects:

Let's break down the process of going from objects or scenes to images and then to features, and
finally back to objects, using the specified levels of representation:

1. From Objects or a Scene to 2D Image:


o At the initial stage, we have real-world objects or scenes that we want to capture
and analyze.
o These objects or scenes are typically three-dimensional and exist in the physical
world.
o Using imaging devices such as cameras, these objects or scenes are captured and
projected onto a 2D plane to create a 2D image.
o The resulting 2D image represents a flattened version of the original objects or
scenes, with spatial information preserved but depth information lost.
2. From 2D Image to Digital Image:
o The 2D image captured by the imaging device is then converted into a digital
format.
o This conversion involves digitizing the image, where each pixel in the 2D image
is represented by numerical values corresponding to its brightness or color.
o The digital image consists of a grid of pixels, with each pixel containing digital
data representing its color or intensity.
3. From Digital Image to Features:
o Once we have the digital image, we can extract various features from it through
image processing techniques.
o These features can be low-level features such as edges, corners, and textures,
which capture basic visual patterns in the image.
o Mid-level features such as regions, shapes, and gradients can also be extracted,
providing more structured information about the image content.
o Additionally, scale-invariant interest points may be identified, representing
distinctive locations in the image that are robust to changes in scale.
4. From Features to Objects:
o Finally, using the extracted features, we can analyze and interpret the image to
identify and understand the objects or scenes depicted.
o Features such as edges, regions, and textures can be used to detect and classify
objects present in the image.
o By matching features to known object representations or using machine learning
algorithms, we can recognize and label objects in the image.
o The process of going from features to objects involves high-level image
understanding and interpretation, where the extracted features are used to infer the
presence and properties of objects in the scene.

The hierarchy of image representation levels encompasses both low-level and high-level image
processing techniques, each serving different purposes in the analysis and understanding of
images. Here's a discussion of the hierarchy with low and high-level image processing:
1. Low-Level Image Processing:
o Data Input: Begins with raw image data captured by sensors like cameras,
represented as matrices of pixel values.

Ex. The process begins with a raw image of a person's face captured by a camera.

o Image Pre-processing: Includes tasks like noise suppression and enhancement of


relevant object features. Techniques may involve filtering, smoothing, and
contrast adjustment.

o Ex. Noise in the image, such as camera sensor noise or lighting variations, is
suppressed to enhance the quality of the image.

o Edge Extraction: Identifies abrupt changes in pixel intensity, often indicating


object boundaries or features.

Ex. : Edges of facial features such as eyes, nose, and mouth are detected using
edge detection algorithms. This helps in identifying the boundaries of facial
components.

o Image Segmentation: Separates objects from the background and from each
other, possibly distinguishing total and partial segmentation. Partial segmentation
focuses on extracting cues useful for higher-level processing.

Eg. : The image is segmented to separate the face from the background. This
helps in isolating the facial region for further analysis.

o Feature Extraction: Identifies low-level features such as corners, edges, textures,


or keypoints, which are essential for subsequent analysis.

Eg. : Low-level features like corners, edges, and textures are extracted from the
facial region. For example, the eyes may be represented by keypoints indicating
their position and orientation.

o Object Description and Classification: Describes objects based on their features


and classifies them into predefined categories. Techniques may include template
matching or statistical classifiers.
2. High-Level Image Processing (Image Analysis):
o Data Abstraction: Involves extracting relevant information from the image data,
reducing data quantity considerably.

o Eg. Relevant information such as facial features, shape, and proportions are
extracted from the low-level features, reducing the data to a symbolic
representation of the face.
o Knowledge Incorporation: Utilizes domain-specific or general knowledge about
the image content, including object size, shape, and spatial relationships.

Eg. : Domain-specific knowledge about facial structures and proportions is


utilized. For instance, the knowledge that eyes are typically located above the
nose and mouth can guide the analysis.

o Symbolic Representation: Represents high-level data in symbolic form,


facilitating reasoning and decision-making processes.

Eg. The facial features are represented symbolically, such as by a set of key facial
landmarks or descriptors representing key features like eye color or facial
symmetry.

o Object Recognition and Understanding: Involves identifying and interpreting


objects or scenes in the image based on their characteristics and relationships.
This may include tasks like object detection, tracking, and scene understanding.

Eg. : The symbolic representation is used to recognize the face and understand its
identity. This involves comparing the extracted features with a database of known
faces to determine the identity of the person.

o Feedback and Iterative Processing: Incorporates feedback from high-level


processing to guide low-level operations and iteratively refine the understanding
of the image content.

Eg.: Feedback from the recognition process may be used to refine the initial
feature extraction or segmentation steps. For example, if a face is misidentified,
adjustments may be made to the edge detection parameters to improve accuracy.
Unit 2

Here are the definitions of the terms mentioned in the text:

1. Edge:
o An edge in image processing refers to a significant change in intensity or color
between neighboring pixels in an image.
o It represents a local property of a pixel and its immediate neighborhood,
indicating the boundary or transition between objects or regions in an image.
o Edges are typically detected using gradient-based methods, where the magnitude
and direction of the gradient of the image function indicate how fast the image
intensity varies in the vicinity of a pixel.
2. Crack Edge:
o A crack edge is a structure between pixels in an image that is similar to cellular
complexes but is more pragmatic and less mathematically rigorous.
o Each pixel has four crack edges attached to it, defined by its relation to its 4-
neighbors.
o The direction of a crack edge is determined by the increasing brightness between
the relevant pair of pixels, and its magnitude is the absolute difference in
brightness.
3. Border:
o In image analysis, the border of a region refers to the set of pixels within the
region that have one or more neighbors outside the region.
o It represents the boundary of the region and is composed of pixels that transition
between the region and its surroundings.
o The inner border corresponds to pixels within the region, while the outer border
corresponds to pixels at the boundary of the background (complement) of the
region.
4. Convex Region:
o A convex region is a region in an image where any two points within it can be
connected by a straight line segment, and the entire line lies within the region.
o It is characterized by its convexity, meaning that it does not have any indentations
or concavities.
o Convex regions can be identified by their property of allowing straight-line
connections between any two points within them.
5. Non-Convex Region:
o A non-convex region is a region in an image that does not satisfy the criteria of
convexity.
o It may contain indentations or concavities, making it impossible to draw a straight
line between certain points within the region without crossing its boundary.’
6.  Contrast: Difference in luminance or color that helps distinguish an object from its
background.
7.  Acuity: Sharpness or clarity of vision, essential for seeing fine details.
8.  Perceptual Grouping: The visual system's method of organizing elements into
coherent groups based on principles like proximity, similarity, continuity, closure, and
common fate.

1. Edge:

 Definition: An edge is a significant change in intensity or color between neighboring


pixels in an image. It marks the boundary or transition between different objects or
regions.
 Local Property: The edge represents a local property of a pixel and its immediate
neighborhood, indicating where one region ends and another begins.
 Detection: Edges are typically detected using gradient-based methods. The gradient of
the image function (which measures how the intensity changes) is computed. The
magnitude of the gradient indicates how sharp the change is, and the direction shows the
orientation of the edge.

2. Crack Edge:

 Definition: A crack edge is a structure that connects pixels in a way that is similar to
cellular complexes but is more practical and less mathematically rigorous.
 Relation to Pixels: Each pixel has four crack edges, each connecting it to one of its four
neighbors (up, down, left, right).
 Direction and Magnitude: The direction of a crack edge is determined by the increase in
brightness between the connected pixels, and the magnitude is the absolute difference in
brightness. This helps in identifying the edge based on intensity changes.

3. Border:

 Definition: The border of a region in an image consists of the pixels that have one or
more neighboring pixels outside the region.
 Boundary Representation: It represents the boundary of the region, marking the
transition between the region and its surroundings.
 Inner and Outer Borders: The inner border includes pixels within the region that touch
the boundary, while the outer border consists of pixels at the boundary of the region’s
complement (background).

4. Convex Region:

 Definition: A convex region is a region where any two points within it can be connected
by a straight line, and the entire line segment lies within the region.
 Characteristic: It has no indentations or concavities, meaning it is "bulged out" without
any inward curves.
 Identification: Convex regions are identified by their property of allowing straight-line
connections between any two points within them, ensuring the line stays inside the
region.
5. Non-Convex Region:

 Definition: A non-convex region is a region that does not meet the criteria of convexity.
 Indentations/Concavities: It may have indentations or concavities, making it impossible
to connect certain points within the region with a straight line without crossing the
boundary.
 Complex Shape: Non-convex regions have more complex shapes due to these inward
curves and indentations.

Here are the definitions of the terms related to image noise:

 Additive Noise: This type of noise is added to the original signal as an additional component.
It alters the original signal by adding random fluctuations or disturbances. Additive noise affects
all parts of the signal equally and independently of the signal's magnitude. An example of
additive noise is Gaussian noise, where the noise values are sampled from a Gaussian
distribution.

Ex. Grainy image.

 Multiplicative Noise: Multiplicative noise alters the original signal by scaling it with a
random factor. Unlike additive noise, multiplicative noise affects different parts of the signal
differently, depending on the signal's magnitude. An example of multiplicative noise is television
raster degradation, where the noise level varies across the image depending on the TV lines.

 Gaussian Noise: Gaussian noise is a type of additive noise characterized by values sampled
from a Gaussian (normal) distribution. It is often used to model random fluctuations or errors in
signals and measurements.

 Impulsive Noise: Impulsive noise, also known as spike noise, introduces sudden and extreme
changes in signal values. It corrupts the signal with individual noisy pixels whose brightness
significantly differs from their neighborhood. Salt-and-pepper noise, where individual pixels are
corrupted with either white or black values, is a common example of impulsive noise.

 Salt-and-Pepper Noise: Salt-and-pepper noise is a specific type of impulsive noise where


random pixels in an image are corrupted with either the maximum or minimum intensity values,
resembling salt and pepper sprinkled on the image. This type of noise can particularly affect
binary images, introducing false contours and artifacts.

1. Additive Noise:
 Definition: Additive noise is noise that is added to the original signal as an additional
component. It introduces random fluctuations or disturbances to the signal, affecting all
parts of the signal equally and independently of the signal's magnitude.
 Example: An image of a clear sky with Gaussian noise added. The noise values are
sampled from a Gaussian distribution, resulting in a speckled appearance across the entire
image. Each pixel in the image has a small, random value added to it, which can make the
image look grainy.

In the image above, the original image of "Lenna" is altered by adding Gaussian noise,
resulting in a grainy effect throughout the image.

2. Multiplicative Noise:

 Definition: Multiplicative noise scales the original signal with a random factor. Unlike
additive noise, it affects different parts of the signal differently, depending on the signal's
magnitude.
 Example: An image taken with a camera sensor that has varying sensitivity across the
sensor's surface, resulting in a pattern where the noise level changes depending on the
brightness of different parts of the image.

The image above shows how multiplicative noise might look, with the noise level varying
across the image.

3. Gaussian Noise:

 Definition: Gaussian noise is a type of additive noise where the noise values are sampled
from a Gaussian (normal) distribution. It is often used to model random fluctuations or
errors in signals and measurements.
 Example: A photograph of a dark scene with Gaussian noise added. The noise introduces
random variations in pixel values, which can appear as small white or black dots
throughout the image.

The image of a boat has Gaussian noise added, creating a speckled effect across the
image.

4. Impulsive Noise:
 Definition: Impulsive noise, also known as spike noise, introduces sudden and extreme
changes in signal values. It corrupts the signal with individual noisy pixels whose
brightness significantly differs from their neighborhood.
 Example: A scanned document with impulsive noise. Certain pixels in the image are
randomly replaced with either very bright or very dark values, making it look like there
are white or black specks scattered across the document.

The image above shows a text document corrupted with impulsive noise, resulting in
random bright and dark specks.

5. Salt-and-Pepper Noise:

 Definition: Salt-and-pepper noise is a specific type of impulsive noise where random


pixels in an image are corrupted with either the maximum or minimum intensity values,
resembling salt and pepper sprinkled on the image.
 Example: A digital image with salt-and-pepper noise added. Random pixels in the image
are set to either white (salt) or black (pepper), creating a distinctive "salt and pepper"
appearance.

1. Euclidean Distance:

 Definition: Euclidean distance is a measure of proximity between two points in a


multidimensional space.
 Calculation: It is calculated as the square root of the sum of squared differences between
corresponding coordinates.
 Use Cases: Widely used in fields such as machine learning, pattern recognition, and data
mining.
 Example: Calculating the distance between two points in a 2D or 3D space.

2. Manhattan Distance (City Block Distance):

 Definition: Manhattan distance calculates the distance between two points by summing
the absolute differences of their coordinates.
 Calculation: It is particularly useful in grid-based environments where movement is
restricted to horizontal and vertical paths.
 Use Cases: Commonly used in routing algorithms, computer vision, and robotics.
 Example: Determining the distance between two locations on a city grid, where
movement is constrained to streets.

3. Hamming Distance:
 Definition: Hamming distance measures the number of positions at which corresponding
elements of two equal-length strings are different.
 Calculation: It is often used for comparing strings of equal length, particularly in
genetics and error correction.
 Use Cases: Widely applied in DNA sequence analysis, data transmission, and error
detection.
 Example: Comparing binary strings to identify errors in transmission or mutations in
genetic sequences.

Color constancy refers to the ability of the human visual system (and ideally, artificial perception
systems) to perceive the colors of objects as relatively stable and consistent under varying
lighting conditions. This means that even when an object is viewed under different illuminations,
such as sunlight or artificial indoor lighting, humans tend to perceive the colors of the object as
remaining the same.

The concept of color constancy is essential for image processing and analysis for several reasons:

1. Perceptual Consistency: Color constancy ensures that objects in images are perceived
consistently across different lighting conditions, allowing viewers to accurately interpret
and understand the content of the image. Without color constancy, objects may appear to
change color dramatically depending on the lighting, leading to confusion or
misinterpretation.
2. Color Correction: In image processing, color constancy algorithms can be used to
correct for variations in lighting and white balance, ensuring that colors are accurately
represented in the final image. This is particularly important in applications such as
photography, where accurate color reproduction is crucial for conveying the intended
visual information.
3. Object Recognition and Segmentation: Color constancy aids in object recognition and
segmentation by enabling algorithms to focus on the inherent color characteristics of
objects rather than being influenced by changes in illumination. This allows for more
reliable and accurate identification of objects in images, which is important for tasks such
as automated surveillance, medical imaging, and autonomous navigation.
4. Human Perception Alignment: By mimicking the color constancy abilities of the
human visual system, artificial perception systems can produce images that are more
perceptually consistent with how humans perceive the world. This improves the usability
and interpretability of images generated by these systems, leading to better human-
computer interaction and visual communication.
Photosensitive cameras utilize photosensitive sensors to capture images by converting incoming
light into electrical signals. These sensors can be broadly categorized into two types: those based
on photo-emission principles and those based on photovoltaic principles.

Photo-emission sensors exploit the photoelectric effect, where incoming photons cause the
emission of electrons in a material. This principle is most evident in metal-based sensors and has
been historically used in technologies like photomultipliers and vacuum tube TV cameras.

Photovoltaic sensors, on the other hand, utilize semiconductor materials. Incoming photons
elevate electrons from the valence band to the conduction band, generating an electric current
proportional to the intensity of the incident light. Various semiconductor devices such as
photodiodes, avalanche photodiodes, photoresistors, and Schottky photodiodes are based on this
principle.

Among semiconductor photoresistive sensors, two widely used types are CCDs (Charge-
Coupled Devices) and CMOS (Complementary Metal Oxide Semiconductor) sensors. CCDs,
developed in the 1960s and 1970s, initially dominated the market due to their maturity and
widespread usage. CMOS technology, while developed around the same time, became
technologically mastered in the 1990s.

Advantages and disadvantages of CMOS sensors:

Advantages:

1. Lower Power Consumption: CMOS sensors typically consume less power compared to
CCDs, making them suitable for battery-operated devices like smartphones and portable
cameras.
2. Random Access: Each pixel in a CMOS sensor has its own charge-to-voltage conversion
circuitry, allowing for individual pixel readout and manipulation. This enables faster
readout speeds and on-chip processing capabilities.
3. Integration with Other Technologies: CMOS sensors can be integrated with other
components on the same chip, such as processors and memory, leading to compact and
cost-effective "smart" camera solutions.
4. High Sensitivity: CMOS sensors exhibit high sensitivity, making them capable of
capturing images in low-light conditions.

Disadvantages:

1. Higher Noise Levels: CMOS sensors generally have higher levels of noise compared to
CCDs, which can affect image quality, especially in low-light environments.
2. Limited Dynamic Range: While CMOS sensors offer a wider dynamic range compared
to CCDs, they may still have limitations in capturing scenes with extremely high contrast.
3. Complex Design: The integration of additional functionalities on the same chip can
increase design complexity and potentially reduce the area available for light capture,
impacting image quality.
Components and Working of Analog and Digital Cameras

Analog Cameras:

Analog cameras use photosensitive sensors, such as CCDs (Charge-Coupled Devices), to convert
incoming light into electrical signals. Analog cameras are traditional cameras that produce a
continuous video signal. Here’s a breakdown of their components and working:

1. CCD (Charge-Coupled Device):


o This is the image sensor that converts light into an electronic signal. It captures the
intensity of light at each pixel and generates a continuous analog signal.

2. Optics (Lens):
o The lens focuses the light onto the CCD sensor, forming an image.

3. Q/U Conversion:
o Converts the luminance and chrominance information into a digital format, preparing
the data for further processing.

4. AGC (Automatic Gain Control):


o This block automatically adjusts the gain of the camera according to the light intensity in
the scene. It balances the sensitivity needed in low light areas with the need to avoid
saturation in bright areas.

5. γ (Gamma) Correction: This block adjusts the brightness to correct the display on older
TV screens (CRTs), making sure the image looks right by balancing out any distortions.

6. Video Circuits:
o These circuits add synchronization pulses to the signal. These pulses are necessary for
the proper display of the video signal, ensuring that the image is displayed row by row.

7. High-Pass Filter:
o This filter compensates for the loss of high frequencies in the optical part, improving the
clarity and sharpness of the image.

8. Analog Filters:
o These filters process the analog signal to remove noise and improve signal quality.

9. Digitizer (Frame Grabber):


o This device converts the analog video signal into a digital format that can be processed
by a computer. It samples the signal and digitizes it.

10. Analog Cable:


o The coaxial cable transmits the analog video signal from the camera to the digitizer.

11. Computer Memory:


o Once digitized, the video data is stored in the computer's memory for further
processing.

Working:

 The lens focuses light onto the CCD sensor, generating an analog video signal.
 The AGC adjusts the gain based on the light intensity.
 The γ correction compensates for the nonlinear response of CRTs.
 Video circuits add synchronization pulses for proper display.
 High-pass and analog filters process the signal to improve quality.
 The analog signal is transmitted via a coaxial cable to the digitizer.
 The digitizer converts the analog signal into digital form and sends it to the computer for
processing.

 Advantages:

 Cost-effective
 Long cable lengths possible (up to 300 meters)
 Widely used in legacy systems

 Disadvantages:

 Susceptible to noise and interference during transmission


 Limited resolution and image quality compared to digital cameras
 Requires additional circuitry for synchronization and correction
Digital Cameras:

Digital cameras convert the captured light directly into digital data. Here’s a breakdown of their
components and working:

1. CCD (Charge-Coupled Device):


o This is the image sensor that converts light into an electronic signal, similar to analog
cameras.

2. Optics (Lens):
o The lens focuses light onto the CCD sensor to form an image.

3. AGC (Automatic Gain Control):


o Adjusts the gain based on the light intensity, similar to analog cameras.

4. γ (Gamma) Correction:
o Performs a nonlinear transformation of the intensity scale, if needed.

5. Analog-to-Digital (A/D) Converter:


o Converts the analog signal from the CCD into a digital format. This step digitizes the
intensity information captured by the CCD.

6. Q/U Conversion:
o Converts the luminance and chrominance information into a digital format, preparing
the data for further processing.

7. Digital Cable:
o Transmits the digital data to the computer. This can be done using USB, FireWire, or
other digital interfaces.

8. Computer Memory:
o Stores the digital video data for further processing.

Working:

 The lens focuses light onto the CCD sensor, generating an analog signal.
 The AGC adjusts the gain based on the light intensity.
 The γ correction adjusts the intensity scale if needed.
 The A/D converter digitizes the analog signal from the CCD.
 The Q/U conversion processes the luminance and chrominance information.
 The digital data is transmitted to the computer via a digital cable.
 The digital data is stored in the computer's memory for further processing.
1. Advantages:
o No transmission noise or signal degradation
o Higher resolution and image quality compared to analog cameras
o Easier integration with digital systems and devices
2. Disadvantages:
o Initially more expensive than analog cameras (although prices have dropped
significantly)
o Limited cable length for certain digital interfaces (e.g., FireWire)
o May require additional processing power for real-time image processing

Comparison:
Unit 3

The definitions, formulae, and theorems along with their respective notations:

1. Sample Space (S):


o Definition: The sample space, denoted by SSS, is the set of all possible outcomes
of a statistical experiment.
o Notation: S={outcome1,outcome2,…,outcomen}S = \{ \text{outcome}_1, \
text{outcome}_2, \ldots, \text{outcome}_n \}S={outcome1,outcome2,
…,outcomen}
2. Sample Point:
o Definition: Each outcome in a sample space is called a sample point, element, or
member of the sample space.
o Notation: Each sample point may be denoted individually, for example,
sample1,sample2,…\text{sample}_1, \text{sample}_2, \ldotssample1,sample2,….
3. Event:
o Definition: An event is a subset of a sample space, representing a collection of
outcomes.
o Notation: An event can be denoted by a capital letter, such as AAA or BBB.
4. Complement:
o Definition: The complement of an event AAA, denoted as A∁A^{\
complement}A∁ or A′A'A′, is the event that A does not occur.
o Notation: A∁A^{\complement}A∁ or A′A'A′
o Formula: P(A∁)=1−P(A)P(A^{\complement}) = 1 - P(A)P(A∁)=1−P(A)
5. Intersection:
o Definition: The intersection of two events AAA and BBB, denoted by A∩BA \
cap BA∩B, is the event containing all elements that are common to both AAA
and BBB.
o Notation: A∩BA \cap BA∩B
o Formula: P(A∩B)P(A \cap B)P(A∩B)
6. Mutually Exclusive:
o Definition: Events that have no outcomes in common are said to be disjoint or
mutually exclusive.

= \emptysetA∩B=∅ (where ∅\emptyset∅ denotes the empty set).


o Notation: Two events AAA and BBB are mutually exclusive if A∩B=∅A \cap B

7. Union:
o Definition: The union of two events AAA and BBB, denoted by A∪BA \cup
BA∪B, is the event containing all the elements that belong to either AAA or BBB
or both.
o Notation: A∪BA \cup BA∪B
o Formula: P(A∪B)=P(A)+P(B)−P(A∩B)P(A \cup B) = P(A) + P(B) - P(A \cap
B)P(A∪B)=P(A)+P(B)−P(A∩B)
8. Multiplication Rule:
o Definition: If an operation can be performed in n1n_1n1 ways, and if of each of
these ways a second operation can be performed in n2n_2n2 ways, then the two
operations can be performed together in n1×n2n_1 \times n_2n1×n2 ways.
o Formula: n1×n2n_1 \times n_2n1×n2
9. Permutation:
o Definition: A permutation is an arrangement of all or part of a set of objects. The
arrangements are different/distinct.
o Notation: nPrnPrnPr
o Formula: nPr=n!(n−r)!nPr = \frac{n!}{(n-r)!}nPr=(n−r)!n!
10. Factorial Rule:
o Definition: The number of permutations of nnn objects is denoted by n!n!n!.
o Notation: n!n!n!
o Formula: n!=n×(n−1)×(n−2)×…×2×1n! = n \times (n-1) \times (n-2) \times \ldots
\times 2 \times 1n!=n×(n−1)×(n−2)×…×2×1

Types of features
In various fields such as mathematics, statistics, computer science, and machine
learning, features refer to individual measurable properties or characteristics of
data points or objects. Features play a crucial role in representing and analyzing
data. Here are some types of features commonly encountered across different
domains:

1. Numerical Features:
 Numerical features represent quantitative measurements or values. They can
be continuous (e.g., height, weight, temperature) or discrete (e.g., counts,
ratings).
2. Categorical Features:
 Categorical features represent qualitative attributes with distinct categories
or labels. They can be nominal (categories with no inherent order) or ordinal
(categories with a specific order).
3. Binary Features:
 Binary features are a special case of categorical features with only two
possible values, often represented as 0 and 1 (e.g., yes/no, true/false,
presence/absence).
4. Text Features:
 Text features represent textual data, such as documents, sentences, or words.
Text features may require preprocessing techniques like tokenization,
stemming, or vectorization for analysis.
5. Temporal Features:
 Temporal features represent time-related information, such as timestamps,
dates, or durations. They can be used to analyze trends, seasonality, or time-
based patterns in data.
6. Spatial Features:
 Spatial features represent geographic or spatial attributes, such as
coordinates, regions, or distances. They are commonly used in geographic
information systems (GIS) and location-based applications.
7. Image Features:
 Image features represent visual characteristics extracted from images, such
as colors, textures, shapes, or edges. Image features are essential for tasks
like object detection, recognition, and classification.
8. Audio Features:
 Audio features represent characteristics of sound signals, such as frequency,
amplitude, pitch, or duration. They are used in speech recognition, music
analysis, and audio processing.
9. Derived Features:
 Derived features are created by transforming or combining existing features.
Examples include polynomial features, logarithmic transformations, or
feature interactions.
10.Meta-Features:
 Meta-features describe properties or characteristics of other features. They
can include statistics (e.g., mean, standard deviation), properties of
distributions (e.g., skewness, kurtosis), or information gain measures.
Understanding the types of features present in a dataset is crucial for data
preprocessing, feature engineering, and selecting appropriate algorithms for
analysis or modeling tasks. Each type of feature may require different
preprocessing techniques and algorithms for effective analysis and interpretation.

Nominal data,ordinal data,interval-valued variables,ratio variable


In statistics and data analysis, various types of data are classified based on their
properties and level of measurement. Here are the definitions and characteristics of
four common types of data:

1. Nominal Data:
 Nominal data consist of categories or labels that represent distinct groups or
classes. These categories have no inherent order or ranking.
 Examples: Gender (male, female), marital status (single, married, divorced),
types of fruits (apple, banana, orange).
 Nominal data can be represented using numbers, but the numbers are
arbitrary and do not have any quantitative significance.
2. Ordinal Data:
 Ordinal data represent categories with a specific order or ranking. The
intervals between categories may not be equal or well-defined.
 Examples: Educational levels (elementary, high school, college), Likert scale
responses (strongly disagree, disagree, neutral, agree, strongly agree),
ranking of movie ratings (1 star, 2 stars, 3 stars, etc.).
 While ordinal data have an inherent order, the differences between
categories may not be consistent or meaningful.
3. Interval-Valued Variables:
 Interval-valued variables represent numerical data where the differences
between values are meaningful and consistent, but there is no true zero
point.
 Examples: Temperature measured in Celsius or Fahrenheit, years (AD),
longitude and latitude coordinates.
 Interval-valued variables can be added, subtracted, and compared, but ratios
between values are not meaningful because there is no true zero point.
4. Ratio Variables:
 Ratio variables represent numerical data where there is a true zero point, and
ratios between values are meaningful and interpretable.
 Examples: Height, weight, age, time (in seconds), income, number of
children.
 Ratio variables allow for all mathematical operations (addition, subtraction,
multiplication, division), and ratios between values convey meaningful
information.

In summary, nominal data consist of categories without an inherent order, ordinal


data have an order but unequal intervals, interval-valued variables have equal
intervals but no true zero, and ratio variables have equal intervals and a true zero
point. Understanding the type of data being analyzed is crucial for selecting
appropriate statistical methods and interpreting the results accurately.
proximity measures
Proximity measures, also known as similarity or dissimilarity measures, are used to
quantify the similarity or dissimilarity between pairs of objects in a dataset. These
measures play a crucial role in various fields such as machine learning, data
mining, pattern recognition, and clustering. Different proximity measures are
suitable for different types of data and analysis tasks. Here are some common
proximity measures:

1. Euclidean Distance:
 Euclidean distance is one of the most common measures of proximity
between two points in a multidimensional space.
 It is calculated as the square root of the sum of squared differences between
corresponding coordinates.
2. Manhattan Distance (City Block Distance):
 Manhattan distance calculates the distance between two points by summing
the absolute differences of their coordinates.
 It is particularly useful in grid-based environments where movement is
restricted to horizontal and vertical paths
3. Hamming Distance:
 Hamming distance measures the number of positions at which
corresponding elements of two equal-length strings are different.
 It is often used for comparing strings of equal length, particularly in genetics
and error correction.
 Formula: Hamming distance=Number of positions with different symbols

similarity and dissimilarity


Similarity and dissimilarity are two concepts used to quantify the relationship
between objects or data points in various fields such as statistics, machine learning,
data mining, and pattern recognition. They are often used in tasks such as
clustering, classification, and retrieval. Here's an overview of each concept:
1. Similarity:
 Similarity measures the degree of resemblance or proximity between two
objects or data points. It quantifies how alike or similar two entities are.
 High similarity values indicate that two objects are very similar or alike,
while low similarity values indicate that they are dissimilar or different.
 Similarity measures are typically symmetric, meaning that the similarity
between object A and object B is the same as the similarity between object B
and object A.
 Common similarity measures include Euclidean distance, cosine similarity,
Jaccard similarity, and correlation coefficient.
2. Dissimilarity:
 Dissimilarity measures the degree of difference or divergence between two
objects or data points. It quantifies how dissimilar or distinct two entities
are.
 High dissimilarity values indicate that two objects are very dissimilar or
different, while low dissimilarity values indicate that they are similar or
alike.
 Dissimilarity measures need not be symmetric; the dissimilarity between
object A and object B may differ from the dissimilarity between object B and
object A.
 Common dissimilarity measures include Euclidean distance, Manhattan
distance, Hamming distance, and Mahalanobis distance.

In summary, similarity measures quantify how much two objects are alike or
similar, while dissimilarity measures quantify how much they are different or
distinct. These measures play a crucial role in various data analysis tasks, including
clustering (grouping similar objects together), classification (assigning objects to
predefined categories), and retrieval (finding objects similar to a given query). The
choice of similarity or dissimilarity measure depends on the specific application,
the type of data being analyzed, and the desired properties of the analysis.
Unit 4

Pattern recognition is a branch of machine learning and artificial intelligence that focuses on the
identification and classification of patterns and regularities in data. It involves the use of
algorithms and models to recognize patterns, structures, or regularities within a set of data, often
for the purpose of making predictions, decisions, or classifications.

Here are key aspects of pattern recognition:

1. Definition: Pattern recognition is the process of detecting patterns and regularities in


data. It is used to classify input data into predefined categories or to identify patterns that
emerge naturally within the data.
Application:

Biometrics

1. Fingerprint Recognition: Identifying individuals based on unique patterns in their


fingerprints.
2. Face Recognition: Recognizing individuals by analyzing facial features and patterns.
3. Iris Recognition: Authenticating individuals using patterns in the iris of the eye.
4. Voice Recognition: Identifying individuals based on voice patterns and characteristics.
5. Handwriting Recognition: Converting handwritten text into digital format by analyzing
handwriting patterns.

Bioinformatics

1. DNA Sequence Analysis: Identifying patterns and structures in DNA sequences to


understand genetic information.
2. Protein Structure Prediction: Predicting the three-dimensional structure of proteins by
analyzing sequence patterns.
3. Gene Expression Analysis: Analyzing gene expression patterns to understand gene
functions and regulatory mechanisms.
4. Drug Discovery: Identifying potential drug candidates by analyzing patterns in
molecular structures and interactions.

Multimedia Data Analytics

1. Image Classification: Classifying images into different categories based on visual


patterns and features.
2. Video Content Analysis: Analyzing videos to detect and recognize objects, actions, and
events.
3. Audio Classification: Categorizing audio signals based on patterns in sound waves and
spectrograms.
4. Music Genre Recognition: Automatically categorizing music tracks into different genres
based on audio patterns.

Document Recognition

1. Handwritten Text Recognition: Converting handwritten documents into digital text by


analyzing writing patterns.
2. Optical Character Recognition (OCR): Recognizing printed text in documents and
images by identifying character patterns.
3. Document Classification: Automatically categorizing documents based on content
patterns and features.
4. Signature Verification: Authenticating signatures by analyzing patterns and
characteristics in handwritten signatures.

Fault Diagnostics

1. Predictive Maintenance: Detecting patterns in sensor data to predict equipment failures


and perform maintenance proactively.
2. Anomaly Detection: Identifying abnormal patterns or behaviors in systems or processes
to detect faults or malfunctions.
3. Failure Mode Analysis: Analyzing historical data to identify common failure patterns
and improve system reliability.
4. Root Cause Analysis: Investigating patterns in data to determine the underlying causes
of faults or failures.

Expert Systems

1. Medical Diagnosis: Assisting healthcare professionals in diagnosing diseases by


analyzing patient data and symptoms.
2. Financial Fraud Detection: Detecting fraudulent activities in financial transactions by
analyzing transaction patterns.
3. Customer Support Systems: Providing personalized recommendations and assistance to
users based on their interaction patterns.
4. Quality Control Systems: Monitoring and controlling manufacturing processes by
analyzing patterns in production data to ensure product quality.

Paradigms:

1. Statistical Pattern Recognition

 Definition: Statistical pattern recognition involves using statistical methods to analyze


patterns and make decisions based on probability models.
 Approach: It focuses on modeling the statistical distributions of data classes and using
these models to classify or recognize patterns.
 Key Characteristics:
o Utilizes probability theory and statistical methods.
o Assumes that patterns are generated from underlying probabilistic processes.
o Deals effectively with noisy data and uncertainty.
o Suitable for real-world problems where data variability is high.
 Applications:
o Speech Recognition
o Handwriting Recognition
o Object Recognition in Images
o Biometric Identification
o Medical Diagnosis

1. Statistical Pattern Recognition

 Popularity: More popular, receiving most attention in literature.


 Reason for Popularity:
o Effective in handling noisy data and uncertainty.
o Utilizes statistics and probability which are suitable for dealing with such problems.
 Vector Spaces:
o Patterns and classes are represented in multi-dimensional vector spaces.
o Involves probability density/distributions of points in these spaces.
o Concepts of sub-spaces/projections and similarity (distance measures) are meaningful.
 Soft Computing Tools:
o Neural networks
o Fuzzy set
o Rough set-based pattern recognition schemes
 Nearest Neighbour Classifier:
o Simple and popular.
o Classifies a new pattern based on the class label of its nearest neighbour.
o No training phase required.
 Bayes Classifier:
o Focuses on optimality in terms of minimum error rate classification.
o Deals with theoretical aspects under uncertainty.
 Hidden Markov Models (HMMs):
o Popular in fields like speech recognition.
 Decision Trees:
o Transparent data structure.
o Handles classification with both numerical and categorical features.
 Neural Networks:
o Mimic human brain learning.
o Perceptron used for finding linear decision boundaries in high-dimensional spaces.
o Support Vector Machines (SVMs) are built on this concept.
 Combination of Classifiers:
o Using more than one classifier can be meaningful.
 Clustering for Classification:
o Large training data sets can be used directly for classification.
o Clustering generates abstractions of data for classification.
o Clusters form sub-classes represented by prototypical patterns.

2. Syntactic Pattern Recognition

 Definition: Syntactic pattern recognition involves using formal language theory and
grammatical rules to describe and recognize patterns.
 Approach: It focuses on defining syntactic rules or grammars that capture the structure
of patterns and using these rules for recognition.
 Key Characteristics:
o Based on formal language theory and symbolic manipulation.
o Defines rules or grammars to represent the syntax or structure of patterns.
o Requires precise descriptions of patterns using formal languages.
o Not well-suited for noisy environments or complex real-world data.
 Applications:
o Handwritten Text Recognition
o Document Analysis
o Speech Understanding (limited)
o DNA Sequence Analysis (limited)
o Compiler Design and Parsing

Comparison:

 Statistical Pattern Recognition is more popular and widely used due to its effectiveness
in handling real-world data with noise and uncertainty.
 Syntactic Pattern Recognition is less common and often limited to applications where
patterns have well-defined and structured syntax, such as in document analysis or
language processing tasks.

Feature Selection

Feature selection involves identifying and eliminating irrelevant or redundant features from a
dataset to improve classification accuracy and reduce computational cost. The process ensures
that only the most meaningful features are used, which simplifies the representation of patterns
and the complexity of classifiers, making them faster and more memory-efficient. It also helps
avoid the "curse of dimensionality" by improving classification accuracy, especially when the
number of training samples is limited.

Purpose

1. Reduction in Classification Cost: Simplifies both representation and classifier complexity,


resulting in faster and less memory-intensive classifiers.
2. Improvement of Classification Accuracy:
o Performance depends on the inter-relationship between sample sizes, number of
features, and classifier complexity.
o Beyond a certain point, adding more features can lead to higher error probabilities due
to finite training samples.
o Reducing dimensionality alleviates the "curse of dimensionality."

Methods of Feature Selection:

1. Exhaustive Search:
o Involves evaluating all possible feature subsets to find the best one.
o Impractical for large feature sets due to computational infeasibility.

2. Branch and Bound Search:


o Avoids exhaustive search by using intermediate results to obtain bounds on the final
criterion value.
o Assumes monotonicity, meaning the criterion value should not increase by removing
features.
o Algorithm:
1. Start at the root node.
2. Enumerate successors of the current node.
3. Select the successor with the maximum partial criterion.
4. Update the bound if the last level is reached.
5. Backtrack and explore unexplored nodes.
6. Terminate when all nodes are exhausted.
3. Selection of Best Individual Features:
o Evaluates and selects features independently based on their significance.
o Simple but may fail to account for feature dependencies.

4. Sequential Selection:
o Operates by either adding or removing features sequentially.
o Two types:

 Sequential Forward Selection (SFS): Starts with an empty set and adds features.
 Sequential Backward Selection (SBS): Starts with the full set and removes
features.

o Both methods suffer from the "nesting effect," where features once added or removed
cannot be reconsidered.

5. Sequential Floating Search:


o Overcomes the nesting effect by dynamically determining the number of forward and
backward steps.
o Two types:

 Sequential Floating Forward Search (SFFS):


1. Start with an empty feature subset.
2. Add the most significant feature.
3. Conditionally remove the least significant feature.
4. Repeat until the optimal subset is found.
 Sequential Floating Backward Search (SBFS):
1. Start with the full feature set.
2. Remove the least significant feature.
3. Conditionally add the most significant feature.
4. Repeat until the optimal subset is found.

6. Max-Min Approach to Feature Selection:


o Selects features based on the maximum of the minimum differences in criterion values.
o Computationally efficient but may result in information loss.

7. Stochastic Search Techniques:


o Genetic algorithms (GA) are commonly used.
o Population consists of binary strings representing feature subsets.
o Fitness of each chromosome (feature subset) is evaluated based on performance.

8. Artificial Neural Networks:


o Uses a multilayer feed-forward network with back-propagation.
o Prunes unnecessary nodes iteratively to reduce the network size without degrading
performance.
o Saliency of nodes determines their importance.

Artificial Neural Networks (ANNs) can be effectively used in feature selection through a process
known as node pruning. Here's how it works:

1. Initial Network Setup: Initially, a multilayer feed-forward neural network is


constructed, typically with more nodes and connections than necessary. Each input node
in the network corresponds to a feature in the dataset.
2. Training the Network: The network is trained using a standard training algorithm such
as backpropagation, where the weights of the connections between nodes are adjusted to
minimize the error between the network's output and the target output.
3. Node Pruning: Once the network is trained, the process of feature selection begins. This
involves iteratively removing nodes (or features) from the network and evaluating the
impact on the network's performance.
4. Evaluating Node Salience: The salience of each node (or feature) is determined based
on its contribution to the network's performance. This can be measured by calculating the
increase in error over all training patterns caused by the removal of a particular node.
5. Removing Nodes: The least salient nodes (features) are systematically removed from the
network. This process is repeated iteratively until a desired level of performance is
achieved or until further pruning negatively impacts the network's performance.
6. Network Optimization: As nodes are pruned from the network, the remaining
connections and weights are adjusted to maintain network performance. This may involve
retraining the network with the reduced set of features to fine-tune its performance.
Unit 5

main differences between supervised and unsupervised learning:

Supervised Learning Unsupervised Learning

Uses Known and Labeled


Uses Unknown Data as input
Input Data Data as input

Less Computational
More Computational Complex
Computational Complexity Complexity

Uses Real-Time Analysis of


Uses off-line analysis
Real-Time Data

The number of Classes is The number of Classes is not


Number of Classes known known

Accurate and Reliable Moderate Accurate and


Accuracy of Results Results Reliable Results

The desired, output is not


The desired output is given.
Output data given.

In supervised learning it is
In unsupervised learning it is
not possible to learn larger
possible to learn larger and
and more complex models
more complex models than in
than in unsupervised
supervised learning
Model learning

In supervised learning
In unsupervised learning
training data is used to infer
training data is not used.
Training data model

Supervised learning is also Unsupervised learning is also


Another name called classification. called clustering.
Supervised Learning Unsupervised Learning

Test of model We can test our model. We can not test our model.

Optical Character
Find a face in an image.
Example Recognition

Bayesian Statistics: Detailed Overview

Bayesian statistics is a statistical paradigm that uses probability to represent uncertainty in our
inferences. Unlike traditional (frequentist) statistics, which only uses data from the current study,
Bayesian statistics incorporates prior knowledge or beliefs along with current data to make
statistical inferences.

Key Concepts

1. Bayes' Theorem
o Formula: p(θ∣y)=p(y∣θ)p(θ)/p(y)
o Explanation: This theorem updates the probability estimate for a parameter (θ)
given new data (y). Here,
o p(θ∣y))is the posterior probability,
o p(y∣θ) the likelihood,
o p(θ) is the prior probability,
o and p(y) is the marginal likelihood or evidence.
2. Prior Distribution (p(θ)
o What it is: Your initial belief or knowledge about the parameter before seeing the
current data. For example, in medical research, this could be based on past studies
or expert knowledge.
3. Likelihood (p(y∣θ)
o What it is: This is derived from the data and the statistical model used.
4. Posterior Distribution (p(θ∣y)
o What it is: Your updated belief about the parameter after considering the new
data. It combines the prior distribution and the likelihood to give a complete
picture of what the parameter might be, considering both past and present
information.

Example Application

Estimating Infection Rates in a Hospital

 Scenario: A hospital wants to know the infection rate of MRSA (methicillin-resistant


Staphylococcus aureus).
 Data: Over the first six months of the year, 20 infections were recorded in 40,000 bed-
days.
 Prior Belief: Based on past data, the infection rate is believed to be around 10 infections
per 10,000 bed-days, with a range of 5 to 17.
 Calculation:
o Prior Distribution: Assume a Gamma distribution with parameters chosen to
reflect the prior belief.
o Likelihood: Based on a Poisson distribution, since infections are rare events.
o Posterior Distribution: Using Bayes' Theorem, combine the prior distribution
and the likelihood to update the belief about the infection rate.

Computational Methods

1. Monte Carlo Methods


o What they do: Use random sampling to approximate complex posterior
distributions. For example, generating many possible values of the parameter and
observing the distribution of these values.
2. Markov Chain Monte Carlo (MCMC)
o What it does: Generates dependent samples that represent the posterior
distribution. Techniques like Gibbs Sampling or the Metropolis-Hastings
algorithm are used to explore the parameter space efficiently.

Applications

1. Policy Decisions
o Example: The FDA might use Bayesian methods to incorporate various sources of
evidence when deciding on the approval of a new medical device.
2. Moderate-size Problems
o Example: Hierarchical models in meta-analysis, where results from multiple
studies are combined, accounting for variability between studies.
3. Large Complex Models
o Example: Bayesian methods in image processing or genetics, where traditional
methods fail due to the complexity and size of the data.

Advantages and Challenges

Advantages

 Incorporates Prior Knowledge: Allows the inclusion of prior information, making the
analysis more robust and contextually relevant.
 Updates with New Data: Flexible framework that can continuously update the parameter
estimates as new data becomes available.
 Handles Complex Models: Capable of dealing with complex models and multiple
sources of variability through hierarchical modeling.

Challenges
 Subjectivity in Priors: Choosing the prior distribution can be subjective and may
influence the results significantly.
 Computational Intensity: Bayesian methods, especially with large datasets or complex
models, can be computationally demanding.
 Sensitivity Analysis: Requires thorough sensitivity analysis to ensure that the results are
not unduly influenced by the choice of priors or the likelihood model.

K-Nearest Neighbor(KNN) Algorithm


The K-Nearest Neighbors (KNN) algorithm is a supervised machine learning
method employed to tackle classification and regression problems
Supervised learning and unsupervised learning are two main types of machine
learning.
 In supervised learning, the machine is trained on a set of labeled data, which
means that the input data is paired with the desired output.
 The machine then learns to predict the output for new input data.
 Supervised learning is often used for tasks such as classification, regression,
and object detection.
 In unsupervised learning, the machine is trained on a set of unlabeled data,
which means that the input data is not paired with the desired output.
 The machine then learns to find patterns and relationships in the data.
 Unsupervised learning is often used for tasks such as clustering, dimensionality
reduction, and anomaly detection.

KNN is one of the most basic yet essential classification algorithms in machine
learning. It belongs to the supervised learning domain and finds intense application in
 pattern recognition,
 data mining, and
 intrusion detection.

Why do we need a KNN algorithm?

 (K-NN) algorithm is a versatile and widely used machine learning algorithm


that is primarily used for its simplicity and ease of implementation.
 It does not require any assumptions about the underlying data distribution.
 It can also handle both numerical and categorical data, making it a flexible
choice for various types of datasets in classification and regression tasks.
 It is a non-parametric method that makes predictions based on the similarity of
data points in a given dataset. K-NN is less sensitive to outliers compared to
other algorithms.
 The K-NN algorithm works by finding the K nearest neighbors to a given data
point based on a distance metric, such as Euclidean distance.
 The class or value of the data point is then determined by the majority vote or
average of the K neighbors.
 This approach allows the algorithm to adapt to different patterns and make
predictions based on the local structure of the data.

Distance Metrics Used in KNN Algorithm


As we know that the KNN algorithm helps us identify the nearest points or the groups
for a query point. But to determine the closest groups or the nearest points for a query
point we need some metric. For this purpose, we use below distance metrics:
Euclidean Distance

This is nothing but the cartesian distance between the two points which are in the
plane/hyperplane. Euclidean distance can also be visualized as the length of the
straight line that joins the two points which are into consideration. This metric helps us
calculate the net displacement done between the two states of an object.

Manhattan Distance

Manhattan Distance metric is generally used when we are interested in the total
distance traveled by the object instead of the displacement. This metric is calculated
by summing the absolute difference between the coordinates of the points in n-
dimensions.

Minkowski Distance
We can say that the Euclidean, as well as the Manhattan distance, are special cases
of the Minkowski distance

How to choose the value of k for KNN Algorithm?


 The value of k is very crucial in the KNN algorithm to define the number of
neighbors in the algorithm.
 The value of k in the k-nearest neighbors (k-NN) algorithm should be chosen
based on the input data.
 If the input data has more outliers or noise, a higher value of k would be better.
 It is recommended to choose an odd value for k to avoid ties in classification.
 Cross-validation methods can help in selecting the best k value for the given
dataset.
Workings of KNN algorithm
Thе K-Nearest Neighbors (KNN) algorithm operates on the principle of similarity,
where it predicts the label or value of a new data point by considering the labels or
values of its K nearest neighbors in the training dataset.

KNN works is discussed below:


Step 1: Selecting the optimal value of K

 K represents the number of nearest neighbors that needs to be considered while


making prediction.
Step 2: Calculating distance

  For a given data point, compute the distance between this point and all other points in the
training dataset. Common distance metrics include:
o Euclidean distance:
o Manhattan distance: d(p,q)=∑i=1n∣pi−qi∣
o Minkowski distance: A generalized form combining Euclidean and Manhattan distances.

Step 3: Finding Nearest Neighbors

 Sort all the distances and identify the k data points in the training set that are closest to the given
data point.
Step 4: Voting for Classification or Taking Average for Regression

 In the classification problem, the class labels of are determined by performing


majority voting. The class with the most occurrences among the neighbors
becomes the predicted class for the target data point.
 In the regression problem, the class label is calculated by taking average of the
target values of K nearest neighbors. The calculated average value becomes the
predicted output for the target data point.
The algorithm selects the K data points from X that have the shortest distances to x.
For classification tasks, the algorithm assigns the label y that is most frequent among
the K nearest neighbors to x. For regression tasks, the algorithm calculates the
average or weighted average of the values y of the K nearest neighbors and assigns it
as the predicted value for x.
Advantages of the KNN Algorithm
 Easy to implement as the complexity of the algorithm is not that high.
 Adapts Easily – As per the working of the KNN algorithm it stores all the data in
memory storage and hence whenever a new example or data point is added then
the algorithm adjusts itself as per that new example and has its contribution to the
future predictions as well.
 Few Hyperparameters – The only parameters which are required in the training of
a KNN algorithm are the value of k and the choice of the distance metric which we
would like to choose from our evaluation metric.
Disadvantages of the KNN Algorithm
 Does not scale – As we have heard about this that the KNN algorithm is also
considered a Lazy Algorithm. The main significance of this term is that this takes
lots of computing power as well as data storage. This makes this algorithm both
time-consuming and resource exhausting.
 Curse of Dimensionality – There is a term known as the peaking phenomenon
according to this the KNN algorithm is affected by the curse of
dimensionality which implies the algorithm faces a hard time classifying the data
points properly when the dimensionality is too high.
 Prone to Overfitting – As the algorithm is affected due to the curse of
dimensionality it is prone to the problem of overfitting as well. Hence
generally feature selection as well as dimensionality reduction techniques are
applied to deal with this problem.

Applications of the KNN Algorithm


 Data Preprocessing – While dealing with any Machine Learning problem we first
perform the EDA part in which if we find that the data contains missing values then
there are multiple imputation methods are available as well. One of such method
is KNN Imputer which is quite effective ad generally used for sophisticated
imputation methodologies.
 Pattern Recognition – KNN algorithms work very well if you have trained a KNN
algorithm using the MNIST dataset and then performed the evaluation process
then you must have come across the fact that the accuracy is too high.

 Recommendation Engines – The main task which is performed by a KNN


algorithm is to assign a new query point to a pre-existed group that has been
created using a huge corpus of datasets. This is exactly what is required in
the recommender systems to assign each user to a particular group and then
provide them recommendations based on that group’s preferences.
  Classification: Handwriting recognition, image classification, bioinformatics.
  Regression: Predicting house prices, stock price forecasting, medical diagnostics.
Variants of NN algorithm
 k-nearest neighbor algorithm
 Modified k-nearest neighbor algorithm
 Fuzzy kNN algorithm
 R near neighbour

Decision Tree



Decision trees are a popular and powerful tool used in various fields such
as machine learning, data mining, and statistics. They provide a clear and
intuitive way to make decisions based on data by modeling the relationships
between different variables.

What is a Decision Tree?

A decision tree is a non-parametric supervised learning algorithm for


classification and regression tasks. It has a hierarchical tree structure
consisting of a root node, branches, internal nodes, and leaf nodes. Decision
trees are used for classification and regression tasks, providing easy-to-
understand models.

Structure of a Decision Tree

1. Root Node: Represents the entire dataset and the initial decision to be
made.
2. Internal Nodes: Represent decisions or tests on attributes. Each internal
node has one or more branches.
3. Branches: Represent the outcome of a decision or test, leading to
another node.
4. Leaf Nodes: Represent the final decision or prediction. No further splits
occur at these nodes.

How Decision Trees Work?

The process of creating a decision tree involves:


1. Selecting the Best Attribute: Using a metric like entropy, or
information gain, the best attribute to split the data is selected.
2. Splitting the Dataset: The dataset is split into subsets based on the
selected attribute.
3. Repeating the Process: The process is repeated recursively for each
subset, creating a new internal node or leaf node until a stopping criterion
is met (e.g., all instances in a node belong to the same class or a
predefined depth is reached).

Advantages of Decision Trees


 Simplicity and Interpretability: Decision trees are easy to understand
and interpret. The visual representation closely mirrors human decision-
making processes.
 Versatility: Can be used for both classification and regression tasks.
 No Need for Feature Scaling: Decision trees do not require
normalization or scaling of the data.
 Handles Non-linear Relationships: Capable of capturing non-linear
relationships between features and target variables.

Disadvantages of Decision Trees


 Overfitting: Decision trees can easily overfit the training data, especially
if they are deep with many nodes.
 Instability: Small variations in the data can result in a completely
different tree being generated.
 Bias towards Features with More Levels: Features with more levels
can dominate the tree structure.

Pruning
To overcome overfitting, pruning techniques are used. Pruning reduces the size of
the tree by removing nodes that provide little power in classifying instances. There are
two main types of pruning:

 Pre-pruning (Early Stopping): Stops the tree from growing once it meets certain
criteria (e.g., maximum depth, minimum number of samples per leaf).
 Post-pruning: Removes branches from a fully grown tree that do not provide
significant power.
Applications of Decision Trees
 Business Decision Making: Used in strategic planning and resource allocation.
 Healthcare: Assists in diagnosing diseases and suggesting treatment plans.
 Finance: Helps in credit scoring and risk assessment.
 Marketing: Used to segment customers and predict customer behavior.

Construction of Decision tree


Page 144
Bayes Classifier
Page number 86
Comparison with NNC
Page number 91

Introduction to Support Vector Machines (SVM)

INTRODUCTION:

Support Vector Machines (SVMs) are a type of supervised learning algorithm that can
be used for classification or regression tasks. The main idea behind SVMs is to find a
hyperplane that maximally separates the different classes in the training data. This is
done by finding the hyperplane that has the largest margin, which is defined as the
distance between the hyperplane and the closest data points from each class. Once
the hyperplane is determined, new data can be classified by determining on which
side of the hyperplane it falls. SVMs are particularly useful when the data has many
features, and/or when there is a clear margin of separation in the data.
What are Support Vector Machines? Support Vector Machine (SVM) is a relatively
simple Supervised Machine Learning Algorithm used for classification and/or
regression. It is more preferred for classification but is sometimes very useful for
regression as well. Basically, SVM finds a hyper-plane that creates a boundary
between the types of data. In 2-dimensional space, this hyper-plane is nothing but a
line. In SVM, we plot each data item in the dataset in an N-dimensional space, where
N is the number of features/attributes in the data. Next, find the optimal hyperplane to
separate the data. So by this, you must have understood that inherently, SVM can
only perform binary classification (i.e., choose between two classes). However, there
are various techniques to use for multi-class problems. Support Vector Machine for
Multi-CLass Problems To perform SVM on multi-class problems, we can create a
binary classifier for each class of the data. The two results of each classifier will be :
 The data point belongs to that class OR
 The data point does not belong to that class.
For example, in a class of fruits, to perform multi-class classification, we can create a
binary classifier for each fruit. For say, the ‘mango’ class, there will be a binary
classifier to predict if it IS a mango OR it is NOT a mango. The classifier with the
highest score is chosen as the output of the SVM. SVM for complex (Non Linearly
Separable) SVM works very well without any modifications for linearly separable
data. Linearly Separable Data is any data that can be plotted in a graph and can be
separated into classes using a straight line.

Applications

 Face detection

 Text and hypertext categorization

 Classification of images

 Bioinformatics

 Protein fold and remote homology detection

 Handwriting recognition

 Generalized predictive control (GPC)

Advantages

 SVMs are very efficient in handling data in high-dimensional space

 It is memory-efficient because it uses a subset of training points in the support

vectors
 It is very effective when the number of dimensions is greater than the number of

observations

Disadvantages

 It is not suitable for very large data sets

 It is not very efficient for data set have many outliers

 It doesn’t directly provide an indication of probability estimation

Unsupervised Learning
Unsupervised learning is where you have unlabeled data (or no target variable) in the
dataset.
The goal of Unsupervised Learning Algorithms is to find some structure in the dataset.
These are called unsupervised learning algorithms because unlike supervised learning, there
are no correct answers and there is no teacher. Algorithms are left to their own to discover and
present the interesting structure in the data.
Unsupervised Learning = Learning without labels

Why clustering?
Clustering is a popular unsupervised machine learning technique that groups similar data
points together based on their characteristics. By organizing data into clusters, we can uncover
hidden insights and make predictions about future data points.
Clustering - Divide and Rule
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects
in the same group (called a cluster) are more similar (in some sense) to each other than to those
in other groups (clusters).
Clustering
Clustering can be considered the most important unsupervised learning problem. It deals with
finding a structure in a collection of unlabelled data.
A loose definition of clustering could be “the process of organizing objects into groups
whose members are similar in some way”. A cluster is therefore a collection of objects
which are “similar” between them and are “dissimilar” to the objects belonging to other
clusters.
Types of Hierarchical Clustering

Basically, there are two types of hierarchical Clustering:


1. Agglomerative Clustering
2. Divisive clustering
Hierarchical Agglomerative Clustering

It is also known as the bottom-up approach or hierarchical agglomerative


clustering (HAC). A structure that is more informative than the unstructured
set of clusters returned by flat clustering. This clustering algorithm does not
require us to prespecify the number of clusters. Bottom-up algorithms treat
each data as a singleton cluster at the outset and then successively
agglomerate pairs of clusters until all clusters have been merged into a
single cluster that contains all data.
Algorithm :
given a dataset (d 1, d2, d3, ....dN) of size N
# compute the distance matrix
for i=1 to N:
# as the distance matrix is symmetric about
# the primary diagonal so we compute only lower
# part of the primary diagonal
for j=1 to i:
dis_mat[i][j] = distance[d i, dj]
each data point is a singleton cluster
repeat
merge the two cluster having minimum distance
update the distance matrix
until only a single cluster remains

Steps:
 Consider each alphabet as a single cluster and calculate the distance of
one cluster from all the other clusters.
 In the second step, comparable clusters are merged together to form a
single cluster. Let’s say cluster (B) and cluster (C) are very similar to each
other therefore we merge them in the second step similarly to cluster (D)
and (E) and at last, we get the clusters [(A), (BC), (DE), (F)]
 We recalculate the proximity according to the algorithm and merge the
two nearest clusters([(DE), (F)]) together to form new clusters as [(A),
(BC), (DEF)]
 Repeating the same process; The clusters DEF and BC are comparable
and merged together to form a new cluster. We’re now left with clusters
[(A), (BCDEF)].
 At last, the two remaining clusters are merged together to form a single
cluster [(ABCDEF)].

Hierarchical Divisive clustering

It is also known as a top-down approach. This algorithm also does not


require to prespecify the number of clusters. Top-down clustering requires a
method for splitting a cluster that contains the whole data and proceeds by
splitting clusters recursively until individual data have been split into
singleton clusters.
Algorithm :
given a dataset (d 1, d2, d3, ....dN) of size N
at the top we have all data in one cluster
the cluster is split using a flat clustering method eg. K-Means
etc
repeat
choose the best cluster among all the clusters to split
split that cluster by the flat clustering algorithm
until each data is in its own singleton cluster

Computing Distance Matrix


While merging two clusters we check the distance between two every pair of
clusters and merge the pair with the least distance/most similarity. But the
question is how is that distance determined. There are different ways of
defining Inter Cluster distance/similarity. Some of them are:
1. Min Distance: Find the minimum distance between any two points of the
cluster.
2. Max Distance: Find the maximum distance between any two points of the
cluster.
3. Group Average: Find the average distance between every two points of
the clusters.
4. Ward’s Method: The similarity of two clusters is based on the increase in
squared error when two clusters are merged.
Partitioning Method (K-Mean) in Data
Mining


Partitioning Method: This clustering method classifies the information into


multiple groups based on the characteristics and similarity of the data. Its the
data analysts to specify the number of clusters that has to be generated for
the clustering methods. In the partitioning method when database(D) that
contains multiple(N) objects then the partitioning method constructs user-
specified(K) partitions of the data in which each partition represents a cluster
and a particular region. There are many algorithms that come under
partitioning method some of the popular ones are K-Mean, PAM(K-Medoids),
CLARA algorithm (Clustering Large Applications) etc. In this article, we will
be seeing the working of K Mean algorithm in detail. K-Mean (A centroid
based Technique): The K means algorithm takes the input parameter K
from the user and partitions the dataset containing N objects into K clusters
so that resulting similarity among the data objects inside the group
(intracluster) is high but the similarity of data objects with the data objects
from outside the cluster is low (intercluster). The similarity of the cluster is
determined with respect to the mean value of the cluster. It is a type of
square error algorithm. At the start randomly k objects from the dataset are
chosen in which each of the objects represents a cluster mean(centre). For
the rest of the data objects, they are assigned to the nearest cluster based
on their distance from the cluster mean. The new mean of each of the cluster
is then calculated with the added data objects.
Algorithm: K mean:
Input:
K: The number of clusters in which the dataset has to be
divided
D: A dataset containing N number of objects

Output:
A dataset of K clusters
Method:
1. Randomly assign K objects from the dataset(D) as cluster centres(C)
2. (Re) Assign each object to which object is most similar based upon mean
values.
3. Update Cluster means, i.e., Recalculate the mean of each cluster with the
updated values.
4. Repeat Step 2 until no change occurs.

You might also like