IR Unit 2 (1,2)

The document discusses various machine learning algorithms, focusing on supervised and unsupervised learning, particularly classification algorithms. It explains the differences between regression and classification, key definitions, types of classification, and specific algorithms like Naïve Bayes and Support Vector Machines. Additionally, it covers feature selection, dimensionality reduction, clustering techniques, and real-life applications such as spam detection and sentiment analysis.

Uploaded by

dramya761

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views76 pages

IR Unit 2 (1,2)

Uploaded by

dramya761

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 76

UNIT 2

Chap
1,2
Test Classification / Categorization
Algorithm
• The Supervised Machine Learning algorithm can be broadly
classified into Regression and Classification Algorithms.
• In Regression algorithms, we have predicted the output
for continuous values, such as predicting the price of a
house, temperature, or stock market trends. but to predict the
categorical values, such as determining whether an email is
spam or not spam, or classifying customers into categories
like "High Risk" or "Low Risk." we need Classification
algorithms.
What is the Classification Algorithm?
The Classification algorithm is a Supervised Learning technique
that is used to identify the category of new observations on the
basis of training data.
In Classification, a program learns from the given dataset or
observations and then classifies new observation into a number
of classes or groups. Such as, Yes or No, 0 or 1, Spam or Not
Spam, cat or dog, etc. Classes can be called as targets/labels or
categories.
Key Definitions:
Documents and Classes:
D: A collection of documents to be classified.
C={C 1 ,C 2 ,...,C k,Cp }: A set of predefined classes or
categories. Like Promotion, Social, private etc
Classification Function F:
F(d,Cp):
A binary function that determines whether a document d belongs to
a specific
class Cp
F(d,Cp)=1: If d belongs to Cp
F(d,Cp)=0: If d does not belong to Cp
• Types of Classification:
• Single-label Classification: Each document is assigned exactly one class.
• Multi-label Classification: A document can belong to multiple classes
simultaneously.
Test Classification Algorithm
• Text categorization is an effective activity that can be accomplished
using a variety of classification algorithms.
• Text classification algorithms are categorized into two groups.
1. Supervised algorithms
2. Unsupervised algorithms
Supervised learning
• Supervised learning is a type of machine learning algorithm that learns from
labeled data.
• Labeled data is data that has been tagged with a correct answer or
classification.
• Supervised learning involves training a machine from labeled data.
• The machine learns the relationship between inputs (fruit images) and outputs
(fruit labels).

• The trained machine can then make predictions on new, unlabeled data.
• Unsupervised learning :
• Unsupervised learning is a type of machine learning that learns from unlabeled
data.
• This means that the data does not have any pre-existing labels or categories.
• The goal of unsupervised learning is to discover patterns and relationships
in the data without any explicit guidance.
• Here the task of the machine is to group unsorted information according
to similarities, patterns, and differences without any prior training of
data.
Naïve Bayes/ Bayes Theorem
• Naïve Bayes algorithm is a supervised learning algorithm, which is based
on Bayes theorem and used for solving classification problems.
• It is mainly used in text classification that includes a high-dimensional training
dataset.
• It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
• Naïve: It is called Naïve because it assumes that the occurrence of a certain
feature is independent of the occurrence of other features. Such as if the fruit
is identified on the bases of color, shape, and taste, then red, spherical, and
sweet fruit is recognized as an apple.
• Bayes: It is called Bayes because it depends on the principle of Bayes'
Theorem.
• The formula for Bayes' theorem is given as:

Where:
P(A∣B) is the posterior probability, the probability of event A occurring
given that event B has occurred.
P(B∣A) is the likelihood, the probability of observing event B given that event
A is true.
P(A) is the prior probability, the initial probability of event A before
any evidence is taken into account.
P(B) is the marginal likelihood or total probability of observing event B,
regardless of the occurrence of event A.
Real-Life Example
(Medical
• Event Test):
A: The person has the disease.
• Event B: The person tests positive for the disease.
Let’s define the following:
Prior Probability P(A): The probability that a person has the disease.
Likelihood P(B∣A): The probability that a person with the disease will test
positive (say 99%).
Marginal Likelihood P(B): The total probability that a person will test
positive, regardless of whether they have the disease or not. Posterior
Probability
P(A∣B): The probability that the person has the disease given that they tested
positive.
• Let's walk through the process of applying Naive Bayes' Theorem using
the frequency table provided, with the goal of predicting which type of
fruit corresponds to the properties {Yellow, Sweet, Long}.
• guide you step by step for both Mango and Banana.
• Naive Bayes' Theorem for Classification:
P(X∣Fruit)= P(Yellow∣Fruit)×P(Sweet∣Fruit) ×
P(Long∣Fruit)×P(Fruit)
We’ll first calculate the conditional probabilities for Mango and Banana.
1. Mango:
Support Vector Machine(SVM)
• Definition:
SVM is a supervised machine learning algorithm used for classification
and regression tasks.
• It works by finding the best boundary (or hyperplane) that separates different
classes in the data.
• SVMs are highly adaptable, making them suitable for various applications
such as text classification, image classification, spam
detection, handwriting identification, face detection, and anomaly
detection.
• SVM is best suited for classification tasks.
• The primary objective of the SVM algorithm is to identify the optimal
hyperplane in an N-dimensional space that can effectively separate data points
into different classes in the feature space.
• Data points closest to the hyperplane; these points influence the position and
orientation of the hyperplane.
• Margin: The distance between the hyperplane and the nearest data points
of each class. SVM tries to maximize this margin for better
classification.
• Example in Real Life:
• Spam Email Detection: SVM separates spam emails from regular emails by
analyzing features like word frequency.
• Tumor Classification:
• In medical imaging, SVM separates images into "tumor" and "non-tumor"
classes.
Feature
Selection
• Definition:
Feature selection is a way of selecting the subset of the most relevant
features from the original features set by removing the redundant,
irrelevant, or noisy features.
• By reducing the number of terms (features), we decrease the time and
resources needed to train a classifier.
• Example: Imagine sorting through a library. Instead of looking at every
book (all terms), you only focus on books about “science fiction”
(important terms).
• Here’s a simple explanation of the feature selection algorithm:
Step 1 :Extract Vocabulary: Identify all unique terms in the training
documents.
• Example: Classifying Emails as Spam or Not
• Email 1: "cheap deal today"
• Email 2: "limited offer buy now"
• Email 3: "team meeting schedule."
• Vocabulary: {cheap, deal, today, limited, offer, buy, now, team, meeting,
schedule}.
Step 2: Calculate how much each word helps in classifying emails:
• Words like "cheap" and "buy" often appear in spam emails.
• Words like "team" and "meeting" appear in non-spam emails.
Step 3: Rank the words based on their contribution to
distinguishing spam from non-spam:
• Spam-related: cheap (0.8), buy (0.7), offer (0.6).
• Non-spam-related: team (0.9), meeting (0.8).
Step 4: Select the top k words. For k=3 choose:
{cheap, team, meeting}.
Dimensionality
Reduction
• The number of input features, variables, or columns
present in a given dataset is known as dimensionality, and
the process to reduce these features is called
dimensionality reduction.
• A dataset contains a huge number of input features in
various cases, which makes the predictive modeling task
more complicated. Because it is very difficult to visualize
or make predictions for the training dataset with a high
number of features, for such cases, dimensionality
reduction techniques are required to use.
Benefits of applying
Dimensionality Reduction
• By reducing the dimensions of the features, the space required to store the
dataset also gets reduced.
• Less Computation training time is required for reduced dimensions
of features.
• Reduced dimensions of features of the dataset help in visualizing the
data quickly.
• Itremoves the redundant features (if present) by taking care
of multicollinearity.
Differentiate between feature
Selection and Dimensionality
Reduction
Application of Text categorization and
Filtering
• Text categorization and filtering involve organizing and extracting relevant
information from text data based on predefined categories or criteria.
• These methods are widely used in various real-life applications, such as
1. spam detection,
2. sentiment analysis, and
3. Classifying Advertisements
4. Text Filtering
1. Spam Detection
• Purpose: Identify and filter out spam (unwanted) emails or messages.
• How It Works: A classifier is trained on labeled examples of spam and
non- spam emails.
• It looks for features like: Specific keywords: "win", "free", "offer".
2. Sentiment Analysis
• Purpose: Analyze text to determine the sentiment (positive, negative,
or neutral).
• How It Works: The system identifies opinion-based words and phrases in
text.
• Assigns a sentiment score based on the context.
• Example:
• Review: "The movie was amazing, I loved it!" → Positive Sentiment.
• Review: "Terrible customer service, very disappointed." → Negative
Sentiment.
3. Classifying Advertisements :
Purpose: Categorize ads into relevant categories for better targeting.
• How It Works:Text in the ad (e.g., title, description) is analyzed to identify
the most suitable category.
• Categories could include "Jobs," "Real Estate," "Electronics," etc.
• Example:
• Ad: "Brand new iPhone 14 for sale. Excellent condition." → Classified as
Electronics.
• Ad: "Looking for a software developer in New York" → Classified as Jobs.
4. Text Filtering
• Text filtering is a technique used to remove or modify unwanted content
from user interactions, such as messages or search queries, based on
certain criteria.
•The different types of text filtering
are: 1.Content-based Filtering
5. Collaborative Filtering
6. Hybrid Filtering
1. Content-Based Filtering
• Definition: This approach recommends items based on the
similarity between the content of items and the user’s
past preferences.
• How it Works:
• Analyzes the features of the items (e.g., keywords,
and, descriptions).
• Matches these features with a user profile built
from previously interacted content.
• Example:
• A movie recommendation system suggests films with
similar, actors, or directors as the movies a user has already
rated highly.
• 2. Collaborative Filtering
• Definition: This approach relies on user interaction data
(ratings, clicks, etc.) to recommend items based on similarities
between users or items.
• 3. Hybrid Filtering
• Definition: This approach combines content-based and
collaborative filtering methods to leverage their strengths
and mitigate their weaknesses.
• Example:
• A music app combines user preferences (content-based) and the
listening habits of similar users (collaborative) to recommend
new songs.
Differentiate Between Information
Filtering and Retrieval
Difference Between Classification and
Clustering
Clustering Techniques
• Clustering is a process of grouping a set of objects or data points into
clusters so that:
• Similar objects are grouped together in one cluster.
• Dissimilar objects are placed in different clusters.
• When we want to divide a large group of things (like customers, students,
or items) into smaller groups based on how similar they are, we use a
process called Clustering.
• Clustering is the most common form of Unsupervised Learning.
• After Clustering, each cluster is assigned a number called a Cluster ID
• Two most popular clustering algorithms are K means and Hierarchical .
Types of Clustering Methods
Partitioning Clustering
Divides the dataset into k distinct groups or clusters.
It divides the data into hierarchical groups.
It is also known as the Centroid-based method.
Algorithms: K-Means algorithm
Example: Grouping customers in a store based on purchasing behavior (e.g.,
frequent buyers, occasional buyers).
• Density-Based Clustering
Clusters are formed based on density of data points.
• Dense areas form clusters, while sparse regions are considered noise or outliers.
• A sparse region typically refers to an area in data or a system where information
or resources are distributed sparsely or unevenly.
• Algorithm: DBSCAN

Example:
• Imagine you are grouping houses in a city based on their price and size:
• Most houses are grouped in areas where the prices and sizes are close to each other
(e.g., small houses with low prices and large houses with high prices form clusters).
A few houses are isolated because:
• They are too expensive for their size (an outlier).
• They are in remote areas with very few neighbors (sparse region).
Distribution Model-Based Clustering
Assumes data is generated by a specific probability distribution and tries to
fit the data to that model.
• Algorithm: Expectation-Maximization (EM)
Example: Grouping students based on their test scores using Gaussian
distributions.
• Expectation Step (E-step):The algorithm guesses which probability
distribution each data point likely belongs to based on the current
parameters of the distributions.
• Step 1: Guess the number of clusters. Let's say you assume there are 3
clusters (low, medium, high scores).
• Step 2: In the E-step, the algorithm estimates which students belong to each
of these three groups based on their test scores.
• Step 3:
• In the M-step, the algorithm updates the parameters of the Gaussian
distributions (e.g., the mean and standard deviation) for each group based
on the students assigned to each group.
• The EM algorithm helps you find groups even if the data is not perfectly
separated, by fitting it into a distribution (like a Gaussian curve) and refining
it over time.
Connectivity-Based Clustering
Also called hierarchical clustering, it builds a tree-like structure of clusters.
1.Objects close to each other are grouped first, and this process continues.
2.Algorithms: Agglomerative Hierarchical Clustering, Divisive
Clustering

Example: Organizing books in a library by topics and subtopics.

• Step-by-Step Example:
1. Start with individual books as separate clusters:
1.Book A: "Data Science Basics"
2.Book B: "Machine Learning Algorithms"
3.Book C: "Data Structures and Algorithms"
4.Book D: "Python Programming"
5.Book E: "Deep Learning for Beginners"
• Find the closest pairs of books (based on similarity in content):
• Book A ("Data Science Basics") and Book B ("Machine Learning
Algorithms") are closest because both are about data science and machine
learning.
• Book C ("Data Structures and Algorithms") is closest to Book D
("Python Programming") because both are related to computer
science.
• Continue merging:
• Now, you have the following clusters:
• Cluster 1: "Data Science Basics" and "Machine Learning Algorithms"
• Cluster 2: "Data Structures and Algorithms" and "Python Programming"
• Book E ("Deep Learning for Beginners") is still in its own cluster.
• Now, Cluster 1 and Cluster 2 are both about computer science, so you merge
these into a larger cluster.
• Book E could be grouped into the "deep learning" subtopic.
K means Clustering
• K-Means Clustering is an Unsupervised Machine Learning algorithm,
which groups the unlabeled dataset into different clusters.
• Unsupervised Machine Learning is the process of teaching a computer to
use unlabeled, unclassified data and enabling the algorithm to operate
on that data without supervision.
• Without any previous data training, the machine’s job in this case is to
organize unsorted data according to parallels, patterns, and variations.
• K-means is an iterative, centroid-based clustering algorithm that
partitions a dataset into similar groups based on the distance between their
centroids.
K-means is an iterative,
centroid-based
clustering algorithm
• Here K defines the number of pre-defined clusters that need to be created
in the process, as if K=2, there will be two clusters, and for K=3, there will be
three clusters, and so on.
• It is an iterative algorithm that divides the unlabeled dataset into k
different clusters in such a way that each dataset belongs only one group that
has similar properties.
• It starts by randomly assigning the clusters centroid in the space.
• Then each data point assign to one of the cluster based on its distance from
centroid of the cluster.
• After assigning each point to one of the cluster, new cluster centroids are
assigned.
• This process runs iteratively until it finds good cluster.
Hierarchical Clustering
• A Hierarchical clustering method works via grouping data into a tree
of clusters.
• Hierarchical clustering begins by treating every data point as a separate
cluster.
• Then, it repeatedly executes the subsequent steps:
1. Identify the 2 clusters which can be closest together, and

2. Merge the 2 maximum comparable clusters.

3. We need to continue these steps until all the clusters are merged together.
Agglomerative Hierarchical Clustering (Bottom-Up Approach):
• This is the most common approach.
• Starts with each data point as its own individual cluster.
• The algorithm gradually merges the closest clusters to form larger clusters
until all points are in one cluster.
Divisive Hierarchical Clustering (Top-Down Approach):
This approach starts with all data points in a single cluster.
The algorithm splits the large cluster into smaller clusters, based on some
criteria, and continues this splitting process until each data point is in its own
individual cluster.
Example: You start with all books in the library in one big "Books" cluster, and
then you split them into "Science," "Arts," "Technology," etc., until each book
is in its own cluster.
Methods to Find closest pair of
Clusters
• In Hierarchical Clustering, one important aspect is deciding how
to determine the "closeness" between clusters when they need to
be merged.
• There are several methods used to find the closest pair of
clusters, and the most common ones are:
1. Single-Linkage (Nearest Point Linkage)
2. Complete-Linkage (Farthest Point Linkage)
3. Average-Linkage
1. Single-Linkage (Nearest Point Linkage)
In single linkage, the distance between two clusters is defined as the shortest
distance between any two points—one from each cluster.
Essentially, it's the distance between the closest pair of points in the two
clusters.
For two clusters R and S, the single linkage returns the minimum distance
between two points i and j such that i belongs to R and j belongs to S.
2. Complete Linkage:(Farthest Point Linkage)
• In complete linkage, the distance between two clusters is defined as the
maximum distance between any two points—one from each cluster.
• Essentially, it's the distance between the farthest pair of points in the two
clusters.
• For two clusters R and S, the complete linkage returns the maximum distance
between two points i and j such that i belongs to R and j belongs to S.
3. Average-Linkage
In average linkage, the distance between two clusters is defined as the average
distance between all pairs of points—one from each cluster.
This means you calculate the distance between every pair of points, one
from each cluster, and then take the average of all these distances.
For two clusters R and S, first for the distance between any data-point i in R
and any data-point j in S and then the arithmetic mean of these distances are
calculated. Average Linkage returns this value of the arithmetic mean.
Difference Between Kmeans and Hierarcical
clustering
k-means Clustering Hierarchical Clustering

k-means, using a pre-specified number of

clusters, the method assigns records to each Hierarchical methods can be either divisive or
cluster to find the mutually exclusive cluster of agglomerative.
spherical shape based on distance.

In hierarchical clustering one can stop at any number

K Means clustering needed advance knowledge of K of clusters, one find appropriate by interpreting the
i.e. no. of clusters one want to divide your data. dendrogram.(bottom to top or top to bottom)

Agglomerative methods begin with ‘n’ clusters and

One can use median or mean as a cluster centre to sequentially combine similar clusters until only one
represent each cluster. cluster is obtained.
Divisive methods work in the opposite
Methods used are normally less direction, beginning with one cluster that
computationally intensive and are suited includes all the records and Hierarchical
with very large datasets. methods are especially useful when the
K-Means is relatively simple and fast because target is to arrange the clusters into a natural
it only calculates distances and updates hierarchy.(Smaller groups (clusters) are
cluster centers. It's great for large datasets. nested within larger groups.)

In K Means clustering, since one start with

random choice of clusters, the results In Hierarchical Clustering, results are
produced by running the algorithm many reproducible in Hierarchical clustering
times may differ.

K- means clustering a simply a division of the

set of data objects into non-overlapping A hierarchical clustering is a set of nested
subsets (clusters) such that each data clusters that are arranged as a tree.
object is in exactly one subset).
Evaluations Of Clustering Result
• Three important Factors:
1. Clustering Tendency
2. Number of clusters k
3. Clustering Quality
Clustering Tendency
Clustering Tendency is the ability to determine whether a
dataset naturally forms groups (clusters) before actually
applying a clustering algorithm. It's like asking, "Can my data
be grouped meaningfully, or is it just random?“
Why is Clustering Tendency Important?
• Not all datasets are suitable for clustering.
• If the data doesn't have a natural structure, clustering
may give meaningless or arbitrary results.
• Clustering tendency helps us check if clustering is even
possible or worth it for a given dataset.
want to group customers
based on their buying habits.
If customers clearly behave
differently (e.g., some buy
luxury items, others buy
budget products), clustering is
possible.
If all customers buy random
things with no clear patterns,
clustering won't make sense.
2. Number of clusters k :
Choosing the number of clusters (K) in clustering is a critical step, and there
are different approaches to decide the best K.
Here's a breakdown of the two main approaches
1. Domain Knowledge Approach
2. Data-Driven Approach

3. Domain Knowledge Approach:

It give some prior knowledge on finding the number of clusters.
This approach uses your understanding of the data or the problem you're
solving to choose the number of clusters.
If you already have some knowledge about the data or the context, you
can make an educated guess about the appropriate number of clusters.
Example:
Imagine you are working for a clothing store and want to cluster customers
based on their shopping habits.
If you know that there are three main types of customers (budget, mid-range,
and high-end shoppers),
you might choose K=3 based on that understanding.
2. Data-Driven Approach :
This approach uses data itself to find the best number of clusters.
Here are a few methods within this approach:
1. Empirical Approach
2. Elbow Method
3. Statistical Methods
Empirical Approach:
You try different values of K (for example, K=2, 3, 4, etc.) and see which one
produces meaningful or useful clusters.
Elbow Method:
The Elbow Method is a way to help decide the best number of clusters (K)
when you're using a clustering algorithm like K-Means.
How Does It Work?
• Choose different K values (the number of clusters):
• Start with K=1 (one cluster) and increase K step by step (K=2, K=3, etc.).
• When plotting the points for clusters (k = 1, 2, 3, 4, etc.), you notice the drop
slows significantly at 3 clusters.
• Plot a graph:
• Plot the number of clusters (K) on the X-axis.
• Look for the "elbow" point:
• The elbow is the point where adding more clusters doesn’t improve the
results much anymore.
3. Statistical Methods:
Gap Statistics helps you find the best number of clusters by comparing the
performance of your clustering model with random clustering.
It looks for a "gap" between the performance of your real clustering and
random clustering to determine how meaningful the clusters are.
3. Clustering Quality :
When evaluating the quality of a clustering result, we use two types
of measures:
1. Extrinsic and
2. Intrinsic.
These measures help assess how well the clustering reflects the actual structure
of the data.
Clustering for Query expansion
and Result Grouping
• Query Expansion: This refers to the process of enhancing a search query by
adding additional terms or phrases to retrieve more relevant results.
Clustering techniques can help identify groups of related terms or
documents, which can then suggest additional keywords for expanding the
original query.
• Result Grouping: Once search results are retrieved, clustering algorithms
group them into clusters based on similarity. This helps users navigate results
by topic or theme, making it easier to explore related information.
• Dice's Coefficient for Measuring Similarity
• Dice's Coefficient is a statistical measure used to determine the
similarity between two sets. It is often used in clustering to compare the
similarity between terms, queries, or documents.
• Formula:
• The Dice's Coefficient is defined as:
• Explanation with Example
• Intrinsic Measures (Internal Measures)
• Intrinsic measures evaluate the clustering quality based only on the data
and the clusters themselves, without any external reference.
• These metrics focus on how well the data points are grouped within each
cluster and how separated the clusters are.
• Extrinsic Measures (External Measures)
• Extrinsic measures assess the quality of the clustering by comparing the
clustering result with some external reference(predefined reference).
• These measures are useful when you have labeled data to compare your
clustering result against.

Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
Learning AI
No ratings yet
Learning AI
34 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
37 pages
CH 5
No ratings yet
CH 5
21 pages
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper
No ratings yet
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper
74 pages
ML Notes - 2025
No ratings yet
ML Notes - 2025
145 pages
Text Classification in ML
No ratings yet
Text Classification in ML
47 pages
Unit 3 ML
No ratings yet
Unit 3 ML
28 pages
Notes
No ratings yet
Notes
32 pages
UNIT - IV
No ratings yet
UNIT - IV
169 pages
ML & Cloud Computing For Iot: Topics in Module-3
No ratings yet
ML & Cloud Computing For Iot: Topics in Module-3
38 pages
Irs Unit 4 CH 1
No ratings yet
Irs Unit 4 CH 1
58 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
Module 1 Notes
No ratings yet
Module 1 Notes
56 pages
Unit 9 - Classification & Clustering
No ratings yet
Unit 9 - Classification & Clustering
34 pages
Unit 3
No ratings yet
Unit 3
20 pages
Mla Unit-5'2
No ratings yet
Mla Unit-5'2
74 pages
Lecture 6 Text Classification
No ratings yet
Lecture 6 Text Classification
19 pages
Unit - 3
No ratings yet
Unit - 3
83 pages
Machine Learning: Dr. Windhya Rankothge (PHD - Upf, Barcelona)
No ratings yet
Machine Learning: Dr. Windhya Rankothge (PHD - Upf, Barcelona)
44 pages
Module3 Ids
No ratings yet
Module3 Ids
17 pages
ML Notes (III BCA)
No ratings yet
ML Notes (III BCA)
64 pages
FPA Notes
No ratings yet
FPA Notes
13 pages
Unit 3
No ratings yet
Unit 3
123 pages
03 Classification
No ratings yet
03 Classification
66 pages
Naïve Bayesian Classifier
No ratings yet
Naïve Bayesian Classifier
15 pages
Unit 4
No ratings yet
Unit 4
26 pages
Unit-4 Naïve Bayes & Support Vector Machine
No ratings yet
Unit-4 Naïve Bayes & Support Vector Machine
79 pages
Machine Learning-Lecture 04
No ratings yet
Machine Learning-Lecture 04
31 pages
Machine Learning Algorithms Laiki
No ratings yet
Machine Learning Algorithms Laiki
123 pages
ML Chapter 3
No ratings yet
ML Chapter 3
45 pages
ML Module4 Classification
No ratings yet
ML Module4 Classification
79 pages
Classification Chapter 5
No ratings yet
Classification Chapter 5
26 pages
ML Unit-1-1
No ratings yet
ML Unit-1-1
16 pages
NaiveBayes N Text Analytics
No ratings yet
NaiveBayes N Text Analytics
20 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
Classification
No ratings yet
Classification
21 pages
Classification
No ratings yet
Classification
50 pages
Ai 14
No ratings yet
Ai 14
11 pages
Classification and Clustering Techniques in Data Mining
No ratings yet
Classification and Clustering Techniques in Data Mining
18 pages
ML 3RD Unit
No ratings yet
ML 3RD Unit
67 pages
What Is Naive Bayes Algorithm
No ratings yet
What Is Naive Bayes Algorithm
10 pages
4 22865 IS465 2019 1 2 1 08ClassBasic
No ratings yet
4 22865 IS465 2019 1 2 1 08ClassBasic
43 pages
Artificial Intelligence Lec 2
No ratings yet
Artificial Intelligence Lec 2
17 pages
cs188 Fa22 Note19
No ratings yet
cs188 Fa22 Note19
8 pages
Naive Bayes - Report (Repaired)
No ratings yet
Naive Bayes - Report (Repaired)
5 pages
11 W11NSE6220 - Fall 2023 - Zeng
No ratings yet
11 W11NSE6220 - Fall 2023 - Zeng
43 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Machine Learning Section4 Ebook v03
No ratings yet
Machine Learning Section4 Ebook v03
20 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Co-2 ML 2019
No ratings yet
Co-2 ML 2019
71 pages
6d7701 - Bayesean Classifer
No ratings yet
6d7701 - Bayesean Classifer
8 pages
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
No ratings yet
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
55 pages
Naive Bayes Model
No ratings yet
Naive Bayes Model
10 pages
Unit III
No ratings yet
Unit III
10 pages
Classification
No ratings yet
Classification
7 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
6 pages
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
No ratings yet
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
28 pages
Ezu DV4 - Q1e Ez1204 9 PDF
No ratings yet
Ezu DV4 - Q1e Ez1204 9 PDF
59 pages
0417 w23 QP 03
No ratings yet
0417 w23 QP 03
3 pages
Account Statement 1 Apr 2024 To 21 Jul 2024
No ratings yet
Account Statement 1 Apr 2024 To 21 Jul 2024
12 pages
How To Make C64 Cartridges (Part 1)
100% (1)
How To Make C64 Cartridges (Part 1)
2 pages
Procurement Key Performance Indicators and Metrics
100% (1)
Procurement Key Performance Indicators and Metrics
10 pages
Lesson Plan in TLE IC 8: Teaching The Common Competencies in ICT
No ratings yet
Lesson Plan in TLE IC 8: Teaching The Common Competencies in ICT
2 pages
Purchase Order - 0002 PDF
No ratings yet
Purchase Order - 0002 PDF
1 page
Red Lion Control DSPSX000 Data Station Plus
No ratings yet
Red Lion Control DSPSX000 Data Station Plus
8 pages
I. NGCP: (National Grid Corporation of The Philippines)
No ratings yet
I. NGCP: (National Grid Corporation of The Philippines)
10 pages
Fembot
No ratings yet
Fembot
7 pages
Triumph Herald
No ratings yet
Triumph Herald
84 pages
Microlab 600 Basic Technical Manual
No ratings yet
Microlab 600 Basic Technical Manual
113 pages
Lecture - Information Technology in Supply Chain Management
No ratings yet
Lecture - Information Technology in Supply Chain Management
23 pages
Video Resume For Teachers
100% (1)
Video Resume For Teachers
4 pages
Warning Warning Warning: Prodigy 2.0 (M3 Unit Controller) Setup Guide
No ratings yet
Warning Warning Warning: Prodigy 2.0 (M3 Unit Controller) Setup Guide
88 pages
Vocpro RKI Instrument Datasheet
No ratings yet
Vocpro RKI Instrument Datasheet
2 pages
Rev 32 P0003 28 August 2017
No ratings yet
Rev 32 P0003 28 August 2017
24 pages
Ak4351vt Akm
No ratings yet
Ak4351vt Akm
14 pages
Eletronics Lab Report - TTL
No ratings yet
Eletronics Lab Report - TTL
2 pages
1031590-Ejosat1083443-2292434 231226 134238
No ratings yet
1031590-Ejosat1083443-2292434 231226 134238
7 pages
Datasheet KOS 1060 HP - Max
No ratings yet
Datasheet KOS 1060 HP - Max
2 pages
Meeting PPT - 19.07.2022
No ratings yet
Meeting PPT - 19.07.2022
15 pages
L&T Hydrocarbon Presentation September 2015
No ratings yet
L&T Hydrocarbon Presentation September 2015
27 pages
Orange and White Modern Creative Marketing Plan Presentation
No ratings yet
Orange and White Modern Creative Marketing Plan Presentation
12 pages
Ordinary Level Cala Component D 2023
No ratings yet
Ordinary Level Cala Component D 2023
6 pages
L-3, Output Devices by Arpita Mam
No ratings yet
L-3, Output Devices by Arpita Mam
22 pages
Function of Several Variables Assignment: Mathophilic Education Mob. No.-7239082744
No ratings yet
Function of Several Variables Assignment: Mathophilic Education Mob. No.-7239082744
10 pages
DS 14NHG28
No ratings yet
DS 14NHG28
2 pages
Summary Multi-Purpose-Camera en
No ratings yet
Summary Multi-Purpose-Camera en
1 page
SNI Application - Flow Chart
No ratings yet
SNI Application - Flow Chart
3 pages

IR Unit 2 (1,2)

Uploaded by

IR Unit 2 (1,2)

Uploaded by

UNIT 2

Example: Organizing books in a library by topics and subtopics.

2. Merge the 2 maximum comparable clusters.

k-means, using a pre-specified number of

In hierarchical clustering one can stop at any number

Agglomerative methods begin with ‘n’ clusters and

In K Means clustering, since one start with

K- means clustering a simply a division of the

3. Domain Knowledge Approach:

You might also like