0% found this document useful (0 votes)

21 views20 pages

U-5 Iml

Uploaded by

backup.srinija

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views20 pages

U-5 Iml

Uploaded by

backup.srinija

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

U-5 IML Questions

1) Explain how clustering tasks differ from classification tasks and how
clustering defines groups. What are different types of clustering
techniques?

Ans:

Differences Between Clustering and Classification Tasks:

Aspect Clustering Classification

Type of Learning Unsupervised learning. Supervised learning.
Goal To group data points into To assign data points to
clusters based on similarity or predefined categories or
patterns in the data. classes.
Input Data No labeled data; only feature Requires labeled data (features
data is available. and corresponding labels).
Output A set of clusters, each Predicted labels for the data
representing similar data points based on the training
points. model.
How Groups Are Defined by the algorithm based Defined by the training process
Defined on similarity measures like using labeled data and decision
distance, density, or boundaries.
connectivity.
Evaluation Internal metrics (e.g., silhouette Accuracy, precision, recall, F1-
Metrics score, Davies-Bouldin index) score, confusion matrix, etc.
or external benchmarks.
Applications Market segmentation, anomaly Spam detection, fraud
detection, document clustering, detection, medical diagnosis,
image compression. sentiment analysis.
Types of Clustering Techniques (Simple Explanation):

Clustering is about grouping similar items together. Here are the main types of clustering
techniques explained simply:

1. Partitioning Clustering

• What it does: Divides the data into k groups (clusters) where each item belongs to one
group.

Example:

• k-means: Groups data by finding cluster centers (called centroids) and putting items
near them in the same group.
• When to use: If you know the number of clusters you want and the clusters are round-
shaped.

2. Hierarchical Clustering

• What it does: Makes a tree of clusters, where smaller clusters merge into bigger ones
or bigger ones split into smaller ones.
• Example:
o Agglomerative (Bottom-Up): Start with one item per cluster, then combine similar
clusters.
o Divisive (Top-Down): Start with one big cluster, then split into smaller ones.
• When to use: When you want to see clusters at different levels of detail.

3. Density-Based Clustering

• What it does: Finds clusters in areas where items are packed tightly and ignores areas
with few items (outliers).
• Example:
o DBSCAN: Groups items that are close together and marks isolated items as
noise.
• When to use: If clusters are oddly shaped or you have outliers.

4. Grid-Based Clustering

• What it does: Divides the data space into squares (grids) and groups items in the grids
that have many points.
• Example:
o STING: Groups based on density in grid cells.
• When to use: For very large datasets.

5. Model-Based Clustering

• What it does: Assumes the data is made up of different groups, each with a certain
shape or pattern (like bell-shaped).
• Example:
o Gaussian Mixture Models (GMM): Groups items based on probabilities that
they belong to a certain cluster.
• When to use: If you think clusters follow specific patterns (e.g., bell-shaped).

6. Spectral Clustering

• What it does: Groups items based on their relationships, like how connected they are
in a network.
• When to use: For data with complex shapes or relationships, like in social networks.

7. Fuzzy Clustering

• What it does: Lets an item belong to more than one cluster, with a percentage for each.
• Example:
o Fuzzy c-means: Groups items but allows overlap between groups.
• When to use: If items could naturally belong to multiple groups.
2) In detail explain Hierarchical clustering (= what is Hierarchical
clustering? What are the types of Hierarchical clustering. With a neat
sketch explain Hierarchical Agglomerative Algo. With a neat sketch,
explain Hierarchical Division clustering Algo)
Ans:
Hierarchical Clustering:

Hierarchical clustering is a connectivity-based clustering model that groups the data points
together that are close to each other based on the measure of similarity or distance. The
assumption is that data points that are close to each other are more similar or related than
data points that are farther apart.

Types of Hierarchical Clustering:

Hierarchical Agglomerative Clustering:

It is also known as the bottom-up approach or hierarchical agglomerative clustering

(HAC). A structure that is more informative than the unstructured set of clusters returned
by flat clustering. This clustering algorithm does not require us to prespecify the number
of clusters. Bottom-up algorithms treat each data as a singleton cluster at the outset and
then successively agglomerate pairs of clusters until all clusters have been merged into a
single cluster that contains all data.
Steps:

• Consider each alphabet as a single cluster and calculate the distance of one cluster
from all the other clusters.
• In the second step, comparable clusters are merged together to form a single cluster.
Let’s say cluster (B) and cluster (C) are very similar to each other therefore we merge
them in the second step similarly to cluster (D) and (E) and at last, we get the clusters
[(A), (BC), (DE), (F)]
• We recalculate the proximity according to the algorithm and merge the two nearest
clusters([(DE), (F)]) together to form new clusters as [(A), (BC), (DEF)]

Hierarchical Divisive clustering:

It is also known as a top-down approach. This algorithm also does not require to prespecify
the number of clusters. Top-down clustering requires a method for splitting a cluster that
contains the whole data and proceeds by splitting clusters recursively until individual data
have been split into singleton clusters.

Computing Distance Matrix

While merging two clusters we check the distance between two every pair of clusters and
merge the pair with the least distance/most similarity. But the question is how is that
distance determined. There are different ways of defining Inter Cluster distance/similarity.
Some of them are:

1. Min Distance: Find the minimum distance between any two points of the cluster.
2. Max Distance: Find the maximum distance between any two points of the cluster.
3. Group Average: Find the average distance between every two points of the clusters.
4. Ward’s Method: The similarity of two clusters is based on the increase in squared
error when two clusters are merged.
3) What is Partitioning Method (K-Mean) in Data Mining? Explain K-
Mean.
Ans:
Partitioning Method:

This clustering method classifies the information into multiple groups based on the
characteristics and similarity of the data. Its the data analysts to specify the number of clusters
that has to be generated for the clustering methods. In the partitioning method when
database(D) that contains multiple(N) objects then the partitioning method constructs user-
specified(K) partitions of the data in which each partition represents a cluster and a particular
region.

K-Mean (A centroid based Technique):

The K means algorithm takes the input parameter K from the user and partitions the dataset
containing N objects into K clusters so that resulting similarity among the data objects inside
the group (intracluster) is high but the similarity of data objects with the data objects from
outside the cluster is low (intercluster). The similarity of the cluster is determined with respect
to the mean value of the cluster. It is a type of square error algorithm. At the start randomly k
objects from the dataset are chosen in which each of the objects represents a cluster
mean(centre).

Algorithm:
Method:
1. Randomly assign K objects from the dataset(D) as cluster centres(C)
2. (Re) Assign each object to which object is most similar based upon mean values.
3. Update Cluster means, i.e., Recalculate the mean of each cluster with the updated
values.
4. Repeat Step 2 until no change occurs.
Example: Suppose we want to group the visitors to a website using just their age as
follows:
16, 16, 17, 20, 20, 21, 21, 22, 23, 29, 36, 41, 42, 43, 44, 45, 61, 62, 66
Initial Cluster:
K=2
Centroid(C1) = 16 [16]
Centroid(C2) = 22 [22]
Note: These two points are chosen randomly from the dataset.
Iteration-1:
C1 = 16.33 [16, 16, 17]
C2 = 37.25 [20, 20, 21, 21, 22, 23, 29, 36, 41, 42, 43, 44, 45, 61, 62, 66]
Iteration-2:
C1 = 19.55 [16, 16, 17, 20, 20, 21, 21, 22, 23]
C2 = 46.90 [29, 36, 41, 42, 43, 44, 45, 61, 62, 66]
Iteration-3:
C1 = 20.50 [16, 16, 17, 20, 20, 21, 21, 22, 23, 29]
C2 = 48.89 [36, 41, 42, 43, 44, 45, 61, 62, 66]
Iteration-4:
C1 = 20.50 [16, 16, 17, 20, 20, 21, 21, 22, 23, 29]
C2 = 48.89 [36, 41, 42, 43, 44, 45, 61, 62, 66]
No change Between Iteration 3 and 4, so we stop. Therefore we get the clusters (16-
29) and (36-66) as 2 clusters we get using K Mean Algorithm.
4) (a) While performing K-means clustering, how do you determine the
value of K?
(b) How would you perform K-Means on very large data sets?
© How would you pre process the data for K-Means?
(d) Briefly explain about K-Medoids.
Ans:
a.DETERMINING THE VALUE OF K:

Elbow Method
• Process:
1. Perform K-means clustering for a range of K values (e.g., K=1K to K=10K).
2. Compute the within-cluster sum of squares (WCSS) for each K. WCSS
measures the total variance within clusters.
3. Plot Kversus WCSS.
4. Identify the "elbow point" where the rate of decrease in WCSS sharply changes
(i.e., the curve starts flattening). This point suggests the optimal K.
b. Performing K-mean on Very large data set:

Performing K-means clustering on very large datasets requires modifications and optimizations
to handle the computational and memory demands
1. Use Mini-Batch K-Means
• Mini-Batch K-means is a variation of K-means that processes small, random subsets
(mini-batches) of the dataset instead of the entire dataset at each iteration.
• Advantages:
o Faster convergence.
o Significantly reduces memory usage.
• Drawback: Might slightly reduce clustering accuracy compared to full K-means.

2. Dimensionality Reduction
• High-dimensional data increases computational complexity. Reduce dimensionality
using techniques like:
o Principal Component Analysis (PCA).
o t-SNE (for visualization).
o Truncated SVD (for sparse data).
• Perform K-means on the reduced dataset.
• Benefits:
o Speeds up clustering.
o Can help reduce noise in the data.

3. Initialize with Efficient Methods

• The K-means++ initialization improves convergence speed by choosing initial
centroids that are spread out.
• Scalable initialization algorithms like Fast K-means++ can further enhance
performance on large datasets.

4. Cluster on a Sampled Subset

• Randomly sample a subset of the data for initial clustering.
• Use the resulting centroids to cluster the entire dataset.
• Steps:
1. Perform K-means on a random subset.
2. Use the obtained centroids to assign all data points.
• Tradeoff: Risk of missing rare patterns in the data.

c. Pre process the data in K-mean:

To preprocess data for K-Means, follow these simple steps:

1. Handle Missing Values
o Replace missing values with the mean, median, or mode of the
respective column.
2. Normalize/Standardize the Data
o Scale features to have the same range (e.g., using Min-Max scaling or
Z-score normalization) since K-Means uses distance metrics sensitive to
magnitude.
3.Remove Outliers (Optional)
o Identify and handle outliers as they can distort cluster centroids. Use
methods like z-scores or IQR filtering.

4.Convert Categorical Data

o If the dataset has categorical variables, encode them using one-hot
encoding or label encoding.
5.Reduce Dimensionality (Optional)
• For high-dimensional data, reduce the number of features using PCA or similar
techniques.

d. K-Medoids:
K-Medoids is a clustering algorithm similar to K-Means, but instead of using the mean of
the data points to represent a cluster (as in K-Means), it uses an actual data point, called the
medoid, as the cluster center.

Key Points:
1. Medoid: The most representative data point in a cluster, minimizing the total
distance to other points in the cluster.
2. Distance Metric: It works with any distance metric (e.g., Euclidean, Manhattan),
making it more robust for non-numeric or irregular data.
3. Robust to Outliers: Since it uses actual data points as centers, it's less sensitive to
outliers compared to K-Means.
How It Works:
1. Initialize KK random medoids (data points).
2. Assign each data point to the nearest medoid based on the chosen distance metric.
3. Update the medoids by selecting a new data point within the cluster that minimizes
the total distance to other points.
4. Repeat steps 2–3 until the medoids stabilize.
5) What is DBSCAN? Describe the parameters required for DBSCAN
Algorithm. Briefly explain the steps used in DBSCAN Algorithm.
Ans:

DBSCAN:
Clustering analysis or simply Clustering is basically an Unsupervised learning method that
divides the data points into a number of specific batches or groups, such that the data points in
the same groups have similar properties and data points in different groups have different
properties in some sense.

Clusters are dense regions in the data space, separated by regions of the lower density of points.
The DBSCAN algorithm is based on this intuitive notion of “clusters” and “noise”.

Parameters Required For DBSCAN Algorithm:

1. eps: It defines the neighborhood around a data point i.e. if the distance between two
points is lower or equal to ‘eps’ then they are considered neighbors. If the eps value is
chosen too small then a large part of the data will be considered as an outlier. If it is
chosen very large then the clusters will merge and the majority of the data points will
be in the same clusters. One way to find the eps value is based on the k-distance graph.
2. MinPts: Minimum number of neighbors (data points) within eps radius. The larger the
dataset, the larger value of MinPts must be chosen. As a general rule, the minimum
MinPts can be derived from the number of dimensions D in the dataset as, MinPts >=
D+1. The minimum value of MinPts must be chosen at least 3.
Steps Used In DBSCAN Algorithm:

1. Find all the neighbor points within eps and identify the core points or visited with
more than MinPts neighbors.
2. For each core point if it is not already assigned to a cluster, create a new cluster.
3. Find recursively all its density-connected points and assign them to the same
cluster as the core point.
A point a and b are said to be density connected if there exists a point c which has
a sufficient number of points in its neighbors and both points a and b are within
the eps distance. This is a chaining process. So, if b is a neighbor of c, c is a
neighbor of d, and d is a neighbor of e, which in turn is neighbor of a implying
that b is a neighbor of a.
4. Iterate through the remaining unvisited points in the dataset. Those points that do
not belong to any cluster are noise.
6) What is an Association Rule Learning Algorithm? How it will works?
Explain the process of finding pattern using association rule with suitable
examples. Explain how the Market Basket Analysis uses the concepts of
association analysis.
Ans:
Association Rule Learning Algorithm:

Association Rule Learning is a machine learning technique used to find interesting

relationships or patterns (rules) between items in large datasets. It is widely used in
transactional databases to discover frequent itemsets and correlations.

Works:

The goal is to identify rules of the form:

If (Condition A), then (Condition B)
Where AA and BB are sets of items.
Example: "If a customer buys bread, they are likely to buy butter."

Key Metrics:
1. Support: Measures how often items appear together.
Support(A → B)=Transactions with A and B/Total Transactions

2. Confidence: Measures the likelihood of BB given AA.

Confidence(A → B)=Transactions with A and B/Transactions with A

3. Lift: Measures the strength of the rule compared to random chance.

Lift(A → B)=Confidence(A → B)/Support(B)

Process of Finding Patterns Using Association Rules:

1. Data Preparation:

Organize the dataset into a transactional format, e.g., a list of items purchased in each
transaction.
2. Find Frequent Itemsets:

Use algorithms like Apriori or FP-Growth to find itemsets that occur frequently
together (based on Support).

3. Generate Rules:

From the frequent itemsets, generate association rules that meet the thresholds for
Confidence and Lift.

4. Evaluate Rules:

Rank rules based on metrics like Lift or Confidence to identify the most valuable
patterns.

Example:
Dataset of transactions:
Transaction Items Purchased
1 Bread, Butter, Milk
2 Bread, Butter
3 Milk, Bread
4 Butter, Milk

Step 1: Frequent Itemsets

• Bread & Butter: Support = 2/4 = 0.5
• Bread & Milk: Support = 2/4 = 0.5

Step 2: Generate Rules

• Rule: "If Bread → Butter"
o Confidence = 2/3 = 0.67
Step 3: Evaluate
• Lift: Higher Lift indicates a stronger association compared to random chance.

Market Basket Analysis and Association Rules:

Market Basket Analysis applies association rule learning to understand customer purchasing
behavior.

Concept:
• Identifies products often bought together (e.g., "diapers and beer").
• Helps businesses with:
1. Cross-Selling: Suggest related items.
2. Store Layout Optimization: Place frequently purchased items closer.
3. Promotions: Offer discounts on associated items.

Example:
Rule: "If a customer buys a smartphone, they are likely to buy a phone case."
• Action: Recommend phone cases during smartphone checkout.
In summary, association rule learning discovers valuable patterns in data, enabling
businesses to make data-driven decisions.
7) In detail Compare Hierarchical clustering and K-means clustering.
Ans:
Comparison of Hierarchical Clustering and K-Means Clustering:

Aspect Hierarchical Clustering K-Means Clustering

Definition A clustering algorithm that builds a A centroid-based clustering
hierarchy of clusters, either by algorithm that partitions data
merging or splitting them. into K clusters based on
distance.
Approach - Divisive (Top-Down): Start with all Partitional: Divides data into
data as one cluster and split. K clusters in a flat manner,
- Agglomerative (Bottom-Up): Start without hierarchy.
with each data point as its own
cluster and merge.
Output A dendrogram (tree-like structure) A flat K-partitioned
showing clusters at various levels. clustering.
Number of Clusters No need to pre-define K. The Requires pre-specifying the
(K) dendrogram allows choosing the number of clusters K.
number of clusters visually.
Cluster Shape Can detect clusters of arbitrary Assumes clusters are
shapes. spherical (circular or convex
shapes).
Algorithm Type Hierarchical and deterministic (if no Iterative and non-
tie occurs). deterministic (results may
vary with different
initializations).
Scalability Computationally expensive for large Efficient for large datasets
datasets (O(n^2) or worse). (O(n×k×i))where i is the
number of iterations).
8) Write the differences between DBSCAN and K-Means.
Ans:
Differences between DBSCAN and K-Means:

DBSCAN K-Means

K-Means is very sensitive to the

In DBSCAN we need not specify the number
number of clusters so it
of clusters.
need to specified

Clusters formed in K-Means are

Clusters formed in DBSCAN can be of any arbitrary
spherical or
shape.
convex in shape

K-Means does not work well

DBSCAN can work well with datasets having noise and with outliers data. Outliers
outliers can skew the clusters in K-
Means to a very large extent.

In K-Means only one parameter

In DBSCAN two parameters are required for training
is required is for training
the Model
the model
9) Compare Supervised Vs Unsupervised learning
Ans:
Comparison of Supervised vs. Unsupervised Learning:

Aspect Supervised Learning Unsupervised Learning

Definition Learning without labeled
Learning with labeled data
data, focusing on finding
(input-output pairs).
hidden patterns.
Goal Predict outcomes or Group similar data points or
classify data based on discover underlying
labeled training data. structure.
Data Requirement Requires labeled data. Works with unlabeled data.
Output Predicts specific outcomes Produces clusters, patterns,
(classes or values). or reduced dimensions.
Techniques - Classification (e.g., - Clustering (e.g., K-Means,
Decision Trees, SVM, Hierarchical Clustering).
Logistic Regression). - Dimensionality Reduction
- Regression (e.g., Linear (e.g., PCA).
Regression).
Examples - Spam email classification. - Customer segmentation.
- Predicting house prices. - Market Basket Analysis.
- Diagnosing diseases. - Anomaly detection.
Performance Evaluation Use metrics like accuracy, Evaluate using internal
precision, recall, and metrics (e.g., silhouette
RMSE. score) or visual inspection.
Dependency on Labels Completely depends on No need for labeled data;
labeled data for learning. learns patterns from the
structure of data.
Applications - Fraud detection. - Grouping customers by
- Weather forecasting. behavior.
- Sentiment analysis. - Reducing data for
visualization.

Unit - 26 - Machine - Learning - Assignment - 01 (1) Alish
No ratings yet
Unit - 26 - Machine - Learning - Assignment - 01 (1) Alish
42 pages
Course 572
No ratings yet
Course 572
8 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Unit 4
No ratings yet
Unit 4
74 pages
M5
No ratings yet
M5
40 pages
Machine Learning Notes Anna University
100% (1)
Machine Learning Notes Anna University
14 pages
M5
No ratings yet
M5
40 pages
Clustering
No ratings yet
Clustering
38 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Clustering
No ratings yet
Clustering
75 pages
ML Unit III
No ratings yet
ML Unit III
82 pages
Clustering
No ratings yet
Clustering
28 pages
Clustering
No ratings yet
Clustering
39 pages
ML Unit Iii
No ratings yet
ML Unit Iii
12 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Module5 QB 1
No ratings yet
Module5 QB 1
21 pages
Cluster Evaluation Techniques: Atds Assignment
No ratings yet
Cluster Evaluation Techniques: Atds Assignment
4 pages
Clustering
No ratings yet
Clustering
80 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
Unsupervised Machine Learning Techniques
No ratings yet
Unsupervised Machine Learning Techniques
58 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
Cluster
100% (1)
Cluster
72 pages
ML - 8
No ratings yet
ML - 8
70 pages
Unit 4
No ratings yet
Unit 4
4 pages
Clustering
No ratings yet
Clustering
7 pages
Data Mining Unit 3 Cluster Analysis: Types of Clusters
No ratings yet
Data Mining Unit 3 Cluster Analysis: Types of Clusters
11 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
Clustering
No ratings yet
Clustering
75 pages
Cluster
No ratings yet
Cluster
20 pages
13 Unsupervised Learning
No ratings yet
13 Unsupervised Learning
9 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
Chapter 6
No ratings yet
Chapter 6
54 pages
Data Mining - Chapter 4 Cluster Analysis
No ratings yet
Data Mining - Chapter 4 Cluster Analysis
37 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Clustering
No ratings yet
Clustering
11 pages
Chapter 5 Clustering
No ratings yet
Chapter 5 Clustering
40 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Unit 5
No ratings yet
Unit 5
5 pages
DMDWUNITV
No ratings yet
DMDWUNITV
72 pages
Unit 4
No ratings yet
Unit 4
16 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
DWMModule 4
No ratings yet
DWMModule 4
31 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Clustering
No ratings yet
Clustering
34 pages
Unit 5 Cluster Analysis
No ratings yet
Unit 5 Cluster Analysis
15 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
17 pages
Unit 5
No ratings yet
Unit 5
63 pages
Text Analytics Unit-3
No ratings yet
Text Analytics Unit-3
11 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
Unit 5
No ratings yet
Unit 5
85 pages
Data Clustering..
No ratings yet
Data Clustering..
10 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Avs 2 2012 Question Papers
No ratings yet
Avs 2 2012 Question Papers
76 pages
E-Commerce Customer Segmentation Via Unsupervised Machine Learning
No ratings yet
E-Commerce Customer Segmentation Via Unsupervised Machine Learning
7 pages
Value of Fuzzy Logic For Data Mining and Machine Learning - A Case Study
No ratings yet
Value of Fuzzy Logic For Data Mining and Machine Learning - A Case Study
11 pages
Association Rules & Sequential Patterns: Road Map
No ratings yet
Association Rules & Sequential Patterns: Road Map
24 pages
A New Drug-Shelf Arrangement For Reducing Medicati
No ratings yet
A New Drug-Shelf Arrangement For Reducing Medicati
9 pages
Jurnal Mental Health 1
No ratings yet
Jurnal Mental Health 1
10 pages
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
No ratings yet
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
66 pages
Intelligent Data Warehousing From Data Preparation To Data Mining 1st Edition Zhengxin Chen (Author) PDF Download
100% (2)
Intelligent Data Warehousing From Data Preparation To Data Mining 1st Edition Zhengxin Chen (Author) PDF Download
71 pages
Data Mining 1 Practical File-1
No ratings yet
Data Mining 1 Practical File-1
24 pages
Apriori Mapreduce
No ratings yet
Apriori Mapreduce
6 pages
CS8091 Big Data Analytics
No ratings yet
CS8091 Big Data Analytics
28 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
8 pages
Unit 5 Introduction To Data Mining: Prashasti Kanikar 9/26/2020
No ratings yet
Unit 5 Introduction To Data Mining: Prashasti Kanikar 9/26/2020
16 pages
Data Mining and Data Warehousing IT509: Instructor: Rosy Das (Sarmah)
No ratings yet
Data Mining and Data Warehousing IT509: Instructor: Rosy Das (Sarmah)
48 pages
AITools Unit 2
No ratings yet
AITools Unit 2
34 pages
Statistical Learning - Introduction
No ratings yet
Statistical Learning - Introduction
20 pages
Bcse208l Data-Mining TH 1.0 71 Bcse208l 66 Acp
No ratings yet
Bcse208l Data-Mining TH 1.0 71 Bcse208l 66 Acp
2 pages
Association Analysis: Basic Concepts and Algorithms: Problem Definition
No ratings yet
Association Analysis: Basic Concepts and Algorithms: Problem Definition
15 pages
Unit 3 Data Warehouse
No ratings yet
Unit 3 Data Warehouse
17 pages
DM - Unit-I R16
No ratings yet
DM - Unit-I R16
39 pages
A Classification Model For Predicting The Suitable Study Track For School Students
No ratings yet
A Classification Model For Predicting The Suitable Study Track For School Students
6 pages
4-4 Autonomous Syllabus R-15 250418
No ratings yet
4-4 Autonomous Syllabus R-15 250418
44 pages
MCA-SEM-III-Syllabus Mobile Computing
No ratings yet
MCA-SEM-III-Syllabus Mobile Computing
12 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Big Data Analytics 2016th Edition Radha Shankarmani 2024 Scribd Download
No ratings yet
Big Data Analytics 2016th Edition Radha Shankarmani 2024 Scribd Download
72 pages
Lesson 8 Association Rules
No ratings yet
Lesson 8 Association Rules
58 pages
Data Mining and KDD
No ratings yet
Data Mining and KDD
15 pages
Chapter 1
No ratings yet
Chapter 1
55 pages

U-5 Iml

Uploaded by

U-5 Iml

Uploaded by

U-5 IML Questions

Differences Between Clustering and Classification Tasks:

Aspect Clustering Classification

Types of Hierarchical Clustering:

Hierarchical Agglomerative Clustering:

It is also known as the bottom-up approach or hierarchical agglomerative clustering

Hierarchical Divisive clustering:

Computing Distance Matrix

K-Mean (A centroid based Technique):

3. Initialize with Efficient Methods

4. Cluster on a Sampled Subset

c. Pre process the data in K-mean:

To preprocess data for K-Means, follow these simple steps:

4.Convert Categorical Data

Parameters Required For DBSCAN Algorithm:

Association Rule Learning is a machine learning technique used to find interesting

The goal is to identify rules of the form:

2. Confidence: Measures the likelihood of BB given AA.

3. Lift: Measures the strength of the rule compared to random chance.

Process of Finding Patterns Using Association Rules:

Step 1: Frequent Itemsets

Step 2: Generate Rules

Market Basket Analysis and Association Rules:

Aspect Hierarchical Clustering K-Means Clustering

K-Means is very sensitive to the

Clusters formed in K-Means are

K-Means does not work well

In K-Means only one parameter

Aspect Supervised Learning Unsupervised Learning

You might also like