ML U5
ML U5
0 Unsupervised Learning
Unsupervised Learning
Unsupervised learning is a type of machine learning where algorithms discover
patterns, relationships, or groupings within data without prior knowledge of labeled
outcomes.
Types of Unsupervised Learning
Algorithms
Applications
Evaluation Metrics
Unsupervised Learning
Key Differences
Comparison Table
Real-World Examples
Supervised Learning:
Unsupervised Learning:
Clustering Techniques
Clustering is a type of unsupervised learning that groups similar data points into
clusters.
Types of Clustering Techniques
1. Hierarchical Clustering
Agglomerative (bottom-up)
Divisive (top-down)
Visualizes cluster hierarchy using dendrograms
2. K-Means Clustering
Partitional clustering
Simple, efficient, and widely used
Requires specifying number of clusters (k)
1. Silhouette Coefficient
2. Calinski-Harabasz Index
3. Davies-Bouldin Index
4. Elbow Method
Partitioning Methods
Partitioning methods are a type of clustering algorithm that divide data into non-
overlapping subsets or clusters.
Types of Partitioning Methods
1. K-Means Clustering
Simple, efficient, and widely used
Requires specifying number of clusters (k)
Sensitive to initial centroids
2. K-Medoids
Similar to K-Means, but uses medoids (objects) instead of centroids
More robust to outliers
3. K-Modes
Extension of K-Means for categorical data
Uses modes instead of means
5. Claro
Robust to outliers and noise
Handles varying cluster densities
Advantages
Disadvantages
Real-World Applications
Hierarchical Methods
Hierarchical methods are a type of clustering algorithm that build a hierarchy of
clusters by merging or splitting existing clusters.
Types of Hierarchical Methods
3. Ward's Method
AHC with Ward's linkage
Minimizes within-cluster variance
4. Single Linkage
AHC with single linkage
Merges clusters based on closest points
5. Complete Linkage
AHC with complete linkage
Merges clusters based on farthest points
6. Average Linkage
AHC with average linkage
Merges clusters based on average distance
Advantages
Disadvantages
Real-World Applications
Density-Based Methods
Density-based methods are a type of clustering algorithm that groups data points into
clusters based on their density and proximity.
Advantages
Real-World Applications
K-Means Algorithm
K-Means is a popular unsupervised learning algorithm for clustering data points into
K distinct groups.
How K-Means Works
Elbow Method
The Elbow Method is a visual technique used to determine the optimal number of
clusters (K) in K-Means clustering.
How Elbow Method Works
1. Compute Distortion : Calculate the sum of squared errors (SSE) or distortion for
different values of K.
2. Plot Elbow Curve : Plot SSE against K.
3. Identify Elbow Point : Choose K at the elbow point, where the rate of decrease of
SSE becomes less pronounced.
K-Medoids Algorithm
K-Medoids is a clustering algorithm that partitions data into K clusters based on
similarity.
How K-Medoids Works
Agglomerative Clustering
Agglomerative clustering is a type of hierarchical clustering algorithm that builds
clusters by merging existing clusters.
How Agglomerative Clustering Works
Divisive clustering! That's a fascinating topic in data analysis and machine learning.
Divisive clustering is a type of hierarchical clustering algorithm that splits data into
smaller clusters by dividing them into more homogeneous subsets. Unlike
agglomerative clustering, which starts with individual data points and merges them,
divisive clustering begins with the entire dataset and recursively divides it.
Key characteristics:
1. Top-down approach
2. Divides the dataset into smaller clusters
3. Focuses on splitting clusters based on dissimilarity
Algorithms:
Some popular divisive clustering algorithms include:
Advantages:
Applications:
1. Image segmentation
2. Gene expression analysis
3. Customer segmentation
4. Network analysis
Challenges:
1. Apriori Algorithm
2. Eclat Algorithm
3. FP-Growth Algorithm
4. Association Rule Mining (ARM)
5. Correlation Analysis (e.g., Pearson, Spearman)
Key Concepts:
Applications:
Benefits:
Challenges:
Real-World Examples:
5.4.1 Common terms for association rule (pattern, itemset, support, count)
Additional Terms:
Understanding these terms will help you work effectively with association rule mining
algorithms and interpret results.
Example: {Milk, Bread} → {Butter} (Support: 60%, Confidence: 80%, Lift: 2.5)
Interpretation: 60% of transactions contain Milk, Bread, and Butter. 80% of
transactions with Milk and Bread also contain Butter. Butter is 2.5 times more likely
to be purchased with Milk and Bread.
Types of Association Rules:
Evaluation Metrics:
1. Support
2. Confidence
3. Lift
4. Conviction
5. Interest Factor
Algorithms:
1. Apriori
2. Eclat
3. FP-Growth
4. ARM (Association Rule Mining)
Applications:
Benefits:
Challenges:
Apriori Algorithm!
Overview:
The Apriori algorithm is a popular association rule mining technique used to discover
frequent itemsets and generate association rules. It's an efficient, scalable, and
widely used algorithm.
Key Steps:
1. Data Preparation: Transform data into transactions (e.g., customer purchases).
2. Itemset Generation: Find frequent itemsets (1-item, 2-item, ...).
3. Rule Generation: Derive association rules from frequent itemsets.
4. Rule Filtering: Filter rules based on support, confidence, and lift.
5.4.4 Strengths and Weaknesses of Apriori algorithm
Weaknesses:
Disadvantages:
1. Generates many redundant rules: Apriori produces multiple rules with similar
antecedents and consequents.
2. Requires careful parameter tuning: Minimum support and confidence thresholds
need careful adjustment.
3. Not suitable for very large itemsets: Performance degrades with extremely large
itemsets.
4. Sensitive to minimum support threshold: Setting threshold too low/high affects
results.
5. Does not handle sequential patterns: Apriori focuses on co-occurrence, not
sequential patterns.
6. Limited handling of categorical variables: Apriori assumes binary (0/1) variables.
Business Applications:
Scientific Applications:
Healthcare Applications:
Financial Applications:
Other Applications: