0% found this document useful (0 votes)

46 views46 pages

AIMLB PGP 2024 Session 12

import info on this topic

Uploaded by

VAKUL SINGH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views46 pages

AIMLB PGP 2024 Session 12

import info on this topic

Uploaded by

VAKUL SINGH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

Artificial Intelligence and

Machine Learning for

Business (AIMLB)
Mukul Gupta
(Information Systems Area)
What is customer segmentation?

• Grouping customers based on shared

characteristics.
• This allows companies to refine their messaging, sales
strategies, and products to target, advertise, and sell to
those audiences more effectively.
Customer segmentation strategy

• STP approach: Segmentation, Targeting, and

Positioning
Customer Segmentation Techniques
• Roughly, three categories:
• Rule-based segmentation
• Based on manually designed rules.
• Segmentation is not portable to other analyses. So, with a new
goal, new knowledge, or new data, the whole rule system needs to
be redesigned.
• For example, if you want to categorize your customers based on the number
of days since their first order
• Segmentation using binning
• Binning data based on one or more features.
• This does not necessarily require domain knowledge, but the
business goal must be clear.
• Segmentation with zero knowledge
• Common clustering algorithms can be applied.
What is Cluster Analysis?
• Given a set of objects, place them in groups such
that the objects in a group are similar (or related) to
one another and different from (or unrelated to) the
objects in other groups
Inter-cluster
Intra-cluster distances are
distances are maximized
minimized
Applications of Cluster Analysis
• Understanding
Discovered Clusters Industry Group
Applied-Matl-DOWN,Bay-Network-Down,3-COM-DOWN,

1 Cabletron-Sys-DOWN,CISCO-DOWN,HP-DOWN,

• Group related DSC-Comm-DOWN,INTEL-DOWN,LSI-Logic-DOWN,

Micron-Tech-DOWN,Texas-Inst-Down,Tellabs-Inc-Down,
Natl-Semiconduct-DOWN,Oracl-DOWN,SGI-DOWN,
Technology1-DOWN

documents for Sun-DOWN

Apple-Comp-DOWN,Autodesk-DOWN,DEC-DOWN,

browsing, group genes 2 ADV-Micro-Device-DOWN,Andrew-Corp-DOWN,

Computer-Assoc-DOWN,Circuit-City-DOWN,
Compaq-DOWN, EMC-Corp-DOWN, Gen-Inst-DOWN,
Technology2-DOWN

and proteins that have Motorola-DOWN,Microsoft-DOWN,Scientific-Atl-DOWN

Fannie-Mae-DOWN,Fed-Home-Loan-DOWN,

similar functionality, or 3 MBNA-Corp-DOWN,Morgan-Stanley-DOWN Financial-DOWN

Baker-Hughes-UP,Dresser-Inds-UP,Halliburton-HLD-UP,
group stocks with 4 Louisiana-Land-UP,Phillips-Petro-UP,Unocal-UP,
Schlumberger-UP
Oil-UP

similar price
fluctuations

• Summarization
• Reduce the size of
large data sets Clustering precipitation
in Australia
Notion of a Cluster can be Ambiguous

How many clusters? Six Clusters

Two Clusters Four Clusters

Types of Clustering

• A clustering is a set of clusters

• Important distinction between hierarchical and
partitional sets of clusters
• Partitional Clustering
• A division of data objects into non-overlapping subsets (clusters)
• Hierarchical clustering
• A set of nested clusters organized as a hierarchical tree
Partitional Clustering

Original Points A Partitional Clustering

Hierarchical Clustering

p1
p3 p4
p2
p1 p2 p3 p4

Hierarchical Clustering Dendrogram

K-means Clustering
• Partitional clustering approach
• Number of clusters, 𝐾, must be specified
• Each cluster is associated with a centroid (center point)
• The center of a cluster is often a centroid, the average of all the points
in the cluster
• Each point is assigned to the cluster with the closest centroid
• The basic algorithm is very simple
Example of K-means Clustering
Iteration 6
1
2
3
4
5
3

2.5

1.5
y

0.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

x
Example of K-means Clustering
Iteration 1 Iteration 2 Iteration 3
3 3 3

2.5 2.5 2.5

2 2 2

1.5 1.5 1.5

y
1 1 1

0.5 0.5 0.5

0 0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x

Iteration 4 Iteration 5 Iteration 6

3 3 3

2.5 2.5 2.5

2 2 2

1.5 1.5 1.5

y
1 1 1

0.5 0.5 0.5

0 0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
K-means Clustering – Details
• Simple iterative algorithm.
• Choose initial centroids;
• repeat {assign each point to a nearest centroid; re-compute cluster
centroids}
• until centroids stop changing.
• Initial centroids are often chosen randomly.
• Clusters produced can vary from one run to another
• K-means will converge for common proximity measures with
appropriately defined centroid.
• Most of the convergence happens in the first few iterations.
• Often the stopping condition is changed to ‘Until relatively few points
change clusters’
• Complexity is 𝑂(𝑛 ∗ 𝐾 ∗ 𝐼 ∗ 𝑑)
•𝑛 = number of points, 𝐾 = number of clusters,
𝐼 = number of iterations, 𝑑 = number of attributes
K-means Objective Function
• A common objective function (used with Euclidean
distance measure) is Sum of Squared Error (SSE)
• For each point, the error is the distance to the nearest
cluster center
• To get SSE, we square these errors and sum them.
K
SSE =   dist 2 (mi , x )
i =1 xCi

• 𝑥 is a data point in cluster 𝐶𝑖 and 𝑚𝑖 is the centroid

(mean) for cluster 𝐶𝑖
• SSE improves in each iteration of K-means until it
reaches a local or global minima.
Two different K-means Clustering
3

2.5

2
Original Points
1.5

y
1

0.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

3 3

2.5 2.5

2 2

1.5 1.5
y

1
y 1

0.5 0.5

0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2

x x

Optimal Clustering Sub-optimal Clustering

Importance of Choosing Initial Centroids
…
Iteration 5
1
2
3
4
3

2.5

1.5
y

0.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

x
Importance of Choosing Initial Centroids
…
Iteration 1 Iteration 2
3 3

2.5 2.5

2 2

1.5 1.5
y

y
1 1

0.5 0.5

0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2

x x

Iteration 3 Iteration 4 Iteration 5

3 3 3

2.5 2.5 2.5

2 2 2

1.5 1.5 1.5

y
1 1 1

0.5 0.5 0.5

0 0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
Limitations of K-means

• K-means has problems when clusters are of

differing
• Sizes
• Densities
• Non-globular shapes

• K-means has problems when the data contains

outliers.
• One possible solution is to remove outliers before
clustering
Limitations of K-means: Differing Sizes

Original Points K-means (3 Clusters)

Limitations of K-means: Differing
Density

Original Points K-means (3 Clusters)

Limitations of K-means: Non-globular
Shapes

Original Points K-means (2 Clusters)

Hierarchical Clustering

• Produces a set of nested clusters organized as a

hierarchical tree
• Can be visualized as a dendrogram
• A tree like diagram that records the sequences of merges
or splits
6 5
0.2
4
3 4
0.15 2
5
2
0.1

1
0.05
3 1

0
1 3 2 5 4 6
Strengths of Hierarchical Clustering

• Do not have to assume any particular number of

clusters
• Any desired number of clusters can be obtained by
‘cutting’ the dendrogram at the proper level

• They may correspond to meaningful taxonomies

• Example in biological sciences (e.g., animal kingdom,
phylogeny reconstruction, …)
Hierarchical Clustering
• Two main types of hierarchical clustering
• Agglomerative:
• Start with the points as individual clusters
• At each step, merge the closest pair of clusters until only one
cluster (or 𝑘 clusters) left
• Divisive:
• Start with one, all-inclusive cluster
• At each step, split a cluster until each cluster contains an
individual point (or there are 𝑘 clusters)

• Traditional hierarchical algorithms use a similarity or

distance matrix
• Merge or split one cluster at a time
Hierarchical Clustering
Hierarchical Clustering
Agglomerative Clustering Algorithm
• Key Idea: Successively merge closest clusters

• Basic algorithm
1. Compute the proximity matrix
2. Let each data point be a cluster
3. Repeat
4. Merge the two closest clusters
5. Update the proximity matrix
6. Until only a single cluster remains

• Key operation is the computation of the proximity of two

clusters
• Different approaches to define the distance between clusters
distinguish the different algorithms
Steps 1 and 2

• Start with clusters of individual points and a

proximity matrix p1 p2 p3 p4 p5 ...
p1

p2
p3

p4
p5
.
.
. Proximity Matrix

...
p1 p2 p3 p4 p9 p10 p11 p12
Intermediate Situation

• After some merging steps, we have some clusters

C1 C2 C3 C4 C5
C1

C2
C3
C3
C4
C4
C5

Proximity Matrix
C1

C2 C5

...
p1 p2 p3 p4 p9 p10 p11 p12
Step 4
• We want to merge the two closest clusters (C2 and
C5) and update the proximity matrix.
C1 C2 C3 C4 C5
C1

C3 C3

C4 C4
C5

C1
Proximity Matrix

C2 C5

...
p1 p2 p3 p4 p9 p10 p11 p12
Step 5

• The question is “How do we update the proximity

matrix?” C2
U
C1 C5 C3 C4

C1 ?

C2 U C5 ? ? ? ?
C3
C3 ?
C4
C4 ?

Proximity Matrix
C1

C2 U C5

...
p1 p2 p3 p4 p9 p10 p11 p12
How to Define Inter-Cluster Similarity

p1 p2 p3 p4 p5 ...
p1
Similarity?
p2

p5
MIN (Single linkage)
.
MAX (Complete linkage)
.
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity

p1 p2 p3 p4 p5 ...
p1

p5
MIN
.
MAX
.
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity

p1 p2 p3 p4 p5 ...
p1

p5
MIN
.
MAX
.
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity

p1 p2 p3 p4 p5 ...
p1

p5
MIN
.
MAX
.
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity

p1 p2 p3 p4 p5 ...
p1

  p2

p5
MIN
.
MAX
.
Group Average .
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
– Ward’s Method uses squared distance
MIN, MAX, and Group Average

• MIN
• Can handle non-elliptical shapes
• Sensitive to noise and outliers
• MAX
• Less susceptible to noise and outliers
• Tends to break large clusters and biased towards globular
clusters
• Group average
• Compromise between MIN and MAX
• Less susceptible to noise and outliers
• Biased towards globular clusters
Hierarchical Clustering: Comparison
5
1 4 1
3
2 5
5 5
2 1 2
MIN MAX
2 3 6 3 6
3
1
4 4
4

5
1 5 4 1
2 2
5 Ward’s Method 5
2 2
3 6 Group Average 3 6
3
4 1 1
4 4
3
Hierarchical Clustering: Time and
Space requirements
• 𝑂 𝑁 2 space since it uses the proximity matrix.
• 𝑁 is the number of points.

• 𝑂 𝑁 3 time in many cases

• There are 𝑁 steps and at each step the size, 𝑁 2 ,
proximity matrix must be updated and searched
• Complexity can be reduced to 𝑂 𝑁 2 log 𝑁 time with
some cleverness
Hierarchical Clustering: Problems and
Limitations
• Once a decision is made to combine two clusters, it
cannot be undone

• No global objective function is directly minimized

• Different schemes have problems with one or more

of the following:
• Sensitivity to noise
• Difficulty handling clusters of different sizes and non-
globular shapes
• Breaking large clusters
Measures of Cluster Validity: Cohesion
and Separation
• Cluster Cohesion: Measures how closely related are
objects in a cluster
• Example: SSE
• Cluster Separation: Measure how distinct or well-
separated a cluster is from other clusters
• Example: Squared Error
• Cohesion is measured by within cluster sum of squares (SSE)
2
𝑆𝑆𝐸 = ෍ ෍ 𝑥 − 𝑚𝑖
𝑖 𝑥∈𝐶𝑖
• Separation is measured by the between cluster sum of squares
(SSB)
2
𝑆𝑆𝐵 = ෍ 𝐶𝑖 𝑚 − 𝑚𝑖
𝑖
Where 𝐶𝑖 is the size of cluster 𝑖
Unsupervised Measures: Cohesion and
Separation
• Example: SSE and SSB
• SSB + SSE = constant
m
  
1 m1 2 3 4 m2 5

K=1 cluster: 𝑆𝑆𝐸 = 1 − 3 2

+ 2−3 2
+ 4−3 2
+ 5−3 2
= 10
𝑆𝑆𝐵 = 4 × 3 − 3 2 = 0
𝑇𝑜𝑡𝑎𝑙 = 10 + 0 = 10
K=2 clusters: 𝑆𝑆𝐸 = 1 − 1.5 2 + 2 − 1.5 2 + 4 − 4.5 2 + 5 − 4.5 2 = 1
𝑆𝑆𝐵 = 2 × 3 − 1.5 2 + 2 × 3 − 4.5 2 =9
𝑇𝑜𝑡𝑎𝑙 = 1 + 9 = 10
Unsupervised Measures: Silhouette
Coefficient
• Silhouette coefficient combines ideas of both cohesion and
separation, but for individual points, as well as clusters and
clusterings
• For an individual point, 𝑖
• Calculate 𝒂 = average distance of 𝑖 to the points in its cluster
• Calculate 𝒃 = min (average distance of 𝑖 to points in another cluster)
• The silhouette coefficient for a point is then given by
s = (b – a) / max(a,b) Distances used
to calculate b
i
• Value can vary between -1 and 1
• Typically ranges between 0 and 1. Distances used
to calculate a
• The closer to 1 the better.

• Can calculate the average silhouette coefficient for a cluster

or a clustering
Determining the Number of Clusters

• SSE is good for comparing two clusterings or two

clusters
• SSE can also be used to estimate the number of
clusters
10

6 9

8
4
7

2 6

SSE
5
0
4
-2 3

2
-4
1
-6 0
2 5 10 15 20 25 30
5 10 15
K
Thank You

Cluster Analysis 1731695796
No ratings yet
Cluster Analysis 1731695796
91 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Unsupervised Machine Learning Techniques
No ratings yet
Unsupervised Machine Learning Techniques
58 pages
Clustering
No ratings yet
Clustering
80 pages
Module 5
No ratings yet
Module 5
43 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
L11 Cluster Analysis
No ratings yet
L11 Cluster Analysis
47 pages
VW t6 Fitting Locations Eng
100% (1)
VW t6 Fitting Locations Eng
148 pages
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
110 pages
Lecture 9 Clustering
No ratings yet
Lecture 9 Clustering
36 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
8.cluster Analysis HCA
No ratings yet
8.cluster Analysis HCA
31 pages
Machine Learning Notes Anna University
100% (1)
Machine Learning Notes Anna University
14 pages
Clustering
No ratings yet
Clustering
34 pages
Clustering Part1
No ratings yet
Clustering Part1
79 pages
Clustering Basics
No ratings yet
Clustering Basics
39 pages
Clustering
No ratings yet
Clustering
38 pages
Datamining Lect8
No ratings yet
Datamining Lect8
79 pages
Poradnik Nauczyciela
No ratings yet
Poradnik Nauczyciela
159 pages
DMDWUNITV
No ratings yet
DMDWUNITV
72 pages
Clustering
No ratings yet
Clustering
65 pages
Chap7 Basic Cluster Analysis
No ratings yet
Chap7 Basic Cluster Analysis
82 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Clustering
No ratings yet
Clustering
104 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
Chap15 Cluster Analysis
No ratings yet
Chap15 Cluster Analysis
55 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Data Mining Cluster Analysis: Basic Concepts and Algorithms
No ratings yet
Data Mining Cluster Analysis: Basic Concepts and Algorithms
26 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Week 07 Lecture Material
No ratings yet
Week 07 Lecture Material
49 pages
ML - 8
No ratings yet
ML - 8
70 pages
Unit 5
No ratings yet
Unit 5
63 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Clustering
No ratings yet
Clustering
29 pages
M5
No ratings yet
M5
40 pages
Lect 12
No ratings yet
Lect 12
80 pages
Clustering
No ratings yet
Clustering
75 pages
M5
No ratings yet
M5
40 pages
Grouping
No ratings yet
Grouping
98 pages
Clustering
No ratings yet
Clustering
75 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Clustering
No ratings yet
Clustering
12 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
108 pages
Clustering
No ratings yet
Clustering
84 pages
U-5 Iml
No ratings yet
U-5 Iml
20 pages
Irrigant Delivery Systems
No ratings yet
Irrigant Delivery Systems
54 pages
UNIT5
No ratings yet
UNIT5
60 pages
Time Current Coordination Curves
100% (2)
Time Current Coordination Curves
2 pages
Clustering
No ratings yet
Clustering
28 pages
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
93 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Feuerbach
No ratings yet
Feuerbach
9 pages
PF6005
100% (2)
PF6005
1 page
Acm (Folded Plates)
No ratings yet
Acm (Folded Plates)
18 pages
IGCSE Petroleum Chemistry WS PDF
No ratings yet
IGCSE Petroleum Chemistry WS PDF
3 pages
NK Accessories Cardiology GR PDF
100% (1)
NK Accessories Cardiology GR PDF
101 pages
Passed Final
No ratings yet
Passed Final
41 pages
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
No ratings yet
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
45 pages
Clustering
No ratings yet
Clustering
39 pages
Cluster
100% (1)
Cluster
72 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
Squint
No ratings yet
Squint
7 pages
Banking: SBI PO: Preliminary
No ratings yet
Banking: SBI PO: Preliminary
6 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
Your Infleunce A2 Worksheets UNITS 5&6
No ratings yet
Your Infleunce A2 Worksheets UNITS 5&6
33 pages
10.4 Applications of Numerical Methods Applications of Gaussian Elimination With Pivoting
No ratings yet
10.4 Applications of Numerical Methods Applications of Gaussian Elimination With Pivoting
11 pages
g12 Humss 122 Week 1 To 9
No ratings yet
g12 Humss 122 Week 1 To 9
12 pages
Reservoir Fluid Properties
No ratings yet
Reservoir Fluid Properties
9 pages
Labtech Italy Freeze Dryers Brochure
No ratings yet
Labtech Italy Freeze Dryers Brochure
14 pages
Japanese Pottery and Porcelain
No ratings yet
Japanese Pottery and Porcelain
3 pages
Causes of Cardiac Arrest
No ratings yet
Causes of Cardiac Arrest
9 pages
Lab 2 Logic Gates: Objectives
No ratings yet
Lab 2 Logic Gates: Objectives
8 pages
Fac Till Session 6 Abhinav Notes
No ratings yet
Fac Till Session 6 Abhinav Notes
15 pages
Example Case Comp Slide Deck 1
No ratings yet
Example Case Comp Slide Deck 1
19 pages
Lecture 6
No ratings yet
Lecture 6
28 pages
Synthite Recipes For Success 8 17
No ratings yet
Synthite Recipes For Success 8 17
36 pages
Lesson Plan Electricity and Mag
No ratings yet
Lesson Plan Electricity and Mag
5 pages
T3 MR Course Outline
No ratings yet
T3 MR Course Outline
5 pages
OpenStack A Comprehensive Cloud Platform
No ratings yet
OpenStack A Comprehensive Cloud Platform
8 pages
LG 507 Governance of Culture Industries 2025
No ratings yet
LG 507 Governance of Culture Industries 2025
4 pages
APSY-353 Developmental Psychology
No ratings yet
APSY-353 Developmental Psychology
3 pages
Chapter 4
No ratings yet
Chapter 4
28 pages
Nimbex
No ratings yet
Nimbex
35 pages
Quiz 2 IPS2021 Solutions
No ratings yet
Quiz 2 IPS2021 Solutions
5 pages
T3 SCM Course Outline
No ratings yet
T3 SCM Course Outline
5 pages
Behavioural Brain Research
No ratings yet
Behavioural Brain Research
11 pages
Vital Signs
No ratings yet
Vital Signs
5 pages
Zig Programming: From Zero to Systems Master
From Everand
Zig Programming: From Zero to Systems Master
Niklas Hoffmann
No ratings yet
Master Roblox Studio Advanced Game Development Techniques: Roblox Studio, #3
From Everand
Master Roblox Studio Advanced Game Development Techniques: Roblox Studio, #3
Steven Mcananey
No ratings yet

AIMLB PGP 2024 Session 12

Uploaded by

AIMLB PGP 2024 Session 12

Uploaded by

Artificial Intelligence and

Machine Learning for

• Grouping customers based on shared

• STP approach: Segmentation, Targeting, and

• Group related DSC-Comm-DOWN,INTEL-DOWN,LSI-Logic-DOWN,

documents for Sun-DOWN

browsing, group genes 2 ADV-Micro-Device-DOWN,Andrew-Corp-DOWN,

and proteins that have Motorola-DOWN,Microsoft-DOWN,Scientific-Atl-DOWN

similar functionality, or 3 MBNA-Corp-DOWN,Morgan-Stanley-DOWN Financial-DOWN

How many clusters? Six Clusters

Two Clusters Four Clusters

• A clustering is a set of clusters

Original Points A Partitional Clustering

Hierarchical Clustering Dendrogram

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

2.5 2.5 2.5

1.5 1.5 1.5

0.5 0.5 0.5

Iteration 4 Iteration 5 Iteration 6

2.5 2.5 2.5

1.5 1.5 1.5

0.5 0.5 0.5

• 𝑥 is a data point in cluster 𝐶𝑖 and 𝑚𝑖 is the centroid

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Optimal Clustering Sub-optimal Clustering

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Iteration 3 Iteration 4 Iteration 5

2.5 2.5 2.5

1.5 1.5 1.5

0.5 0.5 0.5

• K-means has problems when clusters are of

• K-means has problems when the data contains

Original Points K-means (3 Clusters)

Original Points K-means (3 Clusters)

Original Points K-means (2 Clusters)

• Produces a set of nested clusters organized as a

• Do not have to assume any particular number of

• They may correspond to meaningful taxonomies

• Traditional hierarchical algorithms use a similarity or

• Key operation is the computation of the proximity of two

• Start with clusters of individual points and a

• After some merging steps, we have some clusters

• The question is “How do we update the proximity

• 𝑂 𝑁 3 time in many cases

• No global objective function is directly minimized

• Different schemes have problems with one or more

K=1 cluster: 𝑆𝑆𝐸 = 1 − 3 2

• Can calculate the average silhouette coefficient for a cluster

• SSE is good for comparing two clusterings or two

You might also like