0% found this document useful (0 votes)
58 views4 pages

18CSE397T - Computational Data Analysis Unit - 4: Session - 7: SLO - 1

Hierarchical clustering is an unsupervised learning technique used to segment customers without a target variable. It works by grouping customers into hierarchies of clusters based on the similarity of their characteristics. The algorithm groups the most similar customers into clusters, which are then grouped with other clusters that are most similar, forming a tree of clusters known as a dendrogram. Hierarchical clustering is advantageous over other clustering methods as it provides a visual representation of the clusters.

Uploaded by

HoShang PAtel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views4 pages

18CSE397T - Computational Data Analysis Unit - 4: Session - 7: SLO - 1

Hierarchical clustering is an unsupervised learning technique used to segment customers without a target variable. It works by grouping customers into hierarchies of clusters based on the similarity of their characteristics. The algorithm groups the most similar customers into clusters, which are then grouped with other clusters that are most similar, forming a tree of clusters known as a dendrogram. Hierarchical clustering is advantageous over other clustering methods as it provides a visual representation of the clusters.

Uploaded by

HoShang PAtel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

18CSE397T– Computational Data Analysis

Unit – 4: Session – 7 : SLO – 1


Coordinate Proximal Gradient

Introduction

It is crucial to understand customer behavior in any industry. I realized this last year
when my chief marketing officer asked me – “Can you tell me which existing
customers should we target for our new product?”

That was quite a learning curve for me. I quickly realized as a data scientist how
important it is to segment customers so my organization can tailor and build targeted
strategies. This is where the concept of clustering came in ever so handy!

Problems like segmenting customers are often deceptively tricky because we are not
working with any target variable in mind. We are officially in the land of
unsupervised learning where we need to figure out patterns and structures without a
set outcome in mind. It’s both challenging and thrilling as a data scientist.
Now, there are a few different ways to perform clustering (as you’ll see below). I will
introduce you to one such type in this article – hierarchical clustering.

We will learn what hierarchical clustering is, its advantage over the other clustering
algorithms, the different types of hierarchical clustering and the steps to perform it.
We will finally take up a customer segmentation dataset and then implement
hierarchical clustering in Python. I love this technique and I’m sure you will too after
this article!

Supervised vs Unsupervised Learning

It’s important to understand the difference between supervised and


unsupervised learningunsupervised learning before we dive into hierarchical
clustering. Let me explain this difference using a simple example.

Suppose we want to estimate the count of bikes that will be rented in a city
every day:
Or, let’s say we want to predict whether a person on board the Titanic
survived or not:

We have a fixed target to achieve in both these examples:

 In the first example, we have to predict the count of bikes based on


features like the season, holiday, workingday, weather, temp, etc.
 We are predicting whether a passenger survived or not in the second
example. In the ‘Survived’ variable, 0 represents that the person did not
survive and 1 means the person did make it out alive. The independent
variables here include Pclass, Sex, Age, Fare, etc.

So, when we are given a target variable (count and Survival in the above two
cases) which we have to predict based on a given set of predictors or
independent variables (season, holiday, Sex, Age, etc.), such problems are
called supervised learning problems.
Let’s look at the figure below to understand this visually:

Here, y is our dependent or target variable, and X represents the independent


variables. The target variable is dependent on X and hence it is also called a
dependent variable. We train our model using the independent variables in
the supervision of the target variable and hence the name supervised
learning.

Our aim, when training the model, is to generate a function that maps the
independent variables to the desired target. Once the model is trained, we can
pass new sets of observations and the model will predict the target for them.
This, in a nutshell, is supervised learning.

There might be situations when we do not have any target variable to predict.
Such problems, without any explicit target variable, are known as unsupervised
learning problems. We only have the independent variables and no
target/dependent variable in these problems.

You might also like