Unit 4
Unit 4
Clustering
Clustering or cluster analysis is the task of grouping a set of objects in such a way
that objects in the same group (called a cluster) are more similar (in some sense)
to each other than to those in other groups (clusters). Clustering is a main task of
exploratory data mining and used in many fields, including machine learning,
pattern recognition, image analysis, information retrieval, bioinformatics, data
compression, and computer graphics. It can be achieved by various algorithms that
differ significantly in their notion of what constitutes a cluster and how to efficiently
find them. Popular notions of clusters include groups with small distances between
cluster members, dense areas of the data space, etc.
Examples of data with natural clusters
In many applications, there will naturally be several groups or clusters in samples.
1. Consider the case of optical character recognition: There are two ways of writing
the digit 7; the American writing is ‘7’, whereas the European writing style has a
horizontal bar in the middle (something like ). In such a case, when the sample
contains examples from both continents, the sample will contain two clusters or
groups one corresponding to the American 7 and the other corresponding to the
European .
2. In speech recognition, where the same word can be uttered in different ways,
due to different pronunciation, accent, gender, age, and so forth, there is not a
single, universal prototype. In a large sample of utterances of a specific word, All
the different ways should be represented in the sample.
k-means clustering
Outline
The k-means clustering algorithm is one of the simplest unsupervised learning
algorithms for solving the clustering problem. Let it be required to classify a given
data set into a certain number of clusters, say, k clusters. We start by choosing k
points arbitrarily as the “centres” of the clusters, one for each cluster. We then
associate each of the given data points with the nearest centre. We now take the
averages of the data points associated with a centre and replace the centre with
the average, and this is done for each of the centres. We repeat the process until
the centres converge to some fixed points. The data points nearest to the centres
form the various clusters in the dataset. Each cluster is represented by the
associated centre
Example
We illustrate the algorithm in the case where there are only two variables so that
the data points and cluster centres can be geometrically represented by points in a
coordinate plane. The distance between the points (x1, x2) and (y1, y2) will be
calculated using the familiar distance formula of elementary analytical geometry:
Problem
Use k-means clustering algorithm to divide the following data into two clusters and
also compute the the representative data points for the clusters.
Solution
3. We compute the distances of the given data points from the cluster centers.
How to choose the value of "K number of clusters" in K-
means Clustering?
The performance of the K-means clustering algorithm depends upon highly efficient
clusters that it forms. But choosing the optimal number of clusters is a big task. There are
some different ways to find the optimal number of clusters, but here we are discussing
the most appropriate method to find the number of clusters or value of K. The method is
given below:
Elbow Method
The Elbow method is one of the most popular ways to find the optimal number of clusters.
This method uses the concept of WCSS value. WCSS stands for Within Cluster Sum of
Squares, which defines the total variations within a cluster. The formula to calculate the
value of WCSS (for 3 clusters) is given below:
∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of the distances between each data
point and its centroid within a cluster1 and the same for the other two terms.
To measure the distance between data points and centroid, we can use any method such
as Euclidean distance or Manhattan distance.
To find the optimal value of clusters, the elbow method follows the below steps:
o It executes the K-means clustering on a given dataset for different K values (ranges from
1-10).
o For each value of K, calculates the WCSS value.
o Plots a curve between calculated WCSS values and the number of clusters K.
o The sharp point of bend or a point of the plot looks like an arm, then that point is
considered as the best value of K.
Since the graph shows the sharp bend, which looks like an elbow, hence it is known as
the elbow method. The graph for the elbow method looks like the below image:
Before implementation, let's understand what type of problem we will solve here.
So, we have a dataset of Mall_Customers, which is the data of customers who
visit the mall and spend there.
In the given dataset, we have Customer_Id, Gender, Age, Annual Income ($), and
Spending Score (which is the calculated value of how much a customer has spent
in the mall, the more the value, the more he has spent). From this dataset, we need
to calculate some patterns, as it is an unsupervised method, so we don't know what
to calculate exactly.
o Data Pre-processing
o Finding the optimal number of clusters using the elbow method
o Training the K-means algorithm on the training dataset
o Visualizing the clusters
The first step will be the data pre-processing, as we did in our earlier topics of
Regression and Classification. But for the clustering problem, it will be different
from other models. Let's discuss it:
o Importing Libraries
As we did in previous topics, firstly, we will import the libraries for our
model, which is part of data pre-processing. The code is given below:
By executing the above lines of code, we will get our dataset in the Spyder IDE.
The dataset looks like the below image:
Here we don't need any dependent variable for data pre-processing step as it is a
clustering problem, and we have no idea about what to determine. So we will just add a
line of code for the matrix of features.
1. x = dataset.iloc[:, [3, 4]].values
As we can see, we are extracting only 3rd and 4th feature. It is because we need a 2d plot
to visualize the model, and some features are not required, such as customer_id.
As we know, the elbow method uses the WCSS concept to draw the plot by plotting WCSS
values on the Y-axis and the number of clusters on the X-axis. So we are going to calculate
the value for WCSS for different k values ranging from 1 to 10. Below is the code for it:
As we can see in the above code, we have used the KMeans class of sklearn. cluster library
to form the clusters.
Next, we have created the wcss_list variable to initialize an empty list, which is used to
contain the value of wcss computed for different values of k ranging from 1 to 10.
After that, we have initialized the for loop for the iteration on a different value of k ranging
from 1 to 10; since for loop in Python, exclude the outbound limit, so it is taken as 11 to
include 10th value.
The rest part of the code is similar as we did in earlier topics, as we have fitted the model
on a matrix of features and then plotted the graph between the number of clusters and
WCSS.
Output: After executing the above code, we will get the below output:
From the above plot, we can see the elbow point is at 5. So the number of clusters here
will be 5.
Step- 3: Training the K-means algorithm on the training
dataset
As we have got the number of clusters, so we can now train the model on the dataset.
To train the model, we will use the same two lines of code as we have used in the above
section, but here instead of using i, we will use 5, as we know there are 5 clusters that
need to be formed. The code is given below:
The first line is the same as above for creating the object of KMeans class.
In the second line of code, we have created the dependent variable y_predict to train the
model.
By executing the above lines of code, we will get the y_predict variable. We can check it
under the variable explorer option in the Spyder IDE. We can now compare the values
of y_predict with our original dataset. Consider the below image:
From the above image, we can now relate that the CustomerID 1 belongs to a cluster
3(as index starts from 0, hence 2 will be considered as 3), and 2 belongs to cluster 4, and
so on.
To visualize the clusters will use scatter plot using mtp.scatter() function of matplotlib.
In above lines of code, we have written code for each clusters, ranging from 1 to 5. The
first coordinate of the mtp.scatter, i.e., x[y_predict == 0, 0] containing the x value for the
showing the matrix of features values, and the y_predict is ranging from 0 to 1.
Output:
The output image is clearly showing the five different clusters with different colors. The
clusters are formed between two parameters of the dataset; Annual income of customer
and Spending. We can change the colors and labels as per the requirement or choice. We
can also observe some points from the above patterns, which are given below:
o Cluster1 shows the customers with average salary and average spending so we can
categorize these customers as
o Cluster2 shows the customer has a high income but low spending, so we can categorize
them as careful.
o Cluster3 shows the low income and also low spending so they can be categorized as
sensible.
o Cluster4 shows the customers with low income with very high spending so they can be
categorized as careless.
o Cluster5 shows the customers with high income and high spending so they can be
categorized as target, and these customers can be the most profitable customers for the
mall owner.
In this algorithm, we develop the hierarchy of clusters in the form of a tree, and this tree-
shaped structure is known as the dendrogram.
Sometimes the results of K-means clustering and hierarchical clustering may look similar,
but they both differ depending on how they work. As there is no requirement to
predetermine the number of clusters as we did in the K-Means algorithm.
o Step-2: Take two closest data points or clusters and merge them to form one cluster. So,
there will now be N-1 clusters.
o Step-3: Again, take the two closest clusters and merge them together to form one cluster.
There will be N-2 clusters.
o Step-4: Repeat Step 3 until only one cluster left. So, we will get the following clusters.
Consider the below images:
o Step-5: Once all the clusters are combined into one big cluster, develop the dendrogram
to divide the clusters as per the problem.
1. Single Linkage: It is the Shortest Distance between the closest points of the clusters.
Consider the below image:
2. Complete Linkage: It is the farthest distance between the two points of two different
clusters. It is one of the popular linkage methods as it forms tighter clusters than single-
linkage.
3. Average Linkage: It is the linkage method in which the distance between each pair of
datasets is added up and then divided by the total number of datasets to calculate the
average distance between two clusters. It is also one of the most popular linkage methods.
4. Centroid Linkage: It is the linkage method in which the distance between the centroid of
the clusters is calculated. Consider the below image:
From the above-given approaches, we can apply any of them according to the type of
problem or business requirement.
The working of the dendrogram can be explained using the below diagram:
In the above diagram, the left part is showing how clusters are created in agglomerative
clustering, and the right part is showing the corresponding dendrogram.
o As we have discussed above, firstly, the datapoints P2 and P3 combine together and form
a cluster, correspondingly a dendrogram is created, which connects P2 and P3 with a
rectangular shape. The hight is decided according to the Euclidean distance between the
data points.
o In the next step, P5 and P6 form a cluster, and the corresponding dendrogram is created.
It is higher than of previous, as the Euclidean distance between P5 and P6 is a little bit
greater than the P2 and P3.
o Again, two new dendrograms are created that combine P1, P2, and P3 in one dendrogram,
and P4, P5, and P6, in another dendrogram.
o At last, the final dendrogram is created that combines all the data points together.
We can cut the dendrogram tree structure at any level as per our requirement.
1. Data Pre-processing
2. Finding the optimal number of clusters using the Dendrogram
3. Training the hierarchical clustering model
4. Visualizing the clusters
The above lines of code are used to import the libraries to perform specific tasks, such
as numpy for the Mathematical operations, matplotlib for drawing the graphs or scatter
plot, and pandas for importing the dataset.
Here we will extract only the matrix of features as we don't have any further information
about the dependent variable. Code is given below:
Here we have extracted only 3 and 4 columns as we will use a 2D plot to see the clusters.
So, we are considering the Annual income and spending score as the matrix of features.
Step-2: Finding the optimal number of clusters using the
Dendrogram
Now we will find the optimal number of clusters using the Dendrogram for our model.
For this, we are going to use scipy library as it provides a function that will directly return
the dendrogram for our code. Consider the below lines of code:
In the above lines of code, we have imported the hierarchy module of scipy library. This
module provides us a method shc.denrogram(), which takes the linkage() as a
parameter. The linkage function is used to define the distance between two clusters, so
here we have passed the x(matrix of features), and method "ward," the popular method
of linkage in hierarchical clustering.
The remaining lines of code are to describe the labels for the dendrogram plot.
Output:
By executing the above lines of code, we will get the below output:
Using this Dendrogram, we will now determine the optimal number of clusters for our
model. For this, we will find the maximum vertical distance that does not cut any
horizontal bar. Consider the below diagram:
In the above diagram, we have shown the vertical distances that are not cutting their
horizontal bars. As we can visualize, the 4th distance is looking the maximum, so according
to this, the number of clusters will be 5(the vertical lines in this range). We can also take
the 2nd number as it approximately equals the 4th distance, but we will consider the 5
clusters because the same we calculated in the K-means algorithm.
So, the optimal number of clusters will be 5, and we will train the model in the next
step, using the same.
Then we have created the object of this class named as hc. The AgglomerativeClustering
class takes the following parameters:
o n_clusters=5: It defines the number of clusters, and we have taken here 5 because it is the
optimal number of clusters.
o affinity='euclidean': It is a metric used to compute the linkage.
o linkage='ward': It defines the linkage criteria, here we have used the "ward" linkage. This
method is the popular linkage method that we have already used for creating the
Dendrogram. It reduces the variance in each cluster.
In the last line, we have created the dependent variable y_pred to fit or train the model. It
does train not only the model but also returns the clusters to which each data point
belongs.
After executing the above lines of code, if we go through the variable explorer option in
our Sypder IDE, we can check the y_pred variable. We can compare the original dataset
with the y_pred variable. Consider the below image:
As we can see in the above image, the y_pred shows the clusters value, which means the
customer id 1 belongs to the 5th cluster (as indexing starts from 0, so 4 means 5th cluster),
the customer id 2 belongs to 4th cluster, and so on.
Here we will use the same lines of code as we did in k-means clustering, except one
change. Here we will not plot the centroid that we did in k-means, because here we have
used dendrogram to determine the optimal number of clusters. The code is given below:
Output: By executing the above lines of code, we will get the below output:
The association rule learning is one of the very important concepts of machine learning,
and it is employed in Market Basket analysis, Web usage mining, continuous
production, etc. Here market basket analysis is a technique used by the various big
retailer to discover the associations between items. We can understand it by taking an
example of a supermarket, as in a supermarket, all products that are purchased together
are put together.
For example, if a customer buys bread, he most likely can also buy butter, eggs, or milk,
so these products are stored within a shelf or mostly nearby. Consider the below diagram:
1. Apriori
2. Eclat
3. F-P Growth Algorithm
o Support
o Confidence
o Lift
Support
Support is the frequency of A or how frequently an item appears in the dataset. It is
defined as the fraction of the transaction T that contains the itemset X. If there are X
datasets, then for transactions T, it can be written as:
Confidence
Confidence indicates how often the rule has been found to be true. Or how often the
items X and Y occur together in the dataset when the occurrence of X is already given. It
is the ratio of the transaction that contains X and Y to the number of records that contain
X.
Lift
It is the strength of any rule, which can be defined as below formula:
It is the ratio of the observed support measure and expected support if X and Y are
independent of each other. It has three possible values:
Apriori Algorithm
This algorithm uses frequent datasets to generate association rules. It is designed to work
on the databases that contain transactions. This algorithm uses a breadth-first search and
Hash Tree to calculate the itemset efficiently.
It is mainly used for market basket analysis and helps to understand the products that can
be bought together. It can also be used in the healthcare field to find drug reactions for
patients.
Eclat Algorithm
Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a
depth-first search technique to find frequent itemsets in a transaction database. It
performs faster execution than Apriori Algorithm.
o Market Basket Analysis: It is one of the popular examples and applications of association
rule mining. This technique is commonly used by big retailers to determine the association
between items.
o Medical Diagnosis: With the help of association rules, patients can be cured easily, as it
helps in identifying the probability of illness for a particular disease.
o Protein Sequence: The association rules help in determining the synthesis of artificial
Proteins.
o It is also used for the Catalog Design and Loss-leader Analysis and many more other
applications.
DataSet:
Dimensionality Reduction
A dataset contains a huge number of input features in various cases, which makes the
predictive modeling task more complicated. Because it is very difficult to visualize or make
predictions for the training dataset with a high number of features, for such cases,
dimensionality reduction techniques are required to use.
Dimensionality reduction technique can be defined as, "It is a way of converting the
higher dimensions dataset into lesser dimensions dataset ensuring that it provides
similar information." These techniques are widely used in machine learning
for obtaining a better fit predictive model while solving the classification and regression problems.
It is commonly used in the fields that deal with high-dimensional data, such as speech
recognition, signal processing, bioinformatics, etc. It can also be used for data
visualization, noise reduction, cluster analysis, etc.
The Curse of Dimensionality
Handling the high-dimensional data is very difficult in practice, commonly known as
the curse of dimensionality. If the dimensionality of the input dataset increases, any
machine learning algorithm and model becomes more complex. As the number of
features increases, the number of samples also gets increased proportionally, and the
chance of overfitting also increases. If the machine learning model is trained on high-
dimensional data, it becomes overfitted and results in poor performance.
Hence, it is often required to reduce the number of features, which can be done with
dimensionality reduction.
o By reducing the dimensions of the features, the space required to store the dataset
also gets reduced.
o Less Computation training time is required for reduced dimensions of features.
o Reduced dimensions of features of the dataset help in visualizing the data quickly.
o It removes the redundant features (if present) by taking care of multicollinearity.
. It is a statistical process that converts the observations of correlated features into a set of linearly
uncorrelated features with the help of orthogonal transformation. These new transformed features are
called the Principal Components. It is one of the popular tools that is used for exploratory data
analysis and predictive modeling. It is a technique to draw strong patterns from the given dataset by
reducing the variances.
PCA generally tries to find the lower-dimensional surface to project the high-dimensional
data.
PCA works by considering the variance of each attribute because the high attribute shows
the good split between the classes, and hence it reduces the dimensionality. Some real-
world applications of PCA are image processing, movie recommendation system,
optimizing the power allocation in various communication channels. It is a feature
extraction technique, so it contains the important variables and drops the least important
variable.
o The principal component must be the linear combination of the original features.
o These components are orthogonal, i.e., the correlation between a pair of variables is zero.
o The importance of each component decreases when going to 1 to n, it means the 1 PC has
the most importance, and n PC will have the least importance.
Ensemble learning
Ensemble methods combine different decision trees to deliver better predictive results,
afterward utilizing a single decision tree. The primary principle behind the ensemble
model is that a group of weak learners come together to form an active learner.
There are two techniques given below that are used to perform ensemble decision tree.
Bagging
Bagging is used when our objective is to reduce the variance of a decision tree. Here the
concept is to create a few subsets of data from the training sample, which is chosen
randomly with replacement. Now each collection of subset data is used to prepare their
decision trees thus, we end up with an ensemble of various models. The average of all the
assumptions from numerous tress is used, which is more powerful than a single decision
tree.
Random Forest
Random Forest is an expansion over bagging. It takes one additional step to predict a
random subset of data. It also makes the random selection of features rather than using
all features to develop trees. When we have numerous random trees, it is called the
Random Forest.
These are the following steps which are taken to implement a Random forest:
o Let us consider X observations Y features in the training data set. First, a model from the
training data set is taken randomly with substitution.
o The tree is developed to the largest.
o The given steps are repeated, and prediction is given, which is based on the collection of
predictions from n number of trees.
Since the last prediction depends on the mean predictions from subset trees, it won't give
precise value for the regression model.
Boosting:
Boosting is another ensemble procedure to make a collection of predictors. In other
words, we fit consecutive trees, usually random samples, and at each step, the objective
is to solve net error from the prior trees.
If a given input is misclassified by theory, then its weight is increased so that the upcoming
hypothesis is more likely to classify it correctly by consolidating the entire set at last
converts weak learners into better performing models.
It utilizes a gradient descent algorithm that can optimize any differentiable loss function.
An ensemble of trees is constructed individually, and individual trees are summed
successively. The next tree tries to restore the loss ( It is the difference between actual and
predicted values).
Algorithm:
1. Initialise the dataset and assign equal weight to each of the data point.
2. Provide this as input to the model and identify the wrongly classified data
points.
3. Increase the weight of the wrongly classified data points and decrease the
weights of correctly classified data points. And then normalize the weights of
all data points.
4. if (got required results)
Goto step 5
else
Goto step 2
5. End
Meta-learning can help machine learning algorithms deal with these challenges
by optimizing and finding learning algorithms that perform better.
Working of Meta-learning
In general, a meta-learning algorithm is trained using the outputs (i.e., model
predictions) and metadata of machine learning algorithms. After training is
done, its skills are sent for testing and used to make end/final predictions.
Meta-learning includes tasks such as
For example, we may want to train a machine learning model to label discrete
breeds of dogs.
For example, if a user listens to rock music every day, his youtube
recommendation feed will get full of rock music and music of related genres.
In this, items are ranked according to their relevancy and the most relevant ones
are recommended to the user. The recommendation system must assess the
relevance, which is primarily based on past data. Just like the rock music thing
we just saw.
Collaborative filtering
As a result, all past data about user interactions with target objects will be fed
into a collaborative filtering system. This information is usually recorded as a
matrix, with the rows representing users and the columns representing items.
The basic premise of such systems is that the users' previous data should be
sufficient to generate a prediction. That is, we don't require anything other than
historical data, no more user input, no current trending data, and so on.
• Memory Based
Memory-based methods are the most basic because they use no model at
all. They assume that predictions can be made based solely on "memory"
of past data and typically use a simple distance-measurement approach,
such as the nearest neighbor
• Model Based
Now let us jump to the main course of our discussion, which is a second
category of recommender system, i.e., content-based recommendation
system. Before that understand the challenges of the recommendation
system.
Here, the system uses your features and likes in order to recommend you with
things that you might like. It uses the information provided by you over the
internet and the ones they are able to gather and then they curate
recommendations according to that.
The goal behind content-based filtering is to classify products with specific
keywords, learn what the customer likes, look up those terms in the database,
and then recommend similar things.
Example
Suppose I am a fan of the Harry Potter series and watch only such kinds of
movies on the internet. When my data will be gathered from Google or
Wikipedia, it will be found out that I am a fan of fantasy movies. Therefore, my
recommendation will be filled with fantasy movies. Among all the movies, the
ones best for me will be curated and then recommended to me.
Suppose there are two movies, one is Fantastic Beasts and the other is
Shawshank Redemption, then according to my preference of fantasy movies,
the Fantastic Beasts will recommend to me.
Let us suppose you read a crime thriller book by Agatha Christie, you
review it on the internet. Also, you review one more fictional book of the
comedy genre with it and review the crime thriller books as good and the
comedy one as bad.
With this information, the next book recommendation you will get will be
of crime thriller genres most probably as they are the highest rated genres
for you.
For this ranking system, a user vector is created which ranks the
information provided by you. After this, an item vector is created where
books are ranked according to their genres on it.
With the vector, every book name is assigned a certain value by multiplying
and getting the dot product of the user and item vector, and the value is
then used for recommendation.
Like this, the dot products of all the available books searched by you are
ranked and according to it the top 5 or top 10 books are assigned.
Based on the user data, we first look at the author name and it is not
Agatha Christie. Then, the genre is not a crime thriller, nor is it the type of
book you ever reviewed. With these classifications, we conclude that this
book shouldn’t be recommended to you.
• The model can only give suggestions based on the user's current
interests. To put it another way, the model's potential to build on the
users' existing interests is limited.
• Since it must align the features of a user's profile with available products,
content-based filtering offers only a small amount of novelty.
Only item profiles are generated in the case of item-based filtering, and
users are recommended items that are close to what they rate or search
for, rather than their previous background. A perfect content-based
filtering system can reveal nothing surprising or unexpected.
What is Collaborative Filtering? Types, Working and
Case Study
What is Collaborative Filtering?
Ever thought about how e-commerce sites recommend products to their customers while
they are looking for something exactly like that? Ever wondered how Netflix recommends
similar movies based on what we have recently watched or added to our watchlist?
Artificial Intelligence technology has advanced to such an extent that the world can be
With various techniques like deep learning, machine learning, and artificial neural
networks, artificial intelligence tools and techniques have enabled the internet to become
In this respect, it has also enabled the internet to recommend users or items to netizens
A variety of machine learning applications and software use recommender systems that
are empowered by machine learning techniques and tools for recommending their users’
Broadly, there are 2 types of recommendation techniques that are in use as of now. First,
content-based filtering requires users to enter data that is then processed to produce
desired outputs.
For instance, a user logs in to his/her Netflix account and enters "Hollywood Romantic
Movies'' in the search bar. The results obtained from this search are procured with the
help of the content-based approach that works on the basis of content inputs.
Second, the technique of collaborative filtering implies that computers produce outputs
based on a user's past interaction on a platform. Herein, we shall understand this with an
example.
Let us suppose that an individual is inclined towards romanticism and likes to watch
movies belonging to the romantic genre on his Netflix account. Perhaps whenever he logs
in to his account, he will see a separate section that will only display recommended
Therefore, the technique of collaborative filtering filters information and infers from the
past interaction of a user to recommend similar items or content material. In this blog,
Filtering that further recommends items or users based on a user's historical browsing
data. Be it Instagram that recommends people we may know or similar clothing items
that resemble the items that we've just added to our carts, collaborative filtering is a
The key to this technique is Collaborative Filtering that has only emerged in the 21st
technique that helps a computer to filter information based on past interactions and data
Simply put, collaborative filtering algorithms produce similar results based on the user's
historical data. For instance, it has been established that a user is interested in Pop
songs.
Perhaps the collaborative filtering algorithms in music streaming applications will record
this interaction of the user and interpret that the user prefers Pop Genre over other
genres.
The recommendation system built-in with this technique will display other popular songs
traits.
"It's based on the idea that people who agreed in their evaluation of certain
items are likely to agree again in the future." Collaborative Filtering in
Recommender Systems
As collaborative filtering procures its results from implicit data, it is able to retrieve
information that users otherwise might not provide. The first class of collaborative
This approach narrows down users with the help of collaborative filtering that has
Moreover, this approach is also employed for targeted ads and suggested items
based on other users who have similar choices and preferences. Among the various
Facebook. This category recommends people that users might know based on their
By measuring similarity among products and inferring respective ratings, items are
This class of collaborative filtering was invented and first used by Amazon in 1998.
Even today, e-commerce sites like Amazon and Flipkart use item-based
Filtering in this approach works effectively and presents users with legitimate
In this segment, we will be looking at various real-world case studies that will help us to
1. FACEBOOK
A social networking site that was launched in the year 2004, Facebook has pioneered the
world of social networking that aims to connect people from one corner of the world to
another.
Currently led by Mark Zuckerberg, Facebook uses numerous techniques of AI that
have advanced the social networking site. However, one of the most striking
2. AMAZON
the best machine learning tools for better performance and enhanced user
Since an e-commerce platform like Amazon has millions of users surfing through
the platform, this technique is of great use to the company and its users. With a
colossal technological interface, Amazon offers a user-based approach and an
All in all, the platform's item-based collaborative filtering has proved to be a useful
What's more, this platform opts for item-based collaborative filtering more than a
However, it was Amazon that developed an item-based approach that began to look
3. NETFLIX
The third case study is based on one of the most renowned OTT platforms
worldwide - Netflix. Known for its humongous entertainment collection and latest
With millions of users from around the world, the platform offers various
recommendations to its users, thanks to a collaborative filtering movie
recommendation system.
make the most out of the recommendations that are displayed at every step of the
way.
With so much to watch and learn from, platforms like Netflix have brought in the
filtering approaches that narrow down our options, and work according to our
preferences