0% found this document useful (0 votes)

3 views12 pages

ml 8

The document outlines the teaching practices for a course on Fundamentals of Machine Learning at GMR Institute of Technology, focusing on clustering algorithms like k-Means, Agglomerative, Divisive, and Self-Organizing Maps. It details the objectives, intended learning outcomes, teaching methodologies, and various data science tools used in the course. The content includes algorithms and methodologies for hierarchical clustering and introduces popular data science tools such as Python, R, and Apache Spark.

Uploaded by

bevaravikranth9676

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views12 pages

ml 8

Uploaded by

bevaravikranth9676

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

GMR Institute of Technology GMRIT/ADM/F-44

Rajam, AP REV.: 00
(An Autonomous Institution Affiliated to JNTUGV, AP)

Cohesive Teaching – Learning Practices (CTLP)

Class 4th Sem. – B. Tech Department: CSE-AI&ML

Course Fundamentals of Machine Learning Course Code 21ML405
Prepared by Dr. S. Akila Agnes, Ms Manisha Das
Lecture Topic Clustering algorithms(k-Means, Agglomerative/Divisive, DBSCAN and Self
Organizing Maps) and Evaluation Metrics, Data Science Tools
Course Outcome (s) CO6 Program Outcome (s) PO1, PO2, PSO1, PSO2
Duration 50 Min Lecture 42-45 Unit – IV
Pre-requisite (s) Fundamentals of Python

1. Objective

 Understand different clustering techniques.

 Have some knowledge on different data science tools

2. Intended Learning Outcomes (ILOs)

At the end of this session the students will be able to:

1. Summarize different types of clustering algorithms.

2. Understand various data science tools.

3. 2D Mapping of ILOs with Knowledge Dimension and Cognitive Learning Levels of RBT

Cognitive Learning Levels

Knowledge
Remember Understand Apply Analyze Evaluate Create
Dimension
Factual  
Conceptual  
Procedural
Meta Cognitive

4. Teaching Methodology
 Power Point Presentation, Chalk Talk, visual presentation

5. Evocation
6. Deliverables

Lecture Notes-42:

Agglomerative Clustering:
Hierarchical clustering is a connectivity-based clustering model that groups the data points together
that are close to each other based on the measure of similarity or distance. The assumption is that
data points that are close to each other are more similar or related than data points that are farther
apart.

A dendrogram, a tree-like figure produced by hierarchical clustering, depicts the hierarchical

relationships between groups. Individual data points are located at the bottom of the dendrogram,
while the largest clusters, which include all the data points, are located at the top. In order to
generate different numbers of clusters, the dendrogram can be sliced at various heights.
The dendrogram is created by iteratively merging or splitting clusters based on a measure of
similarity or distance between data points. Clusters are divided or merged repeatedly until all data
points are contained within a single cluster, or until the predetermined number of clusters is
attained.
We can look at the dendrogram and measure the height at which the branches of the dendrogram
form distinct clusters to calculate the ideal number of clusters. The dendrogram can be sliced at this
height to determine the number of clusters.

Types of Hierarchical Clustering

Basically, there are two types of hierarchical Clustering:

1. Agglomerative Clustering
2. Divisive clustering
Hierarchical Agglomerative Clustering
It is also known as the bottom-up approach or hierarchical agglomerative clustering (HAC). A
structure that is more informative than the unstructured set of clusters returned by flat clustering.
This clustering algorithm does not require us to prespecify the number of clusters. Bottom-up
algorithms treat each data as a singleton cluster at the outset and then successively agglomerate
pairs of clusters until all clusters have been merged into a single cluster that contains all data.
Algorithm :
given a dataset (d1, d2, d3, ..... dN) of size N
# compute the distance matrix
for i=1 to N:
# as the distance matrix is symmetric about
# the primary diagonal so we compute only lower
# part of the primary diagonal
for j=1 to i:
dis_mat[i][j] = distance[di, dj]
each data point is a singleton cluster
repeat
merge the two cluster having minimum distance
update the distance matrix
until only a single cluster remains
Hierarchical Agglomerative Clustering

Steps:
 Consider each alphabet as a single cluster and calculate the distance of one cluster from
all the other clusters.
 In the second step, comparable clusters are merged together to form a single cluster.
Let’s say cluster (B) and cluster (C) are very similar to each other therefore we merge
them in the second step similarly to cluster (D) and (E) and at last, we get the clusters
[(A), (BC), (DE), (F)]
 We recalculate the proximity according to the algorithm and merge the two nearest
clusters([(DE), (F)]) together to form new clusters as [(A), (BC), (DEF)]
 Repeating the same process; The clusters DEF and BC are comparable and merged
together to form a new cluster. We’re now left with clusters [(A), (BCDEF)].
 At last, the two remaining clusters are merged together to form a single cluster
[(ABCDEF)].

Lecture Notes-43:

Hierarchical Divisive clustering

It is also known as a top-down approach. This algorithm also does not require to prespecify the
number of clusters. Top-down clustering requires a method for splitting a cluster that contains the
whole data and proceeds by splitting clusters recursively until individual data have been split into
singleton clusters.
Algorithm :
given a dataset (d1, d2, d3, ..... dN) of size N
at the top we have all data in one cluster
the cluster is split using a flat clustering method eg. K-Means etc
repeat
choose the best cluster among all the clusters to split
split that cluster by the flat clustering algorithm
until each data is in its own singleton cluster

Hierarchical Divisive clustering

Computing Distance Matrix

While merging two clusters we check the distance between two every pair of clusters and merge
the pair with the least distance/most similarity. But the question is how is that distance determined.
There are different ways of defining Inter Cluster distance/similarity. Some of them are:
1. Min Distance: Find the minimum distance between any two points of the cluster.
2. Max Distance: Find the maximum distance between any two points of the cluster.
3. Group Average: Find the average distance between every two points of the clusters.
4. Ward’s Method: The similarity of two clusters is based on the increase in squared error
when two clusters are merged.
For example, if we group a given data using different methods, we may get different results.
Lecture Notes-44:
Self Organizing Maps:
Self Organizing Map (or Kohonen Map or SOM) is a type of Artificial Neural Network which is
also inspired by biological models of neural systems from the 1970s. It follows an unsupervised
learning approach and trained its network through a competitive learning algorithm. SOM is
used for clustering and mapping (or dimensionality reduction) techniques to map
multidimensional data onto lower-dimensional which allows people to reduce complex
problems for easy interpretation. SOM has two layers, one is the Input layer and the other one
is the Output layer.

The architecture of the Self Organizing Map with two clusters and n input features of any sample
is given below:

How do SOM works?

Let’s say an input data of size (m, n) where m is the number of training examples and n is the
number of features in each example. First, it initializes the weights of size (n, C) where C is the
number of clusters. Then iterating over the input data, for each training example, it updates the
winning vector (weight vector with the shortest distance (e.g Euclidean distance) from training
example). Weight updation rule is given by :

wij = wij(old) + alpha(t) * (xik - wij(old))

where alpha is a learning rate at time t, j denotes the winning vector, i denotes the ith feature
of training example and k denotes the kth training example from the input data. After training
the SOM network, trained weights are used for clustering new examples. A new example falls
in the cluster of winning vectors.

Algorithm
Training:
Step 1: Initialize the weights wij random value may be assumed. Initialize the learning rate α.

Step 2: Calculate squared Euclidean distance.

D(j) = Σ (wij – xi)^2 where i=1 to n and j=1 to m

Step 3: Find index J, when D(j) is minimum that will be considered as winning index.

Step 4: For each j within a specific neighborhood of j and for all i, calculate the new weight.

wij(new)=wij(old) + α[xi – wij(old)]

Step 5: Update the learning rule by using :

α(t+1) = 0.5 * t

Step 6: Test the Stopping Condition.

Lecture Notes-45:
Data Science Tools:
Data science tools are application software or frameworks that help data science
professionals to perform various data science tasks like analysis, cleansing, visualization,
mining, reporting, and filtering of data. Each of these tools comes with a set of some of these
usages. If you go for a data science with python certification, you will be trained on all the
current data science tools. Let us now get to know what these tools are and how do they help
data scientists and professionals.

General-purpose tools
1. MS Excel:
It is the most fundamental & essential tool that everyone should know. For freshers, this tool
helps in easy analysis and understanding of data. MS. Excel comes as a part of the MS Office
suite. Freshers and even seasoned professionals can get a basic idea of what the data wants to
say before getting into high-end analytics. It can help in quickly understanding the data, comes
with built-in formulae, and provides various types of data visualization elements like charts and
graphs. Through MS Excel, data science professionals can represent the data simply through
rows and columns. Even a non-technical user can understand this representation.

Cloud-based tools
2. BigML:
BigML is an online, cloud-based, event-driven tool that helps in data science and machine
learning operations. This GUI based tool allows beginners who have little or no previous
experience in creating models through drag and drop features. For professionals and
companies, BigML is a tool that can help blend data science and machine learning projects for
various business operations and processes. A lot of companies use BigML for risk reckoning,
threat analysis, weather forecasting, etc. It uses REST APIs for producing user-friendly web
interfaces. Users can also leverage it for generating interactive visualizations over data. It also
comes with lots of automation techniques that qualify users to eliminate manual data
workflows.

3. Google Analytics:
Google Analytics (GA) is a professional data science tool and framework that gives an in-depth
look at an enterprise website or app performance for data-driven insights. Data science
professionals are scattered across various industries. One of them is in digital marketing. This
data science tool helps in digital marketing & the web admin can easily access, visualize, and
analyze the website traffic, data, etc., via Google Analytics. It can help businesses understand
the way customers or end-users interact with the website. This tool can work in close tandem
with other products like Search Console, Google Ads, and Data Studio, which makes it a
widespread option for anyone using leveraging different Google products. Through Google
Analytics, data scientists and marketing leaders can make better marketing decisions. Even a
non-technical data science professional can utilize it to perform data analytics with its high-end
functionalities and easy-to-work interface.

Multipurpose Data science Tools

4. Apache Spark:
Apache Spark is a well-known data science tool, framework, and data science library, with a
robust analytics engine that can provide stream processing and batch processing. It can analyze
data in real-time and can perform cluster management. It is much faster than other analytic
workload tools like Hadoop. Apart from data analysis, it can also help in machine learning
projects. It caters to various built-in Machine Learning APIs that allow machine learning
engineers and data scientists to create predictive models. Along with all these, Apache spark
caters to different APIs that are Python, Java, R, and Scala programmers can leverage in their
program.

5. Matlab:
Matlab is a closed-source, high-performing, numerical, computational, simulation-making,
multi-paradigm data science tool for processing mathematical and data-driven tasks. Through
this tool, researchers and data scientists can perform matrix operations, analyze algorithmic
performance, and render data statistical modeling. This tool is an amalgamation of
visualization, mathematical computation, statistical analysis, and programming, all under an
easy-to-use ecosystem. Data scientists find various applications of Matlab, especially for signal
and image processing, simulation of the neural network, or testing of different data science
models.

6. SAS:
SAS is a popular data science tool designed by the SAS Institute for advanced analysis,
multivariate analysis, business intelligence (BI), data management operations, and predictive
analytics for future insights. This closed-source software caters to a wide range of data science
functionalities through its graphical interface, along with its SAS programming language, and
via Base SAS. A lot of MNCs and Fortune 500 companies are utilizing this tool for statistical
modeling and data analysis. This tool allows easy accessing of data from database files, online
databases, SAS tables, and Microsoft Excel tables. It is also used for manipulating existing data
sets to get data-driven insights by leveraging its statistical libraries and tools.
7. KNIME:
KNIME is another widely used open-source and free data science tool that helps in data
reporting, data analysis, and data mining. With this tool, data science professionals can quickly
extract and transform data. It allows integrating various data analysis & data-related
components for machine learning (ML) and data mining objective by leveraging its modular
data pipelining concept. It caters to an excellent graphical interface through which data science
professionals can more likely define the workflow between the various predefined nodes
provided in its repository. Because of this, data science professionals require minimum
programming expertise to carry out data-driven analysis and operations. It has visual data
pipelines that help in rendering interactive visuals for the given dataset.

8. Apache Flink:
Flink is another of Apache's data science software that helps perform real-time data analysis. It
is one of the most popular open-source batch-processing data science tools and frameworks
that utilizes its distributed stream processing engine to perform various data science
operations. A lot of time, data scientists & professionals require performing real-time analysis
& computation on data such as data from users' web activities, measuring data emitted from
the Internet of Things (IoT) devices, location-tracking feeds, financial transactions from apps,
or services, etc. That is where Flink can deliver both parallel and pipelined execution of data
flow at a lower latency. It uses batch processing to handle this flow of enormous data streams
(that are unbounded - i.e., they do not have a fixed start and endpoint) as well as stored datasets
(that are bounded). Apache Flink has a reputation for rendering high-speed processing and
analysis while reducing the complexity of dealing with real-time data processing.

Programming Language-driven Tools

9. Python:
Python is, by far, the most widely used data science programming language. Also considered a
data science tool, Python helps data science professionals to perform data analysis over large
datasets and data of different sorts (structured, semi-structured, and unstructured). This high-
level, general-purpose, dynamic, interpreted programming language has a built-in data
structure and a massive collection of libraries that can help in data analysis, data cleaning, data
visualization, etc. It has a simple syntax and is easy to learn. It also reduces the cost of
maintaining data science programs. Since this programming language helps develop mobile,
desktop, and web applications along with data science capabilities - many prefer to learn this
to leverage both data science and software development capabilities that this tool renders. It
has outstanding community support and contributors keep on developing modules and
libraries that can make data science and programming tasks easier.

10. R Programming:
R is a robust programming language that competes with Python when it comes to data science.
Professionals and companies widely use it for statistical computing and data analysis. It has an
excellent user interface and spontaneously updates its interface for better programming and
data analysis experience. It has exceptional team contribution and community support that
make it a valuable tool for data science. It is scalable because it has a huge collection of data
science packages and libraries such as tidyr, dplyr, readr, SparkR, data.table, ggplot2, etc. Apart
from statistical and data science operations, R also leverages powerful machine learning
algorithms in a simple and fast manner. This open-source programming language comes with
7800 packages and object-oriented features. The entire language runs on RStudio.

11. Jupyter Notebook:

This computational notebook is a popular data science and web application that helps manage
and interact with data effectively. Apart from data science professionals, researchers,
mathematicians, and even beginners in Python also leverage this tool. It is prevalent for its easy
data visualization feature and computational abilities mostly. Data science professionals and
analysts can run a single line or multiple lines of code. It is a spin-off of the IPython project and
supports programming languages like Julia, Python, and R.

12. MongoDB:
MongoDB is a cross-platform, open-source, document-oriented NoSQL database management
software that allows data science professionals to manage semi-structured and unstructured
data. It acts as an alternative to a traditional database management system where all the data
has to be structured. MongoDB is a data science tool that helps data science professionals in
managing document-oriented data, store & retrieve information as and when required. It can
easily handle large volumes of data and caters to all the capabilities of SQL and more. It also
endorses executing dynamic queries. MongoDB caches data in the JSON-like format as
documents and delivers high-level data replications capabilities. Handling Big Data has become
easier with the advent of MongoDB as it enables increased data availability. Apart from
necessary database queries, MongoDB has the potential to execute advanced analytics. It also
allows data scalability, making it one of the widely used Data Science tools

7. Keywords
 Clustering
 Agglomerative
 DBSCAN

8. Sample Questions

Remember:
1. List any four data science tools.
2. What is DBSCAN?
Understand:
1. Explain about Self organizing maps with an example.
2. Explain about Divisive algorithm with an example.
Apply:
1. Apply Agglomerative single link clustering for the following data matrix and draw the
dendrogram.

A B C D E F
A 0
B 5 0
C 14 9 0
D 11 20 13 0
E 18 15 6 3 0
F 10 16 8 10 11 0

9. Stimulating Question (s)

1. What is the need of clustering?
10. Mind Map
11. Student Summary

At the end of this session, the facilitator (Teacher) shall randomly pick-up few students to
summarize the deliverables.

12. Reading Materials

1. Stephen Marsland, "Machine Learning -An Algorithmic Perspective ", CRC Press, 2009.
2. Tom M. Mitchell, "Machine Learning ", Tata McGraw Hill, 1997

13. Scope for Mini Project

NIL

---------------

DATA Warehouse MCQs
No ratings yet
DATA Warehouse MCQs
41 pages
Hierarchical Clustering in Data Mining
No ratings yet
Hierarchical Clustering in Data Mining
4 pages
Data Science Unit 5
No ratings yet
Data Science Unit 5
105 pages
Clustering (Unit 3)
100% (2)
Clustering (Unit 3)
71 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
63 pages
Chapter8-Basic Cluster Analysis2018
No ratings yet
Chapter8-Basic Cluster Analysis2018
143 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
32 pages
Chapter8-Basic Cluster Analysis2016
No ratings yet
Chapter8-Basic Cluster Analysis2016
143 pages
ML_Unit-3
No ratings yet
ML_Unit-3
22 pages
ML UNIT 4
No ratings yet
ML UNIT 4
15 pages
Cluster Analysis: Basic Concepts and Algorithms
No ratings yet
Cluster Analysis: Basic Concepts and Algorithms
141 pages
5812d46b-1c39-4a89-ae4b-eec09f93ba4b
No ratings yet
5812d46b-1c39-4a89-ae4b-eec09f93ba4b
66 pages
6 - Machine Learning and Unlabeled Data
No ratings yet
6 - Machine Learning and Unlabeled Data
67 pages
6 - Clustering and Applications and Trends in Datamining
No ratings yet
6 - Clustering and Applications and Trends in Datamining
66 pages
Unit 4 Self Made (1)
No ratings yet
Unit 4 Self Made (1)
28 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Clustering Agglo Devisive DBSCAN
No ratings yet
Clustering Agglo Devisive DBSCAN
78 pages
Clustering
No ratings yet
Clustering
27 pages
Grouping
No ratings yet
Grouping
98 pages
Week-9-Part-2 Agglomerative Clustering
No ratings yet
Week-9-Part-2 Agglomerative Clustering
40 pages
Hierachical Clustering
No ratings yet
Hierachical Clustering
31 pages
Clustering
No ratings yet
Clustering
75 pages
Lecturer-1 Unit 3
No ratings yet
Lecturer-1 Unit 3
31 pages
Clustering
No ratings yet
Clustering
44 pages
unit5_CSM_ML
No ratings yet
unit5_CSM_ML
32 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
14 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Clustering
No ratings yet
Clustering
110 pages
MLP U4
No ratings yet
MLP U4
11 pages
P 3.1.3 Hierarchical
No ratings yet
P 3.1.3 Hierarchical
30 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
Data Mining Functionalities
No ratings yet
Data Mining Functionalities
13 pages
Clustering
No ratings yet
Clustering
75 pages
UNIT III - ML
No ratings yet
UNIT III - ML
13 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
Agnes
No ratings yet
Agnes
25 pages
Hierarchical Clustering Unit 4 ML
No ratings yet
Hierarchical Clustering Unit 4 ML
14 pages
Unsupervised-Learning-Part-1 (1)
No ratings yet
Unsupervised-Learning-Part-1 (1)
9 pages
Clustering
No ratings yet
Clustering
29 pages
Unit 3 unsupervised learning algorith
No ratings yet
Unit 3 unsupervised learning algorith
15 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
CBSYLLABUS BDA
No ratings yet
CBSYLLABUS BDA
5 pages
Hierarchical-Clustering-in-Machine-Learning
No ratings yet
Hierarchical-Clustering-in-Machine-Learning
10 pages
Hierarchical Clustering Algorithm
No ratings yet
Hierarchical Clustering Algorithm
9 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
ML CO4 SESSION 30 Hierarchical Clustering
No ratings yet
ML CO4 SESSION 30 Hierarchical Clustering
20 pages
Machine Learning Assignment
100% (1)
Machine Learning Assignment
55 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Chapter 5
No ratings yet
Chapter 5
43 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
10 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
Classify Clustering
No ratings yet
Classify Clustering
31 pages
6 3 PDF
No ratings yet
6 3 PDF
15 pages
fuzzy meaning
No ratings yet
fuzzy meaning
6 pages
Cluster
100% (1)
Cluster
72 pages
Clustering
No ratings yet
Clustering
39 pages
A_new_hierarchical_clustering_algorithm (1)
No ratings yet
A_new_hierarchical_clustering_algorithm (1)
5 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
DWDM Unit-5
No ratings yet
DWDM Unit-5
52 pages
"Educational Data Mining A Review of Satate of Art
No ratings yet
"Educational Data Mining A Review of Satate of Art
18 pages
LiuZheyu GSAPPUP 2020 Thesis
No ratings yet
LiuZheyu GSAPPUP 2020 Thesis
51 pages
IIT Madras Certificate Course Project_20250302_155546_0000
No ratings yet
IIT Madras Certificate Course Project_20250302_155546_0000
3 pages
Statistica
No ratings yet
Statistica
40 pages
Organized 1 35
No ratings yet
Organized 1 35
35 pages
Instant Download Current Developments in Web Based Learning ICWL 2015 International Workshops KMEL IWUM LA Guangzhou China November 5 8 2015 Revised Selected Papers 1st Edition Zhiguo Gong PDF All Chapters
100% (7)
Instant Download Current Developments in Web Based Learning ICWL 2015 International Workshops KMEL IWUM LA Guangzhou China November 5 8 2015 Revised Selected Papers 1st Edition Zhiguo Gong PDF All Chapters
52 pages
Literature Review On Sampling Techniques
100% (3)
Literature Review On Sampling Techniques
7 pages
2008 - Evolution of Interest Rate Curve - Empirical Analysis of Patterns Using Nonlinear Clustering Tools
No ratings yet
2008 - Evolution of Interest Rate Curve - Empirical Analysis of Patterns Using Nonlinear Clustering Tools
9 pages
AI notes Module- 4
No ratings yet
AI notes Module- 4
13 pages
Paper 2
No ratings yet
Paper 2
19 pages
dataanalytics unit-4
No ratings yet
dataanalytics unit-4
23 pages
Midterm Lab Exam - Attempt Review
No ratings yet
Midterm Lab Exam - Attempt Review
17 pages
Project Occupancy Alfonso Vicente Aragues
No ratings yet
Project Occupancy Alfonso Vicente Aragues
18 pages
Implementing Condition Monitoring For MV Switchgear
No ratings yet
Implementing Condition Monitoring For MV Switchgear
13 pages
The Fastcluster Package: User's Manual: Daniel Müllner
No ratings yet
The Fastcluster Package: User's Manual: Daniel Müllner
16 pages
Introduction To AI and ML
No ratings yet
Introduction To AI and ML
22 pages
c59bdb5e03d5127d2a02c7939b0c5257
No ratings yet
c59bdb5e03d5127d2a02c7939b0c5257
9 pages
AI With Python - Unsupervised Learning - Clustering
No ratings yet
AI With Python - Unsupervised Learning - Clustering
12 pages
CustomerSegmentationforaMobileTelecommunicationsCompanyBasedonServiceUsage
No ratings yet
CustomerSegmentationforaMobileTelecommunicationsCompanyBasedonServiceUsage
7 pages
TEXT-Automatic Template Extraction From
No ratings yet
TEXT-Automatic Template Extraction From
12 pages
Data Mining and Business Intelligence
No ratings yet
Data Mining and Business Intelligence
41 pages
Hossain Et Al. - 2021 - Text Mining and Sentiment Analysis of Newspaper He
No ratings yet
Hossain Et Al. - 2021 - Text Mining and Sentiment Analysis of Newspaper He
15 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
20 pages
Logistic Regression vs Decision Tree
No ratings yet
Logistic Regression vs Decision Tree
2 pages
Research Paper (Machine Learning & Clustering)
No ratings yet
Research Paper (Machine Learning & Clustering)
8 pages
Prrethy-Dr. Huma Lone - AL
No ratings yet
Prrethy-Dr. Huma Lone - AL
7 pages
Data Mining
No ratings yet
Data Mining
18 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
From Everand
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
Fouad Sabry
No ratings yet

ml 8

Uploaded by

ml 8

Uploaded by

GMR Institute of Technology GMRIT/ADM/F-44

Cohesive Teaching – Learning Practices (CTLP)

Class 4th Sem. – B. Tech Department: CSE-AI&ML

 Understand different clustering techniques.

2. Intended Learning Outcomes (ILOs)

At the end of this session the students will be able to:

1. Summarize different types of clustering algorithms.

Cognitive Learning Levels

A dendrogram, a tree-like figure produced by hierarchical clustering, depicts the hierarchical

Types of Hierarchical Clustering

Basically, there are two types of hierarchical Clustering:

Hierarchical Divisive clustering

Hierarchical Divisive clustering

Computing Distance Matrix

How do SOM works?

wij = wij(old) + alpha(t) * (xik - wij(old))

Step 2: Calculate squared Euclidean distance.

D(j) = Σ (wij – xi)^2 where i=1 to n and j=1 to m

wij(new)=wij(old) + α[xi – wij(old)]

Step 5: Update the learning rule by using :

Step 6: Test the Stopping Condition.

Multipurpose Data science Tools

Programming Language-driven Tools

11. Jupyter Notebook:

9. Stimulating Question (s)

12. Reading Materials

13. Scope for Mini Project

You might also like