0% found this document useful (0 votes)
31 views11 pages

Unit#8 - Top - Most Popular DS Algorithms

The document summarizes several popular data science algorithms: 1. It lists 13 popular data science algorithms including building tree algorithms, classification, EM, K-means clustering, and statistical learning. 2. It provides details on building tree algorithms including growing a tree from training data and partitioning data recursively. 3. It explains the EM algorithm with steps of expectation to assign points to clusters and maximization to estimate model parameters. 4. It discusses finding split points for categorical attributes by evaluating splits on attribute values and constructing a class-value matrix.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views11 pages

Unit#8 - Top - Most Popular DS Algorithms

The document summarizes several popular data science algorithms: 1. It lists 13 popular data science algorithms including building tree algorithms, classification, EM, K-means clustering, and statistical learning. 2. It provides details on building tree algorithms including growing a tree from training data and partitioning data recursively. 3. It explains the EM algorithm with steps of expectation to assign points to clusters and maximization to estimate model parameters. 4. It discusses finding split points for categorical attributes by evaluating splits on attribute values and constructing a class-value matrix.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

19 CS

1 Term Final Year


st
Data Sciences
and Analytics
(DSA)
Prof. Dr. M. S. Memon
Course In charge
[email protected]
8. Top- Most Popular DS Algorithms
Top- Most Popular DS Algorithms
1. Building Tree
2. Classification
3. EM
4. Split Point
5. K-MEANS
6. Statistical Learning
7. Link Mining
8. Clustering
9. Association and Aggregation
10. Bagging and Boosting
11. Sequential Patterns
12. Integrated Mining
13. Rough Sets
14. Graph Mining
M. S. Memon
5
CSE Dept. QUEST Nawabshah
Building tree

• GrowTree(TrainingData D)
• Partition(D);

• Partition(Data D)
• if (all points in D belong to the same class) then
• return;
• for each attribute A do
• evaluate splits on attribute A;
• use best split found to partition D into D1 and D2;
• Partition(D1);
• Partition(D2);
EM Algorithm
• Initialize K cluster centers
• Iterate between two steps
• Expectation step: assign points to clusters

P( d i  ck )  Pr( ck | d i )  Pr( ck ) Pr( d i | ck ) / Pr( d i )


Pr( d i | ck )  N ( k ,  k ), d i )

• Maximation step: estimate


m
model parameters
1 d i P ( d i  ck )
k  m i 1  P(d
k
i cj )
Finding Split Points: Categorical
Attrib.
• Consider splits of the form: value(A) {x1, x2, ..., xn}
• Example: CarType {family, sports}
• Evaluate this split-form for subsets of domain(A)
• To evaluate splits on attribute A for a given tree node:

initialize class/value matrix of node to zeroes;


for each record in the attribute list do
increment appropriate count in matrix;
evaluate splitting index for various subsets using the
constructed matrix;
Performing the Splits
• The attribute lists of every node must be divided among the two
children
• To split the attribute lists of a give node:

for the list of the attribute used to split this node do


use the split test to divide the records;
collect the record ids;

build a hashtable from the collected ids;

for the remaining attribute lists do


use the hashtable to divide each list;

build class-histograms for each new leaf;


K-MEANS
ALGORITHM
1) Decide on a value for k.
2) Initialize the k cluster centers
• randomly, or
• smartly
3) Decide the class memberships of the N objects by
assigning them to the nearest cluster center
4) Re-estimate the k cluster centers, by assuming the
memberships found above are correct
5) If none of the n objects changed membership in
the last iteration  EXIT. Otherwise GOTO 3)
K-MEANS VISUALIZATION

You might also like