0% found this document useful (0 votes)

39 views101 pages

Mod 3 Part1 - Merged

Module 3 focuses on advanced classification and clustering techniques, detailing decision tree construction principles and algorithms such as ID3 and SLIQ. It covers key concepts like information gain, Gini index, and entropy, along with practical examples of decision tree construction. The module also highlights various classification methods, including Naïve Bayes, KNN, and SVM.

Uploaded by

Insha Nourin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views101 pages

Mod 3 Part1 - Merged

Uploaded by

Insha Nourin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 101

Module - 3 (Advanced classification and

Cluster analysis)

Module – 3.1
(Advanced classification)

Major Source For This Material:

Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
Jiawei Han, Micheline Kamber, and Jian Pei

1
Module - 3 (Advanced classification and Cluster
analysis)
1.1 Classification- Introduction, 3.1 Introduction to clustering-
1.2 Decision tree construction Clustering Paradigms,
principle, 3.2 Partitioning Algorithm-
1.3 Splitting indices - PAM,
Information Gain, Gini 3.3 Hierarchical Clustering-
index, DBSCAN,
1.4 Decision tree construction 3.4 Categorical Clustering-
algorithms-ID3, ROCK
1.5 Decision tree construction
with presorting-SLIQ,
Note: Data Analytics Coverage
1. Classification: Naïve Bayes, KNN,
2 Classification Accuracy-
2. Clustering: Hierarchical,
Precision, Recall. Partitioning

2
Module – 3.1 Classification
1.1 Classification- Introduction,
1.2 Decision tree construction
principle,
1.3 Splitting indices -
Information Gain, Gini
index,
1.4 Decision tree construction
algorithms-ID3,
Note: Data Analytics Coverage
1.5 Decision tree construction
1. Classification: Naïve Bayes, KNN
with presorting-SLIQ

3
3.1 Short Notes
3.1.1. Define the entropy of a dataset. Write a formula to compute the
entropy of a two-class dataset. (Slide: 17, 18)
3.1.2. How is Gain Ratio calculated? What is the advantage of Gain Ratio
over Information Gain? (3)
Define information gain and gini index. (Slide 19, 29)
3.1.3. Explain the steps in creating decision tree using ID3 algorithm (4)
Explain the ID3 algorithm for building decision trees - Slide 12,13,14
3.1.4. Discuss the issues in the implementation of a decision tree (3)
What are the challenges in building a decision tree? How are they
overcome?
3.1.5. Explain the construction of decision tree using SLIQ algorithm with
an example. (8)
Explain the working of SLIQ algorithm (6)

4
Regression Vs Classification

5
Classification Methods

◼ Bayes Classification Methods 1.2 Decision tree

construction principle,
◼ K nearest neighbours (KNN)
◼ Support Vector Machine 1.3 Splitting indices -
(SVM) Information Gain, Gini
index,
◼ Neural Networks
1.4 Decision tree
◼ Decision Tree
construction algorithms-
◼ Logistic Regression ID3,
◼ Linear Discriminant Analysis 1.5 Decision tree construction
(LDA) with presorting-SLIQ

6
3.1.6.
Consider the dataset for a binary classification problem with class
label “yes” and “no”. The table shows class labeled dataset of
customers in a bank. Explain information gain attribute selection
measure, and find the information gain of the attribute “age”.

7
Decision Tree Construction - Illustration:
(Note: At First Categorize the attribute “Age”)
age income student credit buys computer
<=30 high no fair no
<=30 high no poor no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes poor no
31…40 low yes poor yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes poor yes
31…40 medium no poor yes
31…40 high yes fair yes
>40 medium no poor no
8
Select the attribute ‘Age’ for root node split.
Sort the data by age categories

Buys Buys
age buys
Yes No
age?
<=30 2 3 no
<=30 no
<=30 no
<=30 yes
<=30 yes
>40 3 2 no <=30 31..40 >40
>40 yes (2:3) (4:0) (3:2)
>40 no
>40 yes
>40 yes
31…40 4 0 yes yes
31…40 yes
31…40 yes
31…40 yes
9
For Age 31 ... 40, select attribute ‘Student’ for node split

Buys Buys Buys Buys

age student buys
Yes No Yes No
<=30 2 3 no 0 3 no
<=30 no no
<=30 no no
<=30 yes 2 0 yes
<=30 yes yes
>40 3 2 no 1 1 no
>40 no yes
>40 yes 2 1 no
>40 yes yes
>40 yes yes
31…40 4 0 yes
31…40 yes
31…40 yes
31…40 yes
10
age?

<=30 overcast
31..40 >40
(2:3) (4:0) (3:2)

Student-Yes/No? yes

Stud-Yes Stud-No
(2:0) (3:0)

yes no
For Age > 40, Select Attribute ‘Credit Rating’ for Split

age?

<=30 31..40
overcast >40
(2:3) (4:0) (3:2)

Student? yes credit rating?

Stud- Stud-No Fair Poor

Yes (0:3) (3:0) (0:2)
(2:0)
yes no no
Decision Tree Construction
◼ Decision tree construction is a top-down recursive tree
induction algorithm, which uses an attribute selection
measure to select the best attribute to split each non leaf
node.
◼ Algorithms like ID3, C4.5, and CART employ different
selection criteria to build efficient decision trees.

13
Algorithm for Decision Tree Construction …
◼ At the start, all the training samples are at the root
◼ Attributes are assumed to be categorical (if the attributes
are continuous-valued, discretize them in advance)
◼ Construct a decision tree by splitting a node and growing
each branch. This is done in a top-down recursive manner,
until the conditions for termination of the algorithm is
reached (PTO). The steps involved are: -
◼ If the tuples are all from the same class, then the node
becomes a leaf, labeled with that class.
◼ Otherwise, partition the node using a selected attribute.
The attribute selection is done based on a heuristic or
statistical measure such as information gain, Gini index,
etc.

14
… Algorithm for Decision Tree Construction
◼ Conditions for termination of the algorithm
◼ All samples for a given node belong to the same class
◼ There are no more attributes for further partitioning. In this
case, majority voting is employed for classifying the leaf
◼ There are no samples left
◼ Other conditions set by the researcher. E.g., maximum tree
depth is attained

15
The ID3 Algorithm for Algorithm
for Decision Tree Construction
◼ ID3 (Iterative dichotomiser 3) is an algorithm for
Decision Tree Construction
◼ It follows the basic steps listed n the ‘Algorithm for
Decision Tree Construction’. In addition, the following
characteristics apply.
◼ The ID3 algorithm assumes that there are only two
class labels, namely, “+” and “−”.
◼ The attributes can be multi-valued
◼ The algorithm uses information gain to select the
attribute for node split.

16
𝐼𝑛𝑓𝑜 𝐷, 𝐴𝑔𝑒 =
Decision Tree 5 4 5
Construction using 𝐼 2,3 + 𝐼 4,0 + 𝐼 3,2
14 14 14
ID3 Algorithm = 0.694
Age Gain(Age) = 0.246
𝐺𝑎𝑖𝑛 𝑖𝑛𝑐𝑜𝑚𝑒 = 0.029
𝐺𝑎𝑖𝑛(𝑠𝑡𝑢𝑑𝑒𝑛𝑡) = 0.151
𝐺𝑎𝑖𝑛 𝑐𝑟𝑒𝑑𝑖𝑡 = 0.048
Young Middle
overcast Senior
(2:3) (4:0) (3:2)

Student? yes credit rating?

Stud- Stud-No Fair Poor

Yes (0:3) (3:0) (0:2)
(2:0)
yes no no
Attribute Selection Measures

◼ Information Gain
◼ Gini Index

18
Entropy
◼ Entropy, in information theory, is a measure of uncertainty
associated with a random variable.

◼ Calculation: For a discrete random variable Y taking m

distinct values {y1, ..., ym},

H(Y) = − σ𝑚
𝑖=1 𝑝𝑖 log 2 ( 𝑝𝑖 ),

where: pi=P(Y = yi)

◼ Higher entropy => higher uncertainty

◼ Lower entropy => lower uncertainty

19
Entropy of a 2-class dataset
◼ 𝐄𝐧𝐭𝐫𝐨𝐩𝐲 𝐃 = 𝑰 𝒑𝟏, 𝒑𝟐 = −𝒑𝟏 𝐥𝐨𝐠 𝟐 ( 𝒑𝟏) − 𝒑𝟐 𝒍𝒐𝒈𝟐 ( 𝒑𝟐)
where:
𝑝1 is the proportion of samples in D belonging to class 1
𝑝2 is the proportion of samples in D belonging to class 2
◼ If 𝑝1 or 𝑝2 equals zero, the entropy = 0. The dataset is pure
◼ For a 2 class dataset, if 𝑝1 = 𝑝2 = 0.5, entropy = 1. The
dataset is perfectly impure (equal probability for each class).
◼ Consider a dataset with 9 objects labeled ‘Yes’ and 5 labeled ‘No’.
𝟗 𝟗 𝟓 𝟓
𝐄𝐧𝐭𝐫𝐨𝐩𝐲 𝐃 = 𝑰 𝟗, 𝟓 = − 𝐥𝐨𝐠 𝟐 ( ) − 𝐥𝐨𝐠 𝟐 ( )
𝟏𝟒 𝟏𝟒 𝟏𝟒 𝟏𝟒
= 0.940

20
Attribute Selection Measure:
Information Gain
◼ Let pi be the probability that an arbitrary tuple in D belongs
to class Ci. Therefore, pi = |Ci, D| / |D|

◼ Entropy of D = 𝐼𝑛𝑓𝑜(𝐷) = − σ𝑚
𝑖=1 𝑝𝑖 log 2 ( 𝑝𝑖 )

◼ If A takes v values, A splits D into v partitions

𝑣
|𝐷𝑗 |
𝐼𝑛𝑓𝑜𝐴 (𝐷) = ෍ × 𝐼𝑛𝑓𝑜(𝐷𝑗 )
|𝐷|
𝑗=1

◼ Information gained by node split using attribute A,

Information Gain (A) = Info (D) – Info A (D)

◼ Compute the information from all attributes. Select the

attribute with the highest information gain for node split.

21
Decision Tree Construction - Illustration:
(Note: At First Categorize the attribute “Age”)
age income student credit buys computer
<=30 high no fair no
<=30 high no poor no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes poor no
31…40 low yes poor yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes poor yes
31…40 medium no poor yes
31…40 high yes fair yes
>40 medium no poor no
22
Information Gain Example buys computer
Yes
Use the example of a laptop purchase. yes
Identify the best attribute based on yes
the ‘information gain’. yes
yes
In Database ‘D,’ there are 14 samples,
with 9 yes and 5 no.
yes
Yes
Probabilities p1 = 9/14 and p2= 5/14 Yes
𝐼𝑛𝑓𝑜 𝐷 = 𝐼 9,5 = yes
no
9 9 5 5 no
− log 2 ( ) − log 2 ( ) no
14 14 14 14
no
= 0.940 no

23
Information Gain: Consider ‘Age’ for Root Node Split
𝐼𝑛𝑓𝑜 𝐷 = 0.940 buys
age
computer
Attribute ‘age’ splits ‘D’ into 3 subsets.
1-Young yes
1-Young yes
𝐼𝑛𝑓𝑜 𝐷, 𝐴𝑔𝑒 =
1-Young no
1-Young no
5 4 5
𝐼 2,3 + 𝐼 4,0 + 𝐼 3,2 1-Young no
14 14 14 2-Middle yes
2-Middle yes
(5/14)*0.97 + (4/14)*0 + (5/14)*0.97 2-Middle yes
2-Middle yes
= 0.694 3-Senior yes
3-Senior yes
𝐺𝑎𝑖𝑛 𝐷, 𝐴𝑔𝑒
3-Senior yes
= 𝐼𝑛𝑓𝑜(𝐷) − 𝐼𝑛𝑓𝑜(𝐷, 𝐴𝑔𝑒) = 0.246 3-Senior no
3-Senior no
24
Info. Gain: Consider ‘Income’ for Root Node Split
𝐼𝑛𝑓𝑜 𝐷 = 0.940 buys
income
computer
Attribute ‘income’ splits ‘D’ into 3 subsets. 1-high yes
1-high yes
𝐼𝑛𝑓𝑜 𝐷, 𝐼𝑛𝑐𝑜𝑚𝑒 = 1-high no
1-high no
4 6 4 2-medium yes
𝐼 2,2 + 𝐼 4,2 + 𝐼 3,1
14 14 14 2-medium yes
2-medium yes
2-medium yes
(4/14)*1.0 + (6/14)*0.92 + (4/14)* 0.81
2-medium no
2-medium no
= 0.911
3-low yes
𝐺𝑎𝑖𝑛 𝐷, 𝐼𝑛𝑐𝑜𝑚𝑒 3-low yes
3-low yes
= 𝐼𝑛𝑓𝑜(𝐷) − 𝐼𝑛𝑓𝑜(𝐷, 𝐼𝑛𝑐𝑜𝑚𝑒) = 0.029 3-low no

25
Info. Gain: Consider ‘Student’ for Root Node Split
𝐼𝑛𝑓𝑜 𝐷 = 0.940 buys
student
computer
Attribute ‘student’ splits ‘D’ into 2 subsets. yes yes
yes yes
𝐼𝑛𝑓𝑜 𝐷, 𝑆𝑡𝑢𝑑𝑒𝑛𝑡 = yes yes
yes yes
7 7 yes yes
𝐼 6,1 + 𝐼 3,4
14 14 yes yes
yes no
no yes
7/14(0.59) + 7/14(0.99) = 0.79
no yes
no yes
𝐺𝑎𝑖𝑛 𝐷, 𝑆𝑡𝑢𝑑𝑒𝑛𝑡 no no
no no
= 𝐼𝑛𝑓𝑜(𝐷) − 𝐼𝑛𝑓𝑜(𝐷, 𝐼𝑛𝑐𝑜𝑚𝑒) = 0.151 no no
no no

26
Information Gain example
Feature Selection for Root Node Split

𝐺𝑎𝑖𝑛 𝑎𝑔𝑒 = 𝐼𝑛𝑓𝑜(𝐷) − 𝐼𝑛𝑓𝑜𝑎𝑔𝑒 (𝐷) = 0.246

Gain(income) = 0.029
Gain( student ) = 0.151
Gain(credit _ rating ) = 0.048

Based on the above, the attribute “Age” provides maximum

information gain. So, we select ‘Age’ for root node split

27
Attribute Selection Measures

◼ Information Gain
◼ Gini Index

28
Gini Index of a Dataset buys computer
◼ If a data set D contains samples from n yes
classes, the Gini index, Gini (D) is defined as yes
n 2 yes
gini( D) = 1−  p j Yes
j =1 yes
where pj is relative frequency of class j in D yes
Yes
◼ The computer purchase database D has
Yes
customers who buy computers and those who
yes
do not buy. The probability of a customer
buying computer p1 = 9/14 no
no
◼ Probability of not buying computer p2 = 5/14 no
𝐺𝑖𝑛𝑖 𝐷 = 1 − ((9/14)^2 + (5/14)^2) no
= 0.46 no

29
Attribute Selection Measure: Gini Index
◼ If attribute A splits the data set D into two subsets D1
and D2, the resulting Gini index is computed as:-
|𝐷1| |𝐷2|
Gini(D, A) = Gini(D1) + Gini(D2)
|𝐷| |𝐷|
where |D|, |D1| and |D2| are the sizes of the databases
D, D1, and D2 respectively
◼ Reduction in Impurity when ‘A’ splits D into D1 and D2
δ Gini(D, A) = Gini(D) - Gini(D, A)
◼ Compute the Gini index for splitting node ‘D’ using each
attribute. Choose the attribute that results in the
greatest impurity reduction.

30
‘Age’ for Root Node Split
Gini Impurity D = buys
age
computer
1 − ((9/14)^2 + (5/14)^2) = 0.46
1-Young yes
Attribute ‘Age’ splits ‘D’ into 3 subsets 1-Young yes
1-Young no
𝐆𝐢𝐧𝐢 𝐈𝐦𝐩𝐮𝐫𝐢𝐭𝐲 𝐃, 𝐀𝐠𝐞 = 1-Young no
1-Young no
5/14[1 − ((2/5)^2 + (3/5)^2))]
2-Middle yes
+ 4/14[0] 2-Middle yes
+ 5/14[ 1 − ((3/5)^2 + (2/5)^2))] 2-Middle yes
2-Middle yes
= 0.34 3-Senior yes
3-Senior yes
Reduction in Impurity = δ Gini(D, Age) =
3-Senior yes
Gini(D) – Gini(D, Age) = 0.46 – 0.34 = 0.12 3-Senior no
3-Senior no
31
‘Income’ for Root Node Split
Gini Impurity 𝐷 = 𝐺𝑖𝑛𝑖 (𝐷) buys
income
9 2 5 2 computer
=1− + = 0.46 1-high yes
14 14
1-high yes
Attribute ‘income’ splits ‘D’ into 3 subsets. 1-high no
1-high no
𝑮𝒊𝒏𝒊 𝑰𝒎𝒑𝒖𝒓𝒊𝒕𝒚 𝑫, 𝑰𝒏𝒄𝒐𝒎𝒆 = 2-medium yes
4/15[1 − ((2/4)^2 + (2/4)^2))] 2-medium yes
+ 6/15[1 − ((4/6)^2 + (2/6)^2))] 2-medium yes
2-medium yes
+ 4/15[ 1 − ((3/4)^2 + (1/4)^2))] 2-medium no
= 0.44 2-medium no
Reduction in Impurity = δ Gini(D, Income) 3-low yes
3-low yes
= Gini(D) – Gini(D, Income) = 0.46 – 0.44
3-low yes
= 0.02 3-low no

32
DT Using Gini Index 𝑮𝒊𝒏𝒊 𝑰𝒎𝒑𝒖𝒓𝒊𝒕𝒚 𝑫, 𝑨𝒈𝒆 =
5/14[1 − (2/5)^2]
Age 5/14[1 − (3/5)^2)] + 0
= 0.34

Young Middle
overcast Senior 𝑮𝒊𝒏𝒊 𝑰𝒎𝒑𝒖𝒓𝒊𝒕𝒚
(2:3) (4:0) (3:2) (𝑫, 𝑰𝒏𝒄𝒐𝒎𝒆) = 0.44

Student? yes credit rating?

Stud- Stud-No Fair Poor

Yes (0:3) (3:0) (0:2)
(2:0)
yes no no
Gini Index example:
Feature Selection for Root Node Split
◼ Gini(D) = 0.46

◼ Reduction in Impurity δ Gini(D, Age)

= 0.46 – 0.34 = 0.12

◼ Reduction in Impurity δ Gini(D, Income)

= 0.46 – 0.44 = 0.02

◼ From the above analysis, the attribute ‘Age’ reduces Gini

impurity more than ‘Income’ when used for the root node
split..
◼ Therefore, we select ‘Age’ for root node split

34
Comparing Attribute Selection Measures
Information gain Gini index
used in ID3 algorithm used in CART algorithm
biased towards multivalued
biased towards multivalued
attributes. However, the bias is
attributes (e.g., ‘income’ takes
less, compared to Information
the value low, medium, high)
gain
handles multiple classes has difficulty when the number
effectively of classes is large
tends to favor tests that result
in equal-sized partitions and
high purity in both partitions
Requires computation of
Computation is simpler
logarithms
35
Challenges in Decision Tree Construction
1. Overfitting: The tree becomes too complex and captures
noise instead of patterns. Use pruning (removing
unnecessary branches), set minimum split criteria, or apply
ensemble methods like Random Forest (PTO)
2. Handling Missing Data: Missing values affect decision-
making at nodes. Use imputation techniques
3. Class Imbalance: When one class dominates, the tree
may favor it. Use over sampling (SMOTE), under sampling, or
adjust class weights during training.
4. Computational Complexity: Large datasets with many
features slow down tree construction. Use feature selection,
sampling, or scalable algorithms like Rain Forest
5. Bias-Variance Tradeoff: Simple trees have high bias,
deep trees have high variance. Use cross-validation to find
the optimal tree depth.

36
Overfitting
◼ Overfitting:
◼ If there are too many branches, some branches
may reflect anomalies due to noise or outliers
◼ This results in poor accuracy for unseen samples
◼ Two approaches to avoid overfitting
◼ Pre-pruning: Halt tree construction early ̵ do not split
a node if this would result in the goodness measure
falling below a threshold. Problem: difficult to
choose an appropriate threshold
◼ Post-pruning: Remove branches from a “fully grown”
tree. Reserve some data as test data to evaluate
and identify the “best pruned tree”

37
For the dataset given below, find the first splitting attribute for the
decision tree by using the ID3 algorithm (4)

38
3.1.7
Consider the following data for a binary
classification problem with class labels C1 and
C2.
i.Calculate the gain in Gini index when splitting
the root node using the attributes ‘A’ and ‘B’.
Which attribute would the decision tree
induction algorithm choose?
ii.Calculate the information gain when splitting
the root node using the attributes ‘A’ and ‘B’.
Which attribute would the decision tree
induction algorithm choose? (8)
(i) Calculation of the gain in Gini index – 2
marks Attribute selection based on gain value
comparison –2 marks
ii) Calculation of the information gain – 2 marks
Attribute selection based on information gain –
2 marks]

39
Gini Index at Root Node

Gini Impurity(D) = Gini(D)

A B Class Gini(D)

T F -
Gini (D) =
F F -
1-((6/10)^2 + (4/10)^2)
F F -
F F - = 0.48
T T -
T F - 6
T F + 4
T T +
T T +
T T +
10 0.48
40
Gini Index, when we split D using attribute A

Gini (D1) =
A Class Subset
1-((0/3)^2 + (3/3)^2) = 0
F - 3 (0:3)
Gini (D2) =
F - D1
F - 1-((4/7)^2 + (3/7)^2) = 0.49
T + 7 (4:3) Gini (D,A) =
T +
0.49*|D2|/|D| =
T +
T + 0.49*7/10 = 0.34
D2
T - Reduction in Impurity =
T -
Gini(D) – Gini(D,A) =
T -
10 0.48 – 0.34 = 0.14
41
Gini Index, when we split D using attribute B

Gini (D1) =
B Class Subset
1-((1/6)^2 + (5/6)^2) = 0.28
F + 6 (1:5)
Gini (D2) =
F -
F - 1-((3/4)^2 + (1/4)^2) = 0.38
D1
F - Gini (D,B) =
F -
0.28*|D1|/|D|+ 0.38*|D2|/|D|
F -
T + 4 (3:1) 0.28*6/10+ 0.38*4/10 = 0.32
T + Reduction in Impurity =
T + D2
Gini(D) – Gini(D,B) =
T -
10 0.48 – 0.32 = 0.16
42
Select Attribute
▪ Reduction in impurity resulting from root node split by attribute
‘B’ is (0.16). It is higher than that of ‘A’ (0.14).
▪ Therefore, select the attribute 'B' for the root node split.

43
3.1.8.
Find the first splitting attribute for the decision tree by using the
ID3 algorithm with the following dataset. (8)
[Finding the first splitting attribute – 4 marks
Calculating the gain values of attributes – 4 marks]

44
3.1.9
◼ See the table below. The goal is to build a decision tree using the ID3
algorithm to predict whether a person buys a computer game based
on the given features. The target variable is "Buy Computer Game,"
which can be either "Yes" or "No." The features are "Genre" (with
values: Action, Puzzle, Adventure) and "Price Range" (with values:
Low, Medium, High). Which node will be selected at the root?
Data Point Price Range Buy Game
1 Action Low Yes
2 Puzzle Medium No
3 Adventure High Yes
4 Action Medium No
5 Puzzle Low Yes
6 Adventure Medium Yes
7 Action High No
8 Puzzle Medium No
9 Adventure Low Yes
10 Action Low Yes
45
The target variable is "Buy Computer Game," which can be either
"Yes" or "No." Which node will be selected at the root?

Data Buy Computer

Genre Price Range
Point Game
1 Action Low Yes
2 Puzzle Medium No
3 Adventure High Yes
4 Action Medium No
5 Puzzle Low Yes
6 Adventure Medium Yes
7 Action High No
8 Puzzle Medium No
9 Adventure Low Yes
10 Action Low Yes

46
Root Node Split by Price

The attribute ‘price’ splits the

Price Buy
root node into 3 subsets: Info
Range Game
Low I(4,0) Low Yes I(4:0)
Medium I(1,3) Low Yes
High I(1,1) Low Yes
Low Yes
The values in brackets indicate Medium Yes I(1:3)
the count of Yes and the count Medium No
of No Medium No
Medium No
High Yes I(1:1)
High No

47
Root Node Split by Price
• I(4,0) = 0
• I(1,3) = - (1/4) log (1/4)
Price Buy
- (3/4).log (3/4)] Info
Range Game
= 0.811
Low Yes I(4:0)
• I(1,1) = 0.5 Low Yes
• Info (D, Price) = weighted Low Yes
sum of the entropy of the Low Yes
three data subsets
Medium Yes I(1:3)
(4/10).I(4,0) + Medium No
(4/10).I(1,3) + Medium No
(2/10).I(1,1); Medium No
High Yes I(1:1)
= 0 + (4/10) 0.811 +
High No
(2/10) 1 = 0.52

48
Sort by Genre; Sort by Price Range

Genre Buy Game Price Range Buy Game

Action Yes Low Yes
Action Yes Low Yes
Action No Low Yes
Action No Low Yes
Adventure Yes Medium Yes
Adventure Yes Medium No
Adventure Yes Medium No
Puzzle Yes Medium No
Puzzle No High Yes
Puzzle No High No
49
Root Node Split by Genre

• Genre wise: 4 action, 3 adventure, and 3 puzzle games

• Info (D, Genre)= (4/10) I(2,2) + (3/10).I(3,0) + (3/10).

I(1,2)

• (4/10) 1 + 0 + (3/10) [(-1/3) log (1/3) – (2/3).log (2/3)]

• 0.4 + (3/10)0.918 = 0.68

• Info (D, Genre) = 0.68

• Recall that Info (D, Price) = 0.42

• Info (D, Price) < Info (D, Genre)

∴ Information-Gain by Price > Information-Gain by Genre

∴ Price range attribute must be considered for root node split

50
3.1.11 (From Data Analytics, S6)

51
52
3.1.12

53
54
Module – 3.1 Classification

1.1 Classification- Introduction,

1.2 Decision tree construction principle,

1.3 Splitting indices -Information Gain, Gini index,

1.4 Decision tree construction algorithms-ID3,

1.5 Decision tree construction with presorting-SLIQ

55
Decision tree construction with presorting-SLIQ

3.1.5. Explain the construction of decision tree using SLIQ algorithm

with an example. (8)
Explain the working of SLIQ algorithm (6)
• SLIQ algorithm explanation – 4 marks
• Example/ diagram 2 marks
• Rerefences
1. https://www.cs.cmu.edu/~natassa/courses/15-
721/papers/mehta96sliq.pdf
2. Mehta, M., Agrawal, R., & Rissanen, J. (1996). SLIQ: A fast scalable
classifier for data mining. In Advances in Database Technology—
EDBT'96: 5th International Conference on Extending Database
Technology Avignon, France, March 25–29, 1996 Proceedings 5
(pp. 18-32). Springer Berlin Heidelberg.
SLIQ

• Is a decision tree classifier

• Handles both numeric and categorical attributes.
SLIQ Lists
• Attribute List
• Each entry in the sorted attribute list contains two values.
• Sorted Attribute Values: A list of attribute values in
sorted order.
• Pointers to the Class List: Each attribute value links to
corresponding entries in the class list.
• Class list.
• There is an entry for every data item in this list.
• It consists of
• the data items’ class-label
• and the data item’s location in the decision tree.
Dataset
Index | Age | Income | Education Level | Class Label |

0 | 25 | 50000 | High School | Yes |

1 | 30 | 60000 | College | No |
2 | 30 | 70000 | College | Yes |
3 | 35 | 80000 | Graduate | Yes |
4 | 35 | 90000 | Graduate | No |

Attribute List Class List

Age | Pointer to Class List Index | Class Label | Node |
--------------------------- ----------------------------
25 | Index 0 0 | Yes | N0 |
30 | Index 1, Index 2 1 | No | N1 |
35 | Index 3, Index 4 2 | Yes | N2 |
3 | Yes | N3 |
4 | No | N4 |
SLQ Tree Growth Algorithm
The SLIQ (Supervised Learning in Quest) algorithm constructs
a decision tree in a level-wise (breadth-first) manner rather
than the traditional recursive depth-first approach.
1. Initial Step – Prepare sorted attribute list and class list
▪ The dataset is first sorted for each attribute separately.
▪ This sorting is done once at the beginning to avoid
repeated sorting during the tree growth.
▪ Each attribute value links to corresponding entries in the
class list.
▪ The class list consists of one set entry for each data
item - the data items’ class label and the data item’s
location (node number) in the decision tree.
▪ Initially, assume that all data items are in the root node
SLQ Tree Growth Algorithm
2. Entropy Calculation at Each Level (Impure Frontier Nodes)
• This is an iterative step.
• At each step, the algorithm processes all the impure
nodes at the current frontier simultaneously (breadth-
first) to effect node split.
• For splitting a node
• Scan all the pre-sorted attribute lists. For each
attribute, calculate the entropy for all distinct attribute
values.
• Select the best attribute for splitting the given node
SLQ Tree Growth Algorithm
3. The frontier nodes are split, and the tree is expanded to a
new frontier. Scan the sorted attribute lists once to update
the new node location in the class list
4. Repeat steps 2 and 3 until termination criteria is met
• Leaf nodes are pure
• No further significant splits can be made.
• A predefined stopping criterion (like a minimum number
of samples per node) is met.
Key Advantages of SLIQ
1. Optimized Sorting: Numeric attributes are sorted once,
reducing computational overhead.
2. Breadth-first tree-growing strategy.
• All nodes at a level are processed simultaneously. Therefore,
redundant calculations are avoided, and splitting is efficient.
• This strategy also enables SLIQ to classify disk-resident datasets.
3. Uses a fast sub-setting algorithm to determine splits for
categorical attributes.
4. Uses a tree-pruning algorithm based on the minimum
description length (MDL) principle. This algorithm is
inexpensive, and results in compact and accurate trees.
The combination of the above techniques enables SLIQ to scale
for large data sets and classify data sets with many
classes, many attributes, and large data sets.
Example …
Dataset, Sorted Attribute Lists, and Class List

TRAINING DATA AFTER PRE−SORTING

Class Class
Data List List
Index Age Salary Class Age Index Salary Index Class Leaf
1 30 65 G 23 2 15 2 1 G N1
2 23 15 B 30 1 40 4 2 B N1
3 40 75 G 40 3 60 6 3 G N1
4 55 40 B 45 6 65 1 4 B N1
5 55 100 G 55 5 75 3 5 G N1
6 45 60 G 55 4 100 5 6 G N1
Example continued …
• Assume that the root node is split using (Age<= 35).
• The left and right child are split using income attribute

65 (Age <= 35)

30 G

23 15 B
(Salary <= 40) (Salary <= 50)
40 75 G

55 40 B

55 100 G

45 60 G
SLIQ -
More Details and Illustrations

66
MDL
• The pruning strategy used in SLIQ is based on the principle of
Minimum Description Length (MDL)
• The MDL principle states that the best model for encoding data is
the one that minimizes the sum of the cost of describing the
data in terms of the model and the cost of describing the
model.
• If M is a model that encodes the data D, the total cost of the
encoding is:
cost(M, D) = cost(D|M) + cost(M),
where cost(D|M) is the number of bits of encoding the data given
a model M and cost(M) is the cost of encoding the model M.
• The models are the set of trees obtained by pruning the initial
decision tree T, and the data is the training data set S. The
objective of MDL pruning is to find the subtree of T that best
describes the training set S.
Subsetting for Categorical Attributes
• Let S be the set of possible values of a categorical attribute A
• The evaluation of all the subsets of S can be very expensive,
if the cardinality of S is large.
• SLIQ uses a hybrid approach to overcome this issue. If the
cardinality of S is less than a threshold, MAXSETSIZE, then all
of the subsets of S are evaluated
• Otherwise, a greedy algorithm is used to obtain the desired
subset. The greedy algorithm starts with an empty subset S’
and adds that one element of S to S’ which gives the best
split. The process is repeated until there is no improvement
in the splits. This hybrid approach finds the optimal subset, if
S is small. It also performs well for larger subsets.
Module – 3.2
Classification Accuracy-Precision, Recall.
1.1 Classification- Introduction,
1.2 Decision tree construction
principle,
1.3 Splitting indices -
Information Gain, Gini
index,
1.4 Decision tree construction
algorithms-ID3,
1.5 Decision tree construction
with presorting-SLIQ
2 Classification Accuracy-
Precision, Recall.

1
3.2.1 Exercise
Suppose the dataset had 9700 cancer-free images from
10000 images from patients. A clinic conducts cancer tests.
The test finds 230 positive. However, 140 are wrongly
categorized as positive. Find precision, recall and accuracy.
Is it a good classifier? Justify

2
3.2.1 Exercise
Actual

+ 300

- 9700

10000

The dataset has 9700 cancer-free images from 10000

images from patients.

3
3.2.1 Exercise
+ -

Predicted 230 9770 10000

There are 10000 images. A clinic conducts cancer tests.

The test finds 230 positive.

4
3.2.1 Exercise
+ - Actual

+ 300

- 9700

Predicted 230 9770 10000

The dataset has 9700 cancer-free images from 10000

images from patients.
A clinic conducts cancer tests. The test finds 230
positive.

5
3.2.1 Exercise
+ - Actual

+ 300

- 140 (FP) 9700

Predicted 230 9770 10000

A clinic conducts cancer tests. The test finds 230

positive. However, 140 are wrongly categorized as
positive – False Positive

6
3.2.1 Exercise
+ - Actual

+ 90 (TP) 300

- 140 (FP) 9700

Predicted 230 9770 10000

The test finds 230 positive. However, 140 are wrongly

categorized as positive – False Positive.
Therefore, 90 are True positive

7
3.2.1 Exercise
+ - Actual

+ 90 (TP) 210 (FN) 300

- 140 (FP) 9700

Predicted 230 9770 10000

The test finds 230 positive. However, 140 are wrongly

categorized as positive – False Positive.
Therefore, 90 are True Positive.
210 are False Negative

8
3.2.1 Exercise
+ - Actual

+ 90 (TP) 210 (FN) 300

- 140 (FP) 9560 (TN) 9700

Predicted 230 9770 10000

Total images 10000; Cancer-free images 9700.

A clinic conducts cancer tests. The test finds 230
positive. But 140 are wrongly categorized as positive.

9
3.2.1 Exercise
Actual \
Predicted + - Recall
class

+ TP FN TP / (TP+FN)

- FP TN

Accuracy =
Precision TP / (TP+FP) (TP+TN) /
(TP + FP + FN + TN)

10
3.2.1 Exercise
+ - Recall =
90 / (90+210)
+ 90 (TP) 210 (FN)
= 30%
- 140 (FP) 9560 (TN) 9700
Precision =
10000
90 / (90+140) = 39%

Precision = TP / (TP+FP) = 90 / (90+140) = 39%

Recall = TP / (TP+FN) = 90 / (90+210) = 30%
Accuracy = (TP+TN) / {TP + FP + FN + TN}
= (90 + 9560) / 10000 = 96.5%
11
Precision, Recall, Accuracy – A Comparison
◼ Accuracy is best when both false positives and false negatives
need balance (e.g., surveillance studies like tracking covid cases
globally).
◼ Precision matters when false positives are risky (e.g., immunity
tests).
◼ Recall matters when missing real cases (or false negatives) is
dangerous (e.g., the presence of cancer, airport screening).

12
Classifier Performance:
Precision and Recall, and F-measures

Accuracy = {TP + TN} / {TP + FP + FN + TN}

13
3.2.2 Exercise
◼ A binary classification result, where the correct labels are [T, T,
F, T, F, T, F, T] and the predicted labels are [T, F, T, T, F, F, F,
T]. Assume T means “true” (the desired class) and F (“false”) is
the “default” class. Compute accuracy, and compute recall,
precision, and Accuracy

• True labels:

• [ T, T, F, T, F, T, F, T]

• Prediction:

• [ T, F, T, T, F, F, F, T]

14
3.2.2 Exercise (1. Actual, 2. Prediction)
[ T, T, F, T, F, T, F, T]
True Positives (3)
[ T, F, T, T, F, F, F, T]

[ T, T, F, T, F, T, F, T]
False Positives (1)
[ T, F, T, T, F, F, F, T]

[ T, T, F, T, F, T, F, T]
True Negatives (2)
[ T, F, T, T, F, F, F, T]

[ T, T, F, T, F, T, F, T]
False Negatives (2)
[ T, F, T, T, F, F, F, T]

15
3.2.2 Exercise
◼ Prediction : [ T, F, T, T, F, F, F, T]
◼ True labels: [ T, T, F, T, F, T, F, T]

◼ TP = 3 + - Recall =
+ 3 (TP) 2 (FN) 3 / (3+2)
◼ FP = 1
- 1 (FP) 2 (TN)
◼ FN = 2
◼ TN = 2 Precision = 3/(3+1)

◼ P= Precision= TP/(TP+FP) = 3 / (3+1)= 3/4

◼ R= Recall= TP/(TP+FN) = 3 / (3+2)= 3/5
◼ Accuracy = (TP+TN) / (TP + FP + FN + TN) = (3+2) /8 = 5/8

16
3.2.3 Exercise
◼ Suppose a computer program for recognizing dogs in
photographs identifies eight dogs in a picture containing 12
dogs and some cats. Of the eight dogs identified, five actually
are dogs, while the rest are cats. Compute accuracy, and
compute recall, precision, and F1 Score

17
3.2.3 Exercise
Actual

Dogs 12

Cats ?

a picture containing 12 dogs and some cats

18
3.2.3 Exercise
Dogs Cats

Predicted 8

a computer program identifies 8 dogs.

Of the 8 dogs identified, 5 actually are dogs, while the
rest are cats.

19
3.2.3 Exercise
Dogs Cats Actual

Dogs 5 12

Cats 3 ?

Predicted 8

A picture containing 12 dogs and some cats.

A computer identifies 8 dogs. Of these, 5 are actually
dogs

20
3.2.3 Exercise
Dogs Cats Actual

Dogs 5 7 12

Cats 3 ?

Predicted 8

A picture containing 12 dogs and some cats.

Computer identifies 8 dogs. Of these, 5 are actually dogs.
Therefore, there are 7 cats

21
3.2.3 Exercise
Dogs Cats Actual

Dogs 5 (TP) 7 (FN) 12

Cats 3 (FP) ?

Predicted 8

P = Precision= TP/(TP+FP) = 5 / (5+7) = 5 / 12

R = Recall = TP/(TP+FN) = 5 / (5+3) = 5 / 8
F Score = 2.P.R / (P+R) = 0.5

22
3.2.3 Exercise
Dogs Cats Actual

Dogs 5 (TP) 7 (FN) 12

Cats 3 (FP) TN? ?

Predicted 8

Accuracy = (TP+TN) / (TP + FP + FN + TN);

The number of cats is not given. So, let us consider
only the accuracy of predicting dogs alone.
Accuracy = TP / (TP + FN) = 5 / 12

23
June 2023
A database contains 80 records on a particular topic of which 55
are relevant to a certain investigation.
A search was conducted on that topic and 50 records were
retrieved.
Of the 50 records retrieved, 40 were relevant.
Construct the confusion matrix and calculate the precision and
recall scores for the search.

24
3.2.3 Exercise
Actual

Relevant 55
Not
Relevant

A database contains 80 records on a particular topic of

which 55 are relevant to a certain investigation

25
3.2.3 Exercise
Not
Relevant
Relevant
40

Predicted 50

A search was conducted on that topic

50 records were retrieved.
Of the records retrieved, 40 were relevant

26
3.2.3 Exercise
Not
Relevant Actual
Relevant
Relevant 40 55
Not
Relevant

Predicted 50 80

A database contains 80 records on a particular topic of

which 55 are relevant to a certain investigation.
A search was conducted on that topic. 50 records were
retrieved. Of the records retrieved, 40 were relevant

27
3.2.3 Exercise
Not
Relevant Actual
Relevant
Relevant 40 15 55
Not
10
Relevant

Predicted 50 80

28
3.2.3 Exercise
Not
Relevant Actual
Relevant
Relevant 40 (TP) 15 (FN) 55
Not
10 (FP) TN?
Relevant

Predicted 50 80

P = Precision= 40 / (40 + 10) = 40 / 50

R = Recall = 40 / (40 + 15) = 40 / 55
F Score = P.R / (P+R)

29
3.2.3 Exercise
Not
Relevant Actual
Relevant
Relevant 40 (TP) 15 (FN) 55
Not
10 (FP) TN?
Relevant

Predicted 50 80

Accuracy = (TP+TN) / (TP + FP + FN + TN);

TN is not given. So, let us consider only the accuracy of
predicting relevant documents.
Accuracy = TP / (TP + FN) = 40 / 55

30
May 2024
Draw the confusion matrix and calculate precision and recall of the given
data. (3)
Data Target Prediction
1 cat cat
2 cat dog
3 dog dog
4 dog dog
5 dog cat

Spss Assignment 3
No ratings yet
Spss Assignment 3
11 pages
Integrated Mathematics IA
50% (2)
Integrated Mathematics IA
40 pages
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
CH 5
No ratings yet
CH 5
81 pages
08ClassBasic v1
No ratings yet
08ClassBasic v1
46 pages
08 Class Basic
No ratings yet
08 Class Basic
86 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
Unit 4 DM
No ratings yet
Unit 4 DM
88 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
87 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
42 pages
DM 4
No ratings yet
DM 4
68 pages
Classification: Basic Concepts
No ratings yet
Classification: Basic Concepts
73 pages
08ClassBasic L
No ratings yet
08ClassBasic L
78 pages
UNIT 2 Class Basic
No ratings yet
UNIT 2 Class Basic
69 pages
MLT 3 UNIT-Part-1
No ratings yet
MLT 3 UNIT-Part-1
28 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
Lecture 8
No ratings yet
Lecture 8
81 pages
DM 3
No ratings yet
DM 3
37 pages
Unit-3 (MLT)
No ratings yet
Unit-3 (MLT)
46 pages
VII - CS8031 - DMDW - Module 6 - Classification - VBP
No ratings yet
VII - CS8031 - DMDW - Module 6 - Classification - VBP
99 pages
Module 4
No ratings yet
Module 4
99 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
Class Basic
No ratings yet
Class Basic
75 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Data Mining & Knowledge Discovery
No ratings yet
Data Mining & Knowledge Discovery
34 pages
Trees
No ratings yet
Trees
78 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
Decision Tree
No ratings yet
Decision Tree
22 pages
04 Classification
No ratings yet
04 Classification
72 pages
Unit-3 ML
No ratings yet
Unit-3 ML
47 pages
Session 5b Classification by Decision Tree Induction
No ratings yet
Session 5b Classification by Decision Tree Induction
42 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Chapter#03 Supervised Learning and Its Algorithms - III
No ratings yet
Chapter#03 Supervised Learning and Its Algorithms - III
29 pages
05 Classification
No ratings yet
05 Classification
79 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
8 Classification
No ratings yet
8 Classification
82 pages
Supervised Learning Algorithm
No ratings yet
Supervised Learning Algorithm
59 pages
Classification
No ratings yet
Classification
45 pages
08 Class Basic
No ratings yet
08 Class Basic
76 pages
20210913115613D3708 - Session 05-08 Decision Tree Classification
No ratings yet
20210913115613D3708 - Session 05-08 Decision Tree Classification
37 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Data Mining Book
No ratings yet
Data Mining Book
84 pages
Unit 3
No ratings yet
Unit 3
98 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Unit 3-Classification
No ratings yet
Unit 3-Classification
71 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
83 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Unit 1 Classification & Prediction DM
No ratings yet
Unit 1 Classification & Prediction DM
71 pages
P9-10 ClassBasic
No ratings yet
P9-10 ClassBasic
82 pages
Classification and Prediction
100% (1)
Classification and Prediction
31 pages
The Smart Math Tricks Secrets to Solving Math Fast and Easy
From Everand
The Smart Math Tricks Secrets to Solving Math Fast and Easy
Leonardo Cruz
No ratings yet
Core Concepts in Statistical Learning
From Everand
Core Concepts in Statistical Learning
Tushar Gulati
No ratings yet
Brain Teaser (2nd Edition)
From Everand
Brain Teaser (2nd Edition)
Xing Zhou
No ratings yet
Qsar Stastistical Method in Drug Design
No ratings yet
Qsar Stastistical Method in Drug Design
54 pages
Anderson-Darling Normality Test Calculator
No ratings yet
Anderson-Darling Normality Test Calculator
6 pages
Forecasting of Motorcycle Demand Using Calender Variations, Hybrid Calender Variations-ANN and Disagregation (Case Study in Jombang)
No ratings yet
Forecasting of Motorcycle Demand Using Calender Variations, Hybrid Calender Variations-ANN and Disagregation (Case Study in Jombang)
8 pages
Statistik MBA
No ratings yet
Statistik MBA
41 pages
Dss301 Quiz
No ratings yet
Dss301 Quiz
8 pages
STA101 Lecture 8
No ratings yet
STA101 Lecture 8
26 pages
Python For Finance: Risk Measurement
No ratings yet
Python For Finance: Risk Measurement
36 pages
Feature Engineering Techniques
No ratings yet
Feature Engineering Techniques
5 pages
3SLS and FIML
No ratings yet
3SLS and FIML
5 pages
For The Learners: Math 11 (Statistics and Probability)
No ratings yet
For The Learners: Math 11 (Statistics and Probability)
12 pages
AP Statistics Name - Chapter 10 Review Part I - Multiple Choice (Questions 1-10) - Circle The Answer of Your Choice
No ratings yet
AP Statistics Name - Chapter 10 Review Part I - Multiple Choice (Questions 1-10) - Circle The Answer of Your Choice
4 pages
Berhan Abera PDF
No ratings yet
Berhan Abera PDF
57 pages
M2L2 CLRM & Simple Linear Regression Analysis
No ratings yet
M2L2 CLRM & Simple Linear Regression Analysis
13 pages
MA2 Applied Linguistics 2016: Quantitative Methods
No ratings yet
MA2 Applied Linguistics 2016: Quantitative Methods
19 pages
Scatter Diagrams
No ratings yet
Scatter Diagrams
12 pages
Questions Answers Topic 5
No ratings yet
Questions Answers Topic 5
5 pages
Dwedw
No ratings yet
Dwedw
217 pages
ISYE 6413: Design and Analysis of Experiments Spring, 2020: Jeffwu@isye - Gatech.edu
No ratings yet
ISYE 6413: Design and Analysis of Experiments Spring, 2020: Jeffwu@isye - Gatech.edu
2 pages
A Levels Stats 2 Chapter 6
100% (1)
A Levels Stats 2 Chapter 6
19 pages
Nonlife Actuarial Models: Classical Credibility
No ratings yet
Nonlife Actuarial Models: Classical Credibility
28 pages
A Note On R
No ratings yet
A Note On R
90 pages
Output - Group - Work - Project - 4652 - GWP1.ipynb - Colaboratory
No ratings yet
Output - Group - Work - Project - 4652 - GWP1.ipynb - Colaboratory
6 pages
MSC BT 1 Sem Biostatistics and Computer Applications 15002 May 2019
No ratings yet
MSC BT 1 Sem Biostatistics and Computer Applications 15002 May 2019
2 pages
Stats 2
No ratings yet
Stats 2
9 pages
Chapter 4 Demand Estimation
50% (2)
Chapter 4 Demand Estimation
8 pages
AMELIA RAHMADHANI (2156102007) Statistik Deskriptif Data Tulang Anak
No ratings yet
AMELIA RAHMADHANI (2156102007) Statistik Deskriptif Data Tulang Anak
12 pages
Summer Project - Answers
100% (1)
Summer Project - Answers
4 pages
Panel GMM Commands
No ratings yet
Panel GMM Commands
13 pages