0% found this document useful (0 votes)

9 views

Chapter 02_DM tasks_Part I_Classification

Chapter Two of the document focuses on data mining tasks, specifically classification and clustering techniques. It covers various classification methods such as decision trees, Naïve Bayes, and neural networks, along with model evaluation and performance metrics. Additionally, it discusses the concepts of supervised and unsupervised learning, providing examples and exercises for practical understanding.

Uploaded by

dine mohammed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Chapter 02_DM tasks_Part I_Classification

Uploaded by

dine mohammed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 58

DtSc 5140 : Data Analysis and Visualization

Credit Hours: 3(2+1)

Gebremedhin Gebreyohans (PhD)

March, 2025
Chapter Two

DM tasks: Part I- Classification

CHAPTER 2: DM TASKS: (CLASSIFICATION AND CLUSTERING)
◼ Classification
◼ Concepts of Classification;
◼ K-Nearest Neighbor;
◼ Decision Trees;
◼ Naïve Bayes;
◼ Neural Networks
◼ Clustering
◼ Overview of Clustering;
◼ Partitioning algorithms:
◼ K-Means & K-Medoids;
◼ Hierarchical Clustering: Divisive & Agglomerative Algorithms;
◼ Single-link, Double link & Average link clustering
Slide 1-3
Classification: Basic Concepts
❑ Classification: Basic Concepts

❑ Decision Tree Induction

❑ Bayes Classification Methods

❑ Lazy Learners (or learning from your neighbors)

❑ Linear Classifiers

❑ Model Evaluation and Selection

❑ Techniques to Improve Classification Accuracy

❑ Summary

4
Classification: Definition
• Classification is a data mining (machine learning) technique used to predict group
membership for data instances.
• Given a collection of records (training set), each record contains a set of attributes,
one of the attributes is the class.
– Find a model for class attribute as a function of the values of other attributes.
• Goal: previously unseen records should be assigned a class as accurately as possible.
A test set is used to determine the accuracy of the model.
– Usually, the given data set is divided into training and test sets, with training set
used to build the model and test set used to validate it.
• For example, one may use classification to predict whether the weather on a
particular day will be “sunny”, “rainy” or “cloudy”.

5
Supervised vs. Unsupervised Learning (1)
❑ Supervised learning (classification)
❑ Supervision: The training data such as observations or measurements are
accompanied by labels indicating the classes which they belong to
❑ New data is classified based on the models built from the training set
Training Data with class label:
Outlook Temp Humidity Windy Play Golf Training Model
Rainy Hot High False No Instance Learning
Rainy Hot High True No s
Overcast Hot High False Yes

Sunny Mild High False Yes

Sunny Cool Normal False Yes Positi

Sunny Cool Normal True No
ve
Test
Overcast Cool Normal True Yes Prediction
Instance
Rainy Mild High False No Model Negat
s
ive
6
Classification—Model Construction, Validation and
Testing
❑ Model Construction and Training
❑ Model: Represented as decision trees, rules, mathematical formulas, or other forms
❑ Assumption: Each sample belongs to a predefined class /class label
❑ Training Set: The set of samples used for model construction
❑ Model Validation and Testing:
❑ Test: Estimate accuracy of the model
❑ The known label of test sample VS. the classified result from the model
❑ Accuracy: % of test set samples that are correctly classified by the model
❑ Test set is independent of training set
❑ Validation: If the test set is used to select or refine models, it is called validation (or
development) (test) set
❑ Model Deployment: If the accuracy is acceptable, use the model to classify new data

7
Confusion Matrix & Performance Evaluation
PREDICTED CLASS
Class=Yes Class=No
ACTUAL Class=Yes a b
CLASS (TP) (FP)
Class=No c d
(FP) (TP)

• Most widely-used metric is measuring Accuracy of the system :

• Other metric for performance evaluation are Precision, Recall & F-Measure
9
10
11
Classification methods
• Goal: Predict class Ci = f(x1, x2, .. Xn)
• There are various classification methods. Popular classification
techniques include the following.
– Decision tree classifier: divide decision space into piecewise constant regions.

– Bayesian network: a probabilistic model

– K-Nearest Neighbour: classify based on similarity measurement

– Neural networks: partition by non-linear boundaries

– Support vector machine: solves non-linearly separable problems 12

Chapter 6. Classification: Basic Concepts
❑ Classification: Basic Concepts

❑ Decision Tree Induction

❑ Bayes Classification Methods

❑ Lazy Learners (or learning from your neighbors)

❑ Linear Classifiers

❑ Model Evaluation and Selection

❑ Techniques to Improve Classification Accuracy

❑ Summary

13
Decision Tree Induction: Algorithm
❑ Decision tree performs classification by constructing a tree based on training instances.
❑ The tree is traversed for each test instance to find a leaf, and the class of the leaf is the
predicted class.
❑ Basic algorithm
❑ Tree is constructed in a top-down, recursive, divide-and-conquer manner
❑ At start, all the training examples are at the root
❑ On each node, attributes are selected based on the training examples on that node, and
a heuristic or statistical measure (e.g., information gain, Gini index)
❑ Conditions for stopping partitioning
❑ All samples for a given node belong to the same class
❑ There are no remaining attributes for further partitioning – majority voting is employed
for classifying the leaf
14
❑ There are no samples left
Attribute Selection Measure
• Information gain
–Select the attribute with the highest information gain
• First, compute the disorder using Entropy; the expected information needed to classify
objects into classes
• Second, measure the Information Gain; to calculate by how much the disorder of a set
would reduce by knowing the value of a particular attribute.
– In information gain measure we want:-
• large Gain
• same as: small average disorder created
• GINI index
– An alternative to information gain that measure impurity of attributes in the classification
task
– Select the attribute with the smallest GINI value

15
Entropy
• The Entropy measures the disorder of a set S containing a total of n examples of which n+ are
positive and n- are negative and it is given by:

• Some useful properties of the Entropy:

• D(n,m) = D(m,n)
• D(0,m) = 0
• D(S)=0 means that all the examples in S have the same class
• D(m,m) = 1
• D(S)=1 means that half the examples in S are of one class and half are the opposite class
16
Information Gain
• The Information Gain measures the expected reduction in entropy due to splitting on an
attribute A

Parent Node, p is split into k partitions;

ni is number of records in partition I

• Information Gain: Measures Reduction in Entropy achieved because of the split. Choose the
split that achieves most reduction (maximizes GAIN)
– Used in ID3 and C4.5
– Disadvantage: Tends to prefer splits that result in large number of partitions, each being
small but pure.
Decision Tree Induction: An Example

18 18
19
We can’t decided on sunny and rain
because there are 2 yes and 3 no in
sunny and 3 yes and 2 no in rain but
Now, outlook is the highest gain we can decided on overcast because
Therefore, it will be the root node. all of them are categorized as yes.
For sunny

Next Step we need to calculate for sunny and rain.

first lets do for sunny which are (D1,D2,D8,D9,D11
For sunny
Now, Humidity is the highest gain
Therefore, it will be the next node.
Now, wind is the highest gain
Therefore, it will be the next node.
Classification Rules
IF outlook = “sunny” & Humidity = “high” THEN play tens = “no”
IF outlook = “sunny” & hunidity = “Normal” THEN play tens = “yes”
IF outlook = “overcast” THEN play tens = “yes
IF outlook = “rain” & wind = “strong” THEN play tens = “no”
IF outlook = “rain” & wind = “weak” THEN play tens = “yes”
Exercise 1 : Decision Tree for “buy computer or not”.
Use the training Dataset given below to construct decision tree

27
Attribute Selection by Information Gain
• Class P: buys_computer = “yes”
• Class N: buys_computer = “no”
• E(P, N) = E(9, 5) =0.940
Hence
• Compute the entropy for age:

Similarly

28
Output: A Decision Tree for “buys_computer”
age?

<=30 overcast
31..40 >40

student? yes credit rating?

no yes excellent fair

no yes no yes
Classification Rules
IF age = “<=30” & student = “no” THEN buys_computer = “no”
IF age = “<=30” & student = “yes” THEN buys_computer = “yes”
IF age = “31…40” THEN buys_computer = “yes”
IF age = “>40” & credit_rating = “excellent” THEN buys_computer = “yes”
IF age = “>40” & credit_rating = “fair” THEN buys_computer = “no”
29
Exercise 2: The problem of “Sunburn”
• You want to predict whether another person is likely to get sunburned if he is back
to the beach. How can you do this?
• Data Collected: predict based on the observed properties of the people

30
Exercise: 3 ‘is the customer Good, Doubtful or Poor?’
Customer ID Debt Income Marital Status Risk
Abel High 31 High Married Good
Ben Low High Married Doubtful
Candy Medium Low Unmarried Poor
Dale High Low Married Poor
Ellen High Low Married Poor
Fred High Low Married Poor
George Low High Unmarried Doubtful
Harry Low Medium Married Doubtful
Igor Low High Married Good
Jack High High Married Doubtful
Kate Low Low Married Poor
Lane Medium High Unmarried Good
Mary High Low Unmarried Poor
Nancy Low Medium Unmarried Doubtful
Othello Medium High Unmarried Good
Pros and Cons of decision trees
Pros Cons
• Reasonable training time Cannot handle complicated
• Fast application relationship between features
• Easy to interpret simple decision boundaries
• Easy to implement problems with lots of missing
• Can handle large number of data
features

Why decision tree induction in data mining?

• Relatively faster learning speed (than other classification methods)
• Convertible to simple and easy to understand classification if-then-else rules
•Comparable classification accuracy with other methods
• Does not require any prior knowledge of data distribution, works well on noisy data.

32
Classification: Basic Concepts
• Classification: Basic Concepts
• Decision Tree Induction
• Bayes Classification Methods
• Lazy Learners (or learning from your neighbors)
• Linear Classifiers
• Model Evaluation and Selection
• Techniques to Improve Classification Accuracy
• Summary
Why Bayesian Classification?
• Provides practical learning algorithms
– Probabilistic learning: Calculate explicit probabilities for hypothesis. E.g.
Naïve Bayes
• Prior knowledge and observed data can be combined
– Incremental: Each training example can incrementally increase/decrease the
probability that a hypothesis is correct.
• It is a generative (model based) approach, which offers a useful conceptual
framework
– Probabilistic prediction: Predict multiple hypotheses, weighted by their probabilities.
E.g. sequences could also be classified, based on a probabilistic model specification
– Any kind of objects can be classified, based on a probabilistic model specification

34
Bayesian Classifiers
• Approach:
– compute the posterior probability P(C | A1, A2, …, An) for all values of C using the Bayes
theorem

– Choose value of C that maximizes

P(C | A1, A2, …, An)

– Equivalent to choosing value of C that maximizes

P(A1, A2, …, An|C) P(C)

• How to estimate P(A1, A2, …, An | C )?

35
Example. ‘Play Tennis’ data
• Suppose that you have a free afternoon and you are thinking whether or not to go and play
tennis, How you do that? Based the training data, the goal is to Predict When This Player Will
Play Tennis?
• The following training Data Example are prepared for the classifier ?

36
Play-tennis example
Based on the examples in the table, classify the following unseen sample X :
x=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=strong)
• That means: Play tennis or not?

• Working:
– (0.64)(2/9)(3/9)(3/9)(3/9)= 0.0053
– (0.36)(3/5)(1/5)(4/5)(3/5)= 0.0206

38
Exercise: Naïve Bayes Classifier
A: attributes
M: mammals
N: non-mammals

P(A|M)P(M) > P(A|N)P(N)

=> Mammals

39
Naive Bayesian Classifier
• Advantages
–Easy to implement
–Good results obtained in most of the cases
–Robust to isolated noise points
–Handle missing values by ignoring the instance during probability estimate calculations
–Robust to irrelevant attributes
• Disadvantages
–Class conditional independence assumption may not hold for some attributes, therefore
loss of accuracy
–Practically dependencies exist among variables
• E.g. hospitals: patients: profile: age, family history, etc. symptoms: fever, cough etc.
Disease: lung cancer, diabetes, etc.
• Dependencies among these cannot be modeled by Naïve Bayesian classifier
– How to deal with these dependencies? Bayesian Belief Networks

40
Neural Network

41
Brain and Machine
• The Brain
– Pattern Recognition
– Association
– Complexity
– Noise Tolerance

• The Machine
– Calculation
– Precision
– Logic
42
Neural Network classifier
• It is represented as a layered set of interconnected processors. These processor
nodes has a relationship with the neurons of the brain. Each node has a
weighted connection to several other nodes in adjacent layers. Individual
nodes take the input received from connected nodes and use the weights
together to compute output values.
• The inputs are fed simultaneously into the input layer.
• The weighted outputs of these units are fed into hidden layer.
• The weighted outputs of the last hidden layer are inputs to units making up the
output layer.

43
Architecture of Neural network
• Neural networks are used to look for patterns in data, learn these patterns, and then
classify new patterns & make forecasts
• A network with the input and output layer only is called single-layered neural network.
Whereas, a multilayer neural network is a generalized one with one or more hidden
layer.
– A network containing two hidden layers is called a three-layer neural network, and so on.

Single layered NN Multilayer NN

x1 x1
w1
x2 x2
w2
x3 w3 x3
Input Hidden Output
nodes nodes nodes
44
A Multilayer Neural Network
• INPUT: records with class attribute with normalized
Output layer
attributes values.
–INPUT VECTOR: X = { x1, x2, …. xm}, where n is the number of
attributes.
–INPUT LAYER – there are as many nodes as class attributes i.e. as
Hidden layer
the length of the input vector.
• HIDDEN LAYER – neither its input nor its output can be
observed from outside.
–The number of nodes in the hidden layer and the number of
hidden layers depends on implementation. Input layer

• OUTPUT LAYER – corresponds to the class attribute.

–There are as many nodes as classes (values of the class attribute).
–Ok, where k= 1, 2,.. n, where n is number of classes
45
Hidden layer: Neuron with Activation
• The neuron is the basic information processing unit of a NN. It consists of:
1 A set of links, describing the neuron inputs, with weights W1, W2, …, Wm

2. An adder function (linear combiner) for computing the weighted sum of the inputs (real
numbers):

3. Activation function (also called squashing function): for limiting the output behavior of
the neuron.

46
Activation Functions

• (a) is a step function or threshold function (hardlimiting):

• (b) is a sigmoid function: 1/(1+e-x)
• Changing the bias weight W0,i moves the threshold location
– Bias helps the neural network to be more flexible since it adjust the activation function left-or-right, making it
centered on some other value than x = 0. To this effect an additional node is added to the input layer, with its
constant input; say, 1 or -1, … When this is multiplied by the weights of the hidden layer, it provides a bias (DC
offset) to activation function.

47
Activation Functions

48
Two Topologies of neural network
• NN can be designed in a feed forward or recurrent manner
• In a feed forward neural network connections between the units
do not form a directed cycle.
– In this network, the information moves in only one direction, forward, from
the input nodes, through the hidden nodes (if any) & to the output nodes.

– There are no cycles or loops or no feedback connections are present in the

network, that is, connections extending from outputs of units to inputs of
units in the same layer or previous layers.

• In recurrent networks data circulates back & forth until the

activation of the units is stabilized
– Recurrent networks have a feedback loop where data can be fed back into
the input at some point before it is fed forward again for further processing
and final output.

49
Training the neural network
• The purpose is to learn to generalize using a set of sample patterns where the
desired output is known.
• Back Propagation is the most commonly used method for training multilayer
feed forward NN.
– Back propagation learns by iteratively processing a set of training data (samples).
– For each sample, weights are modified to minimize the error between the desired
output and the actual output.
• After propagating an input through the network, the error is calculated and
the error is propagated back through the network while the weights are
adjusted in order to make the error smaller.

50
Training Algorithm
• The applied learning algorithm is as follows
–Initialize the weights and threshold to small random numbers.
–Present a vector x to the neuron inputs and calculate the output using the adder
function.

–Apply the activation function such that

–Update the weights according to the error.

51
ANN Training Example
Given the following two inputs x1, x2; find equation that helps Bias 1st input 2nd input Target
(x1) (x2) output
to draw the boundary?
• Let say we have the following initializations: -1 0 0 0
W1(0) = 0.92, W2(0) = 0.62, W0(0) = 0.22, ή = 0.1 -1 1 0 0
-1 0 1 1
-1 1 1 1
• Training – epoch 1:

y1 = 0.920 + 0.620 – 0.22 = -0.22 🡪 y = 0

y2 = 0.921 + 0.620 – 0.22 = 0.7 🡪 y =1 X

W1(1) = 0.92 + 0.1 * (0 – 1) * 1 = 0.82

W2(1) = 0.62 + 0.1 * (0 – 1) * 0 = 0.62

W0(1) = 0.22 + 0.1 * (0 – 1) * (-1)= 0.32

y3 = 0.820 + 0.621 – 0.32 = 0.3 🡪 y = 1

52
ANN Training Example
• Training – epoch 2:
y1 = 0.82*0 + 0.62*0 – 0.32 = -0.32 🡪 y= 0
y2 = 0.82*1 + 0.62*0 – 0.32 = 0.5 🡪 y= 1 X
W1(2) = 0.82 + 0.1 * (0 – 1) * 1 = 0.72
W2(2) = 0.62 + 0.1 * (0 – 1) * 0 = 0.62
W0(2) = 0.32 + 0.1 * (0 – 1) * (-1)= 0.42
y3 = 0.72*0 + 0.62*1 – 0.42 = 0.2 🡪 y= 1
y4 = 0.72*1 + 0.62*1 – 0.42 = 0.92 🡪 y = 1

• Training – epoch 3:
y1 = 0.72*0 + 0.62*0 – 0.42 = -0.42 🡪 y = 0 X
y2 = 0.72*1 + 0.62*0 – 0.42 = 0.4 🡪 y = 1
W1(3) = 0.72 + 0.1 * (0 – 1) * 1 = 0.62
W2(3) = 0.62 + 0.1 * (0 – 1) * 0 = 0.62
W0(3) = 0.42 + 0.1 * (0 – 1) * (-1)= 0.52
y3 = 0.62*0 + 0.62*1 – 0.52 = 0.1🡪 y = 1 53
ANN Training Example
• Training – epoch 4:
y1 = 0.62*0 + 0.62*0 – 0.52 = -0.52 🡪 y = 0
y2 = 0.62*1 + 0.62*0 – 0.52 = 0.10🡪 y = 1 X
W1(4) = 0.62 + 0.1 * (0 – 1) * 1 = 0.52
W2(4) = 0.62 + 0.1 * (0 – 1) * 0 = 0.62
W0(4) = 0.52 + 0.1 * (0 – 1) * (-1)= 0.62
X
y3 = 0.52*0 + 0.62*1 – 0.62 = 0 🡪 y = 0
W1(4) = 0.52 + 0.1 * (1 – 0) * 0 = 0.52
W2(4) = 0.62 + 0.1 * (1 – 0) * 1 = 0.72
W0(4) = 0.62 + 0.1 * (1 – 0) * (-1)= 0.52
y4 = 0.52*1 + 0.72*1 – 0.52 = 0.72 🡪 y = 1

• Finally:
y1 = 0.52*0 + 0.72*0 – 0.52 = -0.52 🡪 y = 0
y2 = 0.52*1 + 0.72*0 – 0.52 = -0.0 🡪 y = 0
y3 = 0.52*0 + 0.72*1 – 0.52 = 0.2 🡪 y= 1
54
y4 = 0.52*1 + 0.72*1 – 0.52 = 0.72 🡪 y= 1
ANN Training Example

1+ + +
1 +

x2 x2

0o x1 1
o o
0 x1 1
o

55
Logical Functions
• McCulloch and Pitts: Boolean function can be implemented with a
artificial neuron (not XOR).
a0 a0 a0
W0 = 1.5 a1 W0 = 0.5 W0 = -0.5
a1

W1 = 1 W1 = 1

W2 = 1 a2 W2 = 1 W1 = -1
a2 a1
AND OR NOT

AND Function OR Function

A B Outpu NOT Function
A B Output
t A Outpu
0 0 0 t
0 0 0
0 1 1 0 1
0 1 0
1 0 1 1 0
1 0 0
1 1 1
1 1 1 56
Training Perceptrons
-1 For AND
W=? A B Output
00 0
x y 01 0
W=?
10 0
11 1
W=?
y

• Initialize with random weight values. What are the

weight values?
• Use the activation function:

• By updating the weights find the equation and draw

the separating line?
57
Exercise: Training Perceptrons
For AND
-1 A B Output
W = 0.3 00 0
01 0
x y 10 0
W = 0.5
11 1
W = -0.4
y

58
Pros and Cons of Neural Network
• Useful for learning complex data like handwriting, speech
and image recognition
Cons
Pros
Slow training time
+ Can learn more complicated
Hard to interpret &
class boundaries understand the learned
+ Fast application function (weights)
+ Can handle large number of
Hard to implement: trial &
features error for choosing number of
nodes
Neural Network needs long time for training.
Neural Network has a high tolerance to noisy and
incomplete data
Conclusion: Use neural nets only if decision-trees
fail. 59

Numerical Methods Formula Sheet
100% (1)
Numerical Methods Formula Sheet
1 page
05 Classification Part1
No ratings yet
05 Classification Part1
35 pages
CH-5 DM Classification
No ratings yet
CH-5 DM Classification
31 pages
05classification Rule Mining
No ratings yet
05classification Rule Mining
56 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
05-classification
No ratings yet
05-classification
33 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
CH 5
No ratings yet
CH 5
84 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
Chap6-ClassificationBasic
No ratings yet
Chap6-ClassificationBasic
83 pages
Chap4 Classification Lecture 5
No ratings yet
Chap4 Classification Lecture 5
74 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
Week 5
No ratings yet
Week 5
72 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Lecture 6 Classification-Decision Tree Rule Based K-NN
No ratings yet
Lecture 6 Classification-Decision Tree Rule Based K-NN
73 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Classification
No ratings yet
Classification
33 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Class Basic
No ratings yet
Class Basic
67 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
Classification and Clustering
No ratings yet
Classification and Clustering
59 pages
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
No ratings yet
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
75 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
224 pages
DMDM Part 2
No ratings yet
DMDM Part 2
94 pages
Classification
No ratings yet
Classification
45 pages
LECTURE 8
No ratings yet
LECTURE 8
81 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
Classification and Prediction
100% (1)
Classification and Prediction
31 pages
Class Basic
No ratings yet
Class Basic
75 pages
05 Classification
No ratings yet
05 Classification
79 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
7 Classification
100% (3)
7 Classification
63 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
70 pages
_08ClassBasic_v1
No ratings yet
_08ClassBasic_v1
46 pages
Unit 4
No ratings yet
Unit 4
186 pages
Module 3
No ratings yet
Module 3
132 pages
20210913115613D3708 - Session 05-08 Decision Tree Classification
No ratings yet
20210913115613D3708 - Session 05-08 Decision Tree Classification
37 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
Unit 3-Classification
No ratings yet
Unit 3-Classification
71 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
40 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
Classification[1]
No ratings yet
Classification[1]
45 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
unit 2 notes (1)
No ratings yet
unit 2 notes (1)
83 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
Unit-2 Material (1)
No ratings yet
Unit-2 Material (1)
52 pages
08ClassBasic-L
No ratings yet
08ClassBasic-L
78 pages
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
No ratings yet
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
60 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
Module 4
No ratings yet
Module 4
99 pages
4 Classification
No ratings yet
4 Classification
20 pages
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
Scikit-learn Interview Questions and Answers-1
No ratings yet
Scikit-learn Interview Questions and Answers-1
2 pages
Final Exam in Image Processing
No ratings yet
Final Exam in Image Processing
2 pages
Bucket Sorting
No ratings yet
Bucket Sorting
15 pages
MIT6 003F11 F09q2 Sol PDF
No ratings yet
MIT6 003F11 F09q2 Sol PDF
14 pages
Table 8. Fisher-ADF Unit Root Tests: Eroare
No ratings yet
Table 8. Fisher-ADF Unit Root Tests: Eroare
2 pages
MCQ - Cryptography Hash Functions (Level - Easy)
100% (1)
MCQ - Cryptography Hash Functions (Level - Easy)
3 pages
Multiple Server Queues
No ratings yet
Multiple Server Queues
4 pages
Ensemble Methods
100% (1)
Ensemble Methods
15 pages
Eulerian Graphs
No ratings yet
Eulerian Graphs
47 pages
Dokumen - Pub Ordinary Differential Equations Basics and Beyond 1st Ed 1493963872 978-1-4939 6387 4 978
100% (2)
Dokumen - Pub Ordinary Differential Equations Basics and Beyond 1st Ed 1493963872 978-1-4939 6387 4 978
565 pages
A Brief Survey of Machine Learning Methods and Their Sensor and IoT Applications
No ratings yet
A Brief Survey of Machine Learning Methods and Their Sensor and IoT Applications
8 pages
Missing Data Imputation Using Singular Value Decomposition
No ratings yet
Missing Data Imputation Using Singular Value Decomposition
6 pages
Robust Filtered Smith Predictor For Processes With Time - 2020 - European Journ
No ratings yet
Robust Filtered Smith Predictor For Processes With Time - 2020 - European Journ
13 pages
US - TMC - 05 - Optimization 2022
No ratings yet
US - TMC - 05 - Optimization 2022
43 pages
COS3751 Nov 2022 Exams
No ratings yet
COS3751 Nov 2022 Exams
8 pages
Megatron LM
No ratings yet
Megatron LM
15 pages
DAAQB Test1
No ratings yet
DAAQB Test1
3 pages
Introduction To Problem-Solving
No ratings yet
Introduction To Problem-Solving
3 pages
DSA - Answer Key
No ratings yet
DSA - Answer Key
3 pages
Ccs 2017
No ratings yet
Ccs 2017
16 pages
862-Article Text-2984-1-10-20230105 2
No ratings yet
862-Article Text-2984-1-10-20230105 2
14 pages
AIML- Module 3- Updated
No ratings yet
AIML- Module 3- Updated
42 pages
(CS6701 PTCS6701)
No ratings yet
(CS6701 PTCS6701)
3 pages
DAA Chandrakanta Mahanty
No ratings yet
DAA Chandrakanta Mahanty
103 pages
explain resolution in first order logic. explain in detail with easy example in steps - Google Search
No ratings yet
explain resolution in first order logic. explain in detail with easy example in steps - Google Search
2 pages
Gujarat Technological University: Page 1 of 2
No ratings yet
Gujarat Technological University: Page 1 of 2
2 pages
Cheat Sheet
No ratings yet
Cheat Sheet
20 pages
2 - Lecture1
No ratings yet
2 - Lecture1
2 pages

Chapter 02_DM tasks_Part I_Classification

Uploaded by

Chapter 02_DM tasks_Part I_Classification

Uploaded by

DtSc 5140 : Data Analysis and Visualization

Credit Hours: 3(2+1)

Gebremedhin Gebreyohans (PhD)

DM tasks: Part I- Classification

❑ Decision Tree Induction

❑ Bayes Classification Methods

❑ Lazy Learners (or learning from your neighbors)

❑ Model Evaluation and Selection

❑ Techniques to Improve Classification Accuracy

Sunny Mild High False Yes

Sunny Cool Normal False Yes Positi

• Most widely-used metric is measuring Accuracy of the system :

– Bayesian network: a probabilistic model

– K-Nearest Neighbour: classify based on similarity measurement

– Neural networks: partition by non-linear boundaries

– Support vector machine: solves non-linearly separable problems 12

❑ Decision Tree Induction

❑ Bayes Classification Methods

❑ Lazy Learners (or learning from your neighbors)

❑ Model Evaluation and Selection

❑ Techniques to Improve Classification Accuracy

• Some useful properties of the Entropy:

Parent Node, p is split into k partitions;

Next Step we need to calculate for sunny and rain.

student? yes credit rating?

no yes excellent fair

Why decision tree induction in data mining?

– Choose value of C that maximizes

– Equivalent to choosing value of C that maximizes

• How to estimate P(A1, A2, …, An | C )?

P(A|M)P(M) > P(A|N)P(N)

Single layered NN Multilayer NN

• OUTPUT LAYER – corresponds to the class attribute.

• (a) is a step function or threshold function (hardlimiting):

– There are no cycles or loops or no feedback connections are present in the

• In recurrent networks data circulates back & forth until the

–Apply the activation function such that

–Update the weights according to the error.

y1 = 0.92*0 + 0.62*0 – 0.22 = -0.22 🡪 y = 0

y2 = 0.92*1 + 0.62*0 – 0.22 = 0.7 🡪 y =1 X

W1(1) = 0.92 + 0.1 * (0 – 1) * 1 = 0.82

W2(1) = 0.62 + 0.1 * (0 – 1) * 0 = 0.62

W0(1) = 0.22 + 0.1 * (0 – 1) * (-1)= 0.32

y3 = 0.82*0 + 0.62*1 – 0.32 = 0.3 🡪 y = 1

AND Function OR Function

• Initialize with random weight values. What are the

• By updating the weights find the equation and draw

You might also like

y1 = 0.920 + 0.620 – 0.22 = -0.22 🡪 y = 0

y2 = 0.921 + 0.620 – 0.22 = 0.7 🡪 y =1 X

y3 = 0.820 + 0.621 – 0.32 = 0.3 🡪 y = 1