0% found this document useful (0 votes)

5 views69 pages

Donalek Classif

The document discusses supervised and unsupervised learning within the context of Knowledge Discovery in Databases (KDD) and data mining tasks. It covers various models including neural networks, decision trees, and clustering techniques, as well as data preprocessing methods like handling missing data and normalization. The document emphasizes the importance of selecting appropriate models and techniques based on the nature of the data and the analysis required.

Uploaded by

Ahi Krishna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views69 pages

Donalek Classif

Uploaded by

Ahi Krishna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

Supervised and Unsupervised

Learning
Ciro Donalek
Ay/Bi 199 – April 2011
Summary
• KDD and Data Mining Tasks
• Finding the opmal approach
• Supervised Models
– Neural Networks
– Mul Layer Perceptron
– Decision Trees
• Unsupervised Models
– Dierent Types of Clustering
– Distances and Normalizaon
– Kmeans
– Self Organizing Maps
• Combining dierent models
– Commiee Machines
– Introducing a Priori Knowledge
– Sleeping Expert Framework
Knowledge Discovery in Databases
• KDD may be dened as: "The non trivial process of
idenfying valid, novel, potenally useful, and
ulmately understandable paerns in data".
• KDD is an interacve and iterave process involving
several steps.
You got your data: what’s next?

What kind of analysis do you need? Which model is more appropriate for it? …
Clean your data!
• Data preprocessing transforms the raw data
into a format that will be more easily and
eecvely processed for the purpose of the
user.
• Some tasks
• sampling: selects a representave subset
from a large populaon of data;
Use standard
• Noise treatment formats!
• strategies to handle missing data: somemes
your rows will be incomplete, not all
parameters are measured for all samples.
• normalizaon
• feature extracon: pulls out specied data
that is signicant in some parcular context.
Missing Data
• Missing data are a part of almost all research, and we all have to
decide how to deal with it.
• Complete Case Analysis: use only rows with all the values
• Available Case Analysis
• Substuon
– Mean Value: replace the missing value with the
mean value for that parcular aribute
– Regression Substuon: we can replace the
missing value with historical value from similar cases
– Matching Imputaon: for each unit with a missing y,
nd a unit with similar values of x in the observed
data and take its y value
– Maximum Likelihood, EM, etc
• Some DM models can deal with missing data beer than others.
• Which technique to adopt really depends on your data
Data Mining
• Crucial task within the KDD
• Data Mining is about automang the process of
searching for paerns in the data.
• More in details, the most relevant DM tasks are:
– associaon
– sequence or path analysis
– clustering
– classicaon
– regression
– visualizaon
Finding Soluon via Purposes
• You have your data, what kind of analysis do you need?

• Regression
– predict new values based on the past, inference
– compute the new values for a dependent variable based on the
values of one or more measured aributes
• Classicaon:
– divide samples in classes
– use a trained set of previously labeled data
• Clustering
– paroning of a data set into subsets (clusters) so that data in
each subset ideally share some common characteriscs

• Classicaon is in a some way similar to the clustering, but requires

that the analyst know ahead of me how classes are dened.
Cluster Analysis

How many clusters do you expect?

Search for Outliers
Classicaon
• Data mining technique used to predict group membership for
data instances. There are two ways to assign a new value to a
given class.

• Crispy classicaon
– given an input, the classier returns its label

• Probabilisc classicaon
– given an input, the classier returns its probabilies to belong to
each class
– useful when some mistakes can be more
costly than others (give me only data >90%)
– winner take all and other rules
• assign the object to the class with the
highest probability (WTA)
• …but only if its probability is greater than 40%
(WTA with thresholds)
Regression / Forecasng
• Data table stascal correlaon
– mapping without any prior assumpon on the funconal
form of the data distribuon;
– machine learning algorithms well suited for this.
• Curve ng
– nd a well dened and known
funcon underlying your data;
– theory / experse can help.
Machine Learning
• To learn: to get knowledge of by study, experience,
or being taught.

• Types of Learning
• Supervised
• Unsupervised
Unsupervised Learning
• The model is not provided with the correct results
during the training.
• Can be used to cluster the input data in classes on
the basis of their stascal properes only.
• Cluster signicance and labeling.
• The labeling can be carried out even if the labels are
only available for a small number of objects
representave of the desired classes.
Supervised Learning
• Training data includes both the input and the
desired results.
• For some examples the correct results (targets) are
known and are given in input to the model during
the learning process.
• The construcon of a proper training, validaon and
test set (Bok) is crucial.
• These methods are usually fast and accurate.
• Have to be able to generalize: give the correct
results when new data are given in input without
knowing a priori the target.
Generalizaon
• Refers to the ability to produce reasonable outputs
for inputs not encountered during the training.

In other words: NO PANIC when

"never seen before" data are given
in input!
A common problem: OVERFITTING
• Learn the “data” and not the underlying funcon
• Performs well on the data used during the training
and poorly with new data.

How to avoid it: use proper subsets, early stopping.

Datasets
• Training set: a set of examples used for learning,
where the target value is known.
• Validaon set: a set of examples used to tune the
architecture of a classier and esmate the error.
• Test set: used only to assess the performances of a
classier. It is never used
during the training process
so that the error on the test
set provides an unbiased
esmate of the generalizaon
error.
IRIS dataset
• IRIS
– consists of 3 classes, 50 instances each
– 4 numerical aributes (sepal and petal length and width
in cm)
– each class refers to a type of Iris plant (Setosa, Versicolor,
Verginica)
– the rst class is linearly separable
from the other two while the 2nd
and the 3rd are not linearly
separable
Arfacts Dataset
• PQ Arfacts
– 2 main classes and 4 numerical aributes
– classes are: true objects, arfacts
Data Selecon
• “Garbage in, garbage out ”: training, validaon and
test data must be representave of the underlying
model
• All eventualies must be covered
• Unbalanced datasets
– since the network minimizes the overall error, the proporon
of types of data in the set is crical;
– inclusion of a loss matrix (Bishop,1995);
– oen, the best approach is to ensure even representaon of
dierent cases, then to interpret the network's decisions
accordingly.
Arcial Neural Network
An Arcial Neural Network is an
informaon processing paradigm
that is inspired by the way
biological nervous systems process
informaon:

“a large number of highly

interconnected simple processing
elements (neurons) working
together to solve specic
problems”
A simple arcial neuron
• The basic computaonal element is oen called a node or unit. It
receives input from some other units, or from an external source.
• Each input has an associated weight w, which can be modied so
as to model synapc learning.
• The unit computes some funcon of the weighted sum of its
inputs:
Neural Networks
A Neural Network is usually structured into an input layer of neurons, one or
more hidden layers and one output layer.
Neurons belonging to adjacent layers are usually fully connected and the
various types and architectures are idened both by the dierent topologies
adopted for the connecons as well by the choice of the acvaon funcon.
The values of the funcons associated with the connecons are called
“weights”.

The whole game of using NNs is in the fact

that, in order for the network to yield
appropriate outputs for given inputs, the
weight must be set to suitable values.

The way this is obtained allows a further

disncon among modes of operaons.
Neural Networks: types
Feedforward: Single Layer Perceptron, MLP, ADALINE (Adapve Linear
Neuron), RBF
Self‐Organized: SOM (Kohonen Maps)
Recurrent: Simple Recurrent Network,
Hopeld Network.
Stochasc: Boltzmann machines, RBM.
Modular: Commiee of Machines, ASNN
(Associave Neural Networks),
Ensembles.
Others: Instantaneously Trained, Spiking
(SNN), Dynamic, Cascades, NeuroFuzzy,
PPS, GTM.
Mul Layer Perceptron
• The MLP is one of the most used supervised model:
it consists of mulple layers of computaonal units,
usually interconnected in a feed‐forward way.
• Each neuron in one layer has direct connecons to
all the neurons of the subsequent layer.
Learning Process
• Back Propagaon
– the output values are compared with the target to compute the value
of some predened error funcon
– the error is then fedback through the network
– using this informaon, the algorithm adjusts the weights of each
connecon in order to reduce the value of the error funcon

Aer repeang this process for a suciently large number of training cycles,
the network will usually converge.
Hidden Units
• The best number of hidden units depend on:
– number of inputs and outputs
– number of training case
– the amount of noise in the targets
– the complexity of the funcon to be learned
– the acvaon funcon

• Too few hidden units => high training and generalizaon error, due to
underng and high stascal bias.
• Too many hidden units => low training error but high generalizaon
error, due to overng and high variance.
• Rules of thumb don't usually work.
Acvaon and Error Funcons
Acvaon Funcons
Results: confusion matrix
Results: completeness and contaminaon

Exercise: compute completeness and contaminaon for the previous confusion matrix (test set)
Decision Trees
• Is another classicaon method.
• A decision tree is a set of simple rules, such as "if the
sepal length is less than 5.45, classify the specimen as
setosa."
• Decision trees are also nonparametric because they do
not require any assumpons about the distribuon of
the variables in each class.
Summary
• KDD and Data Mining Tasks
• Finding the opmal approach
• Supervised Models
– Neural Networks
– Mul Layer Perceptron
– Decision Trees
• Unsupervised Models
– Dierent Types of Clustering
– Distances and Normalizaon
– Kmeans
– Self Organizing Maps
• Combining dierent models
– Commiee Machines
– Introducing a Priori Knowledge
– Sleeping Expert Framework
Unsupervised Learning
• The model is not provided with the correct results
during the training.
• Can be used to cluster the input data in classes on
the basis of their stascal properes only.
• Cluster signicance and labeling.
• The labeling can be carried out even if the labels are
only available for a small number of objects
representave of the desired classes.
Types of Clustering
• Types of clustering:
– HIERARCHICAL: nds successive clusters using previously
established clusters
• agglomerave (boom‐up): start with each element in a separate cluster
and merge them accordingly to a given property
• divisive (top‐down)
– PARTITIONAL: usually determines all clusters at once
Distances
• Determine the similarity between two clusters and
the shape of the clusters.
In case of strings…
• The Hamming distance between two strings of equal length is
the number of posions at which the corresponding symbols
are dierent.
– measures the minimum number of substuons required to
change one string into the other
• The Levenshtein (edit) distance is a metric for measuring the
amount of dierence between two sequences.
– is dened as the minimum number of edits needed to transform
one string into the other.

1001001 LD(BIOLOGY, BIOLOGIA)=2

1000100 BIOLOGY ‐> BIOLOGI (substuon)
HD=3 BIOLOGI ‐> BIOLOGIA (inseron)
Normalizaon
VAR: the mean of each aribute
of the transformed set of data
points is reduced to zero by
subtracng the mean of each
aribute from the values of the
aributes and dividing the result
by the standard deviaon of the
aribute.

RANGE (Min‐Max Normalizaon): subtracts the minimum value of an aribute from each value
of the aribute and then divides the dierence by the range of the aribute. It has the
advantage of preserving exactly all relaonship in the data, without adding any bias.

SOFTMAX: is a way of reducing the inuence of extreme values or outliers in the data without
removing them from the data set. It is useful when you have outlier data that you wish to
include in the data set while sll preserving the signicance of data within a standard deviaon
of the mean.
KMeans
KMeans: how it works
Kmeans: Pro and Cons
Learning K
• Find a balance between two variables: the number of
clusters (K) and the average variance of the clusters.
• Minimize both values
• As the number of clusters increases, the average
variance decreases (up to the trivial case of k=n and
variance=0).
• Some criteria:
– BIC (Bayesian Informaon Criteria)
– AIC (Akaike Informaon Criteria)
– Davis‐Bouldin Index
– Confusion Matrix
Self Organizing Maps
SOM topology
SOM Prototypes
SOM Training
Compeve and Cooperave Learning
SOM Update Rule
Parameters
DM with SOM
SOM Labeling
Localizing Data
Cluster Structure
Cluster Structure ‐ 2
Component Planes
Relave Importance
How accurate is your clustering
Trajectories
Combining Models
Commiee Machines
A priori knowledge
Sleeping Experts

An Introduction To Machine Learning
No ratings yet
An Introduction To Machine Learning
136 pages
AIML Unit-IV & V
100% (1)
AIML Unit-IV & V
47 pages
Workflow of A Machine Learning Project
No ratings yet
Workflow of A Machine Learning Project
12 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Artificial Intelligence - Machine Learning Fundamentals
No ratings yet
Artificial Intelligence - Machine Learning Fundamentals
31 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
19 pages
Unit4 PPT
No ratings yet
Unit4 PPT
118 pages
Classification (Part II)
No ratings yet
Classification (Part II)
162 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
90 pages
IntroClassificationDA 2024
No ratings yet
IntroClassificationDA 2024
129 pages
Classification and Clustering Algorithms
No ratings yet
Classification and Clustering Algorithms
108 pages
Operating System Os Notes New Cs 2nd Year
No ratings yet
Operating System Os Notes New Cs 2nd Year
89 pages
CSCI946 W5-Classification
No ratings yet
CSCI946 W5-Classification
72 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
Lect3 Machine Learning
No ratings yet
Lect3 Machine Learning
27 pages
CH 4
No ratings yet
CH 4
106 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
Chapter 4 Classification
No ratings yet
Chapter 4 Classification
78 pages
Dav Unit 3
No ratings yet
Dav Unit 3
50 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
55 pages
FML - KNN
No ratings yet
FML - KNN
64 pages
Module 1
No ratings yet
Module 1
50 pages
ML Chap 2
No ratings yet
ML Chap 2
60 pages
NLP Chapter 2
No ratings yet
NLP Chapter 2
79 pages
Classification
No ratings yet
Classification
53 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
47 pages
AI Unit 1
No ratings yet
AI Unit 1
30 pages
Class10-Introduction To ML
No ratings yet
Class10-Introduction To ML
32 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
Machine
No ratings yet
Machine
61 pages
Machine Learning Fundamentals (Updated)
No ratings yet
Machine Learning Fundamentals (Updated)
42 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
31 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
Data Mining Intro IEP
No ratings yet
Data Mining Intro IEP
47 pages
DM Assignment 2
No ratings yet
DM Assignment 2
23 pages
Lec 12 NN
No ratings yet
Lec 12 NN
20 pages
AI Chapter 5
No ratings yet
AI Chapter 5
31 pages
Machine Learning
No ratings yet
Machine Learning
28 pages
Unit Iii Classification
No ratings yet
Unit Iii Classification
57 pages
Unit 3
No ratings yet
Unit 3
33 pages
Asset-V1 ColumbiaX+CSMM.101x+1T2017+type@asset+block@AI Edx ML 5.1intro
No ratings yet
Asset-V1 ColumbiaX+CSMM.101x+1T2017+type@asset+block@AI Edx ML 5.1intro
70 pages
MachineLearning Lecture 2
No ratings yet
MachineLearning Lecture 2
23 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
Fam Question Bank CT
No ratings yet
Fam Question Bank CT
14 pages
Lecture#12 DM MS (DEIM) Spring 2025
No ratings yet
Lecture#12 DM MS (DEIM) Spring 2025
21 pages
Data Mining All Summary
No ratings yet
Data Mining All Summary
47 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
56 pages
Machine Learning Part: Domain Overview
No ratings yet
Machine Learning Part: Domain Overview
20 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
22 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
47 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Practical Cryptography With Go
No ratings yet
Practical Cryptography With Go
54 pages
Classifiers (Support Vector Machines, Decision Trees, Nearest Neighbor Classification)
No ratings yet
Classifiers (Support Vector Machines, Decision Trees, Nearest Neighbor Classification)
16 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
4 pages
Bia Unit-3 Part-2
No ratings yet
Bia Unit-3 Part-2
43 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Numdfgh CSE15
No ratings yet
Numdfgh CSE15
839 pages
AnIntroductiontoMachineLearning - Thebook
No ratings yet
AnIntroductiontoMachineLearning - Thebook
234 pages
9-Error Detection and Correction-21!01!2022 (21-Jan-2022) Material I 21-01-2022 Error Detection - Correction-Up
No ratings yet
9-Error Detection and Correction-21!01!2022 (21-Jan-2022) Material I 21-01-2022 Error Detection - Correction-Up
56 pages
Daa Sly
No ratings yet
Daa Sly
4 pages
Cloud Computing Ch1 Notes - Introduction
No ratings yet
Cloud Computing Ch1 Notes - Introduction
14 pages
Cloud Computing Unit2 - Cloud Architecture
No ratings yet
Cloud Computing Unit2 - Cloud Architecture
15 pages
Question Bank of OT
No ratings yet
Question Bank of OT
15 pages
Problem Set 4
No ratings yet
Problem Set 4
4 pages
Sequencing Model
No ratings yet
Sequencing Model
10 pages
9-UNIT-Finding Roots of equations-Newton-Raphson Method
No ratings yet
9-UNIT-Finding Roots of equations-Newton-Raphson Method
19 pages
Solved Examples For Chapter 19
No ratings yet
Solved Examples For Chapter 19
7 pages
DC Tutorial Sheet 1
No ratings yet
DC Tutorial Sheet 1
2 pages
IJETAUTISMPAPER
No ratings yet
IJETAUTISMPAPER
7 pages
AIM: Program To Find Factorial of A Number by Using Recursion
No ratings yet
AIM: Program To Find Factorial of A Number by Using Recursion
18 pages
Unit 5-Cloud Applications
No ratings yet
Unit 5-Cloud Applications
8 pages
The QR Method
No ratings yet
The QR Method
15 pages
Cloud Application Programming and The Aneka Platform - Unit 3
No ratings yet
Cloud Application Programming and The Aneka Platform - Unit 3
13 pages
Slide - SVM
No ratings yet
Slide - SVM
12 pages
ML Syllabus
No ratings yet
ML Syllabus
3 pages
DL Cif 2023
No ratings yet
DL Cif 2023
3 pages
CPCS 223 - Final Exam (85 Points) - 120 Minutes: Name: Number
No ratings yet
CPCS 223 - Final Exam (85 Points) - 120 Minutes: Name: Number
3 pages
Naive Bayes
No ratings yet
Naive Bayes
16 pages
Signals and Systems Lab2. Linear Time-Invariant Systems
No ratings yet
Signals and Systems Lab2. Linear Time-Invariant Systems
50 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
A New Unconstrained Optimization Method For Imprecise Function and Gradient Values
No ratings yet
A New Unconstrained Optimization Method For Imprecise Function and Gradient Values
22 pages
BCA Data Structures Answers
No ratings yet
BCA Data Structures Answers
5 pages
Assignment 1 - Opti 2 - Report
No ratings yet
Assignment 1 - Opti 2 - Report
13 pages
Decision Tree
No ratings yet
Decision Tree
2 pages
Final EED364 End Sem 2023 GSP
No ratings yet
Final EED364 End Sem 2023 GSP
2 pages
Implementation of Deadlock Avoidance Banker's Algorithm
No ratings yet
Implementation of Deadlock Avoidance Banker's Algorithm
5 pages
8 Reducibility
No ratings yet
8 Reducibility
3 pages
Dpp-3 Division Algorithm, Factor & Remainder Theorem
No ratings yet
Dpp-3 Division Algorithm, Factor & Remainder Theorem
2 pages
WWW Gradplus Pro Lessons Elective IV Digital Image Processing Nagpur University
No ratings yet
WWW Gradplus Pro Lessons Elective IV Digital Image Processing Nagpur University
3 pages
Ecc 501 Cat 1 2020 2021
No ratings yet
Ecc 501 Cat 1 2020 2021
4 pages
OTE Assignment-1 PDF
No ratings yet
OTE Assignment-1 PDF
2 pages
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet

Donalek Classif

Uploaded by

Donalek Classif

Uploaded by

Supervised and Unsupervised

• Classicaon is in a some way similar to the clustering, but requires

How many clusters do you expect?

In other words: NO PANIC when

How to avoid it: use proper subsets, early stopping.

“a large number of highly

The whole game of using NNs is in the fact

The way this is obtained allows a further

1001001 LD(BIOLOGY, BIOLOGIA)=2

You might also like