0% found this document useful (0 votes)
4 views69 pages

Donalek Classif

The document discusses supervised and unsupervised learning within the context of Knowledge Discovery in Databases (KDD) and data mining tasks. It covers various models including neural networks, decision trees, and clustering techniques, as well as data preprocessing methods like handling missing data and normalization. The document emphasizes the importance of selecting appropriate models and techniques based on the nature of the data and the analysis required.

Uploaded by

Ahi Krishna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views69 pages

Donalek Classif

The document discusses supervised and unsupervised learning within the context of Knowledge Discovery in Databases (KDD) and data mining tasks. It covers various models including neural networks, decision trees, and clustering techniques, as well as data preprocessing methods like handling missing data and normalization. The document emphasizes the importance of selecting appropriate models and techniques based on the nature of the data and the analysis required.

Uploaded by

Ahi Krishna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Supervised and Unsupervised

Learning
Ciro Donalek
Ay/Bi 199 – April 2011
Summary
• KDD and Data Mining Tasks
• Finding the opmal approach
• Supervised Models
– Neural Networks
– Mul Layer Perceptron
– Decision Trees
• Unsupervised Models
– Dierent Types of Clustering
– Distances and Normalizaon
– Kmeans
– Self Organizing Maps
• Combining dierent models
– Commiee Machines
– Introducing a Priori Knowledge
– Sleeping Expert Framework
Knowledge Discovery in Databases
• KDD may be dened as: "The non trivial process of
idenfying valid, novel, potenally useful, and
ulmately understandable paerns in data".
• KDD is an interacve and iterave process involving
several steps.
You got your data: what’s next?

What kind of analysis do you need? Which model is more appropriate for it? …
Clean your data!
• Data preprocessing transforms the raw data
into a format that will be more easily and
eecvely processed for the purpose of the
user.
• Some tasks
• sampling: selects a representave subset
from a large populaon of data;
Use standard
• Noise treatment formats!
• strategies to handle missing data: somemes
your rows will be incomplete, not all
parameters are measured for all samples.
• normalizaon
• feature extracon: pulls out specied data
that is signicant in some parcular context.
Missing Data
• Missing data are a part of almost all research, and we all have to
decide how to deal with it.
• Complete Case Analysis: use only rows with all the values
• Available Case Analysis
• Substuon
– Mean Value: replace the missing value with the
mean value for that parcular aribute
– Regression Substuon: we can replace the
missing value with historical value from similar cases
– Matching Imputaon: for each unit with a missing y,
nd a unit with similar values of x in the observed
data and take its y value
– Maximum Likelihood, EM, etc
• Some DM models can deal with missing data beer than others.
• Which technique to adopt really depends on your data
Data Mining
• Crucial task within the KDD
• Data Mining is about automang the process of
searching for paerns in the data.
• More in details, the most relevant DM tasks are:
– associaon
– sequence or path analysis
– clustering
– classicaon
– regression
– visualizaon
Finding Soluon via Purposes
• You have your data, what kind of analysis do you need?

• Regression
– predict new values based on the past, inference
– compute the new values for a dependent variable based on the
values of one or more measured aributes
• Classicaon:
– divide samples in classes
– use a trained set of previously labeled data
• Clustering
– paroning of a data set into subsets (clusters) so that data in
each subset ideally share some common characteriscs

• Classicaon is in a some way similar to the clustering, but requires


that the analyst know ahead of me how classes are dened.
Cluster Analysis

How many clusters do you expect?


Search for Outliers
Classicaon
• Data mining technique used to predict group membership for
data instances. There are two ways to assign a new value to a
given class.

• Crispy classicaon
– given an input, the classier returns its label

• Probabilisc classicaon
– given an input, the classier returns its probabilies to belong to
each class
– useful when some mistakes can be more
costly than others (give me only data >90%)
– winner take all and other rules
• assign the object to the class with the
highest probability (WTA)
• …but only if its probability is greater than 40%
(WTA with thresholds)
Regression / Forecasng
• Data table stascal correlaon
– mapping without any prior assumpon on the funconal
form of the data distribuon;
– machine learning algorithms well suited for this.
• Curve ng
– nd a well dened and known
funcon underlying your data;
– theory / experse can help.
Machine Learning
• To learn: to get knowledge of by study, experience,
or being taught.

• Types of Learning
• Supervised
• Unsupervised
Unsupervised Learning
• The model is not provided with the correct results
during the training.
• Can be used to cluster the input data in classes on
the basis of their stascal properes only.
• Cluster signicance and labeling.
• The labeling can be carried out even if the labels are
only available for a small number of objects
representave of the desired classes.
Supervised Learning
• Training data includes both the input and the
desired results.
• For some examples the correct results (targets) are
known and are given in input to the model during
the learning process.
• The construcon of a proper training, validaon and
test set (Bok) is crucial.
• These methods are usually fast and accurate.
• Have to be able to generalize: give the correct
results when new data are given in input without
knowing a priori the target.
Generalizaon
• Refers to the ability to produce reasonable outputs
for inputs not encountered during the training.

In other words: NO PANIC when


"never seen before" data are given
in input!
A common problem: OVERFITTING
• Learn the “data” and not the underlying funcon
• Performs well on the data used during the training
and poorly with new data.

How to avoid it: use proper subsets, early stopping.


Datasets
• Training set: a set of examples used for learning,
where the target value is known.
• Validaon set: a set of examples used to tune the
architecture of a classier and esmate the error.
• Test set: used only to assess the performances of a
classier. It is never used
during the training process
so that the error on the test
set provides an unbiased
esmate of the generalizaon
error.
IRIS dataset
• IRIS
– consists of 3 classes, 50 instances each
– 4 numerical aributes (sepal and petal length and width
in cm)
– each class refers to a type of Iris plant (Setosa, Versicolor,
Verginica)
– the rst class is linearly separable
from the other two while the 2nd
and the 3rd are not linearly
separable
Arfacts Dataset
• PQ Arfacts
– 2 main classes and 4 numerical aributes
– classes are: true objects, arfacts
Data Selecon
• “Garbage in, garbage out ”: training, validaon and
test data must be representave of the underlying
model
• All eventualies must be covered
• Unbalanced datasets
– since the network minimizes the overall error, the proporon
of types of data in the set is crical;
– inclusion of a loss matrix (Bishop,1995);
– oen, the best approach is to ensure even representaon of
dierent cases, then to interpret the network's decisions
accordingly.
Arcial Neural Network
An Arcial Neural Network is an
informaon processing paradigm
that is inspired by the way
biological nervous systems process
informaon:

“a large number of highly


interconnected simple processing
elements (neurons) working
together to solve specic
problems”
A simple arcial neuron
• The basic computaonal element is oen called a node or unit. It
receives input from some other units, or from an external source.
• Each input has an associated weight w, which can be modied so
as to model synapc learning.
• The unit computes some funcon of the weighted sum of its
inputs:
Neural Networks
A Neural Network is usually structured into an input layer of neurons, one or
more hidden layers and one output layer.
Neurons belonging to adjacent layers are usually fully connected and the
various types and architectures are idened both by the dierent topologies
adopted for the connecons as well by the choice of the acvaon funcon.
The values of the funcons associated with the connecons are called
“weights”.

The whole game of using NNs is in the fact


that, in order for the network to yield
appropriate outputs for given inputs, the
weight must be set to suitable values.

The way this is obtained allows a further


disncon among modes of operaons.
Neural Networks: types
Feedforward: Single Layer Perceptron, MLP, ADALINE (Adapve Linear
Neuron), RBF
Self‐Organized: SOM (Kohonen Maps)
Recurrent: Simple Recurrent Network,
Hopeld Network.
Stochasc: Boltzmann machines, RBM.
Modular: Commiee of Machines, ASNN
(Associave Neural Networks),
Ensembles.
Others: Instantaneously Trained, Spiking
(SNN), Dynamic, Cascades, NeuroFuzzy,
PPS, GTM.
Mul Layer Perceptron
• The MLP is one of the most used supervised model:
it consists of mulple layers of computaonal units,
usually interconnected in a feed‐forward way.
• Each neuron in one layer has direct connecons to
all the neurons of the subsequent layer.
Learning Process
• Back Propagaon
– the output values are compared with the target to compute the value
of some predened error funcon
– the error is then fedback through the network
– using this informaon, the algorithm adjusts the weights of each
connecon in order to reduce the value of the error funcon

Aer repeang this process for a suciently large number of training cycles,
the network will usually converge.
Hidden Units
• The best number of hidden units depend on:
– number of inputs and outputs
– number of training case
– the amount of noise in the targets
– the complexity of the funcon to be learned
– the acvaon funcon

• Too few hidden units => high training and generalizaon error, due to
underng and high stascal bias.
• Too many hidden units => low training error but high generalizaon
error, due to overng and high variance.
• Rules of thumb don't usually work.
Acvaon and Error Funcons
Acvaon Funcons
Results: confusion matrix
Results: completeness and contaminaon

Exercise: compute completeness and contaminaon for the previous confusion matrix (test set)
Decision Trees
• Is another classicaon method.
• A decision tree is a set of simple rules, such as "if the
sepal length is less than 5.45, classify the specimen as
setosa."
• Decision trees are also nonparametric because they do
not require any assumpons about the distribuon of
the variables in each class.
Summary
• KDD and Data Mining Tasks
• Finding the opmal approach
• Supervised Models
– Neural Networks
– Mul Layer Perceptron
– Decision Trees
• Unsupervised Models
– Dierent Types of Clustering
– Distances and Normalizaon
– Kmeans
– Self Organizing Maps
• Combining dierent models
– Commiee Machines
– Introducing a Priori Knowledge
– Sleeping Expert Framework
Unsupervised Learning
• The model is not provided with the correct results
during the training.
• Can be used to cluster the input data in classes on
the basis of their stascal properes only.
• Cluster signicance and labeling.
• The labeling can be carried out even if the labels are
only available for a small number of objects
representave of the desired classes.
Types of Clustering
• Types of clustering:
– HIERARCHICAL: nds successive clusters using previously
established clusters
• agglomerave (boom‐up): start with each element in a separate cluster
and merge them accordingly to a given property
• divisive (top‐down)
– PARTITIONAL: usually determines all clusters at once
Distances
• Determine the similarity between two clusters and
the shape of the clusters.
In case of strings…
• The Hamming distance between two strings of equal length is
the number of posions at which the corresponding symbols
are dierent.
– measures the minimum number of substuons required to
change one string into the other
• The Levenshtein (edit) distance is a metric for measuring the
amount of dierence between two sequences.
– is dened as the minimum number of edits needed to transform
one string into the other.

1001001 LD(BIOLOGY, BIOLOGIA)=2


1000100 BIOLOGY ‐> BIOLOGI (substuon)
HD=3 BIOLOGI ‐> BIOLOGIA (inseron)
Normalizaon
VAR: the mean of each aribute
of the transformed set of data
points is reduced to zero by
subtracng the mean of each
aribute from the values of the
aributes and dividing the result
by the standard deviaon of the
aribute.

RANGE (Min‐Max Normalizaon): subtracts the minimum value of an aribute from each value
of the aribute and then divides the dierence by the range of the aribute. It has the
advantage of preserving exactly all relaonship in the data, without adding any bias.

SOFTMAX: is a way of reducing the inuence of extreme values or outliers in the data without
removing them from the data set. It is useful when you have outlier data that you wish to
include in the data set while sll preserving the signicance of data within a standard deviaon
of the mean.
KMeans
KMeans: how it works
Kmeans: Pro and Cons
Learning K
• Find a balance between two variables: the number of
clusters (K) and the average variance of the clusters.
• Minimize both values
• As the number of clusters increases, the average
variance decreases (up to the trivial case of k=n and
variance=0).
• Some criteria:
– BIC (Bayesian Informaon Criteria)
– AIC (Akaike Informaon Criteria)
– Davis‐Bouldin Index
– Confusion Matrix
Self Organizing Maps
SOM topology
SOM Prototypes
SOM Training
Compeve and Cooperave Learning
SOM Update Rule
Parameters
DM with SOM
SOM Labeling
Localizing Data
Cluster Structure
Cluster Structure ‐ 2
Component Planes
Relave Importance
How accurate is your clustering
Trajectories
Combining Models
Commiee Machines
A priori knowledge
Sleeping Experts

You might also like