Module 4
Module 4
Unsupervised Learning
Bayesian learning
Unsupervised Learning
Unsupervised learning is a type of machine learning in which
models are trained using unlabeled dataset and are allowed
to act on that data without any supervision.
Hierarchical methods
Density-based methods
Unsupervised Learning algorithms:
K-means clustering
KNN (k-nearest neighbors)
Hierarchal clustering
Anomaly detection
Neural Networks
Principle Component Analysis
Independent Component Analysis
Apriori algorithm
Singular value decomposition
Advantages of Unsupervised Learning
Unsupervised learning is used for more complex tasks as compared to
supervised learning because, in unsupervised learning, we don't have labeled
input data.
Unsupervised learning is preferable as it is easy to get unlabeled data in
comparison to labeled data.
It does this until all the clusters are merged into a single cluster that contains
all the datasets.
Dendrogram
Dendrogram
Agglomerative Clustering Algorithm
Starting Situation
Intermediate Situation
Intermediate Situation
After Merging
How to Define Inter-Cluster Distance
MIN or Single Link
MAX or Complete Link
Group Average or Average Link
Cluster Distance Measures
Example
Elbow method
Factors Affecting K-Means Results
Kmeans
Input Data
two circles, 2 clusters (K−means)
5
4.5 4.5
4 4
3.5
3.5
3
3
2.5
2.5
2
2
1.5
1.5
1
1
0.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
K-means Clustering
P(h|D) increases with P(h) and with P(D|h) according to Bayes theorem.
P(h|D) decreases as P(D) increases, because the more probable it is that D
will be observed independent of h, the less evidence D provides in support
of h.
Example
Consider a medical diagnosis problem in which there are two alternative
hypotheses
The patient has a particular form of cancer (denoted by cancer)
The patient does not (denoted by ¬ cancer)
The available data is from a particular laboratory with two possible outcomes: +
(positive) and - (negative)
A patient takes a lab test and the result comes back positive, The test returns a
correct positive result in only 98% of the cases in which the disease is actually
present, and a correct negative result in only 97% of the cases in which the disease is
not present. Furthermore, .008 of the entire population have this cancer.
Suppose a new patient is observed for whom the lab test returns a
positive (+) result.
Should we diagnose the patient as having cancer or not?
0.0078
P (cancer | ) 0.21
0.0078 0.0298
0.0298
P (cancer | ) 0.79
0.0078 0.0298
BAYES THEOREM AND CONCEPT LEARNING
MAP Hypotheses and Consistent Learners
A learning algorithm is a consistent learner if it outputs a hypothesis that
commits zero errors over the training examples.
Every consistent learner outputs a MAP hypothesis, if we assume a
uniform prior probability distribution over H (P(hi) = P(hj) for all i, j), and
deterministic, noise free training data (P(D|h) =1 if D and h are consistent,
and 0 otherwise).
Example:
FIND-S outputs a consistent hypothesis, it will output a MAP hypothesis
under the probability distributions P(h) and P(D|h) defined above.
Are there other probability distributions for P(h) and P(D|h) under which
FIND-S outputs MAP hypotheses? Yes.
Because FIND-S outputs a maximally specific hypothesis from the version
space, its output hypothesis will be a MAP hypothesis relative to any prior
probability distribution that favours more specific hypotheses.
Naive Bayes Classifier
Along with decision trees, neural networks, one of the most practical
learning methods.
When to use
–Moderate or large training set available
–Attributes that describe instances are conditionally independent given
classification
Successful applications:
–Diagnosis
–Classifying text documents
Bayesian Belief Network
EM for Estimating k Means