Week 2 Characterization of Learning Problems: Nptel Video Course On Machine Learning
Week 2 Characterization of Learning Problems: Nptel Video Course On Machine Learning
Week 2 Characterization of
Learning Problems
In Unsupervised learning, input data are NOT classified, i.e. contains only
input data and lack concept labels. Unsupervised learning algorithms
therefore have to identify commonalities and structures in the data-set and to
group the input based on similarity. Unsupervised learning algorithms have
to decide on a optimal portfolio of concepts that best matches the data-set and
arrange groupings of subsets of the data-set so that it matches the portfolio of
concepts.
Basic Distinctions:
Off-line (Batch) vs On-line (Incremental) learning
This distinction is relevant for supervised as well as for un-supervised
learning.
Offline learning refers to situations where the system is not operating in a real time
environment but handles pre-harvested data in static and complete batch form.
Most traditional machine learning algorithms are well adapted to off-line learning
and the parallell access to the whole data-set gives full flexibility of using data-
items in all kinds of variations during the learning process.
Concept
Conc 1
Scenario 2 The presence of Noise
Learning a single concept off-line from
pre-classified positive examples
Noise
Label 1
Noise is a fundamental underlying
phenomenon that is present in all datasets.
Noise is a distortion in data, that is unwanted
by the perceiver.
Concept
Conc 1 Concept 2 Concept 3
Scenario 11 Unsupervised concept learning
Learning multiple concepts from
unsorted examples
All aspects introduced in scenario 1-10
are still relevant to consider.
Generalization Generalization
• Feature engineering
• Algorithm selection