The Ingredients of Machine Learning
The Ingredients of Machine Learning
Machine learning is the systematic study of algorithms and systems that improve their
knowledge or performance with experience
Machine learning is concerned with using the right features to build the right models that
achieve the right tasks.
1. Tasks:
2. Models
3. Features:
Note: Models lend the machine learning field diversity, but tasks and features give it unity
Tasks: The problems that can be solved with machine learning
Tasks for machine learning
The most common machine learning tasks are predictive, in the sense that they concern
predicting a target variable from features. .
Binary and multi-class classification: categorical target
Regression: numerical target
Clustering: hidden target
Descriptive tasks are concerned with exploiting underlying structure in the data
Measuring similarity
If our e-mails are described by word-occurrence features as in the text classification example,
the similarity of e-mails would be measured in terms of the words they have in common.
For instance, we could take the number of common words in two e-mails and divide it by the
number of words occurring in either e-mail (this measure is called the Jaccard coefficient).
Suppose that one e-mail contains 42 (different) words and another contains 112words, and
the two e-mails have 23 words in common, then their similarity would
We can then cluster our e-mails into groups, such that the average similarity of an e-mail to
the other e-mails in its group is much larger than the average similarity to e-mails from other
groups.
Imagine these represent ratings by six different people (in rows), on a scale of 0 to 3, of four
different films – say The Shawshank Redemption, The Usual Suspects, The Godfather, The Big
Lebowski, (in columns, from left to right). The Godfather seems to be the most popular of the
four with an average rating of 1.5,
and The Shawshank Redemption is the least appreciated with an average rating of 0.5.