Machine Learning
Machine Learning
It is very difficult to cater to all the decisions based on all possible inputs.
To tackle this problem, algorithms are developed. These algorithms build
knowledge from specific data and past experience with the principles of
statistics, probability theory, logic, combinatorial optimization, search,
reinforcement learning, and control theory.
The developed algorithms form the basis of various applications such as:
Vision processing
Language processing
Pattern recognition
Games
Data mining
Expert systems
Robotics
Machine learning is a vast area and it is quite beyond the scope of this
tutorial to cover all its features. There are several ways to implement
machine learning techniques, however the most commonly used ones
are supervisedand unsupervised learning.
Supervised Learning
Supervised learning deals with learning a function from available training
data. A supervised learning algorithm analyzes the training data and
produces an inferred function, which can be used for mapping new
examples. Common examples of supervised learning include:
voice recognition.
There are many supervised learning algorithms such as neural networks,
Support Vector Machines (SVMs), and Naive Bayes classifiers. Mahout
implements Naive Bayes classifier.
Unsupervised Learning
Unsupervised learning makes sense of unlabelled data without having
any predefined dataset for its training. Unsupervised learning is an
extremely powerful tool for analysing available data and look for patterns
and trends. It is most commonly used for clustering similar input into
logical groups. Common approaches to unsupervised learning include:
k-means
hierarchical clustering
Recommendation
Recommendation is a popular technique that provides close
recommendations based on user information such as previous purchases,
clicks, and ratings.
Amazon uses this technique to display a list of recommended items that you
might be interested in, drawing information from your past actions. There is
recommender engines that work behind Amazon to capture user behaviour
and recommend selected items based on your earlier actions.
Classification
Classification, also known as categorization, is a machine learning
technique that uses known data to determine how the new data should
be classified into a set of existing categories. Classification is a form of
supervised learning.
Mail service providers such as Yahoo! and Gmail use this technique to decide
whether a new mail should be classified as a spam. The categorization
algorithm trains itself by analyzing user habits of marking certain mails as
spams. Based on that, the classifier decides whether a future mail should be
deposited in your inbox or in the spams folder.
iTunes application uses classification to prepare playlists.
Clustering
Clustering is used to form groups or clusters of similar data based on
common characteristics. Clustering is a form of unsupervised learning.
The clustering engine goes through the input data completely and based
on the characteristics of the data, it will decide under which cluster it
should be grouped. Take a look at the following example.