Task The Problems That Can Be Solved With Machine Learning
Task The Problems That Can Be Solved With Machine Learning
Introduction
1.1 What Is Machine Learning?
Machine learning is programming computers to optimize a performance criterion using example
data or past experience. We have a model defined up to some parameters, and learning is the execution
of a computer program to optimize the parameters of the model using the training data or past
experience. The model may be predictive to make predictions in the future, or descriptive to gain
knowledge from data, or both.
Arthur Samuel, an early American leader in the field of computer gaming and artificial intelligence,
coined the term “Machine Learning” in 1959 while at IBM.
He defined machine learning as “the field of study that gives computers the ability to learn without being
explicitly programmed.”
However, there is no universally accepted definition for machine learning. Different authors define the
term differently.
Definition of learning:
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure
P, if its performance at tasks T, as measured by P, improves with experience E. Examples
i) Handwriting recognition learning problem
• Task T: Recognising and classifying handwritten words within images
• Performance P: Percent of words correctly classified
• Training experience E: A dataset of handwritten words with given classifications
ii) A robot driving learning problem
• Task T: Driving on highways using vision sensors
• Performance measure P: Average distance traveled before an error
• training experience: A sequence of images and steering commands recorded while observing a human driver
iii) A chess learning problem
• Task T: Playing chess
• Performance measure P: Percent of games won against opponents
• Training experience E: Playing practice games against itself Definition A computer program which learns from experience is
called a machine learning program or simply a learning program. Such a program is sometimes also referred to as a learner
Task: The problems that can be solved with machine learning
A machine learning task is the type of prediction or inference being made, based on the problem or
question that is being asked, and the available data.
For example, the classification task assigns data to categories, and the clustering task groups data according
to similarity.
There are the different machine learning tasks
• Binary classification
• Multiclass classification
• Regression Analysis
• Clustering
• Association Rule
Spam e-mail recognition was described in the Prologue. It constitutes a binary classification
task, which is easily the most common task in machine learning which figures heavily
throughout the book. One obvious variation is to consider classification problems with more
than two classes.
For instance, we may want to distinguish different
kinds of ham e-mails, e.g., work-related e-mails and private messages. We could approach
this as a combination of two binary classification tasks: the first task is to distinguish between
spam and ham, and the second task is, among ham e-mails, to distinguish between work related and private
ones.
Binary classification
A supervised machine learning task that is used to predict which of two classes (categories) an instance of
data belongs to. The input of a classification algorithm is a set of labeled examples, where each label is an
integer of either 0 or 1. The output of a binary classification algorithm is a classifier, which you can use to
predict the class of new unlabeled instances.
Examples of binary classification scenarios include:
•Understanding sentiment of Twitter comments as either "positive" or "negative".
•Diagnosing whether a patient has a certain disease or not.
•Making a decision to mark an email as "spam" or not.
•Determining if a photo contains a particular item or not, such as a dog or fruit.
Multiclass classification
A supervised machine learning task that is used to predict the class (category) of an instance of
data. The input of a classification algorithm is a set of labeled examples. Each label normally
starts as text. It is then run through the Term Transform, which converts it to the Key (numeric)
type. The output of a classification algorithm is a classifier, which you can use to predict the
class of new unlabeled instances.
Examples of multi-class classification scenarios include:
•Categorizing flights as "early", "on time", or "late".
•Understanding movie reviews as "positive", "neutral", or "negative".
•Categorizing hotel reviews as "location", "price", "cleanliness", etc.
Regression Analysis:
Regression analysis is a set of statistical processes for estimating the relationships between a dependent variable
(often called the 'outcome variable') and one or more independent variables (often called 'predictors', 'covariates',
or 'features').
A supervised machine learning task that is used to predict the value of the label from a set of related features.
The label can be of any real value and is not from a finite set of values as in classification tasks. Regression
algorithms model the dependency of the label on its related features to determine how the label will change
as the values of the features are varied. The input of a regression algorithm is a set of examples with labels of
known values. The output of a regression algorithm is a function, which you can use to predict the label value
for any new set of input features.
Examples of regression scenarios include:
•Predicting house prices based on house attributes such as number of bedrooms, location, or size.
•Predicting future stock prices based on historical data and current market trends.
•Predicting sales of a product based on advertising budgets.
Clustering
An unsupervised machine learning task that is used to group instances of data into clusters that contain similar
characteristics. Clustering can also be used to identify relationships in a dataset that you might not logically
derive by browsing or simple observation. The inputs and outputs of a clustering algorithm depends on the
methodology chosen. You can take a distribution, centroid, connectivity, or density-based approach. Examples of
clustering scenarios include:
•Understanding segments of hotel guests based on habits and characteristics of hotel choices.
•Identifying customer segments and demographics to help build targeted advertising campaigns.
•Categorizing inventory based on manufacturing metrics.
•Association rule learning is a rule-based machine learning method for discovering interesting relations between
variables in large databases. It is intended to identify strong rules discovered in databases using some measures of
interestingness