0% found this document useful (0 votes)
20 views30 pages

Unit 1 ML

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views30 pages

Unit 1 ML

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

UNIT I INTRODUCTION TO MACHINE LEARNING 7

Types of Machine Learning, Supervised learning: Classification, Regression, Unsupervised


learning, Generative and Discriminative Models ,Some basic concepts in machine learning, The
Machine Learning Process, Reinforcement Learning.

Definition of machine learning


Arthur Samuel, an early American leader in the field of computer gaming and artificial intelligence,
coined the term “Machine Learning” in 1959 while at IBM. He defined machine learning as “the
field of study that gives computers the ability to learn without being explicitly programmed.”
However, there is no universally accepted definition for machine learning. Different authors define the
term differently.
Two more definitions.
1. Machine learning is programming computers to optimize a performance criterion using example
data or past experience. We have a model defined up to some parameters, and learning is the execution
of a computer program to optimize the parameters of the model using the training data or past
experience. The model may be predictive to make predictions in the future, or descriptive to gain
knowledge from data, or both
2. The field of study known as machine learning is concerned with the question of how to construct
computer programs that automatically improve with experience
Definition of learning
A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks T, as measured by P, improves with experience E.
Examples
i) Handwriting recognition learning problem
• Task T: Recognizing and classifying handwritten words within images
• Performance P: Percent of words correctly classified
• Training experience E: A dataset of handwritten words with given classifications
ii) A robot driving learning problem
• Task T: Driving on highways using vision sensors
• Performance measure P: Average distance traveled before an error
• Training experience: A sequence of images and steering commands recorded while observing a
human driver
iii) A chess learning problem
• Task T: Playing chess

CS1701 Machine Learning Department of CSE Unit I 1


• Performance measure P: Percent of games won against opponents
• Training experience E: Playing practice games against itself
A computer program which learns from experience is called a machine learning program or simply a
learning program. Such a program is sometimes also referred to as a learner.

How machines learn?


Basic components of learning process
The learning process, whether by a human or a machine, can be divided into four components, namely,
data storage, abstraction, generalization and evaluation. Figure illustrates the various components and
the steps involved in the learning process.

1. Data storage
Facilities for storing and retrieving huge amounts of data are an important component of the learning
process. Humans and computers alike utilize data storage as a foundation for advanced reasoning.
• In a human being, the data is stored in the brain and data is retrieved using electrochemical signals.
• Computers use hard disk drives, flash memory, random access memory and similar devices to store
data and use cables and other technology to retrieve data.
2. Abstraction
The second component of the learning process is known as abstraction.
Abstraction is the process of extracting knowledge about stored data. This involves creating general
concepts about the data as a whole. The creation of knowledge involves application of known models
and creation of new models.
The process of fitting a model to a dataset is known as training. When the model has been trained, the
data is transformed into an abstract form that summarizes the original information.
3. Generalization
The third component of the learning process is known as generalisation.

CS1701 Machine Learning Department of CSE Unit I 2


The term generalization describes the process of turning the knowledge about stored data into a form
that can be utilized for future action. These actions are to be carried out on tasks that are similar, but
not identical, to those what have been seen before. In generalization, the goal is to discover those
properties of the data that will be most relevant to future tasks.
4. Evaluation
Evaluation is the last component of the learning process.
It is the process of giving feedback to the user to measure the utility of the learned knowledge.
This feedback is then utilised to effect improvements in the whole learning process.

CS1701 Machine Learning Department of CSE Unit I 3


Difference between Traditional and Machine Learning Programming
● In traditional programming, we could feed the input data and a well written and tested
program into a machine to generate the output
● In machine learning input data along with the output is fed into the machine during the
learning phase and it works out a program for itself.

Understanding data
The different types and forms of data that are encountered in the machine learning process are.
Unit of observation
By a unit of observation we mean the smallest entity with measured properties of interest for a study.
Examples
• A person, an object or a thing
• A time point
• A geographic region
• A measurement
Sometimes, units of observation are combined to form units such as person-years
Examples and features
Datasets that store the units of observation and their properties can be imagined as collections of data
consisting of the following:
• Examples
 An “example” is an instance of the unit of observation for which properties have been recorded.
 An “example” is also referred to as an “instance”, or “case” or “record.” (It may be noted that
the word “example” has been used here in a technical sense.)
• Features

CS1701 Machine Learning Department of CSE Unit I 4


A “feature” is a recorded property or a characteristic of examples. It is also referred to as “attribute”,
or “variable” or “feature.”
Examples for “examples” and “features”
1. Cancer detection
Consider the problem of developing an algorithm for detecting cancer. In this study we note the
following.
(a) The units of observation are the patients.
(b) The examples are members of a sample of cancer patients.
(c) The following attributes of the patients may be chosen as the features:
 gender
 age
 blood pressure
 the findings of the pathology report after a biopsy
2. Pet selection
Suppose we want to predict the type of pet a person will choose.
(a) The units are the persons.
(b) The examples are members of a sample of persons who own pets.
(c) The features might include age, home region, family income, etc. of persons who own pets.

Figure Example for “examples” and “features” collected in a matrix format (data relates to automobiles and
their features)
Spam e-mail
Let it be required to build a learning algorithm to identify spam e-mail.
(a) The unit of observation could be an e-mail messages.
(b) The examples would be specific messages.
(c) The features might consist of the words used in the messages.

CS1701 Machine Learning Department of CSE Unit I 5


Examples and features are generally collected in a “matrix format”.
Different forms of data
1. Numeric data
If a feature represents a characteristic measured in numbers, it is called a numeric feature.
2. Categorical or nominal
A categorical feature is an attribute that can take on one of a limited, and usually fixed, number of
possible values on the basis of some qualitative property. A categorical feature is also called a nominal
feature.
3. Ordinal data
This denotes a nominal variable with categories falling in an ordered list. Examples include clothing
sizes such as small, medium, and large, or a measurement of customer satisfaction on a scale from “not
at all happy” to “very happy.”
Examples
In the data given the features “year”, “price” and “mileage” are numeric and the features “model”,
“color” and “transmission” are categorical.

1.1 TYPES OF MACHINE LEARNING

Based on the methods and way of learning, machine learning is divided into mainly four types,
which are:
1. Supervised Machine Learning
2. Unsupervised Machine Learning
3. Semi-supervised learning
4. Reinforcement Learning

CS1701 Machine Learning Department of CSE Unit I 6


1. Supervised learning
Supervised learning is the subcategory of machine learning that focuses on learning a classification
or regression model, that is, learning from labeled training data (i.e., inputs that also contain the
desired outputs or targets; basically, \examples" of what we want to predict).

Figure: Illustration of a binary classification problem (plus, minus) and two feature variable (x1 and x2). :

CS1701 Machine Learning Department of CSE Unit I 7


Figure: Illustration of a linear regression model with one feature variable (x1) and the target variable y. The
dashed-line indicates the functional form of the linear regression model.

2. Unsupervised learning
In contrast to supervised learning, unsupervised learning is a branch of machine learning that is
concerned with unlabeled data. Common tasks in unsupervised learning are clustering analysis
(assigning group memberships) and dimensionality reduction (compressing data onto a lower-
dimensional subspace or manifold).

Figure Illustration of clustering, where the dashed lines indicate potential group membership assignments of
unlabeled data points.

CS1701 Machine Learning Department of CSE Unit I 8


3. Semi-Supervised Learning
Losely speaking, semi-supervised learning can be described as a mix between supervised and unsupervised
learning. In semi-supervised learning tasks, some training examples contain outputs, but some do not. We then
use the labeled training subset to label the unlabeled portion of the training set, which we then also utilize for
model training.
 Semi-Supervised learning is a type of Machine Learning algorithm that lies between
Supervised and Unsupervised machine learning.
 It represents the intermediate ground between Supervised (With Labelled training data) and
Unsupervised learning (with no labelled training data) algorithms and uses the combination of
labelled and unlabeled datasets during the training period.
 Although Semi-supervised learning is the middle ground between supervised and unsupervised
learning and operates on the data that consists of a few labels, it mostly consists of unlabeled
data.
 As labels are costly, but for corporate purposes, they may have few labels.
 It is completely different from supervised and unsupervised learning as they are based on the
presence & absence of labels.
To overcome the drawbacks of supervised learning and unsupervised learning algorithms, the
concept of Semi-supervised learning is introduced.
 The main aim of semi-supervised learning is to effectively use all the available data, rather than
only labelled data like in supervised learning.
 Initially, similar data is clustered along with an unsupervised learning algorithm, and further,
it helps to label the unlabeled data into labelled data.
 It is because labelled data is a comparatively more expensive acquisition than unlabeled data.
 We can imagine these algorithms with an example.
 Supervised learning is where a student is under the supervision of an instructor at home and
college. Further, if that student is self-analysing the same concept without any help from the
instructor, it comes under unsupervised learning.
 Under semi-supervised learning, the student has to revise himself after analyzing the same
concept under the guidance of an instructor at college.

4. Reinforcement learning
Reinforcement is the process of learning from rewards while performing a series of actions. In
reinforcement learning, we do not tell the learner or agent, for example, a (ro)bot, which action to take
but merely assign a reward to each action and/or the overall outcome. Instead of having \correct/false"
label for each step, the learner must discover or learn a behavior that maximizes the reward for a series
CS1701 Machine Learning Department of CSE Unit I 9
of actions. In that sense, it is not a supervised setting and somewhat related to unsupervised learning;
however, reinforcement learning really is its own category of machine learning. Reinforcement
learning will not be covered further in this class.
Typical applications of reinforcement learning involve playing games (chess, Go, Atari video games)
and some form of robots, e.g., drones, warehouse robots, and more recently selfdriving cars.

Figure Illustration of reinforcement learning

1.2 SUPERVISED LEARNING

Supervised learning
● Supervised learning is a type of machine learning that uses labeled data to train machine
learning models.
Working:
 Supervised learning algorithms takes labeled inputs and map them to the known outputs, which
means you already know the target variable.
 Supervised Learning methods need external supervision to train machine learning models.
Hence, the name was supervised. They need guidance and additional information to return the
desired result.
 In supervised learning, models are trained using labelled dataset, where the model learns about
each type of data. Once the training process is completed, the model is tested on the basis of
test data (a subset of the training set), and then it predicts the output.

CS1701 Machine Learning Department of CSE Unit I 10


Figure : Supervised learning
Suppose we have a dataset of different types of shapes which includes square, rectangle, triangle, and
Polygon. Now the first step is that we need to train the model for each shape.
 If the given shape has four sides, and all the sides are equal, then it will be labelled as a Square.
 If the given shape has three sides, then it will be labelled as a triangle.
 If the given shape has six equal sides then it will be labelled as hexagon. Now, after training,
we test our model using the test set, and the task of the model is to identify the shape. The
machine is already trained on all types of shapes, and when it finds a new shape, it classifies
the shape on the bases of a number of sides, and predicts the output.
Steps Involved in Supervised Learning:
 First Determine the type of training dataset
 Collect/Gather the labelled training data.
 Split the training dataset into training dataset, test dataset, and validation dataset.
 Determine the input features of the training dataset, which should have enough knowledge so
that the model can accurately predict the output.
 Determine the suitable algorithm for the model, such as support vector machine, decision tree,
etc.
 Execute the algorithm on the training dataset. Sometimes we need validation sets as the control
parameters, which are the subset of training datasets.
 Evaluate the accuracy of the model by providing the test set. If the model predicts the correct
output, which means our model is accurate.

CS1701 Machine Learning Department of CSE Unit I 11


Types of supervised Machine learning Algorithms:
Supervised learning can be further divided into two types of problems:

a. Regression: -
Regression algorithms are used if there is a relationship between the input variable and the output
variable. It is used for the prediction of continuous variables, such as Weather forecasting, Market
Trends, etc.
Popular Regression algorithms which come under supervised learning:
 Linear Regression
 Regression Trees
 Non-Linear Regression
 Bayesian Linear Regression
 Polynomial Regression
b. Classification: -
Classification algorithms are used when the output variable is categorical, which means there are two
classes such as Yes-No, Male-Female, True-false, etc.
Popular classification algorithms which come under supervised learning:
 Random Forest
 Decision Trees
 Logistic Regression
 Support vector Machines
Advantages of Supervised Learning: -
 With the help of supervised learning, the model can predict the output on the basis of prior
experiences.
 In supervised learning, we can have an exact idea about the classes of objects.
 Supervised learning model helps us to solve various real-world problems such as fraud
detection, spam filtering, etc.

CS1701 Machine Learning Department of CSE Unit I 12


Disadvantages of supervised Learning:
 Supervised learning models are not suitable for handling the complex tasks.
 Supervised learning cannot predict the correct output if the test data is different from the
training dataset.
 Training required lots of computation times.
 In supervised learning, we need enough knowledge about the classes of object.
Algorithms:
Some of the most popularly used supervised learning algorithms are:
● Linear Regression
● Logistic Regression
● Support Vector Machine
● K Nearest Neighbor
● Decision Tree
● Random Forest
● Naive Bayes
● Neural Networks
Some applications of Supervised Learning include:
1. It classifies email spam Detection by teaching a model of what mail is spam and not spam.
2. Speech recognition where you teach a machine to recognize your voice.
3. Object Recognition by showing a machine what an object looks like and having it pick
that object from among other objects.

1.3 CLASSIFICAITON

The classification problem consists of taking input vectors and deciding which of N classes they belong
to, based on training from exemplars of each class
Example1: Credit scoring Differentiating between low-risk and high-risk customers from their
income and savings
In credit scoring , the bank calculates the risk given the amount of credit and the information about the
customer. The information about the customer includes data we have access to and is relevant in
calculating his or her financial capacity—namely, income, savings, collaterals, profession, age, past
financial history, and so forth. The bank has a record of past loans containing such customer data and
whether the loan was paid back or not. From this data of particular applications, the aim is to infer a

CS1701 Machine Learning Department of CSE Unit I 13


general rule coding the association between a customer’s attributes and his risk. That is, the machine
learning system fits a model to the past data to be able to calculate the risk for a new application and
then decides to accept or refuse it accordingly.

The most important point about the classification problem is that it is discrete—each example belongs
to precisely one class, and the set of classes covers the whole possible output space.
This is an example of a classification problem where there are two classes: low-risk and high-risk
customers. The information about a customer makes up the input to the classifier whose task is to
assign the input to one of the two classes. After training with the past data, a classification rule learned
may be of the form
IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk
Discriminant is a function that separates the examples of different classes.

Example 2: Coin classification


We train the classifier to recognise all New Zealand coins, but what if a British coin is put into the
machine? In that case, the classifier will identify it as the New Zealand coin that is closest to it in
appearance, but this is not really what is wanted: rather, the classifier should identify that it is not one
of the coins it was trained on. This is called novelty detection. For now we’ll assume that we will not
receive inputs that we cannot classify accurately.
Let’s consider how to set up a coin classifier. When the coin is pushed into the slot, the machine takes
a few measurements of it. These could include the diameter, the weight, and possibly the shape, and
are the features that will generate our input vector. In this case, our input vector will have three
elements, each of which will be a number showing the measurement of that feature (choosing a number
to represent the shape would involve an encoding, for example that 1=circle, 2=hexagon, etc.). Of
CS1701 Machine Learning Department of CSE Unit I 14
course, there are many other features that we could measure. If our vending machine included an
atomic absorption spectroscope, then we could estimate the density of the material and its composition,
or if it had a camera, we could take a photograph of the coin and feed that image into the classifier.
The question of which features to choose is not always an easy one. We don’t want to use too many
inputs, because that will make the training of the classifier take longer (and also, as the number of input
dimensions grows, the number of datapoints required increases faster; this is known as the curse of
dimensionality, but we need to make sure that we can reliably separate the classes based on those
features.

FIGURE The New Zealand coins

FIGURE Left: A set of straight line decision boundaries for a classification problem. Right: An alternative set of
decision boundaries that separate the plusses from the lightening strikes better, but requires a line that isn’t straight..

For example, if we tried to separate coins based only on colour, we wouldn’t get very far, because
the 20 ¢ and 50 ¢ coins are both silver and the $1 and $2 coins both bronze. However, if we use colour
and diameter, we can do a pretty good job of the coin classification problem for NZ coins. There are
some features that are entirely useless. For example, knowing that the coin is circular doesn’t tell us
anything about NZ coins, which are all circular . In other countries, though, it could be very useful.

CS1701 Machine Learning Department of CSE Unit I 15


The methods of performing classification that we will see during this book are very different in the
ways that they learn about the solution; in essence they aim to do the same thing: find decision
boundaries that can be used to separate out the different classes. Given the features that are used as
inputs to the classifier, we need to identify some values of those features that will enable us to decide
which class the current input is in. Figure shows a set of 2D inputs with three different classes shown,
and two different decision boundaries; on the left they are straight lines, and are therefore simple, but
don’t categorise as well as the non-linear curve on the right. Now that we have seen these two types of
problem, let’s take a look at the whole process of machine learning from the practitioner’s viewpoint.
Classification: Applications
● Face recognition: Pose, lighting, occlusion (glasses, beard), make-up, hair style
● Character recognition: Different handwriting styles.
● Speech recognition: Temporal dependency.
● Medical diagnosis: From symptoms to illnesses
● Biometrics: Recognition/authentication using physical and/or behavioral characteristics:
Face, iris, signature, etc

1.4 REGRESSION

 In machine learning, a regression problem is the problem of predicting the value of a numeric
variable based on observed values of the variable.
 The value of the output variable may be a number, such as an integer or a floating point value.
These are often quantities, such as amounts and sizes.
 The input variables may be discrete or real-valued.
Example Consider the data on car prices

Table Prices of used cars: example data for regression


CS1701 Machine Learning Department of CSE Unit I 16
Suppose we are required to estimate the price of a car aged 25 years with distance 53240 KM and
weight 1200 pounds. This is an example of a regression problem beause we have to predict the value
of the numeric variable “Price”.

General approach
Let x denote the set of input variables and y the output variable. In machine learning, the general
approach to regression is to assume a model, that is, some mathematical relation between x and y,
involving some parameters say in the following form:

The function f(x,θ ) is called the regression function. The machine learning algorithm optimizes the
parameters in the set such that the approximation error is minimized; that is, the estimates of the values
of the dependent variable y are as close as possible to the correct values given in the training set.

Regression: Applications
 Loan Default Prediction
 House Price Prediction
 Stock Market Prediction
 Market Sales Forecasting
 Advertising
.
1.5 UNSUPERVISED LEARNING

Unsupervised learning
Unsupervised learning is a type of machine learning in which models are trained using unlabeled
dataset and are allowed to act on that data without any supervision.
The goal of unsupervised learning is to find the underlying structure of dataset, group that data
according to similarities, and represent that dataset in a compressed format.
Working of Unsupervised Learning
Working of unsupervised learning can be understood by the below diagram:
Here, we have taken an unlabeled input data, which means it is not categorized and corresponding
outputs are also not given. Now, this unlabeled input data is fed to the machine learning model in order
to train it. Firstly, it will interpret the raw data to find the hidden patterns from the data and then will
apply suitable algorithms such as k-means clustering etc. Once it applies the suitable algorithm, the
algorithm divides the data objects into groups according to the similarities and difference between the
objects.

CS1701 Machine Learning Department of CSE Unit I 17


Types of Unsupervised Learning Algorithm:
The unsupervised learning algorithm can be further categorized into two types of problems:

1. Clustering: Clustering is a method of grouping the objects into clusters such that objects with most
similarities remains into a group and has less or no similarities with the objects of another group.
Cluster analysis finds the commonalities between the data objects and categorizes them as per the
presence and absence of those commonalities.
2. Association: - An association rule is an unsupervised learning method which is used for finding the
relationships between variables in the large database. It determines the set of items that occurs together
in the dataset. Association rule makes marketing strategy more effective. Such as people who buy X
item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical example of Association
rule is Market Basket Analysis.
Unsupervised Learning algorithms:
 K-means clustering
 Hierarchal clustering
 Anomaly detection
 Principle Component Analysis
 Independent Component Analysis
 Apriori algorithm
CS1701 Machine Learning Department of CSE Unit I 18
 Singular value decomposition

Advantages of Unsupervised Learning: -


 Unsupervised learning is used for more complex tasks as compared to supervised learning
because, in unsupervised learning, we don't have labeled input data.
 Unsupervised learning is preferable as it is easy to get unlabeled data in comparison to labeled
data.
Disadvantages of Unsupervised Learning: -
 Unsupervised learning is intrinsically more difficult than supervised learning as it does not have
corresponding output.
 The result of the unsupervised learning algorithm might be less accurate as input data is not
labeled, and algorithms do not know the exact output in advance.
Supervised learning Vs Unsupervised learning
In supervised learning, the aim is to learn a mapping from the input to an output whose correct values
are provided by a supervisor. In unsupervised learning, there is no such supervisor and we only have
input data.

1.6 GENERATIVE AND DISCRIMINATIVE MODELS

Machine learning models can be classified into two types: Discriminative and Generative.
 In simple words, a discriminative model makes predictions on unseen data based on
conditional probability and can be used either for classification or regression problem
statements.
 On the contrary, a generative model focuses on the distribution of a dataset to return a
probability for a given example.

They are related to known effects of causal direction, classification vs. inference learning, and
observational vs. feedback learning.
CS1701 Machine Learning Department of CSE Unit I 19
Problem Formulation
Suppose we are working on a classification problem where our task is to decide if an email is spam or
not spam based on the words present in a particular email. To solve this problem, we have a joint model
over.
● Labels: Y=y, and
● Features: X={x1, x2, …xn}
Therefore, the joint distribution of the model can be represented as
p(Y,X) = P(y,x1,x2…xn)
Now, our goal is to estimate the probability of spam email i.e., P(Y=1|X). Both generative and
discriminative models can solve this problem but in different ways.
The Approach of Generative Models
In the case of generative models, to find the conditional probability P(Y|X), they estimate the prior
probability P(Y) and likelihood probability P(X|Y) with the help of the training data and use the Bayes
Theorem to calculate the posterior probability P(Y |X):

The Approach of Discriminative Models


In the case of discriminative models, to find the probability, they directly assume some functional form
for P(Y|X) andthen estimate the parameters of P(Y|X) with the help of the training data.
1. Discriminative Models
The discriminative model refers to a class of models used in Statistical Classification, mainly used for
supervised machine learning. These types of models are also known as conditional models since they
learn the boundaries between classes or labels in a dataset.
Discriminative models focus on modeling the decision boundary between classes in a classification
problem. The goal is to learn a function that maps inputs to binary outputs, indicating the class label
of the input. Maximum likelihood estimation is often used to estimate the parameters of the
discriminative model, such as the coefficients of a logistic regression model or the weights of a neural
network.
Discriminative models (just as in the literal meaning) separate classes instead of modeling the
conditional probability and don’t make any assumptions about the data points. But these models are
not capable of generating new data points. Therefore, the ultimate objective of discriminative models
is to separate one class from another.

CS1701 Machine Learning Department of CSE Unit I 20


If we have some outliers present in the dataset, discriminative models work better compared to
generative models i.e., discriminative models are more robust to outliers. However, one major
drawback of these models is the misclassification problem, i.e., wrongly classifying a data point.

The Mathematics of Discriminative Models


Training discriminative classifiers or discriminant analysis involves estimating a function f: X -> Y,
or probability P(Y|X)
● Assume some functional form for the probability, such as P(Y|X)
● With the help of training data, we estimate the parameters of P(Y|X)
Examples of Discriminative Models
● Logistic regression
● Support vector machines(SVMs)
● Traditional neural networks
● Nearest neighbor
● Conditional Random Fields (CRFs)
● Decision Trees and Random Forest
2. Generative Models
Generative models are considered a class of statistical models that can generate new data instances.
These models are used in unsupervised machine learning as a means to perform tasks such as
● Probability and Likelihood estimation,
● Modeling data points
● To describe the phenomenon in data,
● To distinguish between classes based on these probabilities.
Since these models often rely on the Bayes theorem to find the joint probability, generative models
can tackle a more complex task than analogous discriminative models.

CS1701 Machine Learning Department of CSE Unit I 21


So, the Generative approach focuses on the distribution of individual classes in a dataset, and the
learning algorithms tend to model the underlying patterns or distribution of the data points (e.g.,
gaussian). These models use the concept of joint probability and create instances where a given feature
(x) or input and the desired output or label (y) exist simultaneously.
These models use probability estimates and likelihood to model data points and differentiate between
different class labels present in a dataset. Unlike discriminative models, these models can also generate
new data points.
However, they also have a major drawback – If there is a presence of outliers in the dataset, then it
affects these types of models to a significant extent.

The Mathematics of Generative Models


Training generative classifiers involve estimating a function f: X -> Y, or probability P(Y|X):
● Assume some functional form for the probabilities such as P(Y), P(X|Y)
● With the help of training data, we estimate the parameters of P(X|Y), P(Y)
● Use the Bayes theorem to calculate the posterior probability P(Y |X)
Examples of Generative Models
● Naïve Bayes
● Bayesian networks
● Markov random fields
● Hidden Markov Models (HMMs)
● Latent Dirichlet Allocation (LDA)
● Generative Adversarial Networks (GANs)
● Autoregressive Model

CS1701 Machine Learning Department of CSE Unit I 22


Difference between Discriminative and Generative Models
Let’s see some of the differences between the Discriminative and Generative Models.
a. Core Idea
Discriminative models draw boundaries in the data space, while generative models try to model how
data is placed throughout the space. A generative model explains how the data was generated, while a
discriminative model focuses on predicting the labels of the data.
b. Mathematical Intuition
In mathematical terms, discriminative machine learning trains a model, which is done by learning
parameters that maximize the conditional probability P(Y|X). On the other hand, a generative model
learns parameters by maximizing the joint probability of P(X, Y).
c. Applications
Discriminative models recognize existing data, i.e., discriminative modeling identifies tags and sorts
data and can be used to classify data, while Generative modeling produces something.
Since these models use different approaches to machine learning, both are suited for specific tasks i.e.,
Generative models are useful for unsupervised learning tasks. In contrast, discriminative models are
useful for supervised learning tasks. GANs(Generative adversarial networks) can be thought of as a
competition between the generator, which is a component of the generative model, and the
discriminator, so basically, it is generative vs. discriminative model.
d. Outliers
Generative models have more impact on outliers than discriminative models.
e. Computational Cost
Discriminative models are computationally cheap as compared to generative models.
Comparison between Discriminative and Generative Models
Let’s see some of the comparisons based on the following criteria between Discriminative and
Generative Models:
a. Based on Performance
Generative models need fewer data to train compared with discriminative models since generative
models are more biased as they make stronger assumptions, i.e., assumption of conditional
independence.
b. Based on Missing Data
In general, if we have missing data in our dataset, then Generative models can work with these missing
data, while discriminative models can’t. This is because, in generative models, we can still estimate
the posterior by marginalizing the unseen variables. However, discriminative models usually require
all the features X to be observed.
CS1701 Machine Learning Department of CSE Unit I 23
c. Based on the Accuracy Score
If the assumption of conditional independence violates, then at that time, generative models are less
accurate than discriminative models.
d. Based on Applications
Discriminative models are called “discriminative” since they are useful for discriminating Y’s label,
i.e., target outcome, so they can only solve classification problems. In contrast, Generative models
have more applications besides classification, such as samplings, Bayes learning, MAP inference, etc.

1.7 BASIC CONCEPTS IN MACHINE LEARNING

Terminologies used in machine learning:


 Training example: a sample from x including its output from the target function
 Target function: the mapping function f from x to f(x)
 Hypothesis: approximation of f, a candidate function.
 Concept: A boolean target function, positive examples and negative examples for the 1/0 class
values.
 Classifier: Learning program outputs a classifier that can be used to classify.
 Learner: Process that creates the classifier.
 Hypothesis space: set of possible approximations of f that the algorithm can create.
 Version space: subset of the hypothesis space that is consistent with the observed data.
1. Input representation
The general classification problem is concerned with assigning a class label to an unknown instance
from instances of known assignments of labels. In a real world problem, a given situation or an object
will have large number of features which may contribute to the assignment of the labels
2. Hypothesis
In a binary classification problem, a hypothesis is a statement or a proposition purporting to explain a
given set of facts or observations.
3. Hypothesis space
The hypothesis space for a binary classification problem is a set of hypotheses for the problem that
might possibly be returned by it.
4. Consistency and satisfying : Let x be an example in a binary classification problem and let c(x)
denote the class label assigned to x (c(x) is 1 or 0). Let D be a set of training examples for the problem.
Let h be a hypothesis for the problem and h(x) be the class label assigned to x by the hypothesis h.

CS1701 Machine Learning Department of CSE Unit I 24


(a) We say that the hypothesis h is consistent with the set of training examples D if h(x) = c(x) for all
x > D.
(b) We say that an example x satisfies the hypothesis h if h(x) = 1.
5. Ordering of hypotheses
Let X be the set of all possible examples for a binary classification problem and let h. and h.. be two
hypotheses for the problem.

Figure : Hypothesis h1. is more general than hypothesis h11.. if and only if S11.. b S1

6. Version space
Consider a binary classification problem. Let D be a set of training examples and H a hypothesis
space for the problem. The version space for the problem with respect to the set D and the space H is
the set of hypotheses from H consistent with D; that is, it is the set

7. Noise
Noise and their sources
Noise is any unwanted anomaly in the data . Noise may arise due to several factors:
1. There may be imprecision in recording the input attributes, which may shift the data points in
the input space.
2. There may be errors in labeling the data points, which may relabel positive instances as negative
and vice versa. This is sometimes called teacher noise.
3. There may be additional attributes, which we have not taken into account, that affect the label of an
instance. Such attributes may be hidden or latent in that they may be unobservable. The effect of these
neglected attributes is thus modeled as a random component and is included in “noise.”
8. Learning multiple classes
So far we have been discussing binary classification problems. In a general case there may be more
than two classes. Two methods are generally used to handle such cases. These methods are known
by the names “one-against-all" and “one-against-one”.

CS1701 Machine Learning Department of CSE Unit I 25


9. Model selection
In order to formulate a hypothesis for a problem, we have to choose some model and the term
“model selection” has been used to refer to the process of choosing a model. However, the term has
been used to indicate several things. In some contexts it has been used to indicates the process of
choosing one particular approach from among several different approaches. This may be choosing an
appropriate algorithms from a selection of possible algorithms, or choosing the sets of features to be
used for input, or choosing initial values for certain parameters. Sometimes “model selection” refers
to the process of picking a particular mathematical model from among different mathematical models
which all purport to describe the same data set. It has also been described as the process of choosing
the right inductive bias.
10. Inductive bias
In a learning problem we only have the data. But data by itself is not sufficient to find the solution. We
should make some extra assumptions to have a solution with the data we have. The set of assumptions
we make to have learning possible is called the inductive bias of the learning algorithm. One way we
introduce inductive bias is when we assume a hypothesis class.
11. Generalisation
How well a model trained on the training set predicts the right output for new instances is called
generalization.
Generalization refers to how well the concepts learned by a machine learning model apply to specific
examples not seen by the model when it was learning. The goal of a good machine learning model is
to generalize well from the training data to any data from the problem domain. This allows us to make
predictions in the future on data the model has never seen. Overfitting and underfitting are the two
biggest causes for poor performance of machine learning algorithms. The model should be selected
having the best generalisation. This is said to be the case if these problems are avoided.
 Underfitting
Underfitting is the production of a machine learning model that is not complex enough to accurately
capture relationships between a dataset â˘A ´ Zs features and a target variable.
 Overfitting
Overfitting is the production of an analysis which corresponds too closely or exactly to a particular set
of data, and may therefore fail to fit additional data or predict future observations reliably.

CS1701 Machine Learning Department of CSE Unit I 26


1.8 THE MACHINE LEARNING PROCESS

It briefly examines the process by which machine learning algorithms can be selected, applied, and
evaluated for the problem.
Steps in Machine Learning Process
1. Data Collection and Preparation
2. Feature Selection
3. Algorithm Choice
4. Parameter and Model Selection
5. Training
6. Evaluation

Data Collection and Preparation


Collect a reasonably small dataset with all of the features that you believe might be useful, and
experiment with it before choosing the best features and collect and analyse the full dataset.
Often the difficulty is that there is a large amount of data that might be relevant, but it is hard to collect,
either because it requires many measurements to be taken, or because they are in a variety of places
and formats, and merging it appropriately is difficult, as is ensuring that it is clean; that is, it does not
have significant errors, missing data, etc.
For supervised learning, target data is also needed, which can require the involvement of experts in the
relevant field and significant investments of time.
Finally, the quantity of data needs to be considered. Machine learning algorithms need significant
amounts of data, preferably without too much noise, but with increased dataset size comes increased
computational costs, and the sweet spot at which there is enough data without excessive computational
overhead is generally impossible to predict.
Feature Selection
An example of this part of the process was given in when we looked at possible features that might be
useful for coin recognition. It consists of identifying the features that are most useful for the problem
under examination. This invariably requires prior knowledge of the problem and the data; our common
sense was used in the coins example above to identify some potentially useful features and to exclude
others.
As well as the identification of features that are useful for the learner, it is also necessary that the
features can be collected without significant expense or time, and that they are robust to noise and
other corruption of the data that may arise in the collection process.
CS1701 Machine Learning Department of CSE Unit I 27
Algorithm Choice
Given the dataset, the choice of an appropriate algorithm (or algorithms) is what this book should be
able to prepare you for, in that the knowledge of the underlying principles of each algorithm and
examples of their use is precisely what is required for this.
Parameter and Model Selection
For many of the algorithms there are parameters that have to be set manually, or that require
experimentation to identify appropriate values.
Training Given the dataset, algorithm, and parameters, training should be simply the use of
computational resources in order to build a model of the data in order to predict the outputs on new
data.
Evaluation
Before a system can be deployed it needs to be tested and evaluated for accuracy on data that it was
not trained on. This can often include a comparison with human experts in the field, and the selection
of appropriate metrics for this comparison.

CS1701 Machine Learning Department of CSE Unit I 28


1.8 REINFORCEMENT LEARNING
Reinforcement learning
 Reinforcement learning works on a feedback-based process, in which an AI agent (A software
component) automatically explore its surrounding by hitting & trail, taking action, learning
from experiences, and improving its performance.
 Agent gets rewarded for each good action and gets punished for each bad action; hence the goal
of reinforcement learning agent is to maximize the rewards.
 In reinforcement learning, there is no labelled data like supervised learning, and agents learn
from their experiences only.
 The reinforcement learning process is similar to a human being; for example, a child learns
various things by experiences in his day-to-day life.
 An example of reinforcement learning is to play a game, where the Game is the environment,
moves of an agent at each step define states, and the goal of the agent is to get a high score.
 Agent receives feedback in terms of punishment and rewards. Due to its way of working,
reinforcement learning is employed in different fields such as Game theory, Operation
Research, Information theory, multi-agent systems.
 A reinforcement learning problem can be formalized using Markov Decision Process(MDP).
In MDP, the agent constantly interacts with the environment and performs actions; at each
action, the environment responds and generates a new state.
Figure shows how they are linked with each other and with the reward.
In reinforcement learning the algorithm gets feedback in the form of the reward about how well it is
doing. In contrast to supervised learning, where the algorithm is ‘taught’ the correct answer, the reward
function evaluates the current solution, but does not suggest how to improve it. We therefore need to
allow for rewards that don’t appear until long after the relevant actions have been taken. Sometimes
we think of the immediate reward and the total expected reward into the future.

FIGURE The reinforcement learning cycle: the learning agent performs action at in state st and receives reward rt+1
from the environment, ending up in state st+1.

CS1701 Machine Learning Department of CSE Unit I 29


Once the algorithm has decided on the reward, it needs to choose the action that should be performed
in the current state. This is known as the policy. This is done based on some combination of exploration
and exploitation (remember, reinforcement learning is basically a search method), which in this case
means deciding whether to take the action that gave the highest reward last time we were in this state,
or trying out a different action in the hope of finding something even better.
Advantages of Reinforcement Learning: -
 It helps in solving complex real-world problems which are difficult to be solved by general
techniques.
 The learning model of RL is similar to the learning of human beings; hence most accurate
results can be found.
 Helps in achieving long term results.
Disadvantage of Reinforcement Learning: -
 RL algorithms are not preferred for simple problems.
 RL algorithms require huge data and computations.
 Too much reinforcement learning can lead to an overload of states which can weaken the
results.
 The curse of dimensionality limits reinforcement learning for real physical systems.
Some of the important reinforcement learning algorithms are:
1. Q-learning
2. Sarsa
3. Monte Carlo
4. Deep Q network

Applications
Reinforcement learning algorithms are widely used in the gaming industries to build games. It is also
used to train robots to do human tasks.

This is where the name ‘reinforcement learning’ comes from, since you repeat actions that are
reinforced by a feeling of satisfaction.

CS1701 Machine Learning Department of CSE Unit I 30

You might also like