Machine Learning
Machine Learning
This machine learning tutorial gives you an introduction to machine learning along with
the wide range of machine learning techniques such as Supervised, Unsupervised,
and Reinforcement learning. You will learn about regression and classification models,
clustering methods, hidden Markov models, and various sequential models.
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
1) Supervised Learning
Supervised learning is a type of machine learning method in which we provide sample
labeled data to the machine learning system in order to train it, and on that basis, it
predicts the output.
The system creates a model using labeled data to understand the datasets and learn
about each data, once the training and processing are done then we test the model by
providing a sample data to check whether it is predicting the exact output or not.
The goal of supervised learning is to map input data with the output data. The
supervised learning is based on supervision, and it is the same as when a student learns
things in the supervision of the teacher. The example of supervised learning is spam
filtering.
o Classification
o Regression
2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any
supervision.
The training is provided to the machine with the set of data that has not been labeled,
classified, or categorized, and the algorithm needs to act on that data without any
supervision. The goal of unsupervised learning is to restructure the input data into new
features or a group of objects with similar patterns.
o Clustering
o Association
3) Reinforcement Learning
Reinforcement learning is a feedback-based learning method, in which a learning agent
gets a reward for each right action and gets a penalty for each wrong action. The agent
learns automatically with these feedbacks and improves its performance. In
reinforcement learning, the agent interacts with the environment and explores it. The
goal of an agent is to get the most reward points, and hence, it improves its
performance.
The robotic dog, which automatically learns the movement of his arms, is an example of
Reinforcement learning.
What is Bias?
In general, a machine learning model analyses the data, find patterns in it and make
predictions. While training, the model learns these patterns in the dataset and applies
them to test data for prediction. While making predictions, a difference occurs
between prediction values made by the model and actual values/expected
values, and this difference is known as bias errors or Errors due to bias. It can be
defined as an inability of machine learning algorithms such as Linear Regression to
capture the true relationship between the data points. Each algorithm begins with some
amount of bias because bias occurs from assumptions in the model, which makes the
target function simple to learn. A model has either:
o Low Bias: A low bias model will make fewer assumptions about the form of the target
function.
o High Bias: A model with a high bias makes more assumptions, and the model becomes
unable to capture the important features of our dataset. A high bias model also cannot
perform well on new data.
Generally, a linear algorithm has a high bias, as it makes them learn fast. The simpler the
algorithm, the higher the bias it has likely to be introduced. Whereas a nonlinear
algorithm often has low bias.
Some examples of machine learning algorithms with low bias are Decision Trees, k-
Nearest Neighbours and Support Vector Machines. At the same time, an algorithm
with high bias is Linear Regression, Linear Discriminant Analysis and Logistic
Regression.
Low variance means there is a small variation in the prediction of the target function
with changes in the training data set. At the same time, High variance shows a large
variation in the prediction of the target function with changes in the training dataset.
A model that shows high variance learns a lot and perform well with the training
dataset, and does not generalize well with the unseen dataset. As a result, such a model
gives good results with the training dataset but shows high error rates on the test
dataset.
Since, with high variance, the model learns too much from the dataset, it leads to
overfitting of the model. A model with high variance has the below problems:
Usually, nonlinear algorithms have a lot of flexibility to fit the model, have high variance.
Some examples of machine learning algorithms with low variance are, Linear
Regression, Logistic Regression, and Linear discriminant analysis. At the same time,
algorithms with high variance are decision tree, Support Vector Machine, and K-
nearest neighbours.
Linear Regression:
o Linear Regression is one of the most simple Machine learning algorithm that comes
under Supervised Learning technique and used for solving regression problems.
o It is used for predicting the continuous dependent variable with the help of independent
variables.
o The goal of the Linear regression is to find the best fit line that can accurately predict the
output for the continuous dependent variable.
o If single independent variable is used for prediction then it is called Simple Linear
Regression and if there are more than two independent variables then such regression is
called as Multiple Linear Regression.
o By finding the best fit line, algorithm establish the relationship between dependent
variable and independent variable. And the relationship should be of linear nature.
o The output for Linear regression should only be the continuous values such as price, age,
salary, etc. The relationship between the dependent variable and independent variable
can be shown in below image:
In above image the dependent variable is on Y-axis (salary) and independent variable is
on x-axis(experience). The regression line can be written as:
y= a0+a1x+ ε
Logistic Regression:
o Logistic regression is one of the most popular Machine learning algorithm that comes
under Supervised Learning techniques.
o It can be used for Classification as well as for Regression problems, but mainly used for
Classification problems.
o Logistic regression is used to predict the categorical dependent variable with the help of
independent variables.
o The output of Logistic Regression problem can be only between the 0 and 1.
o Logistic regression can be used where the probabilities between two classes is required.
Such as whether it will rain today or not, either 0 or 1, true or false etc.
o Logistic regression is based on the concept of Maximum Likelihood estimation.
According to this estimation, the observed data should be most probable.
o In logistic regression, we pass the weighted sum of inputs through an activation function
that can map values in between 0 and 1. Such activation function is known as sigmoid
function and the curve obtained is called as sigmoid curve or S-curve. Consider the
below image:
Linear regression is used to predict the Logistic Regression is used to predict the
continuous dependent variable using a given set categorical dependent variable using a given set
of independent variables. of independent variables.
Linear Regression is used for solving Regression Logistic regression is used for solving
problem. Classification problems.
In Linear regression, we predict the value of In logistic Regression, we predict the values of
continuous variables. categorical variables.
In linear regression, we find the best fit line, by In Logistic Regression, we find the S-curve by
which we can easily predict the output. which we can classify the samples.
Least square estimation method is used for Maximum likelihood estimation method is used
estimation of accuracy. for estimation of accuracy.
The output for Linear Regression must be a The output of Logistic Regression must be a
continuous value, such as price, age, etc. Categorical value such as 0 or 1, Yes or No, etc.
In Linear regression, it is required that relationship In Logistic regression, it is not required to have
between dependent variable and independent the linear relationship between the dependent
variable must be linear. and independent variable.
In linear regression, there may be collinearity In logistic regression, there should not be
between the independent variables. collinearity between the independent variable.