0% found this document useful (0 votes)
3 views18 pages

Unit 3 Study Material

The document provides an overview of machine learning, focusing on supervised learning, its models, and applications. It explains the importance of machine learning in various fields, including self-driving cars, fraud detection, and recommendation systems. Additionally, it outlines the history of machine learning and its classification into supervised, unsupervised, and reinforcement learning methods.

Uploaded by

kavindharan40828
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views18 pages

Unit 3 Study Material

The document provides an overview of machine learning, focusing on supervised learning, its models, and applications. It explains the importance of machine learning in various fields, including self-driving cars, fraud detection, and recommendation systems. Additionally, it outlines the history of machine learning and its classification into supervised, unsupervised, and reinforcement learning methods.

Uploaded by

kavindharan40828
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

CS3491-Artificial Intelligence and Machine Learning

UNIT III SUPERVISED LEARNING 9 How does Machine Learning work?


Introduction to machine learning – Linear Regression Models: Least A Machine Learning system learns from historical data, builds the prediction
squares, single & multiple variables, Bayesian linear regression, gradient descent, models, and whenever it receives new data, predicts the output for it.
Linear Classification Models: Discriminant function – Probabilistic discriminative  The accuracy of predicted output depends upon the amount of data, as the huge
amount of data helps to build a better model which predicts the output more
model - Logistic regression, Probabilistic generative model – Naive Bayes,
accurately.
Maximum margin classifier – Support vector machine, Decision Tree, Random  Suppose we have a complex problem, where we need to perform some
forests. predictions, so instead of writing a code for it, we just need to feed the data to
generic algorithms, and with the help of these algorithms, machine builds the
At the end of the Course Students will be able to logic as per the data and predict the output.
CO3: Build supervised learning models  Machine learning has changed our way of thinking about the problem. The
below block diagram explains the working of Machine Learning algorithm:
1. Introduction to machine learning
What is Machine Learning?
Machine Learning is said as a subset of artificial intelligence that is mainly
concerned with the development of algorithms which allow a computer to learn
from the data and past experiences on their own.
 The term machine learning was first introduced by Arthur Samuel in 1959. We
can define it in a summarized way as: Features of Machine Learning:
Machine learning enables a machine to automatically learn from data, improve 1. Machine learning uses data to detect various patterns in a given dataset.
performance from experiences, and predict things without being explicitly 2. It can learn from past data and improve automatically.
programmed. 3. It is a data-driven technology.
4. Machine learning is much similar to data mining as it also deals with the
 With the help of sample historical data, which is known as training data, huge amount of the data.
machine learning algorithms build a mathematical model that helps in Need for Machine Learning
making predictions or decisions without being explicitly programmed.  The need for machine learning is increasing day by day. The reason behind the
 Machine learning brings computer science and statistics together for creating need for machine learning is that it is capable of doing tasks that are too
predictive models. complex for a person to implement directly.
 Machine learning constructs or uses the algorithms that learn from historical  As a human, we have some limitations as we cannot access the huge amount of
data. The more we will provide the information, the higher will be the data manually, so for this, we need some computer systems and here comes the
performance. machine learning to make things easy for us.
A machine has the ability to learn if it can improve its performance by gaining
more data.

Department Of CSE Anjalai Ammal- Mahalingam Engineering College -614 403 Page No 1
CS3491-Artificial Intelligence and Machine Learning

 We can train machine learning algorithms by providing them the huge amount student learns things in the supervision of the teacher. The example of
of data and let them explore the data, construct the models, and predict the supervised learning is spam filtering.
required output automatically. Supervised learning can be grouped further in two categories of algorithms:
 The performance of the machine learning algorithm depends on the amount of 1. Classification
data, and it can be determined by the cost function. With the help of machine 2. Regression
learning, we can save both time and money. 2) Unsupervised Learning
The importance of machine learning can be easily understood by its uses cases, Unsupervised learning is a learning method in which a machine learns without any
currently, machine learning is used in self-driving cars, cyber fraud supervision.
detection, face recognition, and friend suggestion by Facebook, etc.  The training is provided to the machine with the set of data that has not been
 Various top companies such as Netflix and Amazon have build machine labeled, classified, or categorized, and the algorithm needs to act on that data
learning models that are using a vast amount of data to analyze the user without any supervision.
interest and recommend product accordingly.  The goal of unsupervised learning is to restructure the input data into new
Following are some key points which show the importance of Machine features or a group of objects with similar patterns.
Learning: In unsupervised learning, we don't have a predetermined result. The machine tries to
1. Rapid increment in the production of data find useful insights from the huge amount of data. It can be further classifieds into
2. Solving complex problems, which are difficult for a human two categories of algorithms:
3. Decision making in various sector including finance 1. Clustering
4. Finding hidden patterns and extracting useful information from data. 2. Association
Classification of Machine Learning 3) Reinforcement Learning
At a broad level, machine learning can be classified into three types:  Reinforcement learning is a feedback-based learning method, in which a
1. Supervised learning learning agent gets a reward for each right action and gets a penalty for each
2. Unsupervised learning wrong action.
3. Reinforcement learning  The agent learns automatically with these feedbacks and improves its
1) Supervised Learning performance. In reinforcement learning, the agent interacts with the
Supervised learning is a type of machine learning method in which we provide environment and explores it.
sample labeled data to the machine learning system in order to train it, and on that  The goal of an agent is to get the most reward points, and hence, it improves its
basis, it predicts the output. performance.
 The system creates a model using labeled data to understand the datasets and The robotic dog, which automatically learns the movement of his arms, is an
learn about each data, once the training and processing are done then we test example of Reinforcement learning.
the model by providing a sample data to check whether it is predicting the History of Machine Learning
exact output or not.  Before some years (about 40-50 years), machine learning was science fiction,
 The goal of supervised learning is to map input data with the output data. The but today it is the part of our daily life. Machine learning is making our day
supervised learning is based on supervision, and it is the same as when a to day life easy from self-driving cars to Amazon virtual assistant
"Alexa".
Department Of CSE Anjalai Ammal- Mahalingam Engineering College -614 403 Page No 2
CS3491-Artificial Intelligence and Machine Learning

 However, the idea behind machine learning is so old and has a long history. o 1950: In 1950, Alan Turing published a seminal paper, "Computer
Below some milestones are given which have occurred in the history of Machinery and Intelligence," on the topic of artificial intelligence. In his
machine learning: paper, he asked, "Can machines think?"
Machine intelligence in Games:
o 1952: Arthur Samuel, who was the pioneer of machine learning, created a
program that helped an IBM computer to play a checkers game. It
performed better more it played.
o 1959: In 1959, the term "Machine Learning" was first coined by Arthur
Samuel.
The first "AI" winter:
o The duration of 1974 to 1980 was the tough time for AI and ML
researchers, and this duration was called as AI winter.
o In this duration, failure of machine translation occurred, and people had
reduced their interest from AI, which led to reduced funding by the
government to the researches.
Machine learning from theory to reality
o 1959: In 1959, the first neural network was applied to a real-world problem
The early history of Machine Learning (Pre-1940): to remove echoes over phone lines using an adaptive filter.
 1834: In 1834, Charles Babbage, the father of the computer, conceived a device o 1985: In 1985, Terry Sejnowski and Charles Rosenberg invented a neural
that could be programmed with punch cards. However, the machine was never network NETtalk, which was able to teach itself how to correctly
built, but all modern computers rely on its logical structure. pronounce 20,000 words in one week.
 1936: In 1936, Alan Turing gave a theory that how a machine can determine o 1997: The IBM's Deep blue intelligent computer won the chess game
and execute a set of instructions. against the chess expert Garry Kasparov, and it became the first computer
which had beaten a human chess expert.
The era of stored program computers: Machine learning at 21st century
 1940: In 1940, the first manually operated computer, "ENIAC" was invented, o 2006: In the year 2006, computer scientist Geoffrey Hinton has given a new
which was the first electronic general-purpose computer. After that stored name to neural net research as "deep learning," and nowadays, it has
program computer such as EDSAC in 1949 and EDVAC in 1951 were invented. become one of the most trending technologies.
 1943: In 1943, a human neural network was modeled with an electrical circuit. o 2012: In 2012, Google created a deep neural network which learned to
In 1950, the scientists started applying their idea to work and analyzed how recognize the image of humans and cats in YouTube videos.
human neurons might work. o 2014: In 2014, the Chabot "Eugen Goostman" cleared the Turing Test. It
Computer machinery and intelligence: was the first Chabot who convinced the 33% of human judges that it was
not a machine.

Department Of CSE Anjalai Ammal- Mahalingam Engineering College -614 403 Page No 3
CS3491-Artificial Intelligence and Machine Learning

o 2014: DeepFace was a deep neural network created by Facebook, and they
claimed that it could recognize a person with the same precision as a human
can do.
o 2016: Alpha Go beat the world's number second player Lee sedol at Go
game. In 2017 it beat the number one player of this game KeJie.
o 2017: In 2017, the Alphabet's Jigsaw team built an intelligent system that
was able to learn the online trolling. It used to read millions of comments
of different websites to learn to stop online trolling.
Machine Learning at present:
 Now machine learning has got a great advancement in its research, and it is
present everywhere around us, such as self-driving cars, Amazon
Alexa, Catboats, recommender system, and many more.
 It includes Supervised, unsupervised, and reinforcement learning with
clustering, classification, decision tree, SVM algorithms, etc.
 Modern machine learning models can be used for making various predictions, 1. Image Recognition:
including weather prediction, disease prediction, stock market analysis, etc. Image recognition is one of the most common applications of machine learning. It is
Applications of Machine learning used to identify objects, persons, places, digital images, etc. The popular use case of
image recognition and face detection is, Automatic friend tagging suggestion:
 Machine learning is a buzzword for today's technology, and it is growing
 Facebook provides us a feature of auto friend tagging suggestion. Whenever we
very rapidly day by day.
upload a photo with our Facebook friends, then we automatically get a tagging
 We are using machine learning in our daily life even without knowing it
suggestion with name, and the technology behind this is machine learning's face
such as Google Maps, Google assistant, Alexa, etc. Below are some most trending
detection and recognition algorithm.
real-world applications of Machine Learning:
 It is based on the Facebook project named "Deep Face," which is responsible
for face recognition and person identification in the picture.
2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.
 Speech recognition is a process of converting voice instructions into text, and it
is also known as "Speech to text", or "Computer speech recognition."
 At present, machine learning algorithms are widely used by various
applications of speech recognition. Google assistant, Siri, Cortana,
and Alexa are using speech recognition technology to follow the voice
instructions.
3. Traffic prediction:
Department Of CSE Anjalai Ammal- Mahalingam Engineering College -614 403 Page No 4
CS3491-Artificial Intelligence and Machine Learning

If we want to visit a new place, we take help of Google Maps, which shows us the 3. General blacklists filter
correct path with the shortest route and predicts the traffic conditions. 4. Rules-based filters
 It predicts the traffic conditions such as whether traffic is cleared, slow- 5. Permission filters
moving, or heavily congested with the help of two ways: Some machine learning algorithms such as Multi-Layer Perceptron, Decision
1. Real Time location of the vehicle form Google Map app and sensors tree, and Naïve Bayes classifier are used for email spam filtering and malware
2. Average time has taken on past days at the same time. detection.
Everyone who is using Google Map is helping this app to make it better. It takes 7. Virtual Personal Assistant:
information from the user and sends back to its database to improve the We have various virtual personal assistants such as Google
performance. assistant, Alexa, Cortana, Siri.
4. Product recommendations:  As the name suggests, they help us in finding the information using our voice
Machine learning is widely used by various e-commerce and entertainment instruction.
companies such as Amazon, Netflix, etc., for product recommendation to the user.  These assistants can help us in various ways just by our voice instructions such
 Whenever we search for some product on Amazon, then we started getting an as Play music, call someone, Open an email, Scheduling an appointment, etc.
advertisement for the same product while internet surfing on the same browser  These virtual assistants use machine learning algorithms as an important part
and this is because of machine learning.  These assistant record our voice instructions, send it over the server on a cloud,
 Google understands the user interest using various machine learning and decode it using ML algorithms and act accordingly.
algorithms and suggests the product as per customer interest. 8. Online Fraud Detection:
 As similar, when we use Netflix, we find some recommendations for  Machine learning is making our online transaction safe and secure by
entertainment series, movies, etc., and this is also done with the help of detecting fraud transaction. Whenever we perform some online transaction,
machine learning. there may be various ways that a fraudulent transaction can take place such
5. Self-driving cars: as fake accounts, fake ids, and steal money in the middle of a transaction.
One of the most exciting applications of machine learning is self-driving cars.  So to detect this, Feed Forward Neural network helps us by checking
Machine learning plays a significant role in self-driving cars. whether it is a genuine transaction or a fraud transaction.
 Tesla, the most popular car manufacturing company is working on self- For each genuine transaction, the output is converted into some hash values, and
driving car. these values become the input for the next round. For each genuine transaction,
 It is using unsupervised learning method to train the car models to detect there is a specific pattern which gets change for the fraud transaction hence, it
people and objects while driving. detects it and makes our online transactions more secure.
6. Email Spam and Malware Filtering: 9. Stock Market trading:
Whenever we receive a new email, it is filtered automatically as important, normal, Machine learning is widely used in stock market trading. In the stock market, there
and spam. We always receive an important mail in our inbox with the important is always a risk of up and downs in shares, so for this machine learning's long short
symbol and spam emails in our spam box, and the technology behind this is term memory neural network is used for the prediction of stock market trends.
Machine learning. Below are some spam filters used by Gmail: 10. Medical Diagnosis:
1. Content Filter
2. Header filter
Department Of CSE Anjalai Ammal- Mahalingam Engineering College -614 403 Page No 5
CS3491-Artificial Intelligence and Machine Learning

In medical science, machine learning is used for diseases diagnoses. With this, system which can perform various create machines that can perform
medical technology is growing very fast and able to build 3D models that can complex tasks. only those specific tasks for which
predict the exact position of lesions in the brain. they are trained.
It helps in finding brain tumors and other brain-related diseases easily. AI system is concerned about Machine learning is mainly
11. Automatic Language Translation: maximizing the chances of success. concerned about accuracy and
 Nowadays, if we visit a new place and we are not aware of the language then it patterns.
is not a problem at all, as for this also machine learning helps us by converting The main applications of AI are Siri, The main applications of machine
the text into our known languages. customer support using catboats, learning are Online recommender
 Google's GNMT (Google Neural Machine Translation) provide this feature, Expert System, Online game playing, system, Google search
which is a Neural Machine Learning that translates the text into our familiar intelligent humanoid robot, etc. algorithms, Facebook auto friend
language, and it called as automatic translation. tagging suggestions, etc.
 The technology behind the automatic translation is a sequence to sequence On the basis of capabilities, AI can be Machine learning can also be divided
learning algorithm, which is used with image recognition and translates the text divided into three types, which into mainly three types that
from one language to another language. are, Weak AI, General AI, are Supervised
and Strong AI. learning, Unsupervised learning,
and Reinforcement learning.
Artificial Intelligence Machine learning It includes learning, reasoning, and It includes learning and self-
Artificial intelligence is a technology Machine learning is a subset of AI self-correction. correction when introduced with new
which enables a machine to simulate which allows a machine to data.
human behavior. automatically learn from past data AI completely deals with Structured, Machine learning deals with
without programming explicitly. semi-structured, and unstructured Structured and semi-structured data.
The goal of AI is to make a smart The goal of ML is to allow data.
computer system like humans to solve machines to learn from data so
complex problems. that they can give accurate
output. 1. Supervised Machine Learning
In AI, we make intelligent systems to In ML, we teach machines with data
A training set of examples with the correct responses (targets) is provided and,
perform any task like a human. to perform a particular task and give
based on this
an accurate result.
training set, the algorithm generalizes to respond correctly to all possible inputs.
Machine learning and deep learning Deep learning is a main subset of
 This is also called learning from exemplars. Supervised learning is the
are the two main subsets of AI. machine learning.
machine learning task of learning a function that maps an input to an output
AI has a very wide range of scope. Machine learning has a limited
based on example input-output pairs.
scope.
 In supervised learning, each example in the training set is a pair consisting
AI is working to create an intelligent Machine learning is working to of an input Object (typically a vector) and an output value.
Department Of CSE Anjalai Ammal- Mahalingam Engineering College -614 403 Page No 6
CS3491-Artificial Intelligence and Machine Learning

 A supervised learning algorithm analyzes the training data and produces a


function, which can be used for mapping new examples. In the optimal case,
the function will correctly determine the class labels for unseen instances.
 Both classification and regression problems are supervised learning
problems. A wide range of supervised learning algorithms are available,
each with its strengths and weaknesses. There is no single learning
algorithm that works best on all supervised learning problems.
 Supervised learning is the types of machine learning in which machines are
trained using well "labelled" training data, and on basis of that data, machines
predict the output.
 The labelled data means some input data is already tagged with the correct
output.
 In supervised learning, the training data provided to the machines work as the
supervisor that teaches the machines to predict the output correctly.
 It applies the same concept as a student learns in the supervision of the teacher.

Supervised learning is a process of providing input data as well as correct output


data to the machine learning model. The aim of a supervised learning algorithm is
to find a mapping function to map the input variable(x) with the output
variable(y).
Remarks In the real-world, supervised learning can be used for Risk Assessment, Image
A “supervised learning” is so called because the process of an algorithm learning classification, Fraud Detection, spam filtering, etc.
from the training dataset can be thought of as a teacher supervising the learning
process.
 We know the correct answers (that is, the correct outputs), the
algorithm iteratively makes predictions on the training data and is
corrected by the teacher. Learning stops when the algorithm
achieves an acceptable level of performance. How Supervised Learning Works?
Example
In supervised learning, models are trained using labelled dataset, where the model
Consider the following data regarding patients entering a clinic. The data consists of
learns about each type of data. Once the training process is completed, the model is
the gender
tested on the basis of test data (a subset of the training set), and then it predicts the
and age of the patients and each patient is labeled as “healthy” or “sick”.
output.
Department Of CSE Anjalai Ammal- Mahalingam Engineering College -614 403 Page No 7
CS3491-Artificial Intelligence and Machine Learning

The working of Supervised learning can be easily understood by the below example Steps Involved in Supervised Learning:
and diagram:
 First Determine the type of training dataset
 Collect/Gather the labelled training data.
 Split the training dataset into training dataset, test dataset, and validation
dataset.
 Determine the input features of the training dataset, which should have enough
knowledge so that the model can accurately predict the output.
 Determine the suitable algorithm for the model, such as support vector machine,
decision tree, etc.
 Execute the algorithm on the training dataset. Sometimes we need validation sets
as the control parameters, which are the subset of training datasets.
 Evaluate the accuracy of the model by providing the test set. If the model
predicts the correct output, which means our model is accurate.

Types of supervised Machine learning Algorithms

Supervised learning can be further divided into two types of problems:


Suppose we have a dataset of different types of shapes which includes square,
rectangle, triangle, and Polygon. Now the first step is that we need to train the model
for each shape.

o If the given shape has four sides, and all the sides are equal, then it will be
labelled as a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.

Now, after training, we test our model using the test set, and the task of the model is 1. Regression
to identify the shape.
Regression algorithms are used if there is a relationship between the input
The machine is already trained on all types of shapes, and when it finds a new variable and the output variable. It is used for the prediction of continuous variables,
shape, it classifies the shape on the bases of a number of sides, and predicts the such as Weather forecasting, Market Trends, etc. Below are some popular
output. Regression algorithms which come under supervised learning:
Department Of CSE Anjalai Ammal- Mahalingam Engineering College -614 403 Page No 8
CS3491-Artificial Intelligence and Machine Learning

o Linear Regression The main differences between Supervised and Unsupervised learning are given
o Regression Trees below:
o Non-Linear Regression
o Bayesian Linear Regression
Supervised Learning Unsupervised Learning
o Polynomial Regression
2. Classification
Supervised learning algorithms are trained Unsupervised learning algorithms
Classification algorithms are used when the output variable is categorical, using labeled data. are trained using unlabeled data.
which means there are two classes such as Yes-No, Male-Female, True-false, etc.
Spam Filtering, Supervised learning model takes direct Unsupervised learning model does
o Random Forest feedback to check if it is predicting not take any feedback.
o Decision Trees correct output or not.
o Logistic Regression
o Support vector Machines Supervised learning model predicts the Unsupervised learning model finds
output. the hidden patterns in data.
Advantages of Supervised learning:
In supervised learning, input data is In unsupervised learning, only
1. With the help of supervised learning, the model can predict the output on provided to the model along with the input data is provided to the model.
the basis of prior experiences. output.
2. In supervised learning, we can have an exact idea about the classes of
The goal of supervised learning is to train The goal of unsupervised learning
objects.
the model so that it can predict the output is to find the hidden patterns and
3. Supervised learning model helps us to solve various real-world problems when it is given new data. useful insights from the unknown
such as fraud detection, spam filtering, etc. dataset.

Disadvantages of supervised learning: Supervised learning needs supervision to Unsupervised learning does not
train the model. need any supervision to train the
1. Supervised learning models are not suitable for handling the complex tasks. model.
2. Supervised learning cannot predict the correct output if the test data is
different from the training dataset. Supervised learning can be categorized Unsupervised Learning can be
3. Training required lots of computation times. in Classification and Regression problem classified
4. In supervised learning, we need enough knowledge about the classes of s. in Clustering and Associations pr
object. oblems.

Department Of CSE Anjalai Ammal- Mahalingam Engineering College -614 403 Page No 9
CS3491-Artificial Intelligence and Machine Learning

2. Linear regression algorithm shows a linear relationship between a


Supervised learning can be used for those Unsupervised learning can be used
dependent (y) and one or more independent (y) variables, hence called as
cases where we know the input as well as for those cases where we have only
linear regression.
corresponding outputs. input data and no corresponding
3. Since linear regression shows the linear relationship, which means it finds
output data.
how the value of the dependent variable is changing according to the value
of the independent variable.
Supervised learning model produces an Unsupervised learning model may
accurate result. give less accurate result as The linear regression model provides a sloped straight line representing the
compared to supervised learning. relationship between the variables.
Supervised learning is not close to true Unsupervised learning is more Consider the below image:
Artificial intelligence as in this, we first close to the true Artificial
train the model for each data, and then Intelligence as it learns similarly as
only it can predict the correct output. a child learns daily routine things
by his experiences.

It includes various algorithms such as It includes various algorithms such


Linear Regression, Logistic Regression, as Clustering, KNN, and Apriori
Support Vector Machine, Multi-class algorithm.
Classification, Decision tree, Bayesian
Logic, etc.
Note: The supervised and unsupervised learning both are the machine learning
methods, and selection of any of these learning depends on the factors related to the
structure and volume of your dataset and the use cases of the problem

2. Linear Regression Models

1. Linear regression is one of the easiest and most popular Machine Learning
algorithms. It is a statistical method that is used for predictive analysis. Mathematically, we can represent a linear regression as:
Linear regression makes predictions for continuous/real or numeric
variables such as sales, salary, age, product price, etc. y= a0+a1x+ ε
Here,

Department Of CSE Anjalai Ammal- Mahalingam Engineering College -614 403 Page No 10
CS3491-Artificial Intelligence and Machine Learning

Y=Dependent Variable (Target Variable)


X=Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value).
ε = random error

The values for x and y variables are training datasets for Linear Regression model
representation.

Linear Regression:
1. Linear regression is a statistical regression method which is used for
predictive analysis.
2. It is one of the very simple and easy algorithms which works on regression
and shows the relationship between the continuous variables.
3. It is used for solving the regression problem in machine learning.
4. Linear regression shows the linear relationship between the independent
variable (X-axis) and the dependent variable (Y-axis), hence called linear
o Below is the mathematical equation for Linear regression:
regression.
5. If there is only one input variable (x), then such linear regression is
1. Y= aX+b
called simple linear regression. And if there is more than one input Here, Y = dependent variables (target variables),
variable, then such linear regression is called multiple linear regression. X= Independent variables (predictor variables),
6. The relationship between variables in the linear regression model can be a and b are the linear coefficients
explained using the below image. Here we are predicting the salary of an
employee on the basis of the year of experience. Some popular applications of linear regression are:

o Analyzing trends and sales estimates


o Salary forecasting
o Real estate prediction
Types of Linear Regression
Linear regression can be further divided into two types of the algorithm:
o Simple Linear Regression:
If a single independent variable is used to predict the value of a numerical
Department Of CSE Anjalai Ammal- Mahalingam Engineering College -614 403 Page No 11
CS3491-Artificial Intelligence and Machine Learning

dependent variable, then such a Linear Regression algorithm is called


Simple Linear Regression.
o Multiple Linear regression:
If more than one independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression algorithm is
called Multiple Linear Regression.

Linear Regression Line


A linear line showing the relationship between the dependent and independent
variables is called a regression line. A regression line can show two types of
relationship:
1. Positive Linear Relationship:
If the dependent variable increases on the Y-axis and independent variable Finding the best fit line:
increases on X-axis, then such a relationship is termed as a Positive linear
When working with linear regression, our main goal is to find the best fit line that
relationship.
means the error between predicted values and actual values should be minimized.
The best fit line will have the least error.

The different values for weights or the coefficient of lines (a0, a1) gives a different
line of regression, so we need to calculate the best values for a0 and a1 to find the
best fit line, so to calculate this we use cost function.

Cost function-
● The different values for weights or coefficient of lines (a0, a1) gives the
different line of regression, and the cost function is used to estimate the
values of the coefficient for the best fit line.
● Cost function optimizes the regression coefficients or weights. It measures
2.Negative Linear Relationship: how a linear regression model is performing.
If the dependent variable decreases on the Y-axis and independent variable ● We can use the cost function to find the accuracy of the mapping function,
increases on the X-axis, then such a relationship is called a negative linear which maps the input variable to the output variable. This mapping function
relationship. is also known as Hypothesis function.

Department Of CSE Anjalai Ammal- Mahalingam Engineering College -614 403 Page No 12
CS3491-Artificial Intelligence and Machine Learning

For Linear Regression, we use the Mean Squared Error (MSE) cost function, 1. R-squared method:
which is the average of squared error occurred between the predicted values and
actual values. It can be written as: o R-squared is a statistical method that determines the goodness of fit.
o It measures the strength of the relationship between the dependent and
For the above linear equation, MSE can be calculated as: independent variables on a scale of 0-100%.
o The high value of R-square determines the less difference between the
predicted values and actual values and hence represents a good model.
o It is also called a coefficient of determination, or coefficient of multiple
determination for multiple regression.
Where, o It can be calculated from the below formula:
N=Total number of observation
Yi = Actual value
(a1xi+a0)= Predicted value.

Residuals: The distance between the actual value and predicted values is called Assumptions of Linear Regression
residual. If the observed points are far from the regression line, then the residual will
be high, and so cost function will high. If the scatter points are close to the 1. Linear relationship between the features and target:
regression line, then the residual will be small and hence the cost function. Linear regression assumes the linear relationship between the dependent and
independent variables.
Gradient Descent: 2. Small or no multicollinearity between the features:
o Gradient descent is used to minimize the MSE by calculating the gradient of Multicollinearity means high-correlation between the independent variables.
the cost function. Due to multicollinearity, it may difficult to find the true relationship
o A regression model uses gradient descent to update the coefficients of the between the predictors and target variables. Or we can say, it is difficult to
line by reducing the cost function. determine which predictor variable is affecting the target variable and which
o It is done by a random selection of values of coefficient and then iteratively is not. So, the model assumes either little or no multicollinearity between
update the values to reach the minimum cost function. the features or independent variables.
3. Homoscedasticity Assumption:
Model Performance: Homoscedasticity is a situation when the error term is the same for all the
values of independent variables. With homoscedasticity, there should be no
The Goodness of fit determines how the line of regression fits the set of clear pattern distribution of data in the scatter plot.
observations. The process of finding the best model out of various models is 4. Normal distribution of error terms:
called optimization. It can be achieved by below method: Linear regression assumes that the error term should follow the normal
Department Of CSE Anjalai Ammal- Mahalingam Engineering College -614 403 Page No 13
CS3491-Artificial Intelligence and Machine Learning

distribution pattern. If error terms are not normally distributed, then o MLR tries to fit a regression line through a multidimensional space of data-
confidence intervals will become either too wide or too narrow, which may points.
cause difficulties in finding coefficients.
5. No autocorrelations: MLR equation:
The linear regression model assumes no autocorrelation in error terms. If ●In Multiple Linear Regression, the target variable(Y) is a linear combination
there will be any correlation in the error term, then it will drastically reduce of multiple predictor variables x1, x2, x3, ...,xn.
the accuracy of the model. Autocorrelation usually occurs if there is a ●Since it is an enhancement of Simple Linear Regression, so the same is
dependency between residual errors. applied for the multiple linear regression equation, the equation becomes:

Multiple Linear Regression


● In Simple Linear Regression, where a single Independent/Predictor(X)
variable is used to model the response variable (Y).
● But there may be various cases in which the response variable is affected by
more than one predictor variable; for such cases, the Multiple Linear
Regression algorithm is used.
● Moreover, Multiple Linear Regression is an extension of Simple Linear
regression as it takes more than one predictor variable to predict the
response variable. We can define it as:

Multiple Linear Regression is one of the important regression algorithms which


models the linear relationship between a single dependent continuous variable and Y= Output/Response variable
more than one independent variable. b0, b1, b2, b3 , bn....= Coefficients of the model.
Example: x1, x2, x3, x4,...= Various Independent/feature variable
Prediction of CO2 emission based on engine size and number of cylinders in a car. Assumptions for Multiple Linear Regressions:
Some key points about MLR: o A linear relationship should exist between the Target and predictor
variables.
o For MLR, the dependent or target variable(Y) must be the continuous/real, o The regression residuals must be normally distributed.
but the predictor or independent variable may be of continuous or
o MLR assumes little or no multicollinearity (correlation between the
categorical form.
independent variable) in data.
o Each feature variable must model the linear relationship with the dependent
variable.
Department Of CSE Anjalai Ammal- Mahalingam Engineering College -614 403 Page No 14
CS3491-Artificial Intelligence and Machine Learning

Example:
The advertising data set consists of the sales of a product in 200 different
markets, along with advertising budgets for three different media: TV,
radio, and newspaper. Here’s how it looks like:

The first row of the data says that the advertising budgets for TV, radio, and
newspaper were $230.1k, $37.8k, and $69.2k respectively, and the
corresponding number of units that were sold was 22.1k (or 22,100).
Given a set of coordinates in the form of (X, Y), the task is to find the least
regression line that can be formed.
Multiple Linear Regression solves the problem by taking account of all the Regression Line: If our data shows a linear relationship between X and Y, then the
variables in a single expression. straight line which best describes the relationship is the regression line. It is the
straight line that covers the maximum points in the graph.
Hence, our Linear Regression model can now be expressed as:

Sales=b0+b1*TV+b2*radio+b3*newspaper

4. Least Square Regression


To find the line of best fit for N points:
Step 1: For each (x,y) point calculate x2 and xy
Step 2: Sum all x, y, x2 and xy, which gives us Σx, Σy, Σx2 and Σxy (Σ means "sum
up")
Step 3: Calculate Slope m

Department Of CSE Anjalai Ammal- Mahalingam Engineering College -614 403 Page No 15
CS3491-Artificial Intelligence and Machine Learning

x y x2 xy
2 4 4 8
3 5 9 15
5 7 25 35
7 10 49 70
9 15 81 135
Step 4: Calculate y Intercept b: x y x2 xy

Step 2: Sum x, y, x2 and xy (gives us Σx, Σy, Σx2 and Σxy):


x y x2 xy
2 4 4 8
Step 5: Substitute the values in final equation. 3 5 9 15
y = mx + b 5 7 25 35
7 10 49 70
9 15 81 135
Examples:
x y x2 xy
Sam found how many hours of sunshine vs how many ice creams were sold at
Σx: 26 Σy: 41 Σx2: 168 Σxy: 263
the shop from Monday to Friday. Sam hears the weather forecast which says "we
expect 8 hours of sun tomorrow", so predict how many ice creams will be sold
tomorrow by sam? Also N (number of data values) = 5
Step 3: Calculate Slope m:
Hours of Ice Creams
Sunshine(x) Sold(y)
2 4
3 5
5 7
7 10
9 15

Solution:
Let us find the best m (slope) and b (y-intercept) that suits that data
y = mx + b
Step 1: For each (x,y) calculate x2 and xy:

Department Of CSE Anjalai Ammal- Mahalingam Engineering College -614 403 Page No 16
CS3491-Artificial Intelligence and Machine Learning

2 4 3.34 −0.66
3 5 4.86 −0.14
5 7 7.89 0.89
7 10 10.93 0.93
9 15 13.97 −1.03
Here are the (x,y) points and the line y = 1.518x + 0.305 on a graph:

Step 4: Calculate Intercept b:

Sam hears the weather forecast which says "we expect 8 hours of sun tomorrow", so
he uses the above equation to estimate that he will sell

y = 1.518 x 8 + 0.305 = 12.45 Ice Creams


Step 5: Substitute the values in equation of a line: (Assemble the equation of a
line:)) Sam makes fresh waffle cone mixture for 13 ice creams .
y = mx + b
y = 1.518 x + 0.305 2. Bayesian linear regression
 In the bayesian viewpoint, we formulate linear regression using probability
distributions rather than point estimates.
 The response, y, is not estimated as a single value, but is assumed to be
drawn from a probability distribution.
Working:  The model for bayesian linear regression with the response sampled from a
x y y = 1.518x + 0.305 error normal distribution is:

Department Of CSE Anjalai Ammal- Mahalingam Engineering College -614 403 Page No 17
CS3491-Artificial Intelligence and Machine Learning

1. Priors: if we have domain knowledge, or a guess for what the model parameters
should be, we can include them in our model, unlike in the frequentist approach
which assumes everything there is to know about the parameters comes from the
1. The output, y is generated from a normal (gaussian) distribution characterized
data.
by a mean and variance.
2. The mean for linear regression is the transpose of the weight matrix multiplied 2. If we don’t have any estimates ahead of time, we can use non-informative
by the predictor matrix. priors for the parameters such as a normal distribution.
3. The variance is the square of the standard deviation σ (multiplied by the identity 3. Posterior: the result of performing bayesian linear regression is a distribution of
matrix because this is a multi-dimensional formulation of the model). possible model parameters based on the data and the prior.
4. The aim of bayesian linear regression is not to find the single “best” value of the 4. This allows us to quantify our uncertainty about the model: if we have fewer data
model parameters, but rather to determine the posterior distribution for the points, the posterior distribution will be more spread out.
model parameters.  As the amount of data points increases, the likelihood washes out the
5. Not only is the response generated from a probability distribution, but the model prior, and in the case of infinite data, the outputs for the parameters
parameters are assumed to come from a distribution as well. converge to the values obtained from ols.
6. The posterior probability of the model parameters is conditional upon the The formulation of model parameters as distributions encapsulates the bayesian
training inputs and outputs: worldview: we start out with an initial estimate, our prior, and as we gather more
evidence, our model becomes less wrong. Bayesian reasoning is a natural extension of
our intuition.

 Here, p(β|y, x) is the posterior probability distribution of the model parameters


given the inputs and outputs.
 This is equal to the likelihood of the data, p(y|β, x), multiplied by the prior
probability of the parameters and divided by a normalization constant.
 This is a simple expression of bayes theorem, the fundamental underpinning of
bayesian inference:

 we have a posterior distribution for the model parameters that is proportional to


the likelihood of the data multiplied by the prior probability of the parameters.
 Here we can observe the two primary benefits of bayesian linear regression.

Department Of CSE Anjalai Ammal- Mahalingam Engineering College -614 403 Page No 18

You might also like