0% found this document useful (0 votes)
25 views12 pages

Advance ML - Unit 1

Uploaded by

bsmn027
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views12 pages

Advance ML - Unit 1

Uploaded by

bsmn027
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Self-Learning Material

[email protected]
svim0023

Program: MCA
Specialization: AI/ML
Semester: 3
Course Name: Advanced Machine Learning
Course Code: 21VMT6S305
Unit Name: MACHINE LEARNING – RECAP

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
MACHINE LEARNING – RECAP 3

LINEAR ALGEBRA 3
SOME TERMINOLOGIES AND THEIR DEFINITIONS 3
WHAT IS MACHINE LEARNING AND ITS IMPORTANCE? 4
MACHINE LEARNING LIFECYCLE 5
TYPES OF MACHINE LEARNING ALGORITHMS 6
SUPERVISED LEARNING 6
UNSUPERVISED LEARNING 6S
REINFORCEMENT LEARNING 6
LINEAR REGRESSION 6
ASSUMPTIONS OF LINEAR REGRESSION 8
POLYNOMIAL REGRESSION 8
RIDGE REGRESSION 9
LASSO REGRESSION 9
LOGISTIC REGRESSION 10
GENERALISED LINEAR MODELS (GLM) 10
ASSUMPTIONS OF GLM 11
COMPONENTS OF GLM 11

[email protected]
svim0023

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
MACHINE LEARNING – RECAP
Linear Algebra

Machines understand only numbers and they have to be represented in a way which allows
machines to learn from data in an efficient manner. In Machine Learning, the learning involves
finding parameters of a function that best fit the data.Linear Algebra is the mathematical
foundation that helps solve problem of representing data as well as computations in ML
models. Linear Algebra involves processing of vectors, matrices and tensors.

Using Linear Algebra allows us to represent data as vectors and allow us to perform matrix
operations like dot product

If we want to find similarity between two text documents – each document is represented as
vector and then cosine similarity is calculated which involves computing the dot product of the
two vectors as well as the magnitude of the two vectors.

In algorithms like PCA, Linear Algebra allows us to calculate the Eigen Vectors and use them
to reduce the dimensions of the data by getting I portant components, that are linear
combinations of the original features.

Today, in deep learning space which involves solving problems like Image Classification or
[email protected]
svim0023 Building Natural Language Models like Q&A Answering system or model that detects Toxic
Comments – we use tensors which allow for vectorized operations to learn patterns. Tensors
are nothing but arrays and using Linear Algebra we perform the mathematical operations.

In Recommendation Systems, representing the users and items as embeddings which are dense
vectors allow us to capture the information about the user and item and allow us to recommend
personalized items to users.

Some Terminologies and their Definitions

Artificial Intelligence: AI gives the machine the ability to imitate human behaviours. Work
on AI started way back in 1956, but it was the advent of availability of GPU’s that speeded the
AI boom

Machine Learning: Machine Learning is subset of AI that uses algorithms to process, learn
and make sense or predict the pattern of available data.

Deep Learning: Deep Learning is the subset of Machine Learning, which employs Neural
Networks for training data to achieve decision making. These methods try to mimic the human
brain.This is employed in problems like Image Classification, Natural Language
Understanding, Machine Translation etc.

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
What is
[email protected] Machine Learning and Its Importance?
svim0023
Machine Learning is a subset of Artificial Intelligence that helps computer systems learn and
improve from experience by using large amount data to provide actionable insights which
includes predictions or detecting patterns in data. In today’s digital world, every single day
there is 2.5 quintillion bytes of data every single day and this number is set to increase rapidly
with the use of IoT (Internet of Things). Google in a day processes 20 petabytes of data.
With the rapid increase in data both in terms of volume and variety and affordable
computational power and high speed internet, Machine Learning has become important so that
we can make sense of data. These factors, make it possible for building models that can analyse
large and complex data. In Business, machine learning can help reduce costs, mitigate risks
and improve user experiences. In traditional institutions like Banking, Machine Learning plays
a key role in fraud detection, in identifying customers who can default on a loan, In automatic
cheque processing etc. With increase in access to data and computation power – applications
of machine learning can be found across domains and in every facet of human life.

Some applications of Machine Learning are : Recommendation Systems, Google Search,


Facebook Recommending Posts or Followers, Fraud Detection .

The performance of any Machine Learning Model is dependent on two major aspects:
1. Quality of Input Data : The most common saying you will hear in the Machine
Learning world “Garbage In, Garbage Out” – this simply means if your data is messy,
then even the most sophisticated machine learning algorithms will fail.

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
2. The choice of Model: Not every problem needs to be solved with a Neural Network
or a ensembled model. It is important to always remember that many a times a simpler
model will yield better results. In some cases like in Banking, it is also important that
the model is interpretable – in such cases the focus is more on interpretability than on
improving the score by 2% or 3% - hence using a complex model may not be
acceptable. While choosing the model, it is important to keep the business goals in
mind.

Machine Learning Lifecycle

There are 5 basic steps involved in any machine learning task.

• Collecting the data: This step involves loading and reading the data from the data
sources. The data sources can include excel files, csv files, data from tables, raw text
files etc.

• Preparing and Understanding the Data: The quality of any machine learning model
is dependent on the quality of data. Ideally, in any Machine Learning project this is the
step where most of the time is spent. This involves identifying and handling outliers,
handling missing data, creating features from data. This allows involves performing
Exploratory Analysis which involves understanding the data and the relationship
[email protected] between variables.
svim0023
• Training a Model: This step involves dividing the data into training and validation
set. The validation set is not used for training but to check how well the model will
perform on unseen data. In training, the train set is used to develop a model. Cross
Validation techniques are used to understand performance of a model while training

• Evaluating a Model: This step uses the validation data to see how well the model
performs on unseen data. Various metrics like accuracy, precision,recall or f-score in
case of classification, RMSE,MSE etc in case of regression are measured and the model
is validated. The metrics that are achieved during training is compared against the
metrics achieved during evaluation on the validation set to check if the model in
underfitting or overfitting.

• Improving the Model: This step may involve either choosing a different model or
creating new features in the data that can help improve the quality of model or in some
cases – can also lead to collect more data.

Machine Learning Lifecyle is not straightforward but instead a cycle iterating between
improving the data, model and evaluation and is never really completed.

Following the cyclical approach is important because it focuses on using the model and its
results to refine your data.

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Types of Machine Learning Algorithms

Supervised Learning

This is also known as predictive modelling and is used to predict future outcome based on
historical data. This is a task-driven approach. When the predicted outcome is a continuous
variable it is called Regression and when it is categorical it is called Classification. Some
examples of supervised learning are : Fraud Detection, Predicting which customers are most
likely to churn or Image Classification, Forecasting Sales .

Unsupervised
[email protected] Learning
svim0023
This is a data-driven learning approach. In this method there are no predefined outcome
variable and is used to identify patterns in data. One of the most commonly used method of
unsupervised learning is clustering. Examples of Unsupervised Learning are: customer
segmentation, topic modelling, identifying which products customers are most likely to buy
together
Reinforcement Learning

In this type of learning machines are trained to take specific decisions that maximise the
efficiency. The main idea behind this kind of learning is the machine learns from its
environment continuously and applies the knowledge to the business. In Reinforcement
Learning, the machine learns by interacting with the environment. Self-Driving cars is an
example of Reinforcement Learning

Linear Regression

Linear regression is a statistical model that allows to explain a dependent variable y based on
variation in one or multiple independent variables.

Linear Regression is type of model, where it is assumed that the relationship between the
dependent and independent variable is linear in nature. The equation for Linear Regression is
given as:

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
𝒀 =𝒄+𝒎∗𝑿+𝒆
where, Y is the dependent variable, X is the independent variable, ‘c’ is the intercept, ‘m’ is
the slope and ‘e’ is the Error Term

Linear Regression generated a regression line which is also called the line of Best Fit. To model
linear regression, there can be only one dependent and one independent variable or multiple
independent and one dependent variable.

Plotting the line of best fit between a dependent and independent variable can help understand
the relationship between them – for example a positive slope indicates a positive correlation
and negative slope indicates a negative correlation between the variables.
Also, to understand how Y changes when X changes, we can use the equation:
[email protected]
svim0023 Y=mX+c+e
If X increases by n units, then
X1=X+n
=> Y1=m(X+n)+c+e
=> Y1=mX+mn+c+e
=> Y1=Y+mn

If we have multiple independent variables then the equation can be given as:
𝒀 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏 + 𝜷𝟐 𝑿𝟐 + ⋯ + 𝜷𝒏 𝑿𝒏 + 𝒆

Where , 𝛽0 is the intercept, 𝛽1, 𝛽2 ,… 𝛽𝑛 are the slope for each independent variable,
𝑋1 , 𝑋2 … , 𝑋𝑛 are the independent variables and ‘e’ is the error term
In case of Multiple Linear Regression, the if 𝜷𝟏 is positive, it indicates that there is a positive
correlation between 𝑿𝟏 and Y.

If we increase X1 by n units then how much does Y change by?

Y1=𝛽0 + 𝛽1 (𝑋1 + 𝑛) + 𝛽2 𝑋2 + ⋯ + 𝛽𝑛 𝑋𝑛 + 𝑒
 Y1=Y+𝜷𝟏 𝒏

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
In a similar way, we can understand how various factors impact the outcome or the independent
variable using Linear Regression. Also, note one of the key assumptions of Multiple Linear
Regression is that there is no correlation between independent variables.
The cost function used for Linear Regression is OLS (Ordinary Least Squares). The cost
function that we use to determine the coefficients is given by

𝑛 𝑝

𝐶𝑜𝑠𝑡 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛 = ∑(𝑦𝑖 − ∑ 𝑥𝑖𝑗 𝛽𝑗 )2


𝑖=1 𝑗=1

In the above equation i represents the ith data observation where n is the total number of samples
in the data. 𝑦𝑖 is the dependent variable for the ith observation. j is the feature number where p
is the total number of features or independent variables. 𝑥𝑖𝑗 is the jth independent variable for
the ith row and 𝛽𝑗 is the coefficient associated with the jth feature

Assumptions of Linear Regression

1. Linearity: The relationship between the dependent and independent variables is linear
in nature
2. Homoscedasticity: The variance of the residual or the error term is constant. That is
the error term does not vary much as the value of the dependent variable changes.
3. Independence: Observations are independent of each other
[email protected]
svim0023 4. Normality: The residuals of Linear Regressions should follow a normal distribution.
5. Multicollinearity: There should be no or very little correlation between the
independent variables. Multicollinearity can be tested using Variance Inflation Factor
(VIF)

Polynomial Regression
When there is a non-linear relationship between dependent and independent variable, we use
polynomial regression analysis. It is like multiple linear regression the difference being instead
of a straight line a curve is fit.

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Polynomial Regression

Ridge Regression
Multiple Linear Regression does not work well when there is correlation between independent
variables (there is multicollinearity) – to handle such scenarios we use L2 regularization. For
Ridge Regression, the cost function becomes:

𝑛 𝑝 𝑝
[email protected]
svim0023 𝐶𝑜𝑠𝑡 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛 = ∑(𝑦𝑖 − ∑ 𝑥𝑖𝑗 𝛽𝑗 ) + 𝜆 ∑ 𝛽𝑗2 2

𝑖=1 𝑗=1 𝑗=1

In the cost function when 𝜆 = 0, it becomes the same as OLS cost function. If 𝜆 is large it will
cause underfitting. Ridge Regression is one of the techniques to avoid overfitting.

Lasso Regression
Lasso Regression is also a regularization technique the helps in reducing overfitting. In Lasso
we use L1 Regularization. The cost function for Lasso Regression is:

𝑛 𝑝 𝑝

𝐶𝑜𝑠𝑡 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛 = ∑(𝑦𝑖 − ∑ 𝑥𝑖𝑗 𝛽𝑗 )2 + 𝜆 ∑ |𝛽𝑗 |


𝑖=1 𝑗=1 𝑗=1

The key difference between Lasso and Ridge regression is that Lasso shrinks the less important
feature coefficients to zero (hence helps with feature selection).

Both Ridge and Lasso Regression helps in reducing the complexity of the model especially
when you have large number of features, and hence reduce over-fitting

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Logistic Regression

This is used when the target variable is categorical in nature. This is used for example to predict
whether the email is spam or ham or whether the tumor is malignant or benign.While Linear
Regression is unbounded, logistic regression predicts value between 0 and 1 which is nothing
but a probability score and is bounded. Generally, logistic regression is used for Binary
Classification, when there is multiple classes, we use Multinomial or Ordered Logistic
Regression.

The cost function used for Logistic Regression is sigmoid. Sigmoid function maps any
real-valued number to between 0 and 1.

[email protected]
svim0023

Cost Function of Logistic Regression is given as :

𝟏
𝑪𝒐𝒔𝒕 𝑭𝒖𝒏𝒄𝒕𝒊𝒐𝒏 =
𝟏+ 𝒆−(𝜷𝟎 +𝜷𝟏 𝑿)

As we can see, the Cost Function of Logistic Regression is the same as sigmoid function
applied on the Linear Regression Cost Function. The Logistic Regression returns a probability
score between 0 and 1. To determine which class the data belongs to we can set a probability
threshold.

For example: If there are two classes : spam and ham (spam=1 and ham is 0) and the
probability threshold is 0.5 and the predicted probability is 0.48, then the data point belongs to
ham . If the predicted probability is greater than the probability threshold then the data point
belongs to spam.

Generalised Linear Models (GLM)

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
In Linear Regression, one of the assumptions is that there is linear relationship between the
dependent variable (y) and independent variable (X) and the error distribution is normal. But,
in real-world there can be other relationships like exponential between y and X. GLM’s allow
us to build a linear relationship between response and predictors even though their underlying
relationship is not linear. This is achieved by using a “link” function. In GLM models, the
residuals are not assumed to follow a normal distribution.

If there is an exponential relationship between X and y as shown below, then we cannot use
Linear regression as Linearity Assumption is invalidated.

Assumptions of GLM

1. Independence: Observations are independent of each other


[email protected]
2. The distribution of the residuals need not be normal, but belong to any of the
svim0023
exponential distributions like binomial, Poisson, multinomial or normal
3. The dependent variable need not have an linear relationship with the independent
variables.
Note: Logistic Regression also belongs to the family of GLM’s.

Components of GLM

1. Linear Predictor
2. Link Function
3. Probability Distribution

If the distribution is a Poisson distribution then GLM equation becomes:


𝐥𝐧 𝝀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿
𝒚𝒊 = 𝑷𝒐𝒊𝒔𝒔𝒐𝒏(𝝀𝒊 ) – this is the probability distribution function
where ln is the Link Function, 𝛽0 + 𝛽1 𝑋 𝑖𝑠 𝑡ℎ𝑒 𝐿𝑖𝑛𝑒𝑎𝑟 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑜𝑟
 𝝀𝒊 = 𝒆𝒙𝒑(𝜷𝟎 + 𝜷𝟏 𝑿)
For Poisson Regression the link function is a log function.

For Linear Regressions, the probability distribution function is normal. The link function is
“identity function”

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
 𝝁𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿 & 𝒚𝒊 = 𝚴(𝝁𝒊 , ∈)

For Logistic Regression, the probability distribution is Binomial and the link function is the
logit function (sigmoid function).

𝟏
 𝒒𝒊 = 𝟏+𝒆−(𝜷𝟎 +𝜷𝟏 𝑿) & 𝒚𝒊 = 𝐁𝐞𝐫𝐧(𝒒𝒊 )

[email protected]
svim0023

Proprietary content. All rights reserved. Unauthorized use or distribution prohibited.


This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.

You might also like