0% found this document useful (0 votes)
18 views133 pages

Unit 2

The document provides an overview of regression modeling and analysis, detailing its purpose in determining relationships between dependent and independent variables, and its applications in prediction and causal relationships. It covers various types of regression including linear, logistic, polynomial, and regularization techniques such as Ridge and Lasso regression. Additionally, it introduces concepts related to Bayesian modeling, multivariate analysis, and neural networks, highlighting their significance in data analysis and machine learning.

Uploaded by

ashmakhan8855
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views133 pages

Unit 2

The document provides an overview of regression modeling and analysis, detailing its purpose in determining relationships between dependent and independent variables, and its applications in prediction and causal relationships. It covers various types of regression including linear, logistic, polynomial, and regularization techniques such as Ridge and Lasso regression. Additionally, it introduces concepts related to Bayesian modeling, multivariate analysis, and neural networks, highlighting their significance in data analysis and machine learning.

Uploaded by

ashmakhan8855
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 133

Unit-2

Data Analysis
Regression Modelling/ Analysis

• used to determine the relationship between a dataset’s dependent (goal) and


independent variables.

• It is widely used when the dependent and independent variables are linked in a linear
or non-linear fashion, and the target variable has a set of continuous values.

• It is used for one of two purposes: predicting the value of the dependent variable when
information about the independent variables is known or predicting the effect of an
independent variable on the dependent variable.

• regression analysis approaches help establish causal relationships between variables,


modelling time series, and forecasting.
• Regression analysis, for example, is the best way to examine the relationship between
sales and advertising expenditures for a corporation.
Regression Analysis:

A statistical procedure used to


find relationships among a set of
variables
y=mx + b
•y is the dependent variable
• x is the independent variable
• m is slope of line (how much Y changes for a unit change in X)
• b is intercept (the value of Y when X is 0)
Two types of variables used:

• Dependent Variable(Y) (Target/ Outcome/ Response)


• Independent Variable(X) (Predictor/ Features/ Explanatory)

• RA is a way to find trend in data.

Regression

Number of Independent Type of dependent


Shape of Regression Line
Variable Variable
Regression shows a line or curve that passes through all the
datapoints on target-predictor graph in such a way that the
vertical distance between the datapoints and the regression
line is minimum
r (Simple Correlation Coefficient (Pearson’s Correlation)

• It measures the nature & strength between two variables of Quantitative type.
r

Sign (Nature of Association) Value (Strength of Association)

Negative (Indirect/
Positive (Direct
Inverse Relation)
Relation)
{One increase other
{Both increase or
decrease and vice versa
decrease together}
Linear
•Linear regression shows the linear relationship between the
Regression independent
variable X-axis and the dependent variable Y-axis, hence called linear
regression.

•If there is only one input variable (x), then such linear regression is
called simple linear regression. And if there is more than one input variable,
then such linear regression is called multiple linear regression.

equation: y = A + Bx

equation: y = A+B1x1+B2x2+B3x3+B4x4
Logistic Regression

•Also known as logit model.

•Used for classification & Predictive analytics.

•Logistic regression is another supervised learning algorithm which is used to solve


the classification problems. In classification problems, we have dependent
variables in a binary or discrete format such as 0 or 1.

• works with the categorical variable such as 0 or 1, Yes or No, True or False, Spam
or not spam, etc.

• works on the concept of probability.


•In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic
function, which predicts two maximum values (0 or 1).

•The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, a mouse is obese or not based on its weight, etc.

Logistic Function (Sigmoid Function):

•The sigmoid function is a mathematical function used to map the predicted values to
probabilities.

•It maps any real value into another value within a range of 0 and 1. o The value of the
logistic regression must be between 0 and 1, which cannot go beyond this limit, so it
forms a curve like the “S” form.
•The S-form curve is called the Sigmoid function or the logistic function.
Types of Logistics Regression

1. Binomial --- only two possible types of dependent variable


{E.g. 0 or 1, Pass or Fail}

1. Multinomial --- there can be 3 or more possible unordered types of


dependent variable. {E.g. “cat”, “dog”, “Sheep”}

2. Ordinal --- there can be 3 or more possible ordered types of


dependent variable. {E.g. “Low”, “Medium”, “High”}
Polynomial Regression
• type of regression which models the non-linear dataset using a linear model.
• the original features are transformed into polynomial features of given degree
and then
modeled using a linear model. Which means the datapoints are best fitted
using a
polynomial line.
• The equation for polynomial regression also derived from linear regression
equation that means Linear regression equation Y= b0+ b1x, is transformed into
Polynomial regression equation Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn.
Regularization in Machine Learning

• Refers to a technique that are used to calibrate machine learning


models in order to minimize the adjusted loss function and prevent
overfitting or underfitting.
Ridge Regression

•most robust versions of linear regression in which a small amount of bias is


introduced so that we can get better long term predictions.
•The amount of bias added to the model is known as Ridge Regression penalty.
We can compute this penalty term by multiplying with the lambda to the squared
weight of each individual features.
•The equation:

•A general linear or polynomial regression will fail if there is high collinearity


between the independent variables, so to solve such problems, Ridge regression
can be used.
•It is also called as L2 regularization.
Lasso Regression

•Stands for Least Absolute Shrinkage and Selection Operator.

• Provides accurate Prediction.

•Lasso regression model uses shrinkage technique.

• the data values are shrunk towards a central point similar to the concept of mean.
•The lasso regression algorithm suggests a simple, sparse models (i.e. models with fewer
parameters), which is well-suited for models or data showing high levels of multicollinearity or
when we would like to automate certain parts of model selection, like variable selection or
parameter elimination.

•It is also called as L1 regularization.


Support Vector Regression

•Support Vector Machine is a supervised learning algorithm which can be used for
regression as well as classification problems. So if we use it for regression
problems, then it is termed as Support Vector Regression.

The main goal of SVR is to consider the maximum datapoints within the
boundary lines and the hyperplane (best-fit line) must contain a maximum
number of datapoints
Decision Tree Regression

•Decision Tree is a supervised learning algorithm which can be used for solving
both classification and regression problems.
•It can solve problems for both categorical and numerical data

•Decision Tree regression builds a tree-like structure in which each internal node
represents the "test" for an attribute, each branch represent the result of the test,
and each leaf node represents the final decision or result.
Support Vector machine

• Support Vector Machine or SVM is one of the most popular Supervised


Learning algorithms, which is used for Classification as well as Regression
problems. However, primarily, it is used for Classification problems in Machine
Learning.

• The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put
the new data point in the correct category in the future. This best decision
boundary is called a hyperplane.

• SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is
termed as Support Vector Machine. Consider the below diagram in which there
are two different categories that are classified using a decision boundary or
How SVM Works?

Linear SVM

Non- Linear SVM


SVM Kernel
• It is a function that takes low dimensional input space & transform it into higher dimensional
space i.e. it convert non-separable problem into separable one.

• It does some extreme complex data transformation then find out process to separate data
based on label or output.
SVM Margin
Types of Kernel

1. Polynomial kernel: It is popular in image processing.

Equation is:

where d is the degree of the polynomial.


2. Gaussian kernel:

It is a general-purpose kernel; used when there is no prior knowledge about the data. Equation
is:
3. Gaussian radial basis function (RBF):

It is a general-purpose kernel; used when there is no prior knowledge about the data.

Equation is:
4. Laplace RBF kernel:

It is general-purpose kernel; used when there is no prior knowledge about the data.

Equation is:

• Hyperbolic tangent kernel:

We can use it in neural networks.

Equation is:
• Sigmoid kernel:

We can use it as the proxy for neural networks.

Equation is

• Bessel function of the first kind Kernel:

We can use it to remove the cross term in mathematical functions. Equation is :

where j is the Bessel function of first kind.

• ANOVA radial basis kernel:

We can use it in regression problems. Equation is:

• Linear splines kernel in one-dimension


useful when dealing with large sparse data vectors. It is often used in text categorization. The splines kernel also
performs well in regression problems. Equation is:
Bayesian Modelling
Basics:

Statistics is the study to help us quantify the way to measure uncertainty and hence, the
concept of ‘Probability’ was introduced.

There are 3 different approaches available to determine the probability of an event.

1. Classical : This Classical approach works well when we have well-defined equally likely
outcomes.
2. Frequentist : Frequentist definition requires us to have a hypothetical infinite sequence of
a particular event and then to look at the relevant frequency in that hypothetical infinite
sequence.
3. Bayesian : Bayesian perspective allows us to incorporate personal belief/opinion into the
decision-making process.
Bayes Theorem
Bayes’ theorem can be expressed through the following mathematical equation

(Prob. Of Event A occurring


independent of any other event
B)
`
Bayesian Belief Network

• A Bayesian network is a probabilistic graphical model which represents a set of variables and their conditional
dependencies using a directed acyclic graph."
• It is also called a Bayes network, belief network, decision network, or Bayesian model.

Bayesian Network can be used for building models from data and experts opinions, and it consists of two parts:
•Directed Acyclic Graph
•Table of conditional probabilities.
Joint probability distribution

If we have variables x1, x2, x3,....., xn, then the probabilities of a different
combination of x1, x2, x3.. xn, are known as Joint probability distribution.

P[x1, x2, x3,....., xn], can be written as the following way in terms of the joint
probability distribution.
= P[x1| x2, x3,....., xnP[x2, x3,....., xn]
= P[x1| x2, x3,....., xnP[x2|x3,....., xn]....P[xn-1|xnP[xn].

In general for each variable Xi, we can write the equation as:
P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))
Problem:
Calculate the probability that alarm has sounded, but there is neither a burglary, nor an earthquake
occurred, and David and Sophia both called the Harry.

From the formula of joint distribution, we can write the problem statement in the form of probability
distribution:
PS, D, A, ¬B, ¬E = P S|A P D|AP A|¬B ^ ¬E P (¬B) P (¬E).
= 0.75 0.91 0.001 0.9980.999
= 0.00068045.
Tutorial
1. (a) Find the probability of finding oil industry at its peak when its stock price
value is low, also interest rate will be depreciating and stock market is at high
peak.
Multivariate Analysis

• It is used to describe analysis of data where there are multiple variables or observations for
each unit or individual.
PCA (Principal Component Analysis)
Principal Component Analysis is an unsupervised learning algorithm that is used for the
dimensionality reduction in machine learning.

PCA generally tries to find the lower-dimensional surface to project the high-dimensional data.

PCA algorithm is based on some mathematical concepts such as:


•Variance and Covariance
•Eigenvalues and Eigen factors
PCA Algorithm-

The steps involved in PCA Algorithm are as follows-

Step-01: Get data.


Step-02: Compute the mean vector (µ).
Step-03: Subtract mean from the given data.
Step-04: Calculate the covariance matrix.
Step-05: Calculate the eigen vectors and eigen values of the covariance
matrix.
Step-06: Choosing components and forming a feature vector.
Step-07: Deriving the new data set.
Exampl Exampl Exampl Exampl
Feature
e1 e2 e3 e4
X1 4 8 13 7
X2 11 4 5 14

Step 1: Calculate Mean


The figure shows the scatter plot of the given data
points.

Step 2: Calculation of the covariance matrix.


Step 3: Eigenvalues of the covariance matrix

Step 4: Computation of the eigenvectors


To find the first principal components, we need only compute the eigenvector corresponding to the largest eigenvalue.
In the present example, the largest eigenvalue is λ1 and so we compute the eigenvector corresponding to λ1.
The eigenvector corresponding to λ = λ1 is a vector
Step 5: Computation of first principal components
X1 4 8 13 7
X2 11 4 5 14

First
Principle -4.3052 3.7361 5.6928 -5.1238
Components
Neural Network
Objective is to develop a system to perform various computational tasks faster than the traditional
systems.

These tasks include pattern recognition and classification, approximation, optimization, and data
clustering.

Neural networks are parallel computing devices, which is basically an attempt to make a computer
model of the brain.
Biological Neural Network Artificial Neural Network

Dendrites Inputs
Cell nucleus Nodes
Synapse Weights
Axon Output
Neural networks mimic the basic functioning of the human brain and are inspired by how
the human brain interprets information. They solve various real-time tasks because of its
ability to perform computations quickly and its fast responses.
Types of learnings in Neural networks:

1.Supervised Learning
2.Unsupervised Learning
3.Reinforcement Learning
Generalization in Neural Networks

• Whenever we train our own neural networks, we need to take care of something called the generalization of
the neural network.

• This essentially means how good our model is at learning from the given data and applying the learnt
information elsewhere.

• When training a neural network, there’s going to be some data that the neural network trains on, and there’s
going to be some data reserved for checking the performance of the neural network.

• If the neural network performs well on the data which it has not trained on, we can say it has generalized
well on the given data

• concept of learning from some data and correctly applying the gained knowledge on other data is
called generalization.
Types of Learning Rule in NN
Hebbian Learning Rule( Unsupervised Learning)

•If two neighbor neurons are operating in the same phase at the same period of time,
then the weight between these neurons should increase.
•For neurons operating in the opposite phase, the weight between them should
decrease.
•If there is no signal correlation, the weight does not change, the sign of the weight
between two nodes depends on the sign of the input between those nodes
•When inputs of both the nodes are either positive or negative, it results in a strong
positive weight.
•If the input of one node is positive and negative for the other, a strong negative weight
is present.
Perceptron Learning Rule( Supervised Learning)

As being supervised in nature, to calculate the error, there would be a comparison between the
desired/target output and the actual output.

If there is any difference found, then a change must be made to the weights of connection.
Delta Learning Rule
Competitive Learning Rule
Outstar Learning Rule
Fuzzy Logic
Architecture of a Fuzzy Logic System
Fuzzification Module − It transforms the system inputs, which are crisp numbers, into fuzzy
sets. It splits the input signal into five steps such as −

LP x is Large Positive

x is Medium
MP
Positive
S x is Small
x is Medium
MN
Negative
x is Large
LN
Negative
Tutorial
2. Compute Principal Component using PCA algorithm on given data as:
X= 2, 3, 4
Y= 5, 6, 7
3.

You might also like