Unit 2
Unit 2
Data Analysis
Regression Modelling/ Analysis
• It is widely used when the dependent and independent variables are linked in a linear
or non-linear fashion, and the target variable has a set of continuous values.
• It is used for one of two purposes: predicting the value of the dependent variable when
information about the independent variables is known or predicting the effect of an
independent variable on the dependent variable.
Regression
• It measures the nature & strength between two variables of Quantitative type.
r
Negative (Indirect/
Positive (Direct
Inverse Relation)
Relation)
{One increase other
{Both increase or
decrease and vice versa
decrease together}
Linear
•Linear regression shows the linear relationship between the
Regression independent
variable X-axis and the dependent variable Y-axis, hence called linear
regression.
•If there is only one input variable (x), then such linear regression is
called simple linear regression. And if there is more than one input variable,
then such linear regression is called multiple linear regression.
equation: y = A + Bx
equation: y = A+B1x1+B2x2+B3x3+B4x4
Logistic Regression
• works with the categorical variable such as 0 or 1, Yes or No, True or False, Spam
or not spam, etc.
•The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, a mouse is obese or not based on its weight, etc.
•The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
•It maps any real value into another value within a range of 0 and 1. o The value of the
logistic regression must be between 0 and 1, which cannot go beyond this limit, so it
forms a curve like the “S” form.
•The S-form curve is called the Sigmoid function or the logistic function.
Types of Logistics Regression
• the data values are shrunk towards a central point similar to the concept of mean.
•The lasso regression algorithm suggests a simple, sparse models (i.e. models with fewer
parameters), which is well-suited for models or data showing high levels of multicollinearity or
when we would like to automate certain parts of model selection, like variable selection or
parameter elimination.
•Support Vector Machine is a supervised learning algorithm which can be used for
regression as well as classification problems. So if we use it for regression
problems, then it is termed as Support Vector Regression.
The main goal of SVR is to consider the maximum datapoints within the
boundary lines and the hyperplane (best-fit line) must contain a maximum
number of datapoints
Decision Tree Regression
•Decision Tree is a supervised learning algorithm which can be used for solving
both classification and regression problems.
•It can solve problems for both categorical and numerical data
•Decision Tree regression builds a tree-like structure in which each internal node
represents the "test" for an attribute, each branch represent the result of the test,
and each leaf node represents the final decision or result.
Support Vector machine
• The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put
the new data point in the correct category in the future. This best decision
boundary is called a hyperplane.
• SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is
termed as Support Vector Machine. Consider the below diagram in which there
are two different categories that are classified using a decision boundary or
How SVM Works?
Linear SVM
• It does some extreme complex data transformation then find out process to separate data
based on label or output.
SVM Margin
Types of Kernel
Equation is:
It is a general-purpose kernel; used when there is no prior knowledge about the data. Equation
is:
3. Gaussian radial basis function (RBF):
It is a general-purpose kernel; used when there is no prior knowledge about the data.
Equation is:
4. Laplace RBF kernel:
It is general-purpose kernel; used when there is no prior knowledge about the data.
Equation is:
Equation is:
• Sigmoid kernel:
Equation is
Statistics is the study to help us quantify the way to measure uncertainty and hence, the
concept of ‘Probability’ was introduced.
1. Classical : This Classical approach works well when we have well-defined equally likely
outcomes.
2. Frequentist : Frequentist definition requires us to have a hypothetical infinite sequence of
a particular event and then to look at the relevant frequency in that hypothetical infinite
sequence.
3. Bayesian : Bayesian perspective allows us to incorporate personal belief/opinion into the
decision-making process.
Bayes Theorem
Bayes’ theorem can be expressed through the following mathematical equation
• A Bayesian network is a probabilistic graphical model which represents a set of variables and their conditional
dependencies using a directed acyclic graph."
• It is also called a Bayes network, belief network, decision network, or Bayesian model.
Bayesian Network can be used for building models from data and experts opinions, and it consists of two parts:
•Directed Acyclic Graph
•Table of conditional probabilities.
Joint probability distribution
If we have variables x1, x2, x3,....., xn, then the probabilities of a different
combination of x1, x2, x3.. xn, are known as Joint probability distribution.
P[x1, x2, x3,....., xn], can be written as the following way in terms of the joint
probability distribution.
= P[x1| x2, x3,....., xnP[x2, x3,....., xn]
= P[x1| x2, x3,....., xnP[x2|x3,....., xn]....P[xn-1|xnP[xn].
In general for each variable Xi, we can write the equation as:
P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))
Problem:
Calculate the probability that alarm has sounded, but there is neither a burglary, nor an earthquake
occurred, and David and Sophia both called the Harry.
From the formula of joint distribution, we can write the problem statement in the form of probability
distribution:
PS, D, A, ¬B, ¬E = P S|A P D|AP A|¬B ^ ¬E P (¬B) P (¬E).
= 0.75 0.91 0.001 0.9980.999
= 0.00068045.
Tutorial
1. (a) Find the probability of finding oil industry at its peak when its stock price
value is low, also interest rate will be depreciating and stock market is at high
peak.
Multivariate Analysis
• It is used to describe analysis of data where there are multiple variables or observations for
each unit or individual.
PCA (Principal Component Analysis)
Principal Component Analysis is an unsupervised learning algorithm that is used for the
dimensionality reduction in machine learning.
PCA generally tries to find the lower-dimensional surface to project the high-dimensional data.
First
Principle -4.3052 3.7361 5.6928 -5.1238
Components
Neural Network
Objective is to develop a system to perform various computational tasks faster than the traditional
systems.
These tasks include pattern recognition and classification, approximation, optimization, and data
clustering.
Neural networks are parallel computing devices, which is basically an attempt to make a computer
model of the brain.
Biological Neural Network Artificial Neural Network
Dendrites Inputs
Cell nucleus Nodes
Synapse Weights
Axon Output
Neural networks mimic the basic functioning of the human brain and are inspired by how
the human brain interprets information. They solve various real-time tasks because of its
ability to perform computations quickly and its fast responses.
Types of learnings in Neural networks:
1.Supervised Learning
2.Unsupervised Learning
3.Reinforcement Learning
Generalization in Neural Networks
• Whenever we train our own neural networks, we need to take care of something called the generalization of
the neural network.
• This essentially means how good our model is at learning from the given data and applying the learnt
information elsewhere.
• When training a neural network, there’s going to be some data that the neural network trains on, and there’s
going to be some data reserved for checking the performance of the neural network.
• If the neural network performs well on the data which it has not trained on, we can say it has generalized
well on the given data
• concept of learning from some data and correctly applying the gained knowledge on other data is
called generalization.
Types of Learning Rule in NN
Hebbian Learning Rule( Unsupervised Learning)
•If two neighbor neurons are operating in the same phase at the same period of time,
then the weight between these neurons should increase.
•For neurons operating in the opposite phase, the weight between them should
decrease.
•If there is no signal correlation, the weight does not change, the sign of the weight
between two nodes depends on the sign of the input between those nodes
•When inputs of both the nodes are either positive or negative, it results in a strong
positive weight.
•If the input of one node is positive and negative for the other, a strong negative weight
is present.
Perceptron Learning Rule( Supervised Learning)
As being supervised in nature, to calculate the error, there would be a comparison between the
desired/target output and the actual output.
If there is any difference found, then a change must be made to the weights of connection.
Delta Learning Rule
Competitive Learning Rule
Outstar Learning Rule
Fuzzy Logic
Architecture of a Fuzzy Logic System
Fuzzification Module − It transforms the system inputs, which are crisp numbers, into fuzzy
sets. It splits the input signal into five steps such as −
LP x is Large Positive
x is Medium
MP
Positive
S x is Small
x is Medium
MN
Negative
x is Large
LN
Negative
Tutorial
2. Compute Principal Component using PCA algorithm on given data as:
X= 2, 3, 4
Y= 5, 6, 7
3.