0% found this document useful (0 votes)
23 views12 pages

Regression Analysis in Machine Learning: Temperature, Age, Salary, Price

analysis

Uploaded by

puneet mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views12 pages

Regression Analysis in Machine Learning: Temperature, Age, Salary, Price

analysis

Uploaded by

puneet mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

UNIT 2

Regression Analysis in Machine learning


Regression analysis is a statistical method to model the relationship between a dependent
(target) and independent (predictor) variables with one or more independent variables. More
specifically, Regression analysis helps us to understand how the value of the dependent
variable is changing corresponding to an independent variable when other independent
variables are held fixed. It predicts continuous/real values such as temperature, age, salary,
price, etc.

We can understand the concept of regression analysis using the below example:

Example: Suppose there is a marketing company A, who does various advertisement every
year and get sales on that. The below list shows the advertisement made by the company in
the last 5 years and the corresponding sales:

Now, the company wants to do the advertisement of $200 in the year 2019 and wants to
know the prediction about the sales for this year. So to solve such type of prediction
problems in machine learning, we need regression analysis.

Regression is a supervised learning technique which helps in finding the correlation between
variables and enables us to predict the continuous output variable based on the one or more
predictor variables. It is mainly used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables.

n Regression, we plot a graph between the variables which best fits the given datapoints,
using this plot, the machine learning model can make predictions about the data. In simple
words, "Regression shows a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the datapoints and the
UNIT 2

regression line is minimum." The distance between datapoints and line tells whether a model
has captured a strong relationship or not.
Terminologies Related to the Regression Analysis:

o Dependent Variable: The main factor in Regression analysis which we want to


predict or understand is called the dependent variable. It is also called target
variable.
o Independent Variable: The factors which affect the dependent variables or which
are used to predict the values of the dependent variables are called independent
variable, also called as a predictor.
o Outliers: Outlier is an observation which contains either very low value or very high
value in comparison to other observed values. An outlier may hamper the result, so it
should be avoided.
o Multicollinearity: If the independent variables are highly correlated with each other
than other variables, then such condition is called Multicollinearity. It should not be
present in the dataset, because it creates problem while ranking the most affecting
variable.
o Underfitting and Overfitting: If our algorithm works well with the training dataset
but not well with test dataset, then such problem is called Overfitting. And if our
algorithm does not perform well even with training dataset, then such problem is
called underfitting.
Why do we use Regression Analysis?
As mentioned above, Regression analysis helps in the prediction of a continuous variable.
There are various scenarios in the real world where we need some future predictions such as
weather condition, sales prediction, marketing trends, etc., for such case we need some
technology which can make predictions more accurately. So for such case we need
Regression analysis which is a statistical method and used in machine learning and data
science. Below are some other reasons for using Regression analysis:

o Regression estimates the relationship between the target and the independent variable.
o It is used to find the trends in data.
o It helps to predict real/continuous values.
o By performing the regression, we can confidently determine the most important
factor, the least important factor, and how each factor is affecting the other
factors.
Types of Regression
There are various types of regressions which are used in data science and machine learning.
Each type has its own importance on different scenarios, but at the core, all the regression
methods analyze the effect of the independent variable on dependent variables. Here we are
discussing some important types of regression which are given below:

o Linear Regression
o Logistic Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression
UNIT 2

o Ridge Regression
o Lasso Regression:

inear Regression:

o Linear regression is a statistical regression method which is used for predictive


analysis.
o It is one of the very simple and easy algorithms which works on regression and shows
the relationship between the continuous variables.
o It is used for solving the regression problem in machine learning.
o Linear regression shows the linear relationship between the independent variable (X-
axis) and the dependent variable (Y-axis), hence called linear regression.
o If there is only one input variable (x), then such linear regression is called simple
linear regression. And if there is more than one input variable, then such linear
regression is called multiple linear regression.
o The relationship between variables in the linear regression model can be explained
using the below image. Here we are predicting the salary of an employee on the basis
of the year of experience.
Logistic Regression:

o Logistic regression is another supervised learning algorithm which is used to solve the
classification problems. In classification problems, we have dependent variables in a
binary or discrete format such as 0 or 1.
o Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes
or No, True or False, Spam or not spam, etc.
o It is a predictive analysis algorithm which works on the concept of probability.
o Logistic regression is a type of regression, but it is different from the linear regression
algorithm in the term how they are used.
o Logistic regression uses sigmoid function or logistic function which is a complex
cost function. This sigmoid function is used to model the data in logistic regression.
The function can be represented as:

o f(x)= Output between the 0 and 1 value.


o x= input to the function
o e= base of natural logarithm.
When we provide the input values (data) to the function, it gives the S-curve as follows:

Polynomial Regression:

o Polynomial Regression is a type of regression which models the non-linear


dataset using a linear model.
o It is similar to multiple linear regression, but it fits a non-linear curve between the
value of x and corresponding conditional values of y.
UNIT 2

o Suppose there is a dataset which consists of datapoints which are present in a non-
linear fashion, so for such case, linear regression will not best fit to those datapoints.
To cover such datapoints, we need Polynomial regression.
o In Polynomial regression, the original features are transformed into polynomial
features of given degree and then modeled using a linear model. Which means the
datapoints are best fitted using a polynomial line
Decision Tree Regression:

o Decision Tree is a supervised learning algorithm which can be used for solving both
classification and regression problems.
o It can solve problems for both categorical and numerical data
o Decision Tree regression builds a tree-like structure in which each internal node
represents the "test" for an attribute, each branch represent the result of the test, and
each leaf node represents the final decision or result.
o A decision tree is constructed starting from the root node/parent node (dataset), which
splits into left and right child nodes (subsets of dataset). These child nodes are further
divided into their children node, and themselves become the parent node of those
nodes. Consider the below image:

Univariate data:
Univariate data refers to a type of data in which each observation or data point corresponds
to a single variable. In other words, it involves the measurement or observation of a single
characteristic or attribute for each individual or item in the dataset. Analyzing univariate
data is the simplest form of analysis in statistics.
Heights (in cm) 164 167.3 170 174.2 178 180 18

Suppose that the heights of seven students in a class is recorded (above table). There is only
one variable, which is height, and it is not dealing with any cause or relationship.
Key points in Univariate analysis:
1. No Relationships: Univariate analysis focuses solely on describing and summarizing
the distribution of the single variable. It does not explore relationships between
variables or attempt to identify causes.
2. Descriptive Statistics: Descriptive statistics, such as measures of central
tendency (mean, median, mode) and measures of dispersion (range, standard deviation),
are commonly used in the analysis of univariate data.
3. Visualization: Histograms, box plots, and other graphical representations are often used
to visually represent the distribution of the single variable.
Bivariate data
Bivariate data involves two different variables, and the analysis of this type of data focuses
on understanding the relationship or association between these two variables. Example of
bivariate data can be temperature and ice cream sales in summer season.
UNIT 2

Temperature Ice Cream Sales

20 2000

25 2500

35 5000

Suppose the temperature and ice cream sales are the two variables of a bivariate data(table
2). Here, the relationship is visible from the table that temperature and sales are directly
proportional to each other and thus related because as the temperature increases, the sales
also increase.
Key points in Bivariate analysis:
1. Relationship Analysis: The primary goal of analyzing bivariate data is to understand
the relationship between the two variables. This relationship could be positive (both
variables increase together), negative (one variable increases while the other decreases),
or show no clear pattern.
2. Scatterplots: A common visualization tool for bivariate data is a scatterplot, where
each data point represents a pair of values for the two variables. Scatterplots help
visualize patterns and trends in the data.
Multivariate data
Multivariate data refers to datasets where each observation or sample point consists of
multiple variables or features. These variables can represent different aspects,
characteristics, or measurements related to the observed phenomenon. When dealing with
three or more variables, the data is specifically categorized as multivariate.
Example of this type of data is suppose an advertiser wants to compare the popularity of
four advertisements on a website.
Advertisement Gender Click rate

Ad1 Male 80

Ad3 Female 55

Ad2 Female 123

Ad1 Male 66

Ad3 Male 35

The click rates could be measured for both men and women and relationships between
variables can then be examined. It is similar to bivariate but contains more than one
dependent variable.
UNIT 2

Key points in Multivariate analysis:


1. Analysis Techniques:The ways to perform analysis on this data depends on the goals
to be achieved. Some of the techniques are regression analysis, principal component
analysis, path analysis, factor analysis and multivariate analysis of
variance (MANOVA).
2. Goals of Analysis: The choice of analysis technique depends on the specific goals of
the study. For example, researchers may be interested in predicting one variable based
on others, identifying underlying factors that explain patterns, or comparing group
means across multiple variables.
3. Interpretation: Multivariate analysis allows for a more nuanced interpretation of
complex relationships within the data. It helps uncover patterns that may not be
apparent when examining variables individually.
There are a lots of different tools, techniques and methods that can be used to conduct your
analysis. You could use software libraries, visualization tools and statistic testing methods.
However, this blog we will be compare Univariate, Bivariate and Multivariate analysis.

Difference between Univariate, Bivariate and Multivariate data

Univariate Bivariate Multivariate

It only summarize single It only summarize two It only summarize more


variable at a time. variables than 2 variables.

It does deal with causes It does not deal with causes


It does not deal with
and relationships and and relationships and
causes and relationships.
analysis is done. analysis is done.

It is similar to bivariate but


It does not contain any It does contain only one
it contains more than 2
dependent variable. dependent variable.
variables.

The main purpose is to


The main purpose is to The main purpose is to
study the relationship
describe. explain.
among them.

The example of a The example of bivariate Example, Suppose an


univariate can be height. can be temperature and ice advertiser wants to
sales in summer vacation. compare the popularity of
four advertisements on a
website.
Then their click rates could
be measured for both men
and women and
UNIT 2

Univariate Bivariate Multivariate

relationships between
variable can be examined

1. Data Modelling
Data modelling is a very common terminology in software engineering and other IT
disciplines. It has many interpretations and definitions depending on the field in discussion.
In data science, data modelling is the process of finding the function by which data was
generated. In this context, data modelling is the goal of any data analysis task. For instance if
you have a 2d dataset (see the figures below), and you find the 2 variables are linearly
correlated, you may decide to model it using linear regression.
2. Bayesian Data Modelling
Bayesian data modelling is to model your data using Bayes Theorem. Let us re-visit Bayes
Rule again:
P(H|E)=P(E|H)×P(H)P(E)
In the above equation, H is the hypothesis and E is the evidence. In the real world however,
we understand Bayesian components differently! The evidence is usually expressed by data,
and the hypothesis reflects the expert’s prior estimation of the posterior. Therefore, we can
re-write the Bayes Rule to be:
P(posterior)=P(data|θ)×P(prior)P(data)
In the above definition we learned about prior, posterior, and data, bout what
about θ parameter? θ is the set of coefficients that best define the data. You may think of θ as
the set of slope and intercept of your linear regression equation, or the vector of
coefficients w in your polynomial regression function. As you see in the above equation, θ is
the single missing parameter, and the goal of Bayesian modelling is to find it.

3. Bayesian Modelling & Probability Distributions


Bayes Rule is a probabilistic equation, where each term in it is expressed as a probability.
Therefore, modelling the prior and the likelihood must be achieved using probabilistic
functions. In this context arise probability distributions as a concrete tool in Bayesian
modelling, as they provide a great variety of probabilistic functions that suits numerous styles
of discrete and continuous variables. An introductory article about probability distributions
and their features can be found here.
In order to select the suitable distribution for your data, you should learn about the data
domain and gather information from previous studies in it. You may also ask an expert to
UNIT 2

learn how data is developed over time. If you have big portions of data, you may visualize it
and try to detect certain pattern(s) of its evolving over time, and select your probability
distribution upon.
Bayesian modeling is able to incorporate prior knowledge into the model. In environmental
health, this can be used to inform the model with information from previous studies, such as
the previously estimated toxicities of certain pollutants. This allows for more predictions
incorporating previous work, all while taking into account the uncertainty of these
associations.
A particularly powerful advantage is Bayesian modeling’s ability to incorporate uncertainty.
In environmental health, that may include uncertainty in the exposure, or prior knowledge
about the association with the outcome. This approach incorporates model uncertainty, which
can help estimate the probability of a hypothesis being correct. There are many other benefits,
too, such as its flexibility in dealing with missing data.
Finally, Bayesian modeling is a powerful tool for decision-making. It can be used to inform
policy decisions by providing a quantitative assessment of a variety of complex risks
associated with exposure to pollutants.
While Bayesian modeling is ascendant in environmental health sciences, particularly in the
last decade, the theory underlying it is anything but new. In fact, the originally-stated Bayes’
theorem, which describes how to update the probability of a hypothesis as new evidence
becomes available, is named for Reverend Thomas Bayes, an 18th-century statistician and
theologian, who first described the theorem in a paper published posthumously way back in
1763.
What is the Bayesian Model Selection?
Bayesian Model Selection is a probabilistic approach used in statistics and machine learning
to compare and choose between different statistical models. This method is based on the
principles of Bayesian statistics, which provide a systematic framework for updating beliefs
in light of new evidence.
Bayesian Inference
Bayesian inference is a statistical method for updating beliefs about unknown parameters
using observed data and prior knowledge. It’s based on Bayes’ theorem:
P(\theta|D) = \frac{P(D|\theta) \times P(\theta)}{P(D)}
Here,
 P(\theta|D) is the posterior probability of the parameter \theta given data D.
 P(D|\theta) is the likelihood of data D given\theta .
 P(\theta) is the prior probability of \theta .
 P(D) is the marginal likelihood of data.
So basically , we update our belief \theta based on new evidence data ( D ). The
likelihoodP(D|\theta) measures how probable the data is under certain parameter values. The
prior P(\theta) represents our initial belief about \theta before seeing the data. We then
UNIT 2

combine this with the likelihood to get the posteriorP(\theta|D) , our updated belief after
observing the data.
Key Components of Bayesian Statistics
The key components of this framework are:
 Prior Probability (Prior): This represents the belief about the model before seeing
the data.
 Likelihood: The probability of the data given the model.
 Posterior Probability: The probability of the model given the data, obtained by
updating the prior with the likelihood using Bayes’ theorem.
Application of Bayesian Model Selection in Machine Learning
1. Model Comparison: Used to compare different machine learning models (e.g., linear
regression, neural networks, decision trees) to identify the model that best explains the
data.
2. Hyperparameter Tuning: Bayesian optimization can be used for hyperparameter
tuning by treating hyperparameters as random variables and optimizing their posterior
distribution.
3. Ensemble Methods: Bayesian model averaging combines multiple models by
weighting them according to their posterior probabilities, leading to more robust
predictions.
4. Feature Selection: Bayesian methods can be used for feature selection by comparing
models with different subsets of features.
Conclusion
Bayesian Model Selection offers a robust framework for dealing with the complexities
inherent in statistical model comparison. By effectively integrating prior knowledge and
assessing model plausibility through the lens of probability, it provides a powerful tool for
many scientific and engineering disciplines. As computational resources continue to improve,
its applicability and popularity are likely to grow, making it a cornerstone in the field of
statistical inference.
Bayesian Belief Network is a graphical representation of different probabilistic relationships
among random variables in a particular set. It is a classifier with no dependency on attributes
i.e it is condition independent. Due to its feature of joint probability, the probability in
Bayesian Belief Network is derived, based on a condition — P(attribute/parent) i.e
probability of an attribute, true over parent attribute.
(Note: A classifier assigns data in a collection to desired categories.)
 Consider this example:
UNIT 2

 In the above figure, we have an alarm ‘A’ – a node, say installed in a house of a
person ‘gfg’, which rings upon two probabilities i.e burglary ‘B’ and fire ‘F’, which
are – parent nodes of the alarm node. The alarm is the parent node of two probabilities
P1 calls ‘P1’ & P2 calls ‘P2’ person nodes.
 Upon the instance of burglary and fire, ‘P1’ and ‘P2’ call person ‘gfg’, respectively.
But, there are few drawbacks in this case, as sometimes ‘P1’ may forget to call the
person ‘gfg’, even after hearing the alarm, as he has a tendency to forget things,
quick. Similarly, ‘P2’, sometimes fails to call the person ‘gfg’, as he is only able to
hear the alarm, from a certain distance.
Q) Find the probability that ‘P1’ is true (P1 has called ‘gfg’), ‘P2’ is true (P2 has called ‘gfg’)
when the alarm ‘A’ rang, but no burglary ‘B’ and fire ‘F’ has occurred.
=> P ( P1, P2, A, ~B, ~F) [ where- P1, P2 & A are ‘true’ events and ‘~B’ & ‘~F’ are ‘false’
events]
[ Note: The values mentioned below are neither calculated nor computed. They have
observed values ]
Burglary ‘B’ –
 P (B=T) = 0.001 (‘B’ is true i.e burglary has occurred)
 P (B=F) = 0.999 (‘B’ is false i.e burglary has not occurred)
Fire ‘F’ –
 P (F=T) = 0.002 (‘F’ is true i.e fire has occurred)
 P (F=F) = 0.998 (‘F’ is false i.e fire has not occurred)
Alarm ‘A’ –

B F P (A=T) P (A=F)

T T 0.95 0.05
UNIT 2

T F 0.94 0.06

F T 0.29 0.71

F F 0.001 0.999

 The alarm ‘A’ node can be ‘true’ or ‘false’ ( i.e may have rung or may not have rung).
It has two parent nodes burglary ‘B’ and fire ‘F’ which can be ‘true’ or ‘false’ (i.e
may have occurred or may not have occurred) depending upon different conditions.
Person ‘P1’ –

P
A P (P1=F)
(P1=T)

T 0.95 0.05

F 0.05 0.95

 The person ‘P1’ node can be ‘true’ or ‘false’ (i.e may have called the person ‘gfg’ or
not) . It has a parent node, the alarm ‘A’, which can be ‘true’ or ‘false’ (i.e may have
rung or may not have rung ,upon burglary ‘B’ or fire ‘F’).
Person ‘P2’ –

P
A P (P2=F)
(P2=T)

T 0.80 0.20

F 0.01 0.99

 The person ‘P2’ node can be ‘true’ or false’ (i.e may have called the person ‘gfg’ or
not). It has a parent node, the alarm ‘A’, which can be ‘true’ or ‘false’ (i.e may have
rung or may not have rung, upon burglary ‘B’ or fire ‘F’).
Solution: Considering the observed probabilistic scan –
UNIT 2

With respect to the question — P ( P1, P2, A, ~B, ~F) , we need to get the probability of
‘P1’. We find it with regard to its parent node – alarm ‘A’. To get the probability of ‘P2’, we
find it with regard to its parent node — alarm ‘A’.
We find the probability of alarm ‘A’ node with regard to ‘~B’ & ‘~F’ since burglary ‘B’ and
fire ‘F’ are parent nodes of alarm ‘A’.
From the observed probabilistic scan, we can deduce –
P ( P1, P2, A, ~B, ~F)
= P (P1/A) * P (P2/A) * P (A/~B~F) * P (~B) * P (~F)
= 0.95 * 0.80 * 0.001 * 0.999 * 0.998
= 0.00075

You might also like