0% found this document useful (0 votes)
6 views28 pages

PA Notes

The document outlines the steps for developing a simple linear regression (SLR) model, emphasizing that the dependent variable (Y) must be numeric. It includes the regression equation, examples of interpreting coefficients, and discusses the importance of data correlation and homoscedasticity. Additionally, it briefly mentions logistic regression and the need for dummy coding when dealing with categorical variables.

Uploaded by

Vencel Patrick
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views28 pages

PA Notes

The document outlines the steps for developing a simple linear regression (SLR) model, emphasizing that the dependent variable (Y) must be numeric. It includes the regression equation, examples of interpreting coefficients, and discusses the importance of data correlation and homoscedasticity. Additionally, it briefly mentions logistic regression and the need for dummy coding when dealing with categorical variables.

Uploaded by

Vencel Patrick
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Formula for regression coefficient (y on x)

Y in linear regression model should only be float or integer


You cannot build linear regression model if your Y is an object

Framework for SLR model development


Step 1 - Collect/extract data
Step 2 - Preprocess the data
Step 3 - Divide the data into training and validation data
Step 4 - Perform descriptive analysis
Step 5 - Define the functional form of regression
Step 6 - Estimate regression parameters
Step 7 - Perform regression model diagnostics
Step 8 - Validate the model using validation data
Step 9 - Decide on the model deployment
E=Yi- ŷ
Yi - actual
Ŷ - predicted

ŷ=β0+β1x

The equation can be interpreted as follows: for every one percentage increase in grade 10
marks, the salary of the MBA students will increase at the rate of 3076.1774 on an average.

Eg.
ŷ(marks)=20+0.76(study for)
For every 1 hour increase in study hour, there is an increase in marks by 0.76 on an average

ŷ(yield)=20-0.76(rainfall)
For every 1 unit increase in rainfall, there is a decrease in yield by 0.76 on an average
SST=SSR+SSE
SST - sum of square of total variation
SSR - sum of squares of regression
SSE - sum of squares of errors
If the correlation comes out as 0.99, then the model is most probably wrong or a created(fake) data
set
A proper data set will have a correlation around 0.5-0.75.

r-square=0.99
This means that 99% of the variation in y is explained by x (the explanatory variable which you have
used)
Homoscedasticity is preferred when plotting a graph for predicted and actual
‘T’ test - used to find relationship between two variables
Right graph is homoscedastic
In the case of the let graph you need to treat the data in such a way that the pattern created is
eliminated
When the x is categorical, dummy coding needs to be done
Logistic Regression
1. Classification
2. Discrete choices
3. Class probability
Covered up - ‘the model’
Covered up - ‘as positive’
Technique to be used -
. Regression analysis
. Time series analysis

You might also like