PA Notes
PA Notes
ŷ=β0+β1x
The equation can be interpreted as follows: for every one percentage increase in grade 10
marks, the salary of the MBA students will increase at the rate of 3076.1774 on an average.
Eg.
ŷ(marks)=20+0.76(study for)
For every 1 hour increase in study hour, there is an increase in marks by 0.76 on an average
ŷ(yield)=20-0.76(rainfall)
For every 1 unit increase in rainfall, there is a decrease in yield by 0.76 on an average
SST=SSR+SSE
SST - sum of square of total variation
SSR - sum of squares of regression
SSE - sum of squares of errors
If the correlation comes out as 0.99, then the model is most probably wrong or a created(fake) data
set
A proper data set will have a correlation around 0.5-0.75.
r-square=0.99
This means that 99% of the variation in y is explained by x (the explanatory variable which you have
used)
Homoscedasticity is preferred when plotting a graph for predicted and actual
‘T’ test - used to find relationship between two variables
Right graph is homoscedastic
In the case of the let graph you need to treat the data in such a way that the pattern created is
eliminated
When the x is categorical, dummy coding needs to be done
Logistic Regression
1. Classification
2. Discrete choices
3. Class probability
Covered up - ‘the model’
Covered up - ‘as positive’
Technique to be used -
. Regression analysis
. Time series analysis