Ria Stats Regression Analysiss
Ria Stats Regression Analysiss
REGRESSION ANALYSIS
DEFINITION AND TYPES :
Regression analysis deals with determining the relation between two or more variables. In
regression analysis, we estimate the values of parameters involved in the regression equation.
The idea of regression was introduced by Francis Galton (1822-1911) in his paper published in
1886.
For example, income-expenditure data of the form (Xi, Yi), i = 1,2,…,n, where Xi and Yi
represents income and expenditure respectively of n-th household.
The aim will be to develop a regression model of the form ‘Y = a+bX’. Once we develop the
regression model between income and expenditure, for a given level of income Xi, expenditure
can be predicted. The regression equation will also help to understand how expenditure is
related with income. Application of regression exists in large numbers or amounts. It is used in
finance, economics, psychology, and agriculture science to name a few. Regression can be of
various types; I’ll be talking about simple linear regression.
In the example of income-expenditure data (Xi, Yi), one can expect a positive value of β in the
developed regression model. Similarly, one can expect a negative slope for the regression
model where Xi and Yi represent the price and sale of certain products.
A linear pattern can easily be identified in the data by plotting the scatter diagram
For example : Let population regression model is given by: Yi = α + βXi + εi , i = 1,2,…,n .
It should be noted here that the independent variables (Xi’s) are fixed, whereas the dependent
variable (Yi) is assumed to be random. Corresponding to one given value of Xi , we can have
multiple values of Yi. Therefore, Yi has a probability distribution. The error term εi is also a
random variable and with regard to a given value of Xi , we can have different errors.
3. Assumptions of a simple linear regression model
The important assumptions of a simple linear regression model are given below:
(i) Linearity: The first assumption of the model is that there exists a linear relationship between
the variables
(ii) Independence: The second assumption states that residuals or error terms are independent
to each other
(iii) Normality: The error terms εi follows a normal distribution with mean zero and a constant
variance
(iv) Homoscedasticity: The error terms have equal variances for each value of X. This property
of equal variances is also called homoscedasticity. Before we develop a regression model, we
should check that none of the assumptions is violated. Any violation of the above assumptions
may impact the accuracy of the model.