Statistics
Statistics
1
Statistics, Econometrics
Revision to Some Important Topics
Measurement scales of variables
1.Ratio scale -for a variable A taking two value a and b then (a/b) and { a-b }are meaningful
quantities
2.Interval scale -in interval scale the difference of say two time period (2000-1995) is significant
but the ratio of two time period is not .
3.Ordinal scale -those variables which satisfies the property of natural ordering belongs to this
category such as grading system (A,B,C),income class (upper, middle, lower)
4.Nominal scale -variables in this category does not satisfy then property of ratio scale.
Gender(male, female),martial status (married, unmarried, divorced, separated) belong to this
category
Coefficient of determination
The overall goodness of fit of the regression model is measured by the coefficient of
determination, 𝑟2.It tells what proportion of the variation in the dependent variable or
regressand, is explained by the explanatory variable, or regressor.
To join Diksha NET JRF Paper 2- Economics Preparation, Send “JOIN” to 7306721731
The coefficient of determination is the square of the correlation(r) between predicted y scores
and actual y scores; This 𝑟² lies between 0 and 1; closer it is to 1, better is the fit.
With linear regression, the coefficient of determination is also equal to the square of the
correlation between x and y scores.
𝑟²of 0 means that the dependent variable cannot be predicted from the independent variable.
𝑟²of 1 means the dependent variable can be predicted without error from the independent
variable.
𝑟² between 0 and 1 indicates the extent to which the dependent variable is predictable. An r²
of 0.10 means that 10 percent of the variance inYis predictable from Xi, an 𝑟² of 0.20 means
that 20 percent is predictable; and so on.
A concept related to the coefficient of determination is the coefficient of correlation, r. It is
measure of linear association between two variables and it lies between -1 and +1.
To join Diksha NET JRF Paper 2- Economics Preparation, Send “JOIN” to 7306721731
Coefficient of determination, 𝒓𝟐
TSS = Total Sum of squares
ESS= Explained sum of squares
RSS= Residual sum od square
To join Diksha NET JRF Paper 2- Economics Preparation, Send “JOIN” to 7306721731
𝑹² 𝒂𝒏𝒅 𝑨𝒅𝒋𝒖𝒔𝒕𝒆𝒅 𝑹²
R-squared or 𝑹² explains the degree to which your input variables explain the variation of
your output / predicted variable. So, if 𝑹² is 0.8, it means 80% of the variation in the output
variable is explained by the input variables. So, in simple terms, higher the 𝑹², the more
variation is explained by your input variables and hence better is your model.
However, the problem with 𝑹² is that it will either stay the same or increase with addition of
more variables, even if they do not have any relationship with the output variables. This is
where “Adjusted 𝑹²” comes to help. Adjusted R-square penalizes you for adding variables
which do not improve your existing model.
Hence, if you are building Linear regression on multiple variable, it is always suggested that
you use Adjusted𝑹𝟐to judge goodness of model. In case you only have one input variable, 𝑹²
and Adjusted𝑹𝟐would be exactly same.
Typically, the more non-significant variables you add into the model, the gap in 𝑹² and
Adjusted 𝑹² increases.
To join Diksha NET JRF Paper 2- Economics Preparation, Send “JOIN” to 7306721731
Simultaneous Equations Models
A system describing the joint dependence of variables is called a system of simultaneous
equations or simultaneous equations model.
In contrast to single-equation models, in simultaneous-equation models more than one
dependent, or endogenous, variable is involved, necessitating as many equations as the
number of endogenous variables.
A unique feature of simultaneous-equation models is that the endogenous variable (i.e.,
regressand) in one equation may appear as an explanatory variable (i.e., regressor) in another
equation of the system.
As a consequence, such an endogenous explanatory variable becomes stochastic and is
usually correlated with the disturbance term of the equation in which it appears as an
explanatory variable.
In cases where the regressor is correlated with the disturbance term, applying OLS to estimate
the parameters of a regression equation will give biased and inconsistence estimates
To join Diksha NET JRF Paper 2- Economics Preparation, Send “JOIN” to 7306721731
Simultaneous Equation Bias
The violation of assumption E(𝑈𝑖𝑋𝑖)=0 of OLS creates Simultaneous equation bias. This creates the
following problems:
i. The problem of identification of the parameters of individual relationship
ii. There arise problem of estimation
iii. The OLS estimates are biased and inconsistent
Methods to Estimate Parameters
Limited Information Maximum Likelihood -Equation is overidentified
- The estimates are biased for small but consistent
Full Information Maximum Likelihood Method - For small sample estimates are biased
Three stage Least Squares 3 SLS) - overidentified system
- Estimates are biased but consistent
- Efficient than 2 SLS
To join Diksha NET JRF Paper 2- Economics Preparation, Send “JOIN” to 7306721731
Identification
Identification is the problem of finding unique solution for the problem the reduced form
coefficients
A model is said to be identified if it has a unique statistical form enabling unique estimates of
the parameters from the sample
If the model is not identified then estimates of the parameters can not be estimated
In econometric theory two possible situations of identifiability
1. Equation underidentified: An equation is underidentified if its statistical form is not unique
2. Equation Identified: If an equation has a unique statistical form we say it is identified
To join Diksha NET JRF Paper 2- Economics Preparation, Send “JOIN” to 7306721731
Tests of Stationarity
The Unit Root Test:
𝑌𝑡=𝜌𝑌𝑡−1+𝑢𝑡………..(1)
If 𝝆=1 then equation (i) will become random walk model without drift which is non-stationary
stochastic process.
Δ𝑌𝑡=(𝜌−1)𝑌𝑡−1+𝑢𝑡
Δ𝑌𝑡=𝛿𝑌𝑡−1+𝑢𝑡
𝑤ℎ𝑒𝑟𝑒𝛿=𝜌−1
When 𝛿=0, 𝜌=1 that is we have unit root present.
Methods of Forecasting
The most important aspect of time series is Forecasting
There are two methods of forecasting which are popularly used They are
1. Box Jenkins Methodology (BJ Methodology): Box Jenkins methodology technically known as
ARIMA methodology
2. Vector Autoregression
To join Diksha NET JRF Paper 2- Economics Preparation, Send “JOIN” to 7306721731
Parametric Test v/s Nonparametric Test
A statistical test, in which specific assumptions are made about the population parameter is
known as the parametric test.
A statistical test used in the case of non-metric independent variables is called nonparametric
test.
In the parametric test, the test statistic is based on distribution. On the other hand, the test
statistic is arbitrary in the case of the nonparametric test.
In general, the measure of central tendency in the parametric test is mean, while in the case of
the nonparametric test is median.
In the parametric test, there is complete information about the population. Conversely, in the
nonparametric test, there is no information about the population.
The applicability of parametric test is for variables only, whereas nonparametric test applies to
both variables and non variables/attributes.
For measuring the degree of association between two quantitative variables, Pearson’s
coefficient of correlation is used in the parametric test, while spearman’s rank correlation is
used in the nonparametric test.
To join Diksha NET JRF Paper 2- Economics Preparation, Send “JOIN” to 7306721731
Type I and type II errors
No hypothesis test is 100% certain. Because the test is based on probabilities, there is always
a chance of making an incorrect conclusion. When you do a hypothesis test, two types of
errors are possible: type I and type II.
Type I error
When the null hypothesis is true and you reject it, you make a type I error.
When we reject the null hypothesis though it is true.
The probability of making a type I error is α, which is the level of significance you set for your
hypothesis test.
Type II error
When the null hypothesis is false and you fail to reject it, you make a type II error.
When we accept the null hypothesis though it is false.
The probability of making a type II error is β, which depends on the power of the test.
To join Diksha NET JRF Paper 2- Economics Preparation, Send “JOIN” to 7306721731
Level of Significance
This refers to degree of significant with which we accept or reject particular hypothesis.
Most of the hypothesis testing fixed at 5%, Which means decision would be correct 95%
Some time it may be fix at 1% and decision would be correct 99% and denoted by α( type 1
error).
If there is no level of significance is given then we always take α= 0.05
Critical region or rejection region-
The statistic which lead to the rejection of null hypothesis H0 gives us a region known as
rejection region or critical region.
Those which lead to acceptance of H0 gives us a region called as acceptance region.
One tailed test and two tailed test.
In this alternative hypothesis expressed by the symbol (<) or (>) is called one tailed test.
A test of any statistical hypothesis where the alternative is written with a symbol ( ≠ ) is called
two tailed test.
To join Diksha NET JRF Paper 2- Economics Preparation, Send “JOIN” to 7306721731
QUESTIONS FOR THE DAY
To join Diksha NET JRF Paper 2- Economics Preparation, Send “JOIN” to 7306721731
1. Which of the following relationships between the three means is correct
A. H.M.>G.M.>A.M. B. A.M.=G.M#H.M
C. A.M.>G.M.>H.M D. G.M.>A.M.>H.M
2. Variance is
A. Fourth moment B. Third moment
C. Second moment D. First moment
To join Diksha NET JRF Paper 2- Economics Preparation, Send “JOIN” to 7306721731
4. The concept of standard deviation is due to
A. Karl Pearson B. Poisson
C. Samuelson D. Student
To join Diksha NET JRF Paper 2- Economics Preparation, Send “JOIN” to 7306721731
7. value of mean and standard deviation in normal distribution is
A.0 and 0. B.1 and 0
C.1 and 1 D.0 and 1.
Answers:
1. C 2. C 3. C 4. A 5. B 6. B 7. D 8. A
To join Diksha NET JRF Paper 2- Economics Preparation, Send “JOIN” to 7306721731