0% found this document useful (0 votes)
30 views8 pages

Ecotrix

The document defines key statistical concepts like outliers, average, population, sample, mode, median, expected value, types of errors, degrees of freedom, and more. It also discusses econometric models, including deterministic parts of models, data types, estimation and testing, and diagnostic model evaluation.

Uploaded by

zuhanshaik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views8 pages

Ecotrix

The document defines key statistical concepts like outliers, average, population, sample, mode, median, expected value, types of errors, degrees of freedom, and more. It also discusses econometric models, including deterministic parts of models, data types, estimation and testing, and diagnostic model evaluation.

Uploaded by

zuhanshaik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

 Outliers: cannot be selected/ identified from collected data (questions methodology)

 Average: can’t be used for nominal and ordinal data- used for ratio scale data and
interval scale data (has to be continuous)
 Population: unobservable, unknown
 Sample: observable, known
 Look up: free ration scheme by govt, india’s budget size
 Mode: qualitative and discrete
 Median:
 BLUE: among mode, median and mean: mean is blue if unbiased
 Expected value: mean of sampling distribution
 Type 1 error: reject null hypothesis when it’s true for population mean
 Type 2 error: accept null hypothesis when it's not true for the population mean
 Covariance and correlation: explain linear relationship
 Degrees of freedom meaning***
 Sample mean is different than population mean ⇒ may indicate sampling bias
 Why GLS?
 Independent/ explanatory variables, dependent variables, predictor, exogenous
variables, endogenous variables
 Regression analysis: estimation using estimators, elasticity, hypothesis testing,
forecasting/ prediction/ simulation (taking the trend ahead according to the model)
 Econometric models:
1. Deterministic part of model:
a. linear or other relationship among variables
b. variables to be included: dependent on theory being studied, control
for dependent variables influencing explanatory variable, expected
relationship, taking paradoxes into account
c. Causal relationship between variables: does independent variable
cause the dependent variable.
d. How is the error term to be included?: additive, multiplicative,
probability distribution followed by error term. Why should the error
terms be normally distributed?

2. Data:
a. Experimental vs observational data (sample surveys- NSSO, NFHS)
b. Cross sectional data (census, surveys by government)
c. Time series data
d. Panel (cross sectional data being repeated for same ID or household
at different point of times)
e. RCTs: to examine CAUSATION, treatment and control group (blind and
double-blind experiments)
f. Pooled cross sectional data: different IDs, same parameters, different
points of time (e.g., NSSO / NFHS over the years)

3. Estimation and Testing:


a. Estimators: algorithms used to calculate values of parameters of a
model from a sample of data. Point: MLE, OLS. Intervals: Bayesian
Methods. Provide an estimate of the population parameters based on
sample data.
b. OLS: minimizing sum of squared residuals
c. MLE: 0 or 1 data (logit probit model) maximising likelihood

4. Diagnostic Evaluation of Model:


a. Heteroscedasticity (reasons:
b. Multicollinearity (linear relationship between independent variables)
c. Omitted variable bias: important variable in determination of effect
was omitted. (e.g., demand function- price not available, when
income not available- proxy used: monthly per capita consumption
expenditure)
d. Autocorrelation: time series data- previous error term affection next
error term thus no fair estimation
e. Problem of functional form
f. Considering these problems, revise model/ dataset/ estimation
method

5. Assessment of Validity:
a. Internal
b. External

6. Causation vs Correlation: (ref- Joshua Angrist: Mostly harmless econometrics)


a. Causation: cannot be determined in ecotrix and not area of study
b. Correlation

R by Neeraj Hatekar
 Matrix:

R syntax: matrix(data, nrow, byrow, dimnames)

Data: e.g. C(1:36)


Nrow: no of rows
Ncol: no of columns
Byrow: true or false, byrow= true gone row wise, else column wise

 Array: used for multiple explanatory variables e,g. education wrt age, and gender wrt
age

Array(data, dim, dimnames)

Eg. Myarray <- array (c(1:16), dim=c(4,4,2)


myarray
 Dataframe (like an excel table)
1. Column names should be non empty
2. Each column should contain same amount of data items
3. Data stored can be of numeric, factor or character type
4. Row names should be unique

e.g. emp_id=c(100:104)

emp_name=c("john","henry","adam","ron","gary")

dept=c("sales","finance","marketing","HR","R&d")

emp.data <- data.frame(emp_id, emp_name, dept)


emp.data

 Data Operators:
1. Addition: a+b
2. Subtraction: a-b
3. Multiplication: a*b
4. Division: a/b
5. Modulus: a%%b (remainder)
6. Exponent: a^b
7. Floor Division: a%/%b (quotient)

 Relational Operators:
1. Equal to: == (not an assignment operator)
2. Not equal to: a!=b
3. Greater than: a>b
4. Less than: a<b
5. Greater than equal to: a>=b
6. Less than equal to: a<=b

 Logical Operators:
1. A&b  true if both elements are true
2. A|b true if one of the elements are true or both are true
3. !a  gives opposite logical value
4. &&, || : compares only the first elements in the datasets/ vectors

 Conditional Statements:
1. If
2. Else if
3. Else

e.g. a=7
b=7

if (a>b){

print("a is greater than b")

} else if (a<b){

print("a is less than b")

} else {
print (" both numbers are equal")}

 Loops: (C++: for, repeat and while loops)

1. Repeat loop: Repeats given statement or group of statements where the given
condition is true. It is an exit-controlled loop where the code I first executed and
then it is checked to determine whether the control should be inside the loop or
exit from it.
2. While loop: helps to repeat a statement or group while a given condition is true.
It is an entry-controlled loop where condition is first checked and only if the
condition is satisfied the control is delivered inside the loop to execute the code.
3. For loop: it is used to repeat a statement or group for a fixed number of times
except here, we need to initialise something. Here we are aware of the number
of times the code needs to be executed beforehand. The execution is similar to
the while loop.

e.g.

1) FOR LOOP

for (x in 1:10){

print(x)
}

2) FOR LOOP

data<-c(1,2,3,4,5)

for(x in data){

print(x)
}
3) REPEAT LOOP
x=2

repeat {

x=x^2

print(x)

if (x>100){

break

}
}

4) REPEAT LOOP

x=2

repeat {

x=x^2

if (x>100){

print(x)

break

}
}

5)WHILE LOOP: FIBONACCI

num=1

sumn=0

n=1

print(sumn)

print(num)

while(n<11){
c=sumn+num

print(c)

sumn=num

num=c

n=n+1
}

 MEAN, MEDIAN, MODE

1. Mean:

v<-c(1,2,3,4,5,6,7,8,9,10)

add=0

for(x in v){

add= add+x

mean= add/length(v)
print(mean)

 now, instead of using length command initialise n=0 as counter for


observations in v

v<-c(1,2,3,4,5,6,7,8,9,10)

add=0

n=0

for(x in v){

add= add+x

n=n+1

mean= add/n
print(mean)
2. MEDIAN
Sort data in ascending order, if condition  even  median, else condition 
odd  median.

data<-c(1,2,3,5,4,6,7,8,9,10)

data=sort(data)

if(length(data)%%2==0){

median=(data[length(data)/2]+data[(length(data)/2)+1])/2

}else{

median= data[(length(data)+1)/2]

}
print(median)

3. MODE
Sort data, calculate frequency, max frequency

data<-c(5,10,15,5,7,10)

y= table(data)

y;
names(y)[which(y==max(y))];

 Inbuilt function max, length, sort etc


 Funtion:
Syntax:
Function_namr <- function(arg1, arg2,..){
#codes fragments
Function_name= #value to return
}

e.g.

productVect <- function(a){

res<-1

for (e in a){
res<-res*e;

prodcutVect=res;

A<-c(1:5);

print(productVect(A));

B<-c(1:10);
print(productVect(B));

 Importing and exporting data:


- R works most easily with datasets stored as text files. Text values are separated,
delimited by tabs or spaces.
- Provides several related functions to read data stored as files
- use read.csv() to read in data stored as csv and read.delim() to read in text data
delimited by other characters such as tabs or spaces.
- for read.delim() specify delimiter in sep=argument
- both read.csv and read.delim assume the first row of the text file as the row of
variable names. If this is not true use argument header= FALSE
- Packages: readxl: excel files, haven: stata, sas, spss

You might also like