0% found this document useful (0 votes)

14 views

Basics

This document provides definitions and explanations of basic statistical and regression analysis terms including: - Random variables can take random values from a set of possible values with associated probabilities. The two main types are discrete and continuous. - A probability distribution describes possible values of a random variable and their probabilities. The normal distribution is the most common. - The expected value or population mean is the average value expected from repeating an experiment infinitely. - Variance measures the spread of values in a random variable. Standard deviation is the square root of variance. - Covariance measures how variables vary together. Correlation standardizes covariance between -1 and 1.

Uploaded by

grahn.elin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Basics

Uploaded by

grahn.elin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Basics

This page describes some (but not yet all) basic terms and concepts in statistics and regression
analysis.

Random variable

A random variable (or stochastic variable) is a variable that can take a random value from a set of
possible values. Each value has an associated probability, which decides the likelihood of it being
chosen.

The two most common types of random variables are:

- Discrete random variables: van take a limited number of fixed values.

Examples: a dice (1,2,3,4,5 or 6), the number of children in a family.
- Continuous random variables: Can take any values on a scale.
Examples: the interest rate, stock market indexes, household income.

Probability distribution

A probability distribution is a mathematical function describing the possible values of a random

variable and their associated probabilities. The most common type of probability distribution is the
normal distribution.

Normal distribution

The normal distribution (or bell curve) is

the most common probability
distribution. It has a distinct shape which
makes it easy to remember.

Some even say that any random variables

will end up being normally distributed,
given a large enough sample size (i.e. as
long as the experiment is repeated
enough times. This I known as the central
limit theorem (CLT).

Expected value (population mean)

The expected value or population mean (µ) is the average value we would expect to find from a
random variable if we repeated an experiment an infinite number of times. In theory, the results is
the same as the average or arithmetic mean value, i.e. the sum of all values divided by the number
of values, although it’s calculated a bit differently.
Definition

The expected value is the sum of all possible values for a random variable, each value multiplied by
its probability of occurrence.

𝐸(𝑋) = 𝑥1 𝑝1 + 𝑥2 𝑝2 + ⋯ + 𝑥𝑛 𝑝𝑛
Where:

E(X) is the expected value of random variable X.

𝑥1 to 𝑥𝑛 represents all the possible values that X can take.

𝑝1 to 𝑝𝑛 is the probability of this value occurring.

Example

The expected value for rolling a dice is:

1 1 1 1 1 1
𝐸(𝑋) = 1 ∗ + 2 ∗ + 3 ∗ + 4 ∗ + 5 ∗ + 6 ∗ = 3.5
6 6 6 6 6 6
There is a 1/6 chance that the dice lands on each side, If we role the dice an infinite number of times,
the average value should be 3.5.

Variance

Variance is a measure of the spread of values in a random variable. The larger the variance, the
greater the spread of values. For example, the two numbers 0 and 40 have a larger variance than 10
and 30, because they are more spread apart. In general, zero variance means that the values are
identical.

Definition

The variance of a random variable is the expected value of the squared deviation form the mean:

𝑉𝑎𝑟(𝑥) = 𝐸[(𝑥 − µ)2 ]

Where:

Var(x) is the variance of the random variable X

µ is the mean, which is the same as the expected value of X, i.e. µ = E(x)

Example (simplified)

Let’s say we have eight data points with the values 2+4+4+4+5+5+7+9.

The mean (µ) of these values are:

µ = (2+4+4+4+5+5+7+9) / 8 = 5
For each value, we take its deviation from the mean and square it:

(𝑥𝑖 − 𝑥̅ )2
(2-5)^2 = 9

(4-5)^2 = 1

(5-5)^2 = 0

(7-5)^2 = 4

(9-5)^2 = 16

We then get the mean (expected value) of these to get the variance:

Var(X) = (9+1+1+1+0+0+4+16)/8 = 2

The standard deviation (σ) is the square root of the variance:

σ = sqrt(2) = 1.41421356237

This means that the variance of X is the standard deviation (σ) of X squared: Var(x) = σ^2

Standard deviation

The standard deviation (σ) is another way of expressing the variance, calculated as the square root
of the variance.

Definition

The square root of the variance:

σ = sqrt(Var(x))

or you can write:

σ^2 = Var(x)

σ is the standard deviation of the random variable X

Var(X) is the variance of X

Covariance

Covariance is a measure of the relationship between two random variables. How much the two
variables vary together, and in what direction. The covariance is positive when variables move in the
same direction, and negative when they move in opposite directions. Zero covariance means there is
no linear relationships between them.

Covariance is measured in the same units as the variables, making it hard to compare between
variables. Correlation fixed this by standardizing the values, giving us a fixed range of -1 to 1.

Definition

𝐶𝑜𝑣 (𝑥, 𝑦) = 𝐸(𝑥 − 𝐸(𝑥)) ∗ (𝑦 − 𝐸(𝑦)) = 𝐸(𝑥𝑦) − 𝐸(𝑥) ∗ 𝐸(𝑦)

If we take the covariance of a variable with itself, this simply equals its variance:

𝐶𝑜𝑣(𝑥, 𝑥) = 𝑉𝑎𝑟(𝑥)

Correlation

Correlation is measured of the relationship between two random variables. How much the two
variables vary together, and in what direction. It’s the same as the covariance, except it uses a
standardized range of values between -1 to1 wile covariance is measured in the same unit as the
variables. A value of 1 means a perfect positive relationship, -1 a prefect negative relationship, and 0
means no relationship at all.

Note that correlation does not imply causation. Just because there is a statistical relationship
between two things does not mean that one causes the other, only that they seem to occur at
roughly the same time. This also holds for covariance.

Definition
𝐶𝑜𝑣(𝑥, 𝑦)
𝐶𝑜𝑟𝑟(𝑥, 𝑦) =
√(𝑉𝑎𝑟(𝑥) ∗ 𝑉𝑎𝑟(𝑦))

Data types

There are different types of data, which require different methods:

- Cross-sectional data: Data on many subjects at a certain point of time. The subject could be
individuals, firms, countries, regions, or something else. Example: the income of households
in Sweden in 2009.
- Time series data: Data on a single subject over time. Examples: the daily profit & loss of a
specific company over time, or the inflation rate of a country over many years.
- Panel data: Data on many subjects over time. A mix between cross-sectional and time series
data. Panel data is said to be “multi-dimensional” while the others are “one-dimensional”.
Examples: the income of many households in Sweden over time, the daily profit of multiple
companies over time.
Exogenous - The variable is completely outside the model, does not depend on any of the variables
in the model (not even the residual). That’s what we want.

Endogenous - The variable depends on at least one of the other variables in the model.

Logarithms:
log(y) + log(x): dlog y/ dlog x = dy/y / dx/x = Elasticity
log(y) + normal x: When x increases by 1, y increases by %. This is an approximation, which will be
less exact when the coef gets larger.

To get exact percent:

100 * [exp(coef) - 1]

Limited Dependent Variable (LDV):

When the dependent variable (y) is a dummy/binary/boolean/qualitative variable (i.e. it can only
have the value 0 or 1).

Often ok to use OLS even with LDV.

If y can be 0 or 1, the expected value of y can be interpreted as the probability that y i equal to 1.
Therefore, multiple linear regression model with binary dependent variable is called the linear
probability model (LPM).

Time-series data
Logarithmic form common to eliminate scale effects.
Dummy variables often used to identify an event or to isolate a shock. Also for capturing seasonality.
Index numbers (e.g. CPI) often used an independent variable.

Static model:
Static Philips curve:
inf_t = B0 + B1 * unem_t + u_t
Inflation and unemployment a given year.

Difference from cross-sectional model is replacing i with t. Only estimates immediate effects on the
dependant variable, i.e. that takes place the same year.

Finite Distributed Lag Models (FDL)

y_t = B0 + B1 * z_t + B2 * z_t-1 + B3 * z_t-2 + u_t
This model states that y is affected by a change in z in period t, but also by changes in z that
happened earlier (at times t-1 and t-2).

Has a high risk of omitted variable bias.

Shortcomings: The higher the number of lags you use, the more data you lose (why?)

Example:
How interest rate at time t is impacted by inflation at times t, t-1 and t-2. After running regression we
get:
int_t = 1.6 + 0.48inf_t - 0.15inf_t-1 + 0.32inf_t-2 + u_t

Impact propensity/multiplier:
Impact propensity is 0.48.

Long-run propensity/multiplier
Long-run propensity is 0.48-0.15+0.32 = 0.65

Trends:
Many economic time series display a trending behavior over time, which might be important to
incorporate in our model. Two series might seem related just because they follow the same trend.
Danger of ignoring trends: Omitted variable bias.

Linear time trend:

y_t = a0 + a1_t + e_t

Example: The average growth rate in GDP per capita for Sweden during 1971-2012 is 1.7%, hence:
y_t = log(gdp_per_capita), then a1 = 0.017

Seasonality:
For example: If we suspect seasonality each quarter:
y_t = B0 + y1Q2 + y2Q3 + y3Q4 + b1x_t,1 + b2x_t,2 + t_t
If there is no seasonality we would find that all y = 0 which can be tested with an F-test.
Autocorrelation function (ACF):
ACF for lag one (i.e. one time unit back):
corr(rt, rt-1) = cov(rt, rt-1) / sqrt(var(rt) * var(rt-1)) = cor(rt, rt-1) / var(rt) = ACF(1)

ACF for lag s:

ACF(s) = sum(t=s+1 to t) for ((rt - _r) * (rt-s - _r)) / sum(t=1 to t) for (rt - _r)^2
ACF(s) = E

Should decrease at larger time gaps, i.e. ACF(1) is larger than ACF(2). If ACF(1) is small we have less
dependancy between time period t and t-1.

In Stata, this can be calculated automatically by

ac dependent_var, lags(s)

E.g.
ac rus, lags(12)
ac dyus, lags(12)

Grey area is the non-rejection area, where we cannot reject that there is no dependancy between
time periods, i.e. there is a chance there is no dependancy. Outside the area we can be sure there
some kind of dependancy.

In financial markets, if markets are efficient, we have zero arbitage. So we have 0 predictability.

Stationary and Weak Dependence:

Time series observations can rarely if ever be assumed to be independent. This might imply that the
CLM assumptions do not hold. However, the OLS estimator mi. If we assume that our data are
stationary and weakly dependent, we can modify TS.1-3.

We can replace, for example, TS.3 with a weaker assumption (which one?).

Instrumental variables (IV):

When there’s a correlation with an independent/explanatory variable and the error term (i.e. MLR.4
doesn’t hold), we can use instrumental variables. Use an indirect variable related with the
explanatory variable to control for this.

For example, if we want to control for a demand shock but cannot isolate the demand shock itself,
we can use an instrumental variable instead.

Good instrument:
1) Relevant: Contains some information that has predictive power:
corr(Z, lfare) # 0
corr(bmktshr, lfare) > 0
2) Validity: corr(Z, E) = 0
corr(bmktshr, lfare) = 0

Step 1) Predict the variable we want to replace using the new instrumental variable.
Step 2) Replace the variable with the predicted variable in the original OLS regression

Stata example:
Perform step 1 and 2 with one command, replacing lfare with instrumental variable bmktshr:
ivregress 2sls lpassen (lfare=bmktshr) ldist ldist2, first

Brisky
No ratings yet
Brisky
2 pages
Grace Smith Math IA An Investigation Into The Strategies of Rock Paper Scissors
No ratings yet
Grace Smith Math IA An Investigation Into The Strategies of Rock Paper Scissors
10 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
On Using These Lecture Notes
No ratings yet
On Using These Lecture Notes
6 pages
Corporate Finance - Statistics Review: Random Variable
No ratings yet
Corporate Finance - Statistics Review: Random Variable
15 pages
Mosconi W1
No ratings yet
Mosconi W1
14 pages
03-Data Gathering and Preparation
No ratings yet
03-Data Gathering and Preparation
71 pages
Appendix 1 Basic Statistics: Summarizing Data
No ratings yet
Appendix 1 Basic Statistics: Summarizing Data
9 pages
Econometrics Notes
No ratings yet
Econometrics Notes
95 pages
Screenshot 2024-12-15 at 8.15.38 PM
No ratings yet
Screenshot 2024-12-15 at 8.15.38 PM
138 pages
Af Notes by Midhila)
No ratings yet
Af Notes by Midhila)
60 pages
ECON4150 - Introductory Econometrics Lecture 1: Introduction and Review of Statistics
No ratings yet
ECON4150 - Introductory Econometrics Lecture 1: Introduction and Review of Statistics
41 pages
Statistical Foundations and Dealing With Data: Introductory Econometrics For Finance' © Chris Brooks 2019 1
No ratings yet
Statistical Foundations and Dealing With Data: Introductory Econometrics For Finance' © Chris Brooks 2019 1
54 pages
Statistical Foundations and Dealing With Data: Introductory Econometrics For Finance' © Chris Brooks 2019 1
No ratings yet
Statistical Foundations and Dealing With Data: Introductory Econometrics For Finance' © Chris Brooks 2019 1
56 pages
Quantitative Analysis
No ratings yet
Quantitative Analysis
47 pages
Ardl 1
No ratings yet
Ardl 1
166 pages
Unit 1 - Part 1
No ratings yet
Unit 1 - Part 1
105 pages
mean-variance
No ratings yet
mean-variance
14 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
Econometrics notes Final
No ratings yet
Econometrics notes Final
10 pages
Lecture #1
No ratings yet
Lecture #1
22 pages
Assignment 2
No ratings yet
Assignment 2
19 pages
Some Stats Concepts
No ratings yet
Some Stats Concepts
6 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
15 pages
The Nature of Regression Analysis
No ratings yet
The Nature of Regression Analysis
20 pages
Multiple Regression
No ratings yet
Multiple Regression
61 pages
Business Statistics Important Theory Questions
No ratings yet
Business Statistics Important Theory Questions
22 pages
Econometrics Lecture Notes
No ratings yet
Econometrics Lecture Notes
16 pages
Chapter Two Metrics (I)
No ratings yet
Chapter Two Metrics (I)
35 pages
Regression Analysis - SSB
No ratings yet
Regression Analysis - SSB
2 pages
Basic Statistics For Data Science
100% (1)
Basic Statistics For Data Science
45 pages
Math Review
No ratings yet
Math Review
29 pages
21
No ratings yet
21
6 pages
BIOSTATISTICS
No ratings yet
BIOSTATISTICS
24 pages
Econometrics Chapter Two
No ratings yet
Econometrics Chapter Two
36 pages
What Is A Math/Stats Model?: 1. Often Describe Relationship Between Variables 2. Types
No ratings yet
What Is A Math/Stats Model?: 1. Often Describe Relationship Between Variables 2. Types
64 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
Introduction To Data Analysis: Professor David Richardson IIT Stuart School of Business
No ratings yet
Introduction To Data Analysis: Professor David Richardson IIT Stuart School of Business
31 pages
Lecture 1 F12
No ratings yet
Lecture 1 F12
31 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
45 pages
Introductory Econometrics: Probability and Statistics Refresher
No ratings yet
Introductory Econometrics: Probability and Statistics Refresher
35 pages
Cba101 MT
No ratings yet
Cba101 MT
4 pages
Instructor'S Manual: Statistical Techniques in Financial Management
No ratings yet
Instructor'S Manual: Statistical Techniques in Financial Management
3 pages
FRM Part 1: Basic Statistics
No ratings yet
FRM Part 1: Basic Statistics
28 pages
BDU Biometrics
No ratings yet
BDU Biometrics
122 pages
Intro_Slides
No ratings yet
Intro_Slides
55 pages
Lecture - 01 - REVIEW MATERIAL - Quantitative - Review 8801
No ratings yet
Lecture - 01 - REVIEW MATERIAL - Quantitative - Review 8801
35 pages
Statistics in Real Life
No ratings yet
Statistics in Real Life
19 pages
Appendix 1 Basic Statistics: Summarizing Data
No ratings yet
Appendix 1 Basic Statistics: Summarizing Data
5 pages
Basics
No ratings yet
Basics
61 pages
Midterm 2 Nem Veg Leges
No ratings yet
Midterm 2 Nem Veg Leges
9 pages
UNIT I Notes-1
No ratings yet
UNIT I Notes-1
18 pages
UNIT I Notes
No ratings yet
UNIT I Notes
23 pages
Problem in Regression Analysis
No ratings yet
Problem in Regression Analysis
7 pages
Econometrics, Economic Data and Probability: A.S. Goldberger
No ratings yet
Econometrics, Economic Data and Probability: A.S. Goldberger
6 pages
Application": σ/μ σ=standard deviation,μ=mean
No ratings yet
Application": σ/μ σ=standard deviation,μ=mean
5 pages
Statistics Theory Notes
No ratings yet
Statistics Theory Notes
6 pages
ECON 332 Business Forecasting Methods Prof. Kirti K. Katkar
No ratings yet
ECON 332 Business Forecasting Methods Prof. Kirti K. Katkar
38 pages
9.1. Prob - Stats
No ratings yet
9.1. Prob - Stats
19 pages
AP Statistics Portfolio Q2
No ratings yet
AP Statistics Portfolio Q2
17 pages
Unit-III (Data Analytics)
100% (1)
Unit-III (Data Analytics)
15 pages
Quantitative Data Analysis: Harshad Bajpai
No ratings yet
Quantitative Data Analysis: Harshad Bajpai
26 pages
Sonek Assignment 2
No ratings yet
Sonek Assignment 2
3 pages
Categorical Data Analysis Assignment: Due DT.: 10/12/2022 Name: Soham Mallick Roll No.: MB-2202
No ratings yet
Categorical Data Analysis Assignment: Due DT.: 10/12/2022 Name: Soham Mallick Roll No.: MB-2202
6 pages
Topic 7. VAR Models
No ratings yet
Topic 7. VAR Models
44 pages
Sample Questions In-Class Qualifying Exams: This Is An Example of A Question That Could Be Required
No ratings yet
Sample Questions In-Class Qualifying Exams: This Is An Example of A Question That Could Be Required
2 pages
Introduction to Correlation and Regression Analysis (1) (3)
No ratings yet
Introduction to Correlation and Regression Analysis (1) (3)
14 pages
Statistical Inference Notes
No ratings yet
Statistical Inference Notes
15 pages
Module 5 - Post Task
No ratings yet
Module 5 - Post Task
5 pages
Hubungan Sosialisasi Politik Dengan Partisipasi Politik Dalam Pemilihan Kepala Daerah Di Kabupaten Dairi Kecamatan Gunung Sitember
No ratings yet
Hubungan Sosialisasi Politik Dengan Partisipasi Politik Dalam Pemilihan Kepala Daerah Di Kabupaten Dairi Kecamatan Gunung Sitember
12 pages
Class 1 - Test 1b
No ratings yet
Class 1 - Test 1b
3 pages
Project Om101
No ratings yet
Project Om101
4 pages
Lab Report Bio 330
No ratings yet
Lab Report Bio 330
4 pages
Download Complete Data analysis with Microsoft Excel updated for Office 2007 3rd Edition Kenneth N. Berk PDF for All Chapters
100% (11)
Download Complete Data analysis with Microsoft Excel updated for Office 2007 3rd Edition Kenneth N. Berk PDF for All Chapters
67 pages
MAI 102 Assignment 8
No ratings yet
MAI 102 Assignment 8
3 pages
Case Study Beta Management Company: Raman Dhiman Indian Institute of Management (Iim), Shillong
100% (1)
Case Study Beta Management Company: Raman Dhiman Indian Institute of Management (Iim), Shillong
8 pages
Assignement 2 Geo
No ratings yet
Assignement 2 Geo
7 pages
QT Tables Rug-Merged
No ratings yet
QT Tables Rug-Merged
14 pages
Statistical Research - Eda
No ratings yet
Statistical Research - Eda
10 pages
Elemetary Statitics and Probability
No ratings yet
Elemetary Statitics and Probability
4 pages
Statistics
No ratings yet
Statistics
16 pages
G Power 3.1 Manual: October 15, 2020
No ratings yet
G Power 3.1 Manual: October 15, 2020
85 pages
Logistic Regression Course Note
No ratings yet
Logistic Regression Course Note
23 pages
1-Z-Test For Mean PDF
0% (1)
1-Z-Test For Mean PDF
20 pages
Full Download Multivariate Analysis For The Behavioral Sciences Kimmo Vehkalahti PDF
100% (3)
Full Download Multivariate Analysis For The Behavioral Sciences Kimmo Vehkalahti PDF
52 pages
Inference About 2 Population Means
No ratings yet
Inference About 2 Population Means
49 pages
Wooldridge 6e Ch09 SSM
No ratings yet
Wooldridge 6e Ch09 SSM
8 pages
Mid-Semester Test With Solution 2018
No ratings yet
Mid-Semester Test With Solution 2018
7 pages
Stat Chapter 2
No ratings yet
Stat Chapter 2
15 pages
Correlation and Regression Skill Set
No ratings yet
Correlation and Regression Skill Set
8 pages

Basics

Uploaded by

Basics

Uploaded by

Basics

The two most common types of random variables are:

- Discrete random variables: van take a limited number of fixed values.

A probability distribution is a mathematical function describing the possible values of a random

The normal distribution (or bell curve) is

Some even say that any random variables

Expected value (population mean)

E(X) is the expected value of random variable X.

𝑥1 to 𝑥𝑛 represents all the possible values that X can take.

𝑝1 to 𝑝𝑛 is the probability of this value occurring.

The expected value for rolling a dice is:

𝑉𝑎𝑟(𝑥) = 𝐸[(𝑥 − µ)2 ]

Var(x) is the variance of the random variable X

The mean (µ) of these values are:

The standard deviation (σ) is the square root of the variance:

The square root of the variance:

or you can write:

σ is the standard deviation of the random variable X

Var(X) is the variance of X

𝐶𝑜𝑣 (𝑥, 𝑦) = 𝐸(𝑥 − 𝐸(𝑥)) ∗ (𝑦 − 𝐸(𝑦)) = 𝐸(𝑥𝑦) − 𝐸(𝑥) ∗ 𝐸(𝑦)

There are different types of data, which require different methods:

To get exact percent:

Limited Dependent Variable (LDV):

Often ok to use OLS even with LDV.

Finite Distributed Lag Models (FDL)

Has a high risk of omitted variable bias.

Linear time trend:

ACF for lag s:

In Stata, this can be calculated automatically by

Stationary and Weak Dependence:

Instrumental variables (IV):

You might also like