0% found this document useful (0 votes)

29 views23 pages

AdvStats - W1 - Descriptive Stats

This document provides an overview of descriptive statistics concepts including: 1) Measures of central tendency like the mean and median are used to describe the central or typical value in a data set. Measures of dispersion like variance and standard deviation describe how concentrated data values are around the central value. 2) Correlation and regression analyze relationships between variables and form the basis of analyzing effects. The line of best fit minimizes the sum of squared residuals to best capture the linear relationship between variables. 3) Ordinary least squares regression chooses coefficients a and b that generate the regression line Zi = a + bXi which best predicts the dependent variable Y based on the independent variable X.

Uploaded by

Pedro Fernandez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views23 pages

AdvStats - W1 - Descriptive Stats

Uploaded by

Pedro Fernandez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Advanced Statistics

Descriptive Statistics

Economics
University of Manchester

1
Numerical data summaries
LOCATION
‘central value’ - mean and median
DISPERSION/SPREAD
how concentrated are data values around central
location – variance and standard deviation
CORRELATION AND REGRESSION
Relationships between variables
Building block of analysis of “effects”

2
Notation for Variables
VARIABLE
A “variable” is simply a label, with description, for an event of
interest (X,Y,Z, etc)

For example:
“Let X = your weekly expenditure on food”
“Let P = the price of a litre of petrol”
“Let Y = your household income”
X 1
=
.......
; X 2
=
.......
; X 3
=
.......
; X 4
=
.......
;

etc ….

Xi : in general, denotes the i th observation on the variable X.

Note the use of “subscript i”
If we have X1 , X2 , ... , Xn
a (random) sample of n observations on X

3
Summation: “adding up”

We use the following definitions and notation

to signify summing up:

Greek “SIGMA”, S: =X1+X2+…+Xn

n n
Dummy SUBSCRIPT:  Xi =  X j
i =1 j =1

4
Measures of location
Suppose we have sample data: X1, X2, …, Xn

Sample (arithmetic) mean:

Weighted mean sometimes used: (eg, price

indices)
n

 w X = (w X
i =1
i i 1 1 + w2 X 2 ++ wn X n );
n
with w
i =1
i =1
5
Measures of dispersion I
(Mean) deviation: xi = X i − X
Mean Squared Deviation (MSD)

1 n 2 1 n
 i n
n i =1
x = ( X i)− X
2

i =1

=
1
n
( ) ( 2
X1 − X + X 2 − X ) 2
(
+ + X n − X )
2

1 n
=   Xi  − X( )
2 2

 n i =1 

• Note: MSD ≥ 0

• Manipulations in Worked Exercises Q1 6

Measures of dispersion II
Sample variance

( )
n n
1 1
  i
2
s2 = Xi − X = x 2

n − 1 i =1 n − 1 i =1

Sample standard deviation: s=+ s 2

Note:
1 Divisor of n – 1, not n, for statistical reasons
2 Both s 2 and s ≥ 0
3 Sample mean and s are measured in the units
of the original data
4 Variance measured in squared units
7
Regression
Summarise the data/scatter with line of best fit
Data (Xi, Yi), i = 1, 2, …, n
LINEAR relationship between Y and X?
What is the equation of the line of BEST FIT?

Salary against education

40
Salary (000s)

30
20

10
0
0 2 4 6
8
Years of FE
Line of best fit I

Consider the following scatter

(Xi, Yi), i = 1, 2, …, 6

9
Line of best fit II

Z=a+bX

Draw any straight line through the points:

depicted by the green line, Z=a+bX,
where a and b define the
intercept and slope, respectively.

10
Line of best fit III

Z=a+bX

Z
Choose a particular value of X in
the sample and get the
Y corresponding Z value, from the
line that has been drawn

Now compare this Z value with the actual Y value

in the sample associated with the chosen X

X
11
Line of best fit IV

Z=a+bX

Z
Residual = Y – Z
Y

The difference is called a residual.

Residual = Y – Z
= Y - a - bX

X
12
Line of best fit V

Z=a+bX

Can construct such a residual for each X value in the

sample.

Then square the residual values.

Sum these squares to get the “sum of squared

residuals” associated with the line Z=a+bX

13
Line of best fit VI

Z=a+bX

This sum of squared residuals is defined by:

n n

 (Yi − Zi ) =  (Yi − a − bX i )
2 2

i =1 i =1

14
Line of best fit VII

Z=a+bX

Z W=c+dX
W
Y
Could repeat this for any other line;
eg, W=c+dX

This will give rise to a different set

of residuals:

Residual = Y - W
= Y - c - dX

X
15
Line of best fit VIII

Z=a+bX

W=c+dX

And, therefore, a sum of squared residuals:

n n

 (Yi − Wi ) =  (Yi − c − dX i )
2 2

i =1 i =1

16
Line of best fit IX

Z=a+bX

W=c+dX

Which line is better?

Choose the line which has the smallest sum of squared

residuals. That is, choose Z = a+bX if
n n

 i
( − − )   i
( − − )
2 2
Y a bX i Y c dX i
i =1 i =1

17
Line of best fit: summary

Choose that line (i.e., choose a and b) which

minimises
n n n n n n

 (Y − a − bX ) =  Y +a +b X − 2a  Yi − 2b Yi X i + 2ab X i
2 2 2 2 2
i i i i
i =1 i =1 i =1 i =1 i =1 i =1

This is called Ordinary Least Squares

Regression
or, “regressing Y on X”

18
The regression equation
Regression line (line of best fit) has the mathematical
form:
Z i = Yˆi = a + bX i

The intercept and slope are given by (using calculus):

a = Y −bX ,

 (X )( ) x y
n n

i − X Yi − Y i i n
b= i =1
= i =1
=  wi yi
 (X )
n n

 i
2
i −X x 2 i =1

i =1 i =1

Derived Worked Exercises Q2. 19

Sample correlation
Gives an INDEX of the LINEAR relationship between
observed Y and X.
Defined as

 (X
n
)(
− X Yi − Y ) Y .
. . Y
..
r= i =1
i
, .. .
+
 (X
n

i −X )  (Y − Y )
2
n

i
2
. X . . X
i =1 i =1
0<r<1 r1
x y i i x 2
i
= i
=b i

y
Y
x y 2 2 2
i . .
.. . .
i i
i i i

• NB: -1  r  1 X

r=0
20
Regression and correlation:
calculations and interpretation
These calculations illustrated in the last four slides

But in practice use Excel, for example

EXAMPLE: Data on salary (£000) and education (years ofFE):

r = 0.83
Regression line has +ve slope (higher salary with more years FE);
regression line fits fairly well.
Yˆi = 16.3 + 2.86 X i
Line of best fit:

Salary (on average) estimated to be higher by £2,860 for each

additional year of FE;

Salary estimated to be £16,300 when no FE (Xi =0).

21
Example
67 industrial firms, cross-section data:
CEO salary (in 1990, thousand US$)
Firm sales (in 1990, million US$)

CEO Salary versus Firm Sales

5000

4000

3000
Salary

2000

1000

0
0 20000 40000 60000 80000
Sales

22
Example continued
r = 0.53
Regression line:

Yˆi = 930.3 + 0.025 X i

Xi : firm sales (million US$)
Yi : CEO salary (thousand US$)

CEO Salary versus Firm Sales

5000

4000

3000
Salary

2000

1000

0
0 20000 40000 60000 80000
Sales
23

Free Numerical Reasoning Test Questions Answers
100% (3)
Free Numerical Reasoning Test Questions Answers
18 pages
Chapter Four: Theory of Production and Cost
No ratings yet
Chapter Four: Theory of Production and Cost
33 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
51 pages
OpenStax Chapter 12 Power Point
No ratings yet
OpenStax Chapter 12 Power Point
81 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
16 pages
Lecture9 Regression1 PDF
No ratings yet
Lecture9 Regression1 PDF
22 pages
Uttam Linear Regression 17march24
No ratings yet
Uttam Linear Regression 17march24
82 pages
Linear Regression Analysis and Least Square Methods
No ratings yet
Linear Regression Analysis and Least Square Methods
65 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
Lectures 14 15
No ratings yet
Lectures 14 15
66 pages
Da Unit 3 R22
No ratings yet
Da Unit 3 R22
15 pages
(Mathe) Simple Linear Regression and Correlation
No ratings yet
(Mathe) Simple Linear Regression and Correlation
61 pages
Parametric Test
No ratings yet
Parametric Test
49 pages
Statistics 02
No ratings yet
Statistics 02
8 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Lecture 5 Regression
No ratings yet
Lecture 5 Regression
77 pages
Chapter 7 Presentation - 11.18.2024
No ratings yet
Chapter 7 Presentation - 11.18.2024
18 pages
Course Notes For Unit 6 of The Udacity Course ST101 Introduction To Statistics PDF
No ratings yet
Course Notes For Unit 6 of The Udacity Course ST101 Introduction To Statistics PDF
23 pages
9 Regression (Statistics IEM 2-2)
No ratings yet
9 Regression (Statistics IEM 2-2)
32 pages
Examining Relationships Scatterplot Analysis.: R N 1 Xi X SX Yi y Sy
No ratings yet
Examining Relationships Scatterplot Analysis.: R N 1 Xi X SX Yi y Sy
3 pages
9 Regression (Statistics IEM 2-2)
No ratings yet
9 Regression (Statistics IEM 2-2)
32 pages
Introduction To Linear Regression
No ratings yet
Introduction To Linear Regression
6 pages
Output Input Linear Correlation Coefficient Regression Analysis
No ratings yet
Output Input Linear Correlation Coefficient Regression Analysis
6 pages
Group 10 - Curve Fitting
No ratings yet
Group 10 - Curve Fitting
81 pages
Coding 2
No ratings yet
Coding 2
3 pages
Ra Web
No ratings yet
Ra Web
70 pages
GENG 300 Lecture 09 Curve Fitting 1
No ratings yet
GENG 300 Lecture 09 Curve Fitting 1
55 pages
Third, Regression Analysis Predicts Trends and Future Values
No ratings yet
Third, Regression Analysis Predicts Trends and Future Values
2 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
Fds Unit FINAL
No ratings yet
Fds Unit FINAL
27 pages
Additional Material - Linear Regression
No ratings yet
Additional Material - Linear Regression
11 pages
6 Continuous Data Analysis
No ratings yet
6 Continuous Data Analysis
49 pages
Session 18 Regression
No ratings yet
Session 18 Regression
16 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
Linear Non Linear Regression
No ratings yet
Linear Non Linear Regression
2 pages
AIML MSE 2 Notes
No ratings yet
AIML MSE 2 Notes
35 pages
A A Regression
No ratings yet
A A Regression
28 pages
Handout 4 Regression and Correlation
No ratings yet
Handout 4 Regression and Correlation
13 pages
Unit-5 - Notes
No ratings yet
Unit-5 - Notes
41 pages
Lecture 4
No ratings yet
Lecture 4
60 pages
Topic 13 Correlation and Simple Linear Regression
No ratings yet
Topic 13 Correlation and Simple Linear Regression
17 pages
Mda-Session-7 Simple Linear Regression
No ratings yet
Mda-Session-7 Simple Linear Regression
75 pages
Regression
No ratings yet
Regression
6 pages
Reference Material Linear Regression
No ratings yet
Reference Material Linear Regression
12 pages
Module 3 - Regression and Correlation Analysis
No ratings yet
Module 3 - Regression and Correlation Analysis
54 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Regression Analysis 1 2020
No ratings yet
Regression Analysis 1 2020
40 pages
Statistics Learners' Working Manual
No ratings yet
Statistics Learners' Working Manual
25 pages
Statistics Review: EEE 305 Lecture 10: Regression
No ratings yet
Statistics Review: EEE 305 Lecture 10: Regression
12 pages
C6 Regression
No ratings yet
C6 Regression
27 pages
A Class MTH302 MCQ's Solved by Rana Umair A Khan
100% (3)
A Class MTH302 MCQ's Solved by Rana Umair A Khan
14 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
Bio-L8 - Correlation and Regression Analysis
No ratings yet
Bio-L8 - Correlation and Regression Analysis
15 pages
Chapter 9
No ratings yet
Chapter 9
23 pages
Regression & Correlation
No ratings yet
Regression & Correlation
44 pages
Statistical Analysis: Linear Regression
No ratings yet
Statistical Analysis: Linear Regression
36 pages
4 Regression Analysis
No ratings yet
4 Regression Analysis
44 pages
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
146 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
DS4510 5010
100% (1)
DS4510 5010
2 pages
Sost - Funda - Partnership
No ratings yet
Sost - Funda - Partnership
2 pages
PPG Sewage PDF
No ratings yet
PPG Sewage PDF
4 pages
Q3 DLL-HEALTH10-Week1
No ratings yet
Q3 DLL-HEALTH10-Week1
8 pages
Module 3.1 - Training Certificate - Folayeni - Awosika
No ratings yet
Module 3.1 - Training Certificate - Folayeni - Awosika
1 page
UCL International Postgraduates Orientation Webinar
No ratings yet
UCL International Postgraduates Orientation Webinar
70 pages
IPC Engineering Critical Assessment of Dents and Dents With Cracks Using Inline Inspection
No ratings yet
IPC Engineering Critical Assessment of Dents and Dents With Cracks Using Inline Inspection
9 pages
Memo Example 1
No ratings yet
Memo Example 1
7 pages
Grundfos - CR 5 12 A A A E HQQE
No ratings yet
Grundfos - CR 5 12 A A A E HQQE
10 pages
Rizal in The Context of The Nineteenth-Century Philippines
No ratings yet
Rizal in The Context of The Nineteenth-Century Philippines
1 page
Refresh Parent Grid After Sub-Grid Save in UI For ASP - NET MVC Grid - Telerik Forums
No ratings yet
Refresh Parent Grid After Sub-Grid Save in UI For ASP - NET MVC Grid - Telerik Forums
3 pages
Curriculum - Vitae: Career Objective
No ratings yet
Curriculum - Vitae: Career Objective
3 pages
ct9 Ilm3
No ratings yet
ct9 Ilm3
11 pages
(Campus of Open Learning) University of Delhi Delhi-110007
No ratings yet
(Campus of Open Learning) University of Delhi Delhi-110007
1 page
Mclaren Watch - Google Search
No ratings yet
Mclaren Watch - Google Search
1 page
Opa 2863
No ratings yet
Opa 2863
52 pages
Chapter 2
No ratings yet
Chapter 2
42 pages
Ac Defects
No ratings yet
Ac Defects
5 pages
Inventor Tutorials
100% (3)
Inventor Tutorials
1,264 pages
UFBU Meeting Notice03072025120953
No ratings yet
UFBU Meeting Notice03072025120953
2 pages
Mishra Rath 2021 A Comparative Study of Non Performing Assets Using Non Parametric Test Indian Scheduled Commercial
No ratings yet
Mishra Rath 2021 A Comparative Study of Non Performing Assets Using Non Parametric Test Indian Scheduled Commercial
23 pages
SINAMICS G120 PN at S7-1200 DOCU V1d0 en
No ratings yet
SINAMICS G120 PN at S7-1200 DOCU V1d0 en
63 pages
List of Obcs in Tripura As Approved by The Govt. of India. Schemes For Welfare of O.B.Cs
No ratings yet
List of Obcs in Tripura As Approved by The Govt. of India. Schemes For Welfare of O.B.Cs
4 pages
Manual de Operacion BBC 16
No ratings yet
Manual de Operacion BBC 16
184 pages
Epicor 9.05 Performance Tuning Guide - SQL
No ratings yet
Epicor 9.05 Performance Tuning Guide - SQL
21 pages
Abangan v. Abangan
No ratings yet
Abangan v. Abangan
2 pages
PC Specification List
No ratings yet
PC Specification List
12 pages
CH 4 Summary The Disciplined Trader by Mark Douglas
No ratings yet
CH 4 Summary The Disciplined Trader by Mark Douglas
2 pages

AdvStats - W1 - Descriptive Stats

Uploaded by

AdvStats - W1 - Descriptive Stats

Uploaded by

Advanced Statistics

Xi : in general, denotes the i th observation on the variable X.

We use the following definitions and notation

Greek “SIGMA”, S: =X1+X2+…+Xn

Sample (arithmetic) mean:

Weighted mean sometimes used: (eg, price

• Manipulations in Worked Exercises Q1 6

Sample standard deviation: s=+ s 2

Salary against education

Consider the following scatter

Draw any straight line through the points:

Now compare this Z value with the actual Y value

The difference is called a residual.

Can construct such a residual for each X value in the

Then square the residual values.

Sum these squares to get the “sum of squared

This sum of squared residuals is defined by:

This will give rise to a different set

And, therefore, a sum of squared residuals:

Which line is better?

Choose the line which has the smallest sum of squared

Choose that line (i.e., choose a and b) which

This is called Ordinary Least Squares

The intercept and slope are given by (using calculus):

Derived Worked Exercises Q2. 19

But in practice use Excel, for example

EXAMPLE: Data on salary (£000) and education (years ofFE):

Salary (on average) estimated to be higher by £2,860 for each

Salary estimated to be £16,300 when no FE (Xi =0).

CEO Salary versus Firm Sales

Yˆi = 930.3 + 0.025 X i

CEO Salary versus Firm Sales

You might also like