0% found this document useful (0 votes)

16 views

StatiF 1 Slides

Uploaded by

ishanschneider00

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

StatiF 1 Slides

Uploaded by

ishanschneider00

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

STATISTICS I

WHAT IS STATISTICS?

Statistics is a way to get information from data

Knowledge
Theory 1
2 Population/model - parameters

Information - statistics
3
Practice 4 Sample/data

Important concepts:
x population = group of all items of interest
x parameter = descriptive measure of a population
x sample = data drawn from the population [subset]
x statistic = descriptive measure of a sample

1
Knowledge
Theory 1
2 Population/model - parameters

Information - statistics
3
Practice 4 Sample/data

Descriptive statistics VDPSOHGDWDĺLQIRUPDWLRQ

x FROOHFWLRQVXUYH\VĺUHSUHVHQWDWLYHVDPSOH
x summarise (data reduction): percentages, averages, variances
x presentation: tables, figures and graphs

Probability theory SRSXODWLRQĺVDPSOH

x what is de probability that we shall observe a particular outcome
x parameters are supposed to be known
x GHGXFWLYHVWDWLVWLFV JHQHUDOĺVSHFLILF

Inferential statistics VWDWLVWLFVĺSDUDPHWHUV

x what can be said after we have observed a sample
x parameters are unknown (although constant)!
x inductive staWLVWLFV VSHFLILFĺJHQHUDO
2
TYPES OF VARIABELS

Variable:
x some characteristic of a population or sample
x it varies per item/object

"At the Olympic games van 2002, skater nr 14, Jochem Uytdehaage, finished first at the 10 km in
12:58:92" [new world record]

QUALITATIVE (categorical)
* arithmetic calculations are meaningless
- nominal (labels/names [Latin: nomen])
* no numbers, no ordering, only encoding
- e.g. nr. 14 in n
- e.g. colour, nationality, political preference
- ordinal (labels with natural ordering/rank)
* no numbers, although they are ordered
* no measure for differences
- e.g. first place in n
- e.g. preferences, quality of scientific journals

QUANTITATIVE (numerical)

3
"At the Olympic games van 2002, skater nr 14, Jochem Uytdehaage, finished first at the 10 km in
12:58:92" [new world record]

QUANTITATIVE (numerical)
* arithmetic calculations are valid
- discrete (can only assume a limited numbers of values)
- e.g. year 2002 in n
- e.g. number of kids,
number of correct answers (also as proportion!)
- continuous (can take on any value)
- e.g. time 12:58:92 in n
- e.g. weight: 76.8 or 76.823 or 76.823195 kg

The type of variable determines which

statistical techniques are valid!

4
Example 1: watching TV (n=50)
Data: {C, C, P, P, C, C, P, C, C, P, P, P, C, P, P, P, C, C, C, C, C, P, I, C, P,
C, C, I, P, C, C, C, P, P, I, C, P, I, P, C, P, C, P, C, P, C, C, P, C, I, P, P, C}
Frequency Table
Channel Frequency Relative frequency
Public 20 40
Commercial 25 50
International 5 10

International
10%

Public
40%

Commercial
50%

5
Example 2: Statement “Statistics is fun” (n=16)
5 categories: strongly disagree, disagree, neutral, agree, strongly agree

Answer Frequency
strongly disagree 1
disagree 3
neutral 3
agree 5
strongly agree 4

Opinion about "Statistics is fun"

6
5
5
4
Frequency

4
3 3
3
2
1
1
0
strongly disagree neutral agree strongly
disagree agree
Opinion

6
DESCRIPTIVE STATISTICS

DISCRETE: X="number of kids" (of 50 employees)

Raw data: {2, 1, 3, 0, 1, 3, 4, 3, 0, 2, 2, 0, 2, 1, 5, 2, 0, 2, 0,

2, 0, 1, 0, 1, 2, 4, 0, 0, 3, 0, 0, 3, 2, 1, 5, 0, 1, 0,
2, 2, 1, 2, 3, 1, 4, 2, 0, 0, 1, 5}

i xi fi rf i f i xi rf i xi
1 0 15 0.30 0 0.00
2 1 10 0.20 10 0.20
3 2 13 0.26 26 0.52
4 3 6 0.12 18 0.36
5 4 3 0.06 12 0.24
6 5 3 0.06 15 0.30
total 50 1.00 81 1.62

fi frequency i th class
fi
rf i : relative frequency
n

7
Measures of Central Location (sample):
k
i xi fi rf i xi
1
Mean/Average (arithmetic and unweighted): x ¦ fi xi
ni1
1 0 15 0
2 1 10 0.2
0 10 26 18 12 15 81 3 2 13 0.52
-x 1.62
50 50 4 3 6 0.36
5 4 3 0.24
6 5 3 0.3
middle value n odd total 50 1.62
Median ®
¯mean of the two middle values n even
1 (25th obs.)+2 (26th obs.)
-M 1.5
2
81 5 50
M LVUREXVWVXSSRVHW\SRĺWKHQ x 2.52
50

Mode [French]=most frequently observation/class

- mode=0

8
Histogram

k
The sum of deviations is always zero: ¦ f ( x x )
i=1
i i 0

9
Types of histograms

Histogram Histogram Histogram

0.12 0.18 0.18

0.16 0.16
0.1
0.14 0.14

0.08
Rel. frequency

0.12

Rel. frequency
0.12

Rel. frequency
0.1 0.1
0.06
0.08 0.08
0.04 0.06 0.06

0.04 0.04
0.02
0.02 0.02
0 0 0

symmetric skewed to the right skewed to the left

positively skewed negatively skewed

X M X !M X M

10
Comparison Location Measures

x mean/average: x
x a lot of theory available & efficient usage of the data
x sensitive for extreme observations

x median: M
x less sensitive for extreme observations
x less efficient

x mode:
x used infrequently
x sometimes it's the only measure available (eg. nominal data)

11
Geometric Mean [read yourself]:
Location measure for % change (e.g. returns):

Ex. Wordonline (Tiscali)

prices: 40 50 35 21
50 40
r1 25% r2 30% r3 40%
40

(1 rg )3 (1 r1 )(1 r2 )(1 r3 )
rg 3 (1 r1 )(1 r2 )(1 r3 ) 1 3
0.525 1 | 0.193

12
Measures of Variability (sample):

Range=largest–smallest observation
R=5 (max.) –0 (min.)=5

1 k
Variance: s 2
¦
n 1 i 1
f i u ( xi x ) 2 m "mean squared deviations"

15 u (0 1.62) 2 ... 3 u (5 1.62) 2

-s 2
| 2.2
50 1

Standard deviation: s s 2 (crude approximation: R/4)

-s 2.2 | 1.48 (check: R/4=5/4=1.25)

13
CONTINOUS DATA: X=weight (kg) of 199 individuals
# observations d x
- crf i = : cumulatieve rel. freq. of i th class
n
i from .. till .. mi fi rf i crf i
1 « 55 10 0.05 0.05
2 « 65 38 0.19 0.24
3 « 75 71 0.36 0.60
4 « 85 48 0.24 0.84
5 « 95 26 0.13 0.97
6 « 105 6 0.03 1.00
total 199 1.00
1 k
Average (suppose observations are evenly spread out): x | ¦ f i u mi
ni1
1 15.525
-x 10 u 55 38 u 65 ... 6 u 105 | 78.0 (kg)
199 199
Median class:
- 70-80
Modal class:
- 70-80

14
i from .. till .. mi fi rf i crf i
1 « 55 10 0.05 0.05
2 « 65 38 0.19 0.24
3 « 75 71 0.36 0.60
4 « 85 48 0.24 0.84
5 « 95 26 0.13 0.97
6 « 105 6 0.03 1.00
total 199 1.00

1 k
Variance (approximation): s | 2
¦
n 1 i 1
f i u (mi x ) 2

10 u (55 78) 2 ... 6 u (105 78) 2 26,591

-s |
2
| 134 (kg) 2
199 1 198

Standard deviation: s s2
-s 134 | 11.6 (kg)

s
Coefficient of variation (unit-less relative variability): cv
x
11.6
- cv 14.9%
78

15
(Relative frequency) histogram

0.4

0.35

0.3

rel. 0.25
freq.
0.2

0.15
0.1

0.05
0
50 60 70 80 90 100 110

Weight

What is the total (shaded) area of the bars, if each bar has a width of 1 unit?

16
Ogive / Cumulative frequency polygon
100%
1
97%
0.9
84%
0.8
0.7
cum.rel.freq.

0.6
60%
0.5
0.4
0.3
24%
0.2
0.1
5%
0
50 60 70 80 90 100 110
Weight

What is the median weight?

17
Interpreting the Standard Deviation

Empirical rule: if histogram is bell-shaped, then

x ( x s, x s ) contains r 68% of the observations
x ( x 2 s, x 2 s ) contains r 95% of the observations
x ( x 3s, x 3s ) contains almost all of the observations

x 3s x 2s x s x xs x 2s x 3s

18
x 3s x 2s x s x xs x 2s x 3s

Example: see the 199 weights with x 78 and s 11.6

x ( x s, x s ) (78 11 53 ,78 11 53 ) (66 52 ,89 53 )
x ( x 2 s, x 2 s ) (78 23 15 ,78 23 15 ) (54 54 ,101 15 )
x ( x 3s, x 3s ) (78 34 54 ,78 34 54 ) (43 15 ,112 54 )

What % has a weight between (66 52 ,89 53 ) | (65,90) ? Answer?

ĺ(PSLULFDOUXOHVHHPVWR\LHOGDQDFFXUDWHDSSUR[LPDWLRQ
19
PERCENTILES AND BOXPLOT

p th percentile= such that (i) at most p% of the data is smaller

(ii) at most (100–p)% of the data is greater

p
Location: L p (n 1)
100

Q1 (first quartile) =25th percentile

Q2 (second quartile) =50th percentile (median)
Q3 (third quartile) =75th percentile

25% 25% 25% 25%

Q1 Q2 Q3

Interquartile range (IQR)=Q3–Q1

20
Ex. 25 test marks (0-100) with x 47.72
23 34 42 52 58
27 37 42 53 63
30 39 42 55 66
33 40 48 57 77
33 40 48 58 96

Q1=35.5 [26·25/100=6.5: 34+0.5*(37–34)]

Q2=42 [26·50/100=13th observation]
Q3=57.5 [26·75/100=19.5: 57+0.5*(58–57)]

IQR=57.5–35.5=22 FRPSDUHV§

Outliers: observations less than Q1 1.5 u IQR

observations greater than Q3 1.5 u IQR

35.5 1.5 u 22 2 ½
In our example: ¾ 96 outlier
57.5 1.5 u 22 90.5¿

21
BOXPLOT

1. Box: from Q1 till Q3 with a line at Q2

2. Whiskers: extend to the most extreme values that are not outliers, i.e. extreme value in
( Q1 1.5 u IQR , Q3 1.5 u IQR )

3. 2XWVLGHZKLVNHUVżRU for outliers

BoxPlot

0 20 40 60 80 100 120

22
LINEAR RELATIONSHIP BETWEEN 2 VARIABELS
perfect positive relation positive relation

Correlation=1.0 Correlation=0.5
6 10

4 5
2
0
Y 0 Y
-3 -2 -1 0 1 2 3
-3 -2 -1 0 1 2 3 -5
-2
-10
-4

-6 -15
X X

strong positive relation strong negative relation

Correlation=0.9 Correlation=–0.9
6 8

4 6

2 4
2
0
Y Y 0
-3 -2 -1 -2 0 1 2 3
-3 -2 -1 -2 0 1 2 3
-4
-4
-6 -6
-8 -8
X X

no (linear) relationship
Karl Pearson
Correlation=0.0
Correlation= Correlation=0.0 Born: March 27, 1857 (London)
15
35 Died: April 27, 1936
30
10
25
5 20 Noted for: Pearson's correlation
Y
Y 0 15
10
coefficient
-3 -2 -1 0 1 2 3
-5
5
-10 0
-3 -2 -1 -5 0 1 2 3 23
-15
X
X
Numerical Example: Advertising versus Sales

X="advertising" (100,000 €) Y="sales" (1,000,000 €)

2 3
3 6
4.5 8
5.5 10
7 11
x 4.4 & s X 1.98 y 7.6 & sY 3.21

Sample covariance (dependent on unit of measurement):

1 n 1
cov( X , Y ) s XY ¦ i
n 1 i 1
( x x )( yi y )
5 1
24.8 6.2

Sample correlation (independent of unit of measurement):

cov( X , Y ) 6.2
rXY 0.98
s X sY 1.98 u 3.21

Theorem: 1 d rXY d 1

24
i ( xi x ) ( yi y ) ( xi x )( yi y )
1 –2.4 –4.6 11.04
2 –1.4 –1.6 2.24
3 0.1 0.4 0.04
4 1.1 2.4 2.64
5 2.6 3.4 8.84
Total 24.8

25
Regression according to Least Squares Method

x yi b0 b1 xi residual

yˆ r
i i

so that

actual value ( yi ) predicted/fitted value ( yˆi ) + residual (ri)

n n
x choose b0 and b1 such that ¦ ( y yˆ ) ¦ r
i=1
i i
2

i 1
i
2
minimal

cov( X , Y )
Theorem: b1 and b0 y b1 x
s X2

6.2
b1 | 1.58 & b0 7.6 1.58 u 4.4 0.65 yˆi 0.65 1.58 xi
1.982

26
Scatter diagram:

Sales versus Advertising

12
11
11
10
10
9 y = 1.5796x + 0.6497
8
8
7
Sales

6
6
5
4
3 3
2
1
0
0 1 2 3 4 5 6 7 8
Advertising

Forecast ( x 6 ): yˆ 0.65 1.58 u 6 10.13

C1S1 Statistics Packet
No ratings yet
C1S1 Statistics Packet
24 pages
Lecture+1+slides+with+Q%26A+20242025
No ratings yet
Lecture+1+slides+with+Q%26A+20242025
33 pages
Business Statistics
No ratings yet
Business Statistics
106 pages
Actuary_Math.Stat._Lec1-9
No ratings yet
Actuary_Math.Stat._Lec1-9
22 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
19 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
35 pages
Descriptive Statistics and Exploratory Data Analysis
No ratings yet
Descriptive Statistics and Exploratory Data Analysis
36 pages
Chapter Two
No ratings yet
Chapter Two
36 pages
Lecture-1 Descriptive Statistics
No ratings yet
Lecture-1 Descriptive Statistics
50 pages
Statistics I Chapter 2: Univariate Data Analysis
No ratings yet
Statistics I Chapter 2: Univariate Data Analysis
27 pages
Statistics 101
100% (1)
Statistics 101
20 pages
Exploring Data: AP Statistics Unit 1: Chapters 1-4
No ratings yet
Exploring Data: AP Statistics Unit 1: Chapters 1-4
83 pages
3. Variables & Chart
No ratings yet
3. Variables & Chart
60 pages
UNIT II_ Statistics for Data Science_new (1)
No ratings yet
UNIT II_ Statistics for Data Science_new (1)
153 pages
PROBABILITY Lecture 1 - 2 - 3
No ratings yet
PROBABILITY Lecture 1 - 2 - 3
63 pages
Chapter 2 Descriptive Statistics
No ratings yet
Chapter 2 Descriptive Statistics
12 pages
Week 01 Introduction
No ratings yet
Week 01 Introduction
33 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
53 pages
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
No ratings yet
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
20 pages
6938
No ratings yet
6938
41 pages
3RD QUARTER STATISTICS AND PROBABILITY (1)
No ratings yet
3RD QUARTER STATISTICS AND PROBABILITY (1)
7 pages
Day 01-Basic Statistics
No ratings yet
Day 01-Basic Statistics
36 pages
Unit 01 - Describing Data and Its Distributions - 1 Per Page
No ratings yet
Unit 01 - Describing Data and Its Distributions - 1 Per Page
79 pages
01_Scales of mesurement_Sumarising numeric data
No ratings yet
01_Scales of mesurement_Sumarising numeric data
26 pages
MÔ TẢ BIẾN SỐ
No ratings yet
MÔ TẢ BIẾN SỐ
48 pages
Video Notes Unit 2
No ratings yet
Video Notes Unit 2
16 pages
Making Sense of Data Statistic Course
No ratings yet
Making Sense of Data Statistic Course
39 pages
Actuary Math ZZVCZXVCZXCVZVCX
No ratings yet
Actuary Math ZZVCZXVCZXCVZVCX
23 pages
Definitions
No ratings yet
Definitions
4 pages
Intro SRM
No ratings yet
Intro SRM
73 pages
STATISTICS (Averages and Variation)
No ratings yet
STATISTICS (Averages and Variation)
8 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
4 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Introduction To Descriptive Statistics
No ratings yet
Introduction To Descriptive Statistics
73 pages
Statistics
No ratings yet
Statistics
61 pages
Introduction To The Practice of Basic Statistics (Textbook Outline)
100% (14)
Introduction To The Practice of Basic Statistics (Textbook Outline)
65 pages
Statistics: A Branch of Mathematics That Deals With: Planning Collecting Organizing Presenting Analyzing Interpreting
No ratings yet
Statistics: A Branch of Mathematics That Deals With: Planning Collecting Organizing Presenting Analyzing Interpreting
43 pages
City_Uni_of_New_York
No ratings yet
City_Uni_of_New_York
33 pages
MATM111-Midterms-REVIEWER
No ratings yet
MATM111-Midterms-REVIEWER
3 pages
Statistics Firstfive
No ratings yet
Statistics Firstfive
43 pages
Statistics
No ratings yet
Statistics
11 pages
Module I. Basic Calculations. Average, Standard Deviation by Excel (5)
No ratings yet
Module I. Basic Calculations. Average, Standard Deviation by Excel (5)
48 pages
Types of Statistics
No ratings yet
Types of Statistics
7 pages
Lecture 01 Introduction to Statistics Ppt 06022025 095924am
No ratings yet
Lecture 01 Introduction to Statistics Ppt 06022025 095924am
40 pages
Basic Concepts of Statistics
No ratings yet
Basic Concepts of Statistics
41 pages
Statistics For Css
No ratings yet
Statistics For Css
73 pages
Data Management ( 1)
No ratings yet
Data Management ( 1)
46 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Stats 1 Module Updated
No ratings yet
Stats 1 Module Updated
53 pages
Lecture 1
No ratings yet
Lecture 1
32 pages
Week 03
No ratings yet
Week 03
39 pages
Click To Add Text Dr. Cemre Erciyes: Soc 2003 Statistical Methods and Computer Applications in Social Sciences 18/19
No ratings yet
Click To Add Text Dr. Cemre Erciyes: Soc 2003 Statistical Methods and Computer Applications in Social Sciences 18/19
69 pages
Chapter 1
No ratings yet
Chapter 1
51 pages
Psyc 103 (Stats)
No ratings yet
Psyc 103 (Stats)
75 pages
Statistics 24 04 2021 20210618114031
No ratings yet
Statistics 24 04 2021 20210618114031
41 pages
Quality Control: Fundamentals of Statistics
No ratings yet
Quality Control: Fundamentals of Statistics
62 pages
Newbold, P. (2019) - Statistics For Business and Economics. 9thed, Pearson
No ratings yet
Newbold, P. (2019) - Statistics For Business and Economics. 9thed, Pearson
20 pages