0% found this document useful (0 votes)

15 views38 pages

Lecture 2 Slides With Q&A 20242025

Lecture 2 covers descriptive statistics, including measures of location and spread, percentiles, and the relationship between two variables. It introduces principles of probability, including basic concepts, events, and axioms of probability theory. The lecture also discusses covariance, correlation coefficients, and methods for analyzing grouped data.

Uploaded by

d5xbhvtwnn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views38 pages

Lecture 2 Slides With Q&A 20242025

Uploaded by

d5xbhvtwnn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Slides Probability Theory and Statistics 1

Lecture 2

Joris Marée
Contents Lecture 2:

Descriptive statistics (Ch. 1 continued):

• Recap measures of location and spread
• Grouped data (& measures of location and spread)
• Percentiles, IQR & Boxplot
• Linear relation between two variables
• Covariance and correlation coefficient
• (The geometric mean)

Principles of Probability (§2.1 - §2.4):

• Introduction & basic concepts
• Events/sets (union & intersection)
• Probability definitions
• Axioms of Probability Theory (Kolmogorov)
• Elementary rules of probability
3

Sample ( x1,..., xn ) Population ( x1,..., xN )

Sample mean Pop. mean
1 n 1 N
x =  xi  =  xi
n i =1 N i =1

Sample variance Pop. variance

1 n 1 N
s =
2

n − 1 i =1
( xi − x )2  =  ( xi − )2
2
N i =1
N
1 n 2 2
1
or s =
2
  xi − nx 
or  =
2
N
 xi 2 −  2
n − 1  i =1  i =1

Sample standard deviation Pop. standard deviation

s = s2  = 2

Sample coef. of variation Pop. coef. of variation

s 
cv = CV =
x 
4

Properties sample mean:

n
• The sum of deviations around the mean is always zero: (x − x ) = 0
i=1
i

n
• It can be thought of the value a such that min  ( xi − a)2
a i =1
5

MEASURES OF SPREAD/DISPERSION (sample):

Ex.: 4 11 16 22 27 33 60 78 121 268

Range = R = largest minus smallest observation

1 n
Variance: s =
2

n − 1 i =1
( x i − x )2
 "mean quadratic deviation"

1 n 2 2
Alternative formula: s =
2
 i
n − 1  i =1
x − nx 


Standard deviation: s = s 2

s
Coefficient of variation (unit-less): cv =
x
6

Interpretation of Standard deviation

Empirical rule of thumb: if histogram is bell-shaped, then:
( x − s, x + s ) contains  two-thirds of the observations
( x − 2s, x + 2s ) contains  95% of the observations
( x − 3s, x + 3s ) contains almost all of the observations

x − 3s x − 2s x −s x x +s x + 2s x + 3s

| 68% |
| 95% |
| 99.7% |
7

Grouped data (Examples):

Weight example
Class limits fi
50 - 60 10
60 - 70 38
70 - 80 72
80 - 90 48
90 - 100 26
100 - 110 6

So how do we determine the sample mean and sample variance now?

1 k 1 k
Grouped data: x   fi mi =… and s 
2
 fi (mi − x )2 =…
n i =1 n − 1 i =1
8

1 k 1 k
Grouped data: x   fi mi =… and s 
2
 fi (mi − x )2 =…
n i =1 n − 1 i =1
9

Grouped data: Example car insurance

Class premium (in €1000) Frequency fi mi
1 0-1 8 0.5
2 1-2 9 1.5
3 2-3 3 2.5
k = 3 classes, n = 20, class midpoint mi

1 k
x   mi fi
n i =1
1
= (0.5  8 + 1.5  9 + 2.5  3) = 1.25 ( €1000)
20

1 k
s 
2

n − 1 i =1
(mi − x )2 fi

1
= [(0.5 − 1.25)2  8 + (1.5 − 1.25)2  9 + (2.5 − 1.25)2  3] =
19
= 0.513 ( €1000)2
10

PERCENTILES AND BOXPLOT

p th percentile = such that (i) at most p% of the data is smaller

(ii) at most (100–p)% of the data is greater

 p 
Location: L p = (n + 1)  
 100 

Interquartile range (IQR) = Q3 – Q1

Interpretation: length of interval which contains the center 50% of the observations
(measure of spread which is less sensitive to extreme values than R or s)
11
MEASURES OF SPREAD/DISPERSION (sample):
Ex.: 4 11 16 22 27 33 60 78 121 268

Outlier: extraordinary large or small observations (different criteria are

possible; many times: observations smaller than Q1 − 1.5  IQR or
observations greater than Q3 + 1.5  IQR )
Boxplot: graphical presentation showing quartiles and the total range of
values, incl. outliers

1. Box: from Q1 to Q3 with a line at Q2

2. Outliers are depected by * or o
3. Horizontal lines (whiskers) extend from the smallest to the largest
observation which are no outliers

Outlier at 268 line from 4 to 121

0 100 200
12

Ex. 25 test marks (0 - 100) with x = 47.72

23 34 42 52 58
27 37 42 53 63
30 39 42 55 66
33 40 48 57 77
33 40 48 58 96

Q1 = 35.5 [26·25/100 = 6.5, so 34+0.5*(37–34)]

Q2 = 42 [26·50/100 = 13th observation]
Q3 = 57.5 [26·75/100 = 19.5, so 57+0.5*(58–57)]

IQR = 57.5 – 35.5 = 22 (compare s ≈ 16.5)

35.5 − 1.5  22 = 2.5 
In our example:   96 is an outlier
57.5 + 1.5  22 = 90.5
BoxPlot

0 20 40 60 80 100 120
13

LINEAR RELATION BETWEEN TWO VARIABLES

Ex: Effectiveness of advertising
X = ad expenditure (in €1000); Y = sales (in €100000); i = company
Scatter plot
Sample:
40
i xi yi
35
1 13 10
30
2 18 18
25

Sales
3 22 31
20
4 32 20
15
5 40 36
10

0
0 10 20 30 40 50

Advertising expenditure

Visualises relationship between X and Y:

-Linear or non-linear / Positive or negative / Weak or strong
-Spread in X and Y / Outliers / other peculiarities
14

1 n
Covariance (sample): s XY = cov( X ,Y ) = 
n − 1 i =1
( xi − x )( y i − y )

i xi yi xi − x yi − y ( xi − x )( y i − y )
1 13 10 -12 -13 +156
2 18 18 -7 -5 +35
3 22 31 -3 8 -24
4 32 20 7 -3 -21
5 40 36 15 13 +195
Tot. 125 115 0 0 +341

125
x= = 25 y = 23
5
1
Ex: s XY = cov( X ,Y ) = 341 = 85.25
4
The covariance only tells us about the linear relation: pos/neg
But size tells nothing about the strength of the relation!
15

Coefficient of correlation (sample):

cov( X ,Y ) s XY
rXY = =
s X sY s X sY
Always: − 1  r XY  1

Advertisement example:
s X2 = 119 sY2 = 109 (use one of the formulas on slide 4)

s XY = 85.25

cov( X ,Y )
Then: r XY = =
s X sY

This coefficent is normalised, independent of units of measurement.

Expresses both direction and strength of linear relation.
Note: it does not say anything about the slope in the scatter plot.
16

6 10

4 5
2
0
0 -3 -2 -1 0 1 2 3
-3 -2 -1 0 1 2 3 -5
-2

-4 -10

-6 -15

r=1.0, perfect pos. lin relation r=0.5, positive relation

6 8

4 6

2 4
2
0
-3 -2 -1 0
-2 0 1 2 3
-3 -2 -1 -2 0 1 2 3
-4 -4
-6 -6
-8 -8

r=0.9, strong pos. relation r=-0.9, strong neg. relation

35
15
30
10 25

5 20
15
0
10
-3 -2 -1 0 1 2 3
-5 5
-10 0
-3 -2 -1 -5 0 1 2 3
-15

r=0.0, no relation r=0.0, no linear relation

If the complete population is observed, we can determine

the population covariance and the population coefficient of
correlation:

Covariance (pop):
1 N
 XY = COV( X ,Y ) =  ( xi −  X )( y i − Y )
N i =1

Coefficient of correlation (pop):

COV( X ,Y ) 
 XY = = XY − 1   XY  1
 X Y  X Y
18

Sometimes another measure for the “mean”:

Example:
Initial amount $100.
First year: Return 80%
Second year: Return -10%
Geometrical mean!!!
Comparison mean and median

Histogram Histogram Histogram

0.12 0.18 0.18

0.16 0.16
0.1
0.14 0.14
Rel. frequency

0.08 0.12 0.12

Rel. frequency
Rel. frequency
0.1 0.1
0.06
0.08 0.08
0.04 0.06 0.06

0.04 0.04
0.02
0.02 0.02
0 0 0

symmetric skewed to the right skewed to the left

positively skewed negatively skewed

x =M x M x M

19
20
21

PROBABILITY THEORY

→ prob. theory →

Population Sample

 inferential statistics 

Basic concepts:
- Random experiment
- experiment with an uncertain outcome
- Sample space S
- set of all possible, mutually disjoint, outcomes
- either finite (notation S = {e1, … , ek} with k possible outcomes) or infinite
- Event
- subset of sample space (notation A,B, etc.)
Basic concepts :
1. Probability experiment
2. Sample space S
3. Elementary event
4. Event

Examples
i. 1. Toss a coin ii. 1. Toss 2 coins
2. S= 2. S=
3. E1= , E2= 3.
4. A = “Tails” 4. A=“at least 1x Heads”
=

iii 1. Throw a die iv. 1. Age (in years)

2. S= 2. S =
3. 3.
4. A=“even number of dots” 4. A = “Grown up“
= {2, 4, 6} = {18, 19, 20, …}

22
23

Venn diagram: A rectangle representing sample space S, while other forms

represent events:

S A and B share no outcomes. A and B

are called disjoint, or mutually
exclusive. Their intersection is empty.
E.g. draw a card from a deck of cards.
A : diamonds ( ), B : spades ( )
A B

S A and B do share outcomes. Their

A intersection “A  B” contains all
and outcomes which are both in A as in B.
B E.g. draw a card from a deck of cards:
A : queen, B : diamonds
A B
The union “A  B” contains all outcomes belonging to A or to B or to both.
Above: “A  B” is the totality of all coloured areas.
24

S The union “A  B” contains all outcomes

belonging to A or to B or to both.
A B E.g. draw a card from a deck of cards:
A : queen, B : diamonds

Ac = A = "not A " = Complement of A

S
A 𝐴𝑐
25

PROBABILITY DEFINITIONS
Event A occurs if outcome belongs to set A
P(A) = probability of A occurring
# elements in A
1. Classical definition (Laplace): P ( A) =
# elements in S

1
Ex. Fair die: P("2 dots") =
6
- Only if all outcomes are equally likely!

2. Relative frequency definition:

# times A occurs
P ( A) = lim
n → # experiments (=n )

Ex. Throw 10000(=n) times a coin → 5067 H(ead)

5067
P(“Head”) ≈ = 0.5067
10000
26

AXIOMS OF PROBABILITY THEORY (KOLMOGOROV 1934)

See P(·) as a function (“probability set function”, or just “probability”)

assigning to each subset A of the sample space S (belonging to
some random experiment) a real value, such that:

1 P(A) ≥ 0 for any event A.

2 P(S ) = 1 (the probability of an outcome to fall within S is 1)

3 If A1, A2, … is any sequence of mutually exclusive events, then:


P ( A 1  A 2  A 3  .....) =  P ( A i )
i =1
27

ELEMENTARY RULES OF PROBABILITY

► P () = 0

► If A and B are disjoint, then P(A  B ) = P(A) + P(B).

► In general: P(A  B ) = P(A) + P(B) –+ P(A

??  B ) [Addition rule].

S
A

B
A B

► P(A ) = 1 – P(AC ) [complement rule]

Ac = A = "not A " = Complement of A

Vb. P(“odd number of dots
if you throw a fair die”)

► In general: P(A  B ) = P(A) + P(B) – P(A  B ) [Addition rule].

Experiment: one card from a stock: A : queen, B : diamonds

52 cards (2-10,J/Q/K/A=13 x )
4 1 13 1
P(A) = = P(B) = =
52 13 52 4
1 (only one of the 52 cards is the
P(A  B ) =
52 queen of diamonds)
4 13 1 16 4
so P(A  B ) = + − = =
52 52 52 52 13
28
► P(A ) = 1 – P(AC ) [complement rule] S

AC = A = "not A" = Complement of A

A
AC

Ex. Experiment: one card from a stock

Event C: Card is no queen and it’s no diamond card

4
P (CC ) = (See previous slide)
13

4 9
 P (C ) = 1 − P (C C ) = 1 − =
13 13

Ex. Experiment: Throw a dice 3 times. Find the probability that at

Least one six is thrown.

P(at least 1 six) = 1 – P(no 6) = 1 – 5/65/65/6 = 91/216 29

Extra: Morgan-laws:

S S
A B A B
Next lecture

Lecture 2 Slides With Q&A 20242025

Uploaded by

Lecture 2 Slides With Q&A 20242025

Uploaded by

Slides Probability Theory and Statistics 1

Descriptive statistics (Ch. 1 continued):

Principles of Probability (§2.1 - §2.4):

Sample ( x1,..., xn ) Population ( x1,..., xN )

Sample variance Pop. variance

Sample standard deviation Pop. standard deviation

Sample coef. of variation Pop. coef. of variation

Properties sample mean:

MEASURES OF SPREAD/DISPERSION (sample):

Range = R = largest minus smallest observation

Interpretation of Standard deviation

Grouped data (Examples):

So how do we determine the sample mean and sample variance now?

Grouped data: Example car insurance

PERCENTILES AND BOXPLOT

p th percentile = such that (i) at most p% of the data is smaller

Interquartile range (IQR) = Q3 – Q1

Outlier: extraordinary large or small observations (different criteria are

1. Box: from Q1 to Q3 with a line at Q2

Outlier at 268 line from 4 to 121

Ex. 25 test marks (0 - 100) with x = 47.72

Q1 = 35.5 [26·25/100 = 6.5, so 34+0.5*(37–34)]

IQR = 57.5 – 35.5 = 22 (compare s ≈ 16.5)

LINEAR RELATION BETWEEN TWO VARIABLES

Visualises relationship between X and Y:

Coefficient of correlation (sample):

This coefficent is normalised, independent of units of measurement.

r=1.0, perfect pos. lin relation r=0.5, positive relation

r=0.9, strong pos. relation r=-0.9, strong neg. relation

r=0.0, no relation r=0.0, no linear relation

If the complete population is observed, we can determine

Coefficient of correlation (pop):

Sometimes another measure for the “mean”:

Histogram Histogram Histogram

0.12 0.18 0.18

0.08 0.12 0.12

symmetric skewed to the right skewed to the left

iii 1. Throw a die iv. 1. Age (in years)

Venn diagram: A rectangle representing sample space S, while other forms

S A and B share no outcomes. A and B

S A and B do share outcomes. Their

S The union “A  B” contains all outcomes

Ac = A = "not A " = Complement of A

2. Relative frequency definition:

Ex. Throw 10000(=n) times a coin → 5067 H(ead)

AXIOMS OF PROBABILITY THEORY (KOLMOGOROV 1934)

See P(·) as a function (“probability set function”, or just “probability”)

1 P(A) ≥ 0 for any event A.

2 P(S ) = 1 (the probability of an outcome to fall within S is 1)

3 If A1, A2, … is any sequence of mutually exclusive events, then:

ELEMENTARY RULES OF PROBABILITY

► If A and B are disjoint, then P(A  B ) = P(A) + P(B).

► In general: P(A  B ) = P(A) + P(B) –+ P(A

► P(A ) = 1 – P(AC ) [complement rule]

Ac = A = "not A " = Complement of A

► In general: P(A  B ) = P(A) + P(B) – P(A  B ) [Addition rule].

Experiment: one card from a stock: A : queen, B : diamonds

AC = A = "not A" = Complement of A

Ex. Experiment: one card from a stock

Ex. Experiment: Throw a dice 3 times. Find the probability that at

P(at least 1 six) = 1 – P(no 6) = 1 – 5/6*5/6*5/6 = 91/216 29

More on Probability Theory

You might also like

P(at least 1 six) = 1 – P(no 6) = 1 – 5/65/65/6 = 91/216 29