0% found this document useful (0 votes)
14 views164 pages

ST Topic 1

The document provides an introduction to statistics, covering organizational issues, basic concepts, descriptive statistics, probability, and random variables. It outlines evaluation methods, literature references, and various statistical methods including descriptive and inferential statistics. Key topics include types of variables, population vs sample, parameter vs statistic, and graphical representations of data.

Uploaded by

lucia.garzon500
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views164 pages

ST Topic 1

The document provides an introduction to statistics, covering organizational issues, basic concepts, descriptive statistics, probability, and random variables. It outlines evaluation methods, literature references, and various statistical methods including descriptive and inferential statistics. Key topics include types of variables, population vs sample, parameter vs statistic, and graphical representations of data.

Uploaded by

lucia.garzon500
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 164

Statistics

Topic 1: Introduction

Based on documents by Audra Virbickaitė and Jesús Marı́a Pinar Pérez


Organizational issues

1 Organizational issues

2 Statistics: basic concepts

3 Descriptive Statistics

4 Probability

5 Random variables
Discrete random variables
Continuous random variables

6 Probability Distributions
Bernoulli and Binomial Distribution
Normal distribution: approximations
Poisson Distribution

7 Continuous probability distributions


Uniform distribution
Exponential distribution

Statistics 2 / 129
Organizational issues

Organizational issues

E-mail: [email protected]
Office hours: Tuesdays and Wednesdays 11:00 – 13:00 (send an email
beforehand!)
Evaluation:
Continuous evaluation 40%
- 20% Midterm date: 1-4. Topics 1 to 5.
- 10% R practices 12/2 - 26/2 - 26/3- 6/5.
- 10 % Exercises in class 12/2 - 26/2 - 9/4 - 22/4 - 30/4.
Final exam 60% 19-5-2025 9am
- Covers all topics
- Minimum grade is 5!

Statistics 3 / 129
Organizational issues

Literature

Newbold, Paul. Statistics for business and economics. Pearson.


Peña, Daniel. Fundamentos de estadı́stica. Alianza editorial.
Kerns, G.J. Introduction to probability and statistics using R. Lulu.com.
Levin, Richard I., and David S. Rubin. Statistics for management. Pearson.

Statistics 4 / 129
Organizational issues

1 Organizational issues

2 Statistics: basic concepts

3 Descriptive Statistics

4 Probability

5 Random variables
Discrete random variables
Continuous random variables

6 Probability Distributions
Bernoulli and Binomial Distribution
Normal distribution: approximations
Poisson Distribution

7 Continuous probability distributions


Uniform distribution
Exponential distribution

Statistics 5 / 129
Statistics: basic concepts

What is data Statistics?

Statistics - is the science that allows us to analyze a set of observed data in order
to know the characteristics of the real phenomenon that has generated them.
- If we manage to know the phenomenon that generates the data, we will be
able to anticipate what the next observed data will be like, or we can learn
to provoke or avoid them.
- It can then be said that statistics is a learning tool based on observation,
since it helps us to draw generalizable conclusions from a set of observed
data.

Statistics 6 / 129
Statistics: basic concepts

Statistical method/branches of statistics

Descriptive Study of a set of data whose conclusions or results are limited to


that set.

Statistics 7 / 129
Statistics: basic concepts

Statistical method/branches of statistics

Descriptive Study of a set of data whose conclusions or results are limited to


that set.

It deals essentially with the methods of data collection, description,


visualization and summary. Data can be summarized numerically or
graphically. Use the deductive method.

Statistics 7 / 129
Statistics: basic concepts

Statistical method/branches of statistics

Descriptive Study of a set of data whose conclusions or results are limited to


that set.

It deals essentially with the methods of data collection, description,


visualization and summary. Data can be summarized numerically or
graphically. Use the deductive method.
Inferential Study of a data set whose conclusions or results can be
generalized beyond this data set.

Statistics 7 / 129
Statistics: basic concepts

Statistical method/branches of statistics

Descriptive Study of a set of data whose conclusions or results are limited to


that set.

It deals essentially with the methods of data collection, description,


visualization and summary. Data can be summarized numerically or
graphically. Use the deductive method.
Inferential Study of a data set whose conclusions or results can be
generalized beyond this data set.

Inductive method, is dedicated to the generation of models, inferences,


estimates and predictions associated with the phenomena in question, taking
into account the uncertainty in the observations.

Statistics 7 / 129
Statistics: basic concepts

Determinist vs random

Determinist based in natural laws. A phenomenon is deterministic when the


phenomenon has only one response.

Deterministic experiment: by repeating it under identical conditions the


same result is obtained.
Random based in the laws of probability. A phenomenon is random when,
even knowing the possibilities that may arise, it is not possible to be sure
what the final result will be.

Random experiment: it is not possible to predict the result.

Statistics 8 / 129
Statistics: basic concepts

Types of variables

Qualitative (categorical)
- Nominal. Gender, country of birth, hair color...
- Ordinal. Level of education, satisfaction level, a place in a race...
Quantitative (numerical)
- Discrete. Number of kids, number of students enrolled in a class, number of
Microsoft stocks in an investor’s portfolio...
- Continuous. Height, weight, time...

Statistics 9 / 129
Statistics: basic concepts

Types of variables: examples

- What is your body temperature in degrees Fahrenheit /Celsius?


- How old are you (exactly / in years)?
- Do you have a car?
- What brand is your (your family’s) car?
- How satisfied are you with the service? (very, medium, not satisfied)
- How much does it cost?
- How many stars the hotel has?
- What is your profession?
- What is your level of English? (beginner, intermediate, fluent, native)

Statistics 10 / 129
Statistics: basic concepts

Population vs sample

A population is the complete set of all items that interest an investigator.

Population size, N, can be very large or even infinite.


A sample is an observed subset (or portion) of a population with sample size
given by n.

Our eventual aim is to make statements about the population based on a sample
(a sample needs to be representative).

Statistics 11 / 129
Statistics: basic concepts

Parameter vs statistic

An experiment is any procedure for obtaining data, given experimental


conditions.

Statistics 12 / 129
Statistics: basic concepts

Parameter vs statistic

An experiment is any procedure for obtaining data, given experimental


conditions.
A parameter is a numerical measure that describes a specific characteristic of
a population.

Statistics 12 / 129
Statistics: basic concepts

Parameter vs statistic

An experiment is any procedure for obtaining data, given experimental


conditions.
A parameter is a numerical measure that describes a specific characteristic of
a population.
A statistic is a numerical measure that describes a specific characteristic of a
sample.

Statistics 12 / 129
Statistics: basic concepts

Parameter vs statistic

An experiment is any procedure for obtaining data, given experimental


conditions.
A parameter is a numerical measure that describes a specific characteristic of
a population.
A statistic is a numerical measure that describes a specific characteristic of a
sample.

We form opinions and make decisions about a population parameter (fixed, but
unobservable) based on a sample statistic (observable, but random).

Statistics 12 / 129
Descriptive Statistics

1 Organizational issues

2 Statistics: basic concepts

3 Descriptive Statistics

4 Probability

5 Random variables
Discrete random variables
Continuous random variables

6 Probability Distributions
Bernoulli and Binomial Distribution
Normal distribution: approximations
Poisson Distribution

7 Continuous probability distributions


Uniform distribution
Exponential distribution

Statistics 13 / 129
Descriptive Statistics

Frequency distribution table

For example, the following dataset contains Life Expectancy at birth for 200
countries: 1 = short (< 70), 2 = medium (70 − 75), and 3 = long (> 75)

Statistics 14 / 129
Descriptive Statistics

Frequency distribution table

A frequency distribution is a table used to organize data. Can be used for all types
of data.
xi ni fi Ni Fi

x1 n1 f1 N1 F1
... ... ... ... ...
xl nl fl Nl Fl

ni is called absolute frequency


fi is called relative frequency
Ni is called absolute accumulated frequency
Fi is called relative accumulated frequency

Statistics 15 / 129
Descriptive Statistics

Frequency distribution: example

Countries dataset contains Life Expectancy at birth for 200 countries: 1 = short
(< 70), 2 = medium (70-75), and 3 = long (> 75).

Life Expectancy ni fi Ni Fi
Short n1 = 58
Medium n2 = 52
Long n3 = 90
Total

Statistics 16 / 129
Descriptive Statistics

Frequency table: example


Example 2

15 people were asked how many times they had gone to the cinema in the last year.
The data collected were as follows: 2, 1, 0, 4, 3, 2, 1, 1, 3, 2, 3, 0, 2, 2, 1.

xi ni fi Ni Fi
0 2
1 4
2 5
3 3
4 1
Total

Statistics 17 / 129
Descriptive Statistics

Graphical representation

The use of graphics is essential both in a first part of a statistical analysis (when we
are describing our data set) and in the results part (when we want to efficiently
show our main results or findings).
Graphs allow to present the principal characteristics of the data in an
instantaneous manner.
Graphs are easy to read and understand, even non-specialists can interpret
statistical information presented in graphs.
The choice of the type of the graph depends mainly on two things: (i) type
of data (qualitative vs quantitative) and (ii) the objective or goal of
transmitting certain information.
It is extremely easy to lie (misinform) with graphs (either on purpose or by
mistake).

Statistics 18 / 129
Descriptive Statistics

Bar chart
The bar chart is just a visual representation of the frequency table. In a bar chart
the height of a rectangle represents each frequency (either absolute or relative).
There is no need for the bars to touch. Used for nominal, ordinal and discrete
data.

Statistics 19 / 129
Descriptive Statistics

Pie chart

The pie chart is another visual representation of the frequency table. In a pie chart
the whole pie is worth a 100% and each slice represents the % in each category.
Used for nominal, ordinal and discrete data.

Pie chart is often used in the news and the newspapers, however, it presents certain
disadvantages as compared to the bar chart:

Statistics 20 / 129
Descriptive Statistics

Histogram

Continuous data, grouped into intervals, can be represented by using


histograms (looks similar to a bar graph, only without any spaces between
bars).
The area of each rectangle has to be proportional to the absolute frequency.

Statistics 21 / 129
Descriptive Statistics

Box-plot

Quantitative data (discrete or continuous) is also usually represented using a box


and whisker plot.

Page views

500 1000 1500 2000

Grade

0 2 4 6 8

Statistics 22 / 129
Descriptive Statistics

Numerical summary statistics

We already saw how to summarize data (via frequency distribution tables)


and present the first impression of the data (graphical representation).
We can also communicate the most important information using numerical
summary statistics.
These quantities, calculated from the sample, are estimators of the
population parameters.

Statistics 23 / 129
Descriptive Statistics

Numerical summary statistics

Numerical summary statistics can be grouped into:


Measures of location
Central location: mean, median, mode
Non-central location: quantiles
Measures of dispersion (a.k.a. variability)
Range
IQR
Variance, standard deviation
Coefficient of variation
Measures of shape
Skewness
Kurtosis

Statistics 24 / 129
Descriptive Statistics

Arithmetic Mean
The arithmetic mean (or simply mean) of a set of data is the sum of the data
values divided by the number of observations.
Population mean (if we have the entire population of size n available):

Pn
i=1 xi x1 + x2 + . . . + xn
µ= = .
n n

where µ is the parameter.


Given n observations x1 , x2 , · · · , xn in a random sample, the sample mean or
average is calculated as:

Pn
i=1 xi x1 + x2 + . . . + xn
x̄ = =
n n

Statistics 25 / 129
Descriptive Statistics

Arithmetic Mean

Arithmetic mean is the most intuitive and commonly used measure for
central tendency.
It can be applied for quantitative data only.
It has a major disadvantage: it is not robust to outliers (atypical data point).

Statistics 26 / 129
Descriptive Statistics

Properties of the arithmetic mean

Call Y = a + b · X a function of X , where a and b are some real numbers.


Then Y is a linear function of X .
If we know the mean of X , then the mean of Y is just

ȳ = a + b · x̄ .

Statistics 27 / 129
Descriptive Statistics

Median

The median is the middle observation of a set of observations that are


ordered in increasing order.
If the sample size, n, is an even number, the median is the average of the two
middle observations.
The median can be found in 0.5(n + 1)th ordered position. If the position is
with a decimal (like .5), take the mean of the two middle observations.
In many cases median is preferred because it is robust to outliers.
It is less sensitive than the mean to asymmetries. For that reason, in skewed
distributions, the mean is more shifted toward the tail of the distribution
than the median.

Statistics 28 / 129
Descriptive Statistics

Median: example

Consider a data set of 5 observations: number of phone calls per hour in a


call center.
30 15 50 20 30
Find the median.
Change the data set to include an outlier, 2 phone calls.
30 2 50 20 30
Find the median.
The median is robust to outliers.

Statistics 29 / 129
Descriptive Statistics

Mode

The mode, if one exists, is the most frequently occurring value.


A distribution with one mode is called unimodal; with two modes, it is called
bimodal; and with more than two modes, the distribution is said to be
multimodal.
The mode is most commonly used with qualitative data.

Statistics 30 / 129
Descriptive Statistics

Mode: example

Consider a data set of 5 observations: number of phone calls per hour in a


call center.
30 15 50 20 30
Find the mode.
Change the data set to include an outlier, 2 phone calls.
30 2 50 20 30
Find the mode.
The mode is robust to outliers.

Statistics 31 / 129
Descriptive Statistics

Mean, median, mode: the shape of the distribution

Comparison of mean, median and mode also help to describe the shape of
the distribution.

Statistics 32 / 129
Descriptive Statistics

Mean, median mode: example

A sample of number of bottles of water sold per hour in a certain store:

72 80 67 70 63 75 75 65 82 84 85 60
P12
Given that the i=1 xi = 878, describe the central tendency of the data.

Statistics 33 / 129
Descriptive Statistics

Mean, median mode: example

Mean:
878
x̄ = = 73.17 bottles
12
Median. First sort the observations in increasing order:

60 63 65 67 70 72 75 75 80 82 84 85

Then find the 1/2(12 + 1) = 6.5th position, which is between the 6th and
the 7th: (72 + 75)/2 = 73.5
The mode is 75.
Since Mo > Me > x̄ , the distribution is skewed to the left (negatively
skewed).

Statistics 34 / 129
Descriptive Statistics

Percentiles, Deciles and Quartiles

Percentiles and quartiles are measures that indicate the position relative to
the entire dataset.
For example, you are told that you scored in the 92nd percentile on your
mathematics exam. It means that approximately 92% of the students who
took this exam scored same or lower than you and approximately 8% of the
students who took this exam scored higher than you.

Statistics 35 / 129
Descriptive Statistics

Percentiles, Deciles and Quartiles

Percentiles separate an ordered data set into 100 parts. For example, in
number of siblings P15 = 2 means that 15% of a sample (or population)
have 2 siblings or less.
Deciles separate an ordered data set into 10 parts. For example, in income
D4 = 2000EUR means that 40% of a sample (or population) have income of
2000EUR or less.
Quartiles separate an ordered data set into 4 parts. For example, in exam
grade Q3 = 8.6 means that 75% of students have grade of 8.6 or less.

Convert deciles and quartiles to percentiles and use the percentile formula to find
the position of the r th percentile:
r
k= · (n + 1)
100

Statistics 36 / 129
Descriptive Statistics

Percentiles, Deciles and Quartiles: example

Consider the demand for water bottles dataset:

60 63 65 67 70 72 75 75 80 82 84 85 86

Find P15 , D7 and Q3 .

Statistics 37 / 129
Descriptive Statistics

Percentiles, Deciles and Quartiles: example


We may calculate any r -percentile as follows
1 Sort the data in increasing order.
2 Calculate
r
k= · (n + 1).
100
3 The element at the k-th position in the sorted data is the p-percentile.
4 Note that the calculation may well be fractional (i.e., the number has us
search between two values). When this is the case, then we interpolate.
r
P15 : 100
· (n + 1) = 0.15 · 14 = 2.1. So
P15 = 0.9 · 63 + 0.1 · 65 = 63.2.
r
D7 : 100
· (n + 1) = 0.7 · 14 = 9.8. So
D7 = 0.2 · 80 + 0.8 · 82 = 81.6.
r
Q3 : 100
· (n + 1) = 0.75 · 14 = 10, 5. So
82 + 84
Q3 = = 83.
2
Statistics 38 / 129
Descriptive Statistics

Box-and-Whisker Plot

A box-and-whisker plot is a graph that describes the shape of a distribution


in terms of the five-number summary: xmin , Q1 , Median, Q3 , xmax .
By looking at the box-plot we can infer the spread, central tendency and
skewness in the data, also discover if there are outliers.
Outlier: unusually large or small data point, defined as any observation
outside the limits:

LL = Q1 − 1.5 × (Q3 − Q1 )
UL = Q3 + 1.5 × (Q3 − Q1 )

Statistics 39 / 129
Descriptive Statistics

Box-and-Whisker Plot

Statistics 40 / 129
Descriptive Statistics

Box-and-Whisker Plot and symmetry

By looking at the boxplot we can also determine if the data is symmetric or


not.

Statistics 41 / 129
Descriptive Statistics

Box-and-Whisker Plot: example

Gilotti’s Pizzeria has 4 locations in one large metropolitan area. Daily sales
(in hundreds of dollars) from a random sample of 10 weekdays from each of
the 4 locations are given in table below.
25
20
15
10
5
0

Location1 Location2 Location3 Location4

Which location has the largest range (spread) of sales? Largest median?

Statistics 42 / 129
Descriptive Statistics

Variability

What is variability?
Sample A: 1 2 3 34
Sample B: 8 9 10 13

Sample means are the same x̄A = x̄B = 10, but is the spread (a.k.a. variability,
dispersion, variation) the same?

Statistics 43 / 129
Descriptive Statistics

Range and IQR

Range is the difference between the largest and smallest observations. It


measures the total spread of the data:

Re = xmax − xmin

Not robust to outliers, also, affected by the units of measurement.


The interquartile range (IQR) measures the spread in the middle 50% of the
data:

IQR = Q3 − Q1 ≡ P75 − P25

Robust to outliers, affected by the units of measurement.


If IQR increases, the dispersion also increases and the concentration of data
around the mean decreases.

Statistics 44 / 129
Descriptive Statistics

Range and IQR: example

Consider the demand for water bottles dataset:

60 63 65 67 70 72 75 75 80 82 84 85 86

xmin = 60, xmax = 86.


Q1 = (65 + 67)/2 = 66.
Q3 = (82 + 84)/2 = 83.

Range: Re = 86 − 60 = 26.
Interquartile range: IQR = 83 − 66 = 17.

Statistics 45 / 129
Descriptive Statistics

Range and IQR: example with outliers

Consider the demand for water bottles dataset:

36 63 65 67 70 72 75 75 80 82 84 85 86

xmin = 36, xmax = 86


Q1 = (65 + 67)/2 = 66
Q3 = (82 + 84)/2 = 83

Range: Re = 86 − 36 = 50
Interquartile range: IQR = 83 − 66 = 17

Statistics 46 / 129
Descriptive Statistics

Variance and Standard Deviation

Although range and interquartile range measure the spread of data, both
measures take into account only two of the data values.
We need a measure that would present an average of the distance between
each of the data values and the mean.
Such measure is called variance and the square root of variance is called
standard deviation.

Statistics 47 / 129
Descriptive Statistics

Variance and Standard Deviation


The population variance σ 2 is the sum of the squared differences between
each observation and the population mean divided by the population size n:

Pn Pn
2 i=1 (xi − µ)2 i=1 xi
2
σ = = − µ2 .
n n

Given n observations x1 , x2 , · · · , xn in a random sample, the sample variance


(sometimes called a quasi-variance) s 2 is calculated as:

Pn
2 − x̄ )2
i=1 (xi n
s = = σ2 .
n−1 n−1

The standard deviation, is the square root of variance


√ √
σ = σ2 s = s2
Statistics 48 / 129
Descriptive Statistics

Shape of a Distribution

There are several ways to describe the shape of the distribution.


The most commonly commented aspects of the shape are:
- Skewness (symmetry)
- Kurtosis (tails)

Statistics 49 / 129
Descriptive Statistics

Shape of a Distribution: skewness

Skewness is a measure of the asymmetry. The symmetry (or the lack of) can be
described by:
1 Looking at a histogram or a bar chart or a box plot.
2 Comparing the mean, the median and the mode:
- If mean=median=mode the distribution is symmetric.
- If mean>median>mode the distributions is skewed to the right (positively).
- If mean<median<mode the distributions is skewed to the left (negatively).
3 Calculating the coefficient of skewness. Most commonly used is Fisher’s
(implemented in R and Excel) coefficient.
The two approaches usually (but not always) lead to the same conclusion.

Statistics 50 / 129
Descriptive Statistics

Shape of a Distribution: skewness

Fisher’s coefficient of skewness:


Usually implemented in the statistical packages, also, used in Excel
(COEFICIENTE.ASIMETRIA) and in R moments::skewness.
Pn
(xi − µ)3
AF = i=1
n · σ3

- If AF = 0, then the distribution is symmetric.


- If AF > 0, then the distribution is positively skewed (asymmetric to the right).
- If AF < 0, then the distribution is negatively skewed (asymmetric to the left).
- A coefficient of asymmetry greater than 1 in absolute value can be considered
high.

Statistics 51 / 129
Descriptive Statistics

Shape of a Distribution: example 1

A random sample of 55 observations of variable “Page views”.

Page views Page views


2000

20
1500

15
10
1000

5
500

0
0 500 1000 1500 2000 2500

Statistics 52 / 129
Descriptive Statistics

Shape of a Distribution: example 2

A random sample of 55 observations of variable “Grade”.

Grade Grade
8

15
6

10
4

5
2
0

0
0 2 4 6 8 10

Statistics 53 / 129
Descriptive Statistics

Shape of a Distribution: kurtosis

Statistics 54 / 129
Descriptive Statistics

Shape of a Distribution: kurtosis

Statistics 55 / 129
Descriptive Statistics

Shape of a Distribution: kurtosis

If we zoom-in the tails...

Thin tails (platykurtic)


Normal tails (mesokurtic)
Thick tails (leptokurtic)

Statistics 56 / 129
Descriptive Statistics

Shape of a Distribution: kurtosis

Kurtosis is a measure of ”tailedness”: higher kurtosis corresponds to greater


probability of extreme events (outliers).
Usually hard to see with a naked eye (unlike skewness). Can be found by
calculating Pearson’s coefficient of kurtosis:
Pn
(xi − µ)4
K = i=1
n · σ4
- If K = 3, the distribution has tails that are neither thick nor thin; just like
Normal distribution. Mesokurtic.
- If K > 3, the distribution has tails that are thick/fat as compared to the
Normal distribution. Leptokurtic.
- If K < 3, the distribution has tails that are thin as compared to the Normal
distribution. Platikurtic.
It is common to express the coefficient of kurtosis in terms of the so called
Excess Kurtosis:
K − 3.

Statistics 57 / 129
Probability

1 Organizational issues

2 Statistics: basic concepts

3 Descriptive Statistics

4 Probability

5 Random variables
Discrete random variables
Continuous random variables

6 Probability Distributions
Bernoulli and Binomial Distribution
Normal distribution: approximations
Poisson Distribution

7 Continuous probability distributions


Uniform distribution
Exponential distribution

Statistics 58 / 129
Probability

Probability

Probability is used to describe and quantify


uncertainty.
Everything in this world has some level of uncertainty.
Our objective is to understand probabilities, how they
can be determined and how they can be used.

Statistics 59 / 129
Probability

Random Experiment

A random experiment is a process leading to two or more possible outcomes,


without knowing exactly which outcome will occur.
Examples of a random experiment:
- A coin is tossed: two possible outcomes: head or tails. But we don’t know
which outcome will occur.
- The number of persons admitted to a hospital emergency room.
- The grade a student will obtain in a final exam.
- Gender of your firstborn.
- Rolling a die.

Statistics 60 / 129
Probability

Sample Space

The possible outcomes from a random experiment are called the basic
outcomes, and the set of all basic outcomes is called the sample space. We
use the symbol S to denote the sample space.
Sample spaces can be finite or infinite, formed of qualitative or quantitative
outcomes.
Examples of a sample spaces:
- A coin is tossed, S = {H, T }. Basic outcomes: H or T. The set of all basic
outcomes is the sample space S = {H, T }.
- The number of persons admitted to a hospital emergency room,
S = {0, 1, 2, 3, . . .}.
- Rolling a die, S = {1, 2, 3, 4, 5, 6}.
- Height of a student, S = (1.40, 2.20).
- Name of a student, S = {Adele, Peter , Emily , John...}.

Statistics 61 / 129
Probability

Event

An event, E , is any subset of basic outcomes (elements) from the sample


space S.
The null event represents the absence of a basic outcome and it is denoted
by ∅. It is sometimes called an empty set.
Examples of events:
- A coin is tossed, S = {H, T }, event: observe heads E = {H}. For example, if
we get H the event E occurs.
- The number of persons admitted to a hospital emergency room,
S = {0, 1, 2, 3, . . .}, event: 3 or less people are admitted to the ER,
E = {0, 1, 2, 3}. For example, if 2 people are admitted to the ER then the
event E occurs.
- Rolling a die, S = {1, 2, 3, 4, 5, 6}, event: observe an odd number
E = {1, 3, 5}.
- Rolling a die, S = {1, 2, 3, 4, 5, 6}, event: observe a seven E = ∅.

Statistics 62 / 129
Probability

Sample space/universal set (S, Ω)



Event (a subset)

Basic outcome (element)
Example:
Random experiment: Rolling a die.
Sample space, S = {1, 2, 3, 4, 5, 6}.
Event, observe a prime number,
E = {2, 3, 5}. Notation E ⊂ S.
Basic outcome, observe a 2, which is an
element of S. Notation: 2 ∈ S.

Statistics 63 / 129
Probability

Probability

Probability quantifies the uncertainty of a certain event: how likely an event


is to occur.
We consider three definitions of probability (but there are more):
I. Classical probability.
II. Relative frequency probability.
III. Subjective probability.

Statistics 64 / 129
Probability

I. Classical probability

Classical probability is the theoretical proportion of times that an event will


occur:
NA
P(A) =
N
For example, we roll a die once. What is the probability to observe an odd
number?
3
P(Odd) = = 0.5
6
Because S = {1, 2, 3, 4, 5, 6} (the sample space) and A = {1, 3, 5} (event A:
observe an odd number).

Statistics 65 / 129
Probability

II. Relative Frequency

The relative frequency probability is the limit of the proportion of times that
event A occurs in a large number of trials:
nA
fr (A) = P(A) = lim fr (A)
n n→+∞

For example, we roll a die n = 10 times. What is the probability to observe


an odd number?
- We observed a one 2 times, a three - 1 time and a five - 1 time.
- Then the probability of an odd number is:
2+1+1
P(Odd) = = 0.4
10
What about n = 100?
- We observed a one 15 times, a three - 16 times and a five - 17 times:
15 + 16 + 17 48
P(Odd) = = = 0.48
100 100

Statistics 66 / 129
Probability

Relative Frequency (cont.)

We can calculate what would happen if we rolled a die 1000, 10 000 and an
infinite amount of times....

Statistics 67 / 129
Probability

III. Subjective Probability

Subjective probability expresses an individual’s degree of belief about the


chance that an event will occur. Also called Bayesian probabilities.
For example, in subjective probability setting we can say ”I am 25% sure it
will rain tomorrow”. There is no need to have a repeated experiment (as in
frequentist) or to know the exact proportion (as in classical) to attach
personal uncertainty to events that repeat only once.
Subjective probability is most intuitive and used most often in decision
making.

Statistics 68 / 129
Probability

Probability Postulates

Independently of the definition of probability, all probabilities need to meet the


following postulates, a.k.a. Kolmogorov axioms:
If A is any event in the sample space, S, then

0 ≤ P(A) ≤ 1.

Let O1 and O2 be two mutually exclusive sets (that is, O1 ∩ O2 = ∅), then

P(O1 ∪ O2 ) = P(O1 ) + P(O2 ).

In particular, let A be an event in S and let Oi denote the basic outcomes (of
the event A). Then,
X
P(A) = P(Oi ) = P(O1 ) + P(O2 ) + · · · .

P(S) = 1.

Statistics 69 / 129
Probability

Complement and addition Rule


Let A be an event and Ā its complement. Then the complement rule is as
follows:

P(Ā) = 1 − P(A).

Let A and B be two events. Using the addition rule of probabilities, the
probability of their union is as follows:

P(A ∪ B) = P(A) + P(B) − P(A ∩ B).

Statistics 70 / 129
Probability

Conditional Probability

Let A and B be two events.


The conditional probability of event A, given that event B has occurred is
given by:

P(A ∩ B)
P(A|B) = .
P(B)

The conditional probability of event B, given that event A has occurred is


given by:
P(A ∩ B)
P(B|A) = .
P(A)

Statistics 71 / 129
Probability

Conditional probability: example 1


We roll a die once and call event A - observe and odd number, and event B -
observe a number larger than 3. Find the probability of an odd number given that
a number larger than 3 was observed.

Statistics 72 / 129
Probability

Conditional probability: example 1


We roll a die once and call event A - observe and odd number, and event B -
observe a number larger than 3. Find the probability of an odd number given that
a number larger than 3 was observed.
In other words, we know that a number larger than 3 was observed (event B
has happened), i.e. we have rolled one of the following {4, 5, 6}. Then we
just need to find the probability of observing an odd number in {1, 3, 5}.

Statistics 72 / 129
Probability

Conditional probability: example 1


We roll a die once and call event A - observe and odd number, and event B -
observe a number larger than 3. Find the probability of an odd number given that
a number larger than 3 was observed.
In other words, we know that a number larger than 3 was observed (event B
has happened), i.e. we have rolled one of the following {4, 5, 6}. Then we
just need to find the probability of observing an odd number in {1, 3, 5}.
Formally, calculate the necessary probabilities given the events A = {1, 3, 5}
and B = {4, 5, 6}.

P(A) = 1/2 P(B) = 1/2 P(A ∩ B) = 1/6,

because A ∩ B = {5}.

Statistics 72 / 129
Probability

Conditional probability: example 1


We roll a die once and call event A - observe and odd number, and event B -
observe a number larger than 3. Find the probability of an odd number given that
a number larger than 3 was observed.
In other words, we know that a number larger than 3 was observed (event B
has happened), i.e. we have rolled one of the following {4, 5, 6}. Then we
just need to find the probability of observing an odd number in {1, 3, 5}.
Formally, calculate the necessary probabilities given the events A = {1, 3, 5}
and B = {4, 5, 6}.

P(A) = 1/2 P(B) = 1/2 P(A ∩ B) = 1/6,

because A ∩ B = {5}.
Then apply the rule of conditional probability:

P(A ∩ B) 1/6
P(A|B) = = = 1/3.
P(B) 1/2

Statistics 72 / 129
Probability

Statistical Independence

Statistical independence is a special case for which the conditional probability


of A, given B, is the same as the unconditional probability of A:

P(A|B) = P(A) and P(B|A) = P(B)

Let A and B be two events. These events are said to be statistically


independent if and only if

P(A ∩ B) = P(A) · P(B)

In fact, A and B are statistically independent if and only if P(A|B) = P(A)


and P(B|A) = P(B) but, applying the multiplication rule, this is equivalent to

P(A ∩ B) = P(A|B) · P(B) = P(A) · P(B).

Statistics 73 / 129
Probability

Statistical Independence: example 1

In a certain country, 30% of people have blond hair. In the same country, 25% of
the population have blue eyes. Finally, 15% of the population have blond hair and
blue eyes. Are the events ”a person has blond hair” and ”a person has blue eyes”
statistically independent?
Call event Blond - ”a person has blond hair” and event Blue - ”a person has
blue eyes.” Write down the given information:

P(Blond) = 0.3 P(Blue) = 0.25 P(Blond ∩ Blue) = 0.15

Statistics 74 / 129
Probability

Statistical Independence: example 1

In a certain country, 30% of people have blond hair. In the same country, 25% of
the population have blue eyes. Finally, 15% of the population have blond hair and
blue eyes. Are the events ”a person has blond hair” and ”a person has blue eyes”
statistically independent?
Call event Blond - ”a person has blond hair” and event Blue - ”a person has
blue eyes.” Write down the given information:

P(Blond) = 0.3 P(Blue) = 0.25 P(Blond ∩ Blue) = 0.15

If the events were independent, then P(Blond ∩ Blue) = P(Blond) · P(Blue):

P(Blond) · P(Blue) = 0.3 × 0.25 = 0.075 ̸= 0.15 = P(Blond ∩ Blue)

Statistics 74 / 129
Probability

Statistical Independence: example 1

In a certain country, 30% of people have blond hair. In the same country, 25% of
the population have blue eyes. Finally, 15% of the population have blond hair and
blue eyes. Are the events ”a person has blond hair” and ”a person has blue eyes”
statistically independent?
Call event Blond - ”a person has blond hair” and event Blue - ”a person has
blue eyes.” Write down the given information:

P(Blond) = 0.3 P(Blue) = 0.25 P(Blond ∩ Blue) = 0.15

If the events were independent, then P(Blond ∩ Blue) = P(Blond) · P(Blue):

P(Blond) · P(Blue) = 0.3 × 0.25 = 0.075 ̸= 0.15 = P(Blond ∩ Blue)

The events are not independent.

Statistics 74 / 129
Probability

Law of Total Probability


Let A and B1 , B2 , B3 .... be some mutually exclusive and collectively
exhaustive events. Then the law of total probability (LTP) states that

P(A) = P(A ∩ B1 ) + P(A ∩ B2 ) + P(A ∩ B3 ) + . . .


= P(A|B1 ) · P(B1 ) + P(A|B2 ) · P(B2 ) + P(A|B3 ) · P(B3 ) + . . .

Graphically:

Statistics 75 / 129
Probability

Law of Total Probability: example 1

We have the following information


about three hospitals: Hospital A
accepts 60% of all the patients in the
city, hospital B accepts 30% of all the
patients and the rest of the patients are
accepted by a smaller hospital C .
The cure rates for a certain disease for
the hospitals are:
- 95 % in hospital A,
- 90% in hospital B,
- and 80% in hospital C .

What is the (total) probability of getting


cured?

Statistics 76 / 129
Probability

Law of Total Probability: example 1

Apply LTP:

P(Cured) = P(Cured ∩ A) + P(Cured ∩ B) + P(Cured ∩ C )


= P(Cured|A)P(A) + P(Cured|B)P(B) + P(Cured|C )P(C )
= 0.95 · 0.60 + 0.90 · 0.30 + 0.80 · 0.10 = 0.92

Statistics 77 / 129
Probability

Bayes’ Theorem

Let A and B be two events. Then Bayes’ theorem states that

P(B|A) · P(A)
P(A|B) =
P(B)

and

P(A|B) · P(B)
P(B|A) = .
P(A)

Statistics 78 / 129
Probability

Bayes’ Theorem: example 1

We know that the probability of having COVID-19 is 15%. We also know that
probability of giving a positive for a PCR test for a person who has COVID-19 is
90%, meanwhile in 2% of cases people test positive even though they don’t have
the virus (false positive). Given that a person tested positive, what is the
probability that she has the the COVID-19?

Statistics 79 / 129
Probability

Bayes’ Theorem: example 1, solution

First, we give a name to each event:


P is the event of testing positive,
C is the event of having COVID.

Statistics 80 / 129
Probability

Bayes’ Theorem: example 1, solution

First, we give a name to each event:


P is the event of testing positive,
C is the event of having COVID.
Second, we write down all the available information:
P(C ) = 0.15, so P(C̄ ) = 1 − P(C ) = 1 − 0.15 = 0.85, that is the probability
of not having COVID.
P(P|C ) = 0.90,
P(P|C̄ ) = 0.02.

Statistics 80 / 129
Probability

Bayes’ Theorem: example 1, solution

First, we give a name to each event:


P is the event of testing positive,
C is the event of having COVID.
Second, we write down all the available information:
P(C ) = 0.15, so P(C̄ ) = 1 − P(C ) = 1 − 0.15 = 0.85, that is the probability
of not having COVID.
P(P|C ) = 0.90,
P(P|C̄ ) = 0.02.
Apply Bayes’ theorem:

P(P|C ) · P(C )
P(C |P) = .
P(P)

Statistics 80 / 129
Probability

Bayes’ Theorem: example 1, solution

For applying Bayes’ theorem, we need to compute P(P). We can get it, by
applying LTP.
Since the event C and C̄ are mutually exclusive and collectively exhaustive,
we have that
P(P) = P(P ∩ C ) + P(P ∩ C̄ )
= P(P|C ) · P(C ) + P(P|C̄ ) · P(C̄ )
= 0.90 · 0.15 + 0.02 · 0.85 = 0.152.
Hence
P(P|C ) · P(C ) 0.90 · 0.15 0.135
P(C |P) = = = = 0.89.
P(P) 0.152 0.152

Statistics 81 / 129
Random variables

1 Organizational issues

2 Statistics: basic concepts

3 Descriptive Statistics

4 Probability

5 Random variables
Discrete random variables
Continuous random variables

6 Probability Distributions
Bernoulli and Binomial Distribution
Normal distribution: approximations
Poisson Distribution

7 Continuous probability distributions


Uniform distribution
Exponential distribution

Statistics 82 / 129
Random variables

Random variable
A random variable is a function that maps from the sample space to the real
numbers.
We conduct a random experiment and after learning the outcome s in S (the
sample space of the random experiment) we calculate a number X . That is,
to each outcome s in the sample space we associate a number

X (s) = x .

Statistics 83 / 129
Random variables

For the random variable to be fully defined, we need to know two major
ingredients:
- The complete sample space (what are the possible events).
- The probabilities of each of the events. The rule of how to assign the
probabilities is sometimes called the probability model.

Statistics 84 / 129
Random variables

Random variable: example

Random experiment: toss 3 coins.


Define the random variable X - number of heads.
The possible outcomes are SX = {0, 1, 2, 3}.
A complete enumeration of the value of X for each point in the sample space
S = {HHH, HHT , HTH, THH, TTH, THT , HTT , TTT } is

s ∈ S HHH HHT HTH THH TTH THT HTT TTT


X(s) 3 2 2 2 1 1 1 0
Assuming that all eight points have the same probability, by simply counting
in the above display we see that

x ∈ SX 0 1 2 3
1 3 3 1
PX = (X = x ) 8 8 8 8
For example, PX (x = 1) = P({HTT , THT , TTH}) = 38 .

Statistics 85 / 129
Random variables Discrete random variables

1 Organizational issues

2 Statistics: basic concepts

3 Descriptive Statistics

4 Probability

5 Random variables
Discrete random variables
Continuous random variables

6 Probability Distributions
Bernoulli and Binomial Distribution
Normal distribution: approximations
Poisson Distribution

7 Continuous probability distributions


Uniform distribution
Exponential distribution

Statistics 86 / 129
Random variables Discrete random variables

Discrete random variables

Random variables (which can be only quantitative, there are no qualitative random
variables) are divided into discrete and continuous. Examples of discrete random
variables:
Number of heads after tossing a coin three times (0,1,2,3).
A result of rolling a die (1,2,3,4,5,6).
Number of clients that come into the store per day.
Number of siblings.
Number of car accidents per week in an autonomous community.

Statistics 87 / 129
Random variables Discrete random variables

Probability mass function (a.k.a. probability distribution)


The probability mass function (PMF), P(x ), of a discrete random variable X repre-
sents the probability that X takes the value x , as a function of x . In other words,
it assigns probabilities to each possible outcome

P(x ) = P(X = x ), for all values of x

Distinguish between X (upper case X) and x (lower case x)! X is the random
variable; x is a given value.
Example: we can write P(X = 7) and read “what is the probability that random
variable X is equal to the value 7”?

For example, the question “what is the probability that a store has exactly 20
customers enter in the next hour?” can be addressed using the probability mass
function as follows. First, let X be a random variable that represents the number
of customers that enter the store in the next hour. Then, express the probability
as P(X = 20).

Statistics 88 / 129
Random variables Discrete random variables

Probability mass function (a.k.a. probability distribution)

The PMF can be written down in a table format. For example, consider a
random variable X - the number we observe after rolling a die once. Then
the PMF of X is given by:
x 1 2 3 4 5 6
P(X = x ) 1/6 1/6 1/6 1/6 1/6 1/6

Properties of PMF Let X be a discrete random variable with n outcomes


x1 , x2 , · · · , xn . Then, the probability mass function P(x ) of random variable
X has to satisfy the following:
- P(xi ) = P(X = xi ) for every outcome x1 , · · · , xn .
- 0 ≤ P(xi ) ≤ 1 for any value xi .
- The individual probabilities sum up to 1, that is:
n
X
P(xi ) = 1
i=1

Statistics 89 / 129
Random variables Discrete random variables

Cumulative Probability distribution


The cumulative probability distribution, F (x ), of a random variable X ,
represents the probability that X does not exceed the value x :
F (x ) = P(X ≤ x )
The cumulative probability distribution can be written down in a table. For
example, consider a random variable X - the number we observe after rolling
a die once. Then the probability distribution function and the cumulative
probability distribution of X are given by:
x 1 2 3 4 5 6
P(x ) 1/6 1/6 1/6 1/6 1/6 1/6
F (x ) 1/6 2/6 3/6 4/6 5/6 1

Properties of CPD Let X be a discrete random variable with a cumulative


distribution function F (x ). Then:
- 0 ≤ F (x ) ≤ 1 for any value x .
- limx →−∞ F (x ) = 0 and limx →+∞ F (x ) = 1.
- If x1 and x2 are two numbers such that x1 < x2 , then
F (x1 ) ≤ F (x2 )
Statistics 90 / 129
Random variables Discrete random variables

Expected Value

The mean, a.k.a. expected value, E [X ], of a discrete random variable X is


defined as
X
E [X ] ≡ µ = P(x ) · x
x

We define the variance of a random variable as the weighted average of


the squares of its possible deviations from the mean:
X
V [X ] ≡ E [(X − µ)2 ] ≡ σ 2 = (x − µ)2 · P(x )
x
X
= x 2 · P(x ) − µ2
x
= E [X 2 ] − E [X ]2

The standard deviation, σ, is the square root of the variance.

Statistics 91 / 129
Random variables Continuous random variables

1 Organizational issues

2 Statistics: basic concepts

3 Descriptive Statistics

4 Probability

5 Random variables
Discrete random variables
Continuous random variables

6 Probability Distributions
Bernoulli and Binomial Distribution
Normal distribution: approximations
Poisson Distribution

7 Continuous probability distributions


Uniform distribution
Exponential distribution

Statistics 92 / 129
Random variables Continuous random variables

Continuous random variables

In continuous random variable case, the number of possible values is infinite:


- For example, how many different heights we can find in an interval
(150, 210)cm?
- How many different numbers we can choose from an interval (5, 6)?
Therefore, in a continuous variable case it does not make sense to talk about
the probability of being equal to a certain value.
That probability will always be equal to zero, i.e.

P(X = 5.67956392) = 0.

Statistics 93 / 129
Random variables Continuous random variables

Continuous random variables


Z
Σ→
Discrete random variable Continuous random variable
Probability (mass) function (PMF) → Probability density function (PDF)
Cumulative distribution function (CDF) → Cumulative distribution function (CDF)
0.30

0.4
0.3
0.20

0.2
0.10

0.1
0.00

0.0
−3 −2 −1 0 1 2 3

0.8
0.8
0.4

0.4
0.0

0.0

−1 0 1 2 3 4 5 6 −3 −2 −1 0 1 2 3

Statistics 94 / 129
Random variables Continuous random variables

Probability density function


Let X be a continuous random variable, and let x be any number lying in the
range of values for the random variable. The probability density function
(PDF) of the random variable is a function f (x ) that describes the behavior
of that variable. It can be represented graphically:
0.4
0.3
0.2
0.1
0.0

−3 −2 −1 0 1 2 3

Statistics 95 / 129
Random variables Continuous random variables

Probability density function: properties

The probability density function (PDF) f (x ) has the following properties:


- f (x ) ≥ 0 for all values of x .
- The area under the probability density function, f (x ), over all values of the
random variable, X within its range, is equal to 1:
Z +∞
f (x )dx = 1.
−∞

- Z x
P(X = x ) = f (x )dx = 0.
x
- Z b
P(a < x < b) = P(a ≤ x ≤ b) = f (x )dx .
a

Statistics 96 / 129
Random variables Continuous random variables

Cumulative distribution function

The definition of the cumulative distribution function (CDF) F (x ) of a


continuous random variable X , is the same as in the discrete variable case. It
represents the probability that X does not exceed the value x :

F (x ) = P(X ≤ x )

The CDF is obtained by ”accumulating” the probabilities. Only now we do


not have the discrete probabilities, but rather the density function.
Therefore, the sum is replaced with the integral:
Z x
F (x ) = P(X ≤ x ) = f (t)dt.
−∞

Which leads us the relationship between the PDF and the CDF:

f (x ) = F (x )′ .

Statistics 97 / 129
Random variables Continuous random variables

Cumulative distribution function graphically:

PDF CDF
0.4

1.0
0.8
0.3

0.6
0.2

0.4
0.1

0.2
0.0
0.0

−3 −1 1 2 3 −3 −1 1 2 3

Statistics 98 / 129
Random variables Continuous random variables

Cumulative distribution function

The properties for the CDF are as follows:


-
lim F (x ) = 0
x →−∞
-
lim F (x ) = 1
x →∞
- Z b
P(a < X < b) = f (x )dx = F (b) − F (a)
a

- F (x ) is a non-decreasing continuous function.

Statistics 99 / 129
Random variables Continuous random variables

Mean and variance

Remember, the mean and variance for the discrete random variable are given
by:
X 2
X 2 2 2 2
E [X ] ≡ µ = x · P(x ), V [X ] ≡ σ = x · P(x ) − µ = E [X ] − E [X ]
x x

For a continuous random variable the sum is replaced by the integral:


Z Z
E [X ] ≡ µ = x · f (x )dx , V [X ] ≡ σ 2 = x 2 · f (x )dx − µ2 = E [X 2 ] − E [X ]2

Statistics 100 / 129


Random variables Continuous random variables

Mean and variance properties

Let α, β be two real numbers and X , Y be two independent random


variables. Then:

E [αX + βY ] = αE [X ] + βE [Y ].
V [αX + βY ] = α2 V [X ] + β 2 V [Y ].

This can be generalized. For n independent random variables X1 , · · · Xn and


n real numbers α1 , · · · , αn and get
n n
X  X
E αi Xi = αi E [Xi ]
i i
n n
X X
αi2 V [Xi ].

V αi Xi =
i i

Statistics 101 / 129


Probability Distributions Bernoulli and Binomial Distribution

1 Organizational issues

2 Statistics: basic concepts

3 Descriptive Statistics

4 Probability

5 Random variables
Discrete random variables
Continuous random variables

6 Probability Distributions
Bernoulli and Binomial Distribution
Normal distribution: approximations
Poisson Distribution

7 Continuous probability distributions


Uniform distribution
Exponential distribution

Statistics 102 / 129


Probability Distributions Bernoulli and Binomial Distribution

Bernoulli Distribution

Before we get to the binomial distribution, we need to introduce Bernoulli random


variables. Consider a single experiment that has only two probable outcomes:
Success which happens with probability p; and
Failure which happens with probability q = 1 − p.

Statistics 103 / 129


Probability Distributions Bernoulli and Binomial Distribution

Bernoulli Distribution

Define a random variable X based on that single experiment:


(
0, if the experiment failed;
X=
1, if the experiment succeeded
or the Bernoulli distribution, we have
PMF:
P(X = 0) = q = 1 − p; P(X = 1) = p.
CDF: 
0,
 x <0
F (x ) = 1 − p, x < 1

1, x ≥1

Statistics 104 / 129


Probability Distributions Bernoulli and Binomial Distribution

Bernoulli Distribution

Example
An urn contains 40 black and 10 red balls. You pick at random one ball from the
urn. Let X be the number of black balls you pick from the urn. What is the PMF
of X ?

Statistics 105 / 129


Probability Distributions Bernoulli and Binomial Distribution

Bernoulli Distribution

Example
An urn contains 40 black and 10 red balls. You pick at random one ball from the
urn. Let X be the number of black balls you pick from the urn. What is the PMF
of X ?

Solution: X is a Bernoulli distributed random variable with PMF:


10
P(X = 0) = 50 = 0.2.
40
P(X = 1) = 50 = 0.8.

Statistics 105 / 129


Probability Distributions Bernoulli and Binomial Distribution

Binomial Distribution

What if we consider more than one experiment? What if, say, we picked 5 balls and
wanted to get 3 black ones. This is where binomially distributed random variables
come in play! The setup is simple:
n independent experiments/trials.
Each experiments ends up in a success with probability p and a failure with
probability 1 − p;
That is, each trial is a Bernoulli random variable.
Let X be the number of successes. Then X is a binomial random variable.

Statistics 106 / 129


Probability Distributions Bernoulli and Binomial Distribution

Binomial Distribution

Call Binomial distributed random variable X - number of successes out of n


trials where the probability of success is p:

X ∼ binom(n, p).

Then the probability distribution function is given by:


   
n x n n!
P(x ) = P(X = x ) = p (1 − p)n−x , where =
x x x !(n − x )!

The mean and variance are:

E [X ] = np and V [X ] = np(1 − p)

Statistics 107 / 129


Probability Distributions Bernoulli and Binomial Distribution

Binomial distribution: example


Suppose the probability of making a sale is 0.4: P(sale) = p = 0.4. The sales rep
contacts four potential costumers. What are the probabilities of making 0, 1, 2, 3, 4
sales?

Statistics 108 / 129


Probability Distributions Bernoulli and Binomial Distribution

Binomial distribution: example


Suppose the probability of making a sale is 0.4: P(sale) = p = 0.4. The sales rep
contacts four potential costumers. What are the probabilities of making 0, 1, 2, 3, 4
sales?
Call r.v. X - number of sales out of n trials where the probability of success is
p. Then X ∼ binom(n = 4, p = 0.4) and
   
n x 4
P(x ) = p (1 − p)n−x = (0.4)x (1 − 0.4)4−x
x x

Statistics 108 / 129


Probability Distributions Bernoulli and Binomial Distribution

Binomial distribution: example


Suppose the probability of making a sale is 0.4: P(sale) = p = 0.4. The sales rep
contacts four potential costumers. What are the probabilities of making 0, 1, 2, 3, 4
sales?
Call r.v. X - number of sales out of n trials where the probability of success is
p. Then X ∼ binom(n = 4, p = 0.4) and
   
n x 4
P(x ) = p (1 − p)n−x = (0.4)x (1 − 0.4)4−x
x x
Find the probabilities (PMF):
4 4!
P(X = 0) = (0.4) (1 − 0.4)4−0 =
0
0.64 = 0.64 = 0.1296.
0 0!4!
4 4!
P(X = 1) = (0.4)1 (1 − 0.4)3 = 0.4 · 0.63 = 4 · 0.4 · 0.63 = 0.3456.
1 1!3!
4 4!
P(X = 2) = (0.4)2 (1 − 0.4)2 = 0.42 · 0.62 = 6 · 0.42 · 0.62 = 0.3456.
2 2!2!
4 4!
P(X = 3) = (0.4)3 (1 − 0.4)1 = 0.43 · 0.61 = 4 · 0.43 · 0.6 = 0.1536.
3 3!1!
4 4!
P(X = 4) = (0.4)4 (1 − 0.4)0 = 0.44 · 0.60 = ·0.44 = 0.0256.
4 4!0!

Statistics 108 / 129


Probability Distributions Bernoulli and Binomial Distribution

Binomial distribution: example (cont.)

We have calculated:

x 0 1 2 3 4
P(x ) 0.1296 0.3456 0.3456 0.1536 0.0256
F (x ) 0.1296 0.4752 0.8208 0.9744 1.0000

Statistics 109 / 129


Probability Distributions Bernoulli and Binomial Distribution

Binomial distribution: example (cont.)

We have calculated:

x 0 1 2 3 4
P(x ) 0.1296 0.3456 0.3456 0.1536 0.0256
F (x ) 0.1296 0.4752 0.8208 0.9744 1.0000

Note that P(0) + P(1) + P(2) + P(3) + P(4) = 1.

Statistics 109 / 129


Probability Distributions Bernoulli and Binomial Distribution

Binomial distribution: example (cont.)

We have calculated:

x 0 1 2 3 4
P(x ) 0.1296 0.3456 0.3456 0.1536 0.0256
F (x ) 0.1296 0.4752 0.8208 0.9744 1.0000

Note that P(0) + P(1) + P(2) + P(3) + P(4) = 1.


What is the probability of making at most one sale?
P(X ≤ 1) = F (1) = P(0) + P(1) = 0.1296 + 0.3456 = 0.4752.

Statistics 109 / 129


Probability Distributions Bernoulli and Binomial Distribution

Binomial distribution: example (cont.)

We have calculated:

x 0 1 2 3 4
P(x ) 0.1296 0.3456 0.3456 0.1536 0.0256
F (x ) 0.1296 0.4752 0.8208 0.9744 1.0000

Note that P(0) + P(1) + P(2) + P(3) + P(4) = 1.


What is the probability of making at most one sale?
P(X ≤ 1) = F (1) = P(0) + P(1) = 0.1296 + 0.3456 = 0.4752.
What is the probability of making at between two and four sales (inclusive)?
P(2 ≤ X ≤ 4) = P(2) + P(3) + P(4) = F (4) − F (1) = 0.5248.

Statistics 109 / 129


Probability Distributions Bernoulli and Binomial Distribution

Binomial distribution: example (cont.)

We have calculated:

x 0 1 2 3 4
P(x ) 0.1296 0.3456 0.3456 0.1536 0.0256
F (x ) 0.1296 0.4752 0.8208 0.9744 1.0000

Note that P(0) + P(1) + P(2) + P(3) + P(4) = 1.


What is the probability of making at most one sale?
P(X ≤ 1) = F (1) = P(0) + P(1) = 0.1296 + 0.3456 = 0.4752.
What is the probability of making at between two and four sales (inclusive)?
P(2 ≤ X ≤ 4) = P(2) + P(3) + P(4) = F (4) − F (1) = 0.5248.
In R we use dbinom() and pbinom() commands.
dbinom is for the PMF and pbinom is for the CDF.

Statistics 109 / 129


Probability Distributions Bernoulli and Binomial Distribution

Binomial distribution: example (cont.)

We can also graph the probability distribution function (PMF) and the cumulative
distribution function (CDF).

PMF CDF

1.0
0.30

0.8
0.20

0.6
P(x)

F(x)

0.4
0.10

0.2
0.00

0.0
−1 0 1 2 3 4 5 −1 0 1 2 3 4 5

x x

Statistics 110 / 129


Probability Distributions Bernoulli and Binomial Distribution

Binomial distribution: example (cont.)


X ∼ binom(n = 4, p = 0.4)
We have calculated:
x 0 1 2 3 4
P(x ) 0.1296 0.3456 0.3456 0.1536 0.0256
F (x ) 0.1296 0.4752 0.8208 0.9744 1.0000

Mean and variance:


E [X ] ≡ µX = np = 4 · 0.4 = 1.6sales
V [X ] ≡ σX2 = np(1 − p) = 4 · 0.4 · 0.6 = 0.96sales2

Statistics 111 / 129


Probability Distributions Normal distribution: approximations

Approximating other distributions


For convenience, and under certain conditions, some discrete distributions
can be approximated using a Normal distribution.
Binomial: when np ≥ 5 and n(1 − p) ≥ 5, Binomial distribution can be
approximated using the Normal distribution with the Binomial mean and
variance: µ = np and σ 2 = np(1 − p).
Before proceeding, we must apply continuity correction: and adjustment that
is made when discontinuous distributions (like Binomial) are approximated by
continuous ones (like Normal).

Statistics 112 / 129


Probability Distributions Normal distribution: approximations

Approximating other distributions: example


Let X ∼ binom(n = 10, p = 0.5). So that np = 5 and n(1 − p) = 5.

Statistics 113 / 129


Probability Distributions Normal distribution: approximations

Approximating other distributions: example


Let X ∼ binom(n = 10, p = 0.5). So that np = 5 and n(1 − p) = 5.
Find P(X ≤ 2) using the Binomial distribution and the Normal
approximation.
Using the binomial distribution, we have that
P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2).
We use R for the calculations
dbinom(0,10,0.5)+dbinom(1,10,0.5)+dbinom(2,10,0.5)
=0.0009765625+0.009765625+0.04394531=0.0546875.

Statistics 113 / 129


Probability Distributions Normal distribution: approximations

Approximating other distributions: example


Let X ∼ binom(n = 10, p = 0.5). So that np = 5 and n(1 − p) = 5.
Find P(X ≤ 2) using the Binomial distribution and the Normal
approximation.
Using the binomial distribution, we have that
P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2).
We use R for the calculations
dbinom(0,10,0.5)+dbinom(1,10,0.5)+dbinom(2,10,0.5)
=0.0009765625+0.009765625+0.04394531=0.0546875.
Using the normal approximation, µ = np = 5 and
σ 2 = np(1 − p) = 5 · 0.5 = 2.5.
So that P(X ≤ 2) ≈ P(X < 2.5).
X −5 2.5 − 5
P( √ < √ ) = P(Z < −1.5811) = 1 − P(1.5811) = 1 − 0.9429 = 0.0571.
2.5 2.5

Statistics 113 / 129


Probability Distributions Normal distribution: approximations

Approximating other distributions: example


Let X ∼ binom(n = 10, p = 0.5). So that np = 5 and n(1 − p) = 5.
Find P(X ≤ 2) using the Binomial distribution and the Normal
approximation.
Using the binomial distribution, we have that
P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2).
We use R for the calculations
dbinom(0,10,0.5)+dbinom(1,10,0.5)+dbinom(2,10,0.5)
=0.0009765625+0.009765625+0.04394531=0.0546875.
Using the normal approximation, µ = np = 5 and
σ 2 = np(1 − p) = 5 · 0.5 = 2.5.
So that P(X ≤ 2) ≈ P(X < 2.5).
X −5 2.5 − 5
P( √ < √ ) = P(Z < −1.5811) = 1 − P(1.5811) = 1 − 0.9429 = 0.0571.
2.5 2.5
We use R for the calculations
pnorm(2.5,5,sqrt(2.5))=0.05692315.

Statistics 113 / 129


Probability Distributions Normal distribution: approximations

Approximating other distributions: example


0.25

Binomial(n=10,p=0.5)
Normal(mu=5,var=2.5)
0.20
0.15
0.10
0.05
0.00

0 2 4 6 8 10

Statistics 114 / 129


Probability Distributions Poisson Distribution

Poisson Distribution

Similarly as in Binomial distribution, the Poisson distribution models the


number of occurrences and the occurrences are assumed to be independent.
Differently than in Binomial distribution, there is no upper limit for the
number of occurrences.
Poisson distributed random variables are commonly used to model the
number of events that happen in a given interval.
For example, some variables that might be modelled using the Poisson
distribution:
- Number of delivery trucks to arrive at a warehouse in an hour.
- Number of car accidents in a city per day.
- Number of insurance claims per month.
- Number of kids a woman has in a lifetime.

Statistics 115 / 129


Probability Distributions Poisson Distribution

Poisson distribution

Call Poisson distributed random variable X - number of occurrences during a


certain period of time, with the average rate λ:

X ∼ Poisson(λ)

Then the probability distribution function is given by:

e −λ · λx
P(x ) = , where e ≈ 2.71828
x!
The mean and variance are:

E [X ] = λ and V [X ] = λ

a property known as equidispersion.

Statistics 116 / 129


Probability Distributions Poisson Distribution

Poisson distribution: example

Suppose that in a given city there are, on average, 3 car accidents per day. What
is the probability that there will be 0 accidents tomorrow? What about 1, 2, 3, 4,
5, 100, 1000... accidents?

Statistics 117 / 129


Probability Distributions Poisson Distribution

Poisson distribution: example

Suppose that in a given city there are, on average, 3 car accidents per day. What
is the probability that there will be 0 accidents tomorrow? What about 1, 2, 3, 4,
5, 100, 1000... accidents?
To find the probability of each outcome just apply the formula
P(X = x ) = P(x ) = e −λ · λx /x !, where λ = 3

Statistics 117 / 129


Probability Distributions Poisson Distribution

Poisson distribution: example

Suppose that in a given city there are, on average, 3 car accidents per day. What
is the probability that there will be 0 accidents tomorrow? What about 1, 2, 3, 4,
5, 100, 1000... accidents?
To find the probability of each outcome just apply the formula
P(X = x ) = P(x ) = e −λ · λx /x !, where λ = 3

P(X = 0) = e −3 · 30 /0! ≈ 0.05


P(X = 1) = e −3 · 31 /1! ≈ 0.15
P(X = 2) = e −3 · 32 /2! ≈ 0.22
P(X = 3) = e −3 · 33 /3! ≈ 0.22
P(X = 4) = e −3 · 34 /4! ≈ 0.17
...
P(X = 100) = e −3 · 3100 /100! = 2.7494 · 10−112
P(X = 1000) = e −3 · 31000 /1000! ≈ 0
Statistics 117 / 129
Probability Distributions Poisson Distribution

Poisson distribution: example (cont.)

We have calculated:
x 0 1 2 3 4 5 6 ...
P(x ) 0.05 0.15 0.22 0.22 0.17 0.10 0.05 ...
F (x ) 0.05 0.20 0.42 0.65 0.82 0.92 0.97 ...
What is the probability of having at most 5 accidents?
What is the probability of having 2 accidents or more?
Given that X ∼ Poisson(λ = 3):

E [X ] = λ = 3 accidents per day


V [X ] = λ = 3 accidents2

In R we use dpois() and ppois() commands.

Statistics 118 / 129


Probability Distributions Poisson Distribution

Poisson distribution: example (cont.)


We can also graph the probability mass function and the cumulative distribution
function.

PMF CDF

1.0
0.20

0.8
0.6
P(x)

F(x)
0.10

0.4
0.2
0.00

0 2 4 6 8 10 0 2 4 6 8 10

x x

Statistics 119 / 129


Probability Distributions Poisson Distribution

Poisson and binomial distribution


Recall the binomial distribution and its probability mass function:
 
n
P(X = x ) = · p x · (1 − p)n−x
x
Assume that we try way too many experiments (in essence let n → ∞) and define
λ as the number of successes we get: that is, p = λn . Let’s replace this in the
probability mass function itself:
 x  n−x
n! λ λ
lim P(X = x ) = lim · · 1− =
n→∞ n→∞ x ! · (n − x )! n n
λx
= e −λ
x!

In general, if n >> 30 and p < 0.1 it follows that

Binom(n, p) ∼ Poisson(np).

Statistics 120 / 129


Continuous probability distributions

1 Organizational issues

2 Statistics: basic concepts

3 Descriptive Statistics

4 Probability

5 Random variables
Discrete random variables
Continuous random variables

6 Probability Distributions
Bernoulli and Binomial Distribution
Normal distribution: approximations
Poisson Distribution

7 Continuous probability distributions


Uniform distribution
Exponential distribution

Statistics 121 / 129


Continuous probability distributions Uniform distribution

Uniform distribution

One of the simplest distributions, can be either discrete (for example, rolling
a die) or continuous.
We use uniform distribution when probabilities of the experiment are
distributed in an equal manner in a certain interval.
For example, spinning a needle on a 360◦ circle.

Statistics 122 / 129


Continuous probability distributions Uniform distribution

Uniform distribution

If a r.v. X is uniformly distributed, i.e. X ∼ U(a, b) then its PDF is given by:
(
1
x ∈ (a, b)
f (x ) = b−a
0 otherwise

Statistics 123 / 129


Continuous probability distributions Uniform distribution

Uniform distribution

If a r.v. X is uniformly distributed, i.e. X ∼ U(a, b) then its PDF is given by:
(
1
x ∈ (a, b)
f (x ) = b−a
0 otherwise

Its CDF is given by:



0
 x <a
x −a
F (x ) = x ∈ (a, b)
 b−a
1 x >b

Statistics 123 / 129


Continuous probability distributions Uniform distribution

Uniform distribution

If a r.v. X is uniformly distributed, i.e. X ∼ U(a, b) then its PDF is given by:
(
1
x ∈ (a, b)
f (x ) = b−a
0 otherwise

Its CDF is given by:



0
 x <a
x −a
F (x ) = x ∈ (a, b)
 b−a
1 x >b

The mean and variance are given by:

(b − a)2
E [X ] = 0.5(a + b), V [X ] =
12

Statistics 123 / 129


Continuous probability distributions Uniform distribution

Uniform distribution: example

We know that daily demand for gasoline (in thousands of liters) in a gas station is
uniformly distributed X ∼ U(5, 15).
Draw the PDF.
Calculate the mean.
What is the probability that the demand is 7 thousands of liters or less?
In R we use dunif() and punif() commands.

Statistics 124 / 129


Continuous probability distributions Exponential distribution

Exponential distribution

One of the simplest continuous distributions, used for continuous data that
has positive values only.
The exponential distribution is usually used to model the amount of time
(duration) until some specific event occurs.
Length, in minutes, of a phone call; waiting times at a doctor’s office; the
duration of a car battery in months. Also, the amount of money customers
spend in one trip to the supermarket; the amount of change a person has in
his/her pockets, etc.
The Exponential distribution also describes the time between events in a
Poisson process.

Statistics 125 / 129


Continuous probability distributions Exponential distribution

Exponential distribution

Let X be a r.v. exponentially distributed, i.e. X ∼ Exp(λ), then its PDF is


(
λe −λx x ≥0
f (x ) =
0 x <0

Statistics 126 / 129


Continuous probability distributions Exponential distribution

Exponential distribution

Let X be a r.v. exponentially distributed, i.e. X ∼ Exp(λ), then its PDF is


(
λe −λx x ≥0
f (x ) =
0 x <0

Its CDF is given by: if x < 0 then F (x ) = 0. If x ≥ 0 then,


Z x Z x
F (x ) = f (t) dt = λe −λt dt = −e −λt |x0 = 1 − e −λx .
−∞ 0

Hence
(
1 − e −λx x ≥0
F (x ) =
0 x <0

Statistics 126 / 129


Continuous probability distributions Exponential distribution

Exponential distribution

Graphically:
PDF CDF

1.0
0.04

0.8
0.6
F(x)
f(x)

0.02

0.4
0.2
0.00

0.0
0 20 40 60 80 120 0 20 40 60 80 120

x x

Statistics 127 / 129


Continuous probability distributions Exponential distribution

Exponential distribution

Graphically:
PDF CDF

1.0
0.04

0.8
0.6
F(x)
f(x)

0.02

0.4
0.2
0.00

0.0
0 20 40 60 80 120 0 20 40 60 80 120

x x

In R we use dexp() and pexp() commands.

Statistics 127 / 129


Continuous probability distributions Exponential distribution

Exponential distribution

Graphically:
PDF CDF

1.0
0.04

0.8
0.6
F(x)
f(x)

0.02

0.4
0.2
0.00

0.0
0 20 40 60 80 120 0 20 40 60 80 120

x x

In R we use dexp() and pexp() commands.


The mean and variance are given by:

E [X ] = 1/λ, V [X ] = 1/λ2

Statistics 127 / 129


Continuous probability distributions Exponential distribution

Exponential distribution: example

We know that the average waiting time at a doctor’s office is 20 minutes. Assume
that X -waiting time, follows an exponential distribution X ∼ Exp(λ). What is the
value of λ? What is the PDF of the random variable X ?

Statistics 128 / 129


Continuous probability distributions Exponential distribution

Exponential distribution: example

We know that the average waiting time at a doctor’s office is 20 minutes. Assume
that X -waiting time, follows an exponential distribution X ∼ Exp(λ). What is the
value of λ? What is the PDF of the random variable X ?

Since E [X ] = 1/λ = 20, then λ = 0.05

Statistics 128 / 129


Continuous probability distributions Exponential distribution

Exponential distribution: example

We know that the average waiting time at a doctor’s office is 20 minutes. Assume
that X -waiting time, follows an exponential distribution X ∼ Exp(λ). What is the
value of λ? What is the PDF of the random variable X ?

Since E [X ] = 1/λ = 20, then λ = 0.05


Hence the PDF is given by:
(
0.05e −0.05x x ≥0
f (x ) =
0 x <0

Statistics 128 / 129


Continuous probability distributions Exponential distribution

Exponential distribution: example (cont.)

We know that the waiting time at a doctor’s office follows an exponential distribu-
tion X ∼ Exp(λ = 0.05). CDF is given by
(
1 − e −λx x ≥0
F (x ) = P(X ≤ x ) = F (x ) =
0 x <0

Statistics 129 / 129


Continuous probability distributions Exponential distribution

Exponential distribution: example (cont.)

We know that the waiting time at a doctor’s office follows an exponential distribu-
tion X ∼ Exp(λ = 0.05). CDF is given by
(
1 − e −λx x ≥0
F (x ) = P(X ≤ x ) = F (x ) =
0 x <0

What is the probability of waiting less than 10 minutes?

P(x < 10) = F (10) = 1 − e −0.05·10

Statistics 129 / 129


Continuous probability distributions Exponential distribution

Exponential distribution: example (cont.)

We know that the waiting time at a doctor’s office follows an exponential distribu-
tion X ∼ Exp(λ = 0.05). CDF is given by
(
1 − e −λx x ≥0
F (x ) = P(X ≤ x ) = F (x ) =
0 x <0

What is the probability of waiting less than 10 minutes?

P(x < 10) = F (10) = 1 − e −0.05·10

What is the probability of waiting more than 1 hour?

P(x > 60) = 1 − F (60) = 1 − (1 − e −0.05·60 ) = e −0.05·60

Statistics 129 / 129


Continuous probability distributions Exponential distribution

Exponential distribution: example (cont.)

We know that the waiting time at a doctor’s office follows an exponential distribu-
tion X ∼ Exp(λ = 0.05). CDF is given by
(
1 − e −λx x ≥0
F (x ) = P(X ≤ x ) = F (x ) =
0 x <0

What is the probability of waiting less than 10 minutes?

P(x < 10) = F (10) = 1 − e −0.05·10

What is the probability of waiting more than 1 hour?

P(x > 60) = 1 − F (60) = 1 − (1 − e −0.05·60 ) = e −0.05·60

In R we can use the commands pexp(10,0.05)=0.3935 and


1-pexp(60,0.05)=0.0498

Statistics 129 / 129

You might also like