0% found this document useful (0 votes)
41 views12 pages

Probability Chap 1

This document discusses key concepts in probability and statistics. It covers: 1. Probability is used to model random phenomena, while statistics describes and interprets observed random data. This course will use probability to model statistics problems. 2. Key probability concepts covered include distributions, correlation, and statistical inference methods. 3. Statistics analyzes data to draw conclusions about populations. It uses measures like the mean, median, and mode to describe "typical" values in data. 4. The mean is the average value, the median is the middle value, and the mode is the most frequent value. These and other statistical techniques help analyze limited data to learn about entire populations.

Uploaded by

Sahil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views12 pages

Probability Chap 1

This document discusses key concepts in probability and statistics. It covers: 1. Probability is used to model random phenomena, while statistics describes and interprets observed random data. This course will use probability to model statistics problems. 2. Key probability concepts covered include distributions, correlation, and statistical inference methods. 3. Statistics analyzes data to draw conclusions about populations. It uses measures like the mean, median, and mode to describe "typical" values in data. 4. The mean is the average value, the median is the middle value, and the mode is the most frequent value. These and other statistical techniques help analyze limited data to learn about entire populations.

Uploaded by

Sahil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Course content

Probability is a field of mathematics, which investigates


the behaviour of mathematically defined random
phenomena.
Statistics attempts to describe, model and interpret the
behaviour of observed random phenomena.
In this course, we will learn probability in order to use it
as a modelling device in statistics.
Learning outcomes

After passing the course the student


knows:
1 the basic concepts and rules of probability
2 the basic properties of one- and two-dimensional
discrete and continuous probability distributions
3 common one- and two-dimensional discrete and continuous
probability distributions and knows how to apply them to
simple random phenomena
4 the basic properties of the bivariate normal distribution
5
the basic methods for collecting and describing statistical
6 data
how to apply basic methods of estimation and testing in
7 simple problems of statistical inference
the basic concepts of statistical dependence, correlation and
What is statistics?

Statistics is a collection of tools to study uncertain data.


The observed data itself is not statistics. Statistics is the
conclusions we can draw from our observations, and the
techniques to draw these conclusions.
Applicable whenever there is quantifiable data available.
Terminology

Population is the set that contains all possible objects


of a statistical experiment.
Unit is an element of population.
Sample is a subset of the population.
Observation is an observed value of a variable attached to
each unit in the sample.
Statistical data is the collection of all observations.
Why statistics?

We want to learn something about an entire population, but can


not afford to collect (or store) all the data we would want.
Want to draw as strong conclusions as we can, from limited
data.
Perhaps counterintuitively, to get a useful sample, we want to
know as little as possible about the sample, i.e. the sample
should be selected randomly.
What is “typical” anyway?

Assume we have a data set S = {x1, . . . , xn } of n


numerical observations.
Three different notions: mean, median and mode
Mean is the “average” value: x¯ = x1 n+···+xn .
Median is the “center” value: order the sample such
that
x1 ≤ x 2 ≤ · · · ≤ x n .

If n = 2k − 1 is odd, then the median is xk .


If n = 2k is even, then the median is the average of xk and xk+1.
Mode is the most frequent value. (might not be unique.)
What is “typical” anyway?

Example
S = {−8, 0, 1, 1, 2, 2,
2}
Mean= −8+0+1+1+2+2+2 = 0, Median=1, Mode=2
7

Example
S = {−16, 1, 1, 2, 3, 4,
5}
Mean= −16+1+1+2+3+4+5 = 0, Median=2, Mode=1
7

Example
S = {−8, −1, −1, 1, 2, 3,
4}
Mean= −8−1−1+1+2+3+4 = 0, Median=1, Mode=-1
7
Mean (or average) value

If xi = a + byi , then x¯ = a + by¯.


The average
1+4−2+10
of {100, 400, −200, 1000} can be
100
computed4 as .
·The average of {127, 99, 82, 104} can be
computed 4as .
100 + 27−1−18+4
Mean (or average) value

If a sample is composed of several smaller samples, then the


mean of the whole sample can be computed as a weighted
average of the means of the smaller samples.
Let the sample x consist of r parts x1, x2, . . . , xr , where xi
consists of ni units and n1 + · · · + nr = N.
If x¯i denotes the mean of the i :th part, then
n1 nr
x¯ 1= rx¯ +··
N N
· + x¯ .
This is not the same as the mean of the averages, because
larger samples must be given larger weight.
Median value

The median is useful when we want to ignore outliers.


If we want to understand the typical standard of living in a
developing country, it is useful to compare the median income
to the poverty line, but not the mean income.
Does not require that data can be meaningfully added
and subtracted - only that the data be ordered.
Mode

The mode is useful even for qualitative data.


For example, the mode of the data set {bus, car, bicycle,
pedestrian, pedestrian, car, pedestrian} is pedestrian, but the
mean and median of this data set is meaningless.
Requires that the observations be grouped into not too many
sets of feasible outcomes.
Measures of Variability:
Variance
The sample variance s2 is the average, squared deviation of
each observation from the mean.
The idea is that it measures the spread of the data about the
mean.
It is difficult to interpret because it’s in squared units, cannot
be negative and is only zero when all data points are equal.
n
Σ i =1 (x ¯ 2
s2= i − X )
n−1

The sample standard deviation s is the positive square root


of the variance,

s = √s2

Seungchul 12

You might also like