C4 Descriptive Statistics
C4 Descriptive Statistics
[email protected]
Descriptive Statistics
Standard
Median Kurtosis
Deviation
Mode
Descriptive Statistics
Measures of Position:
Position Statistics measure the data central tendency.
Central tendency refers to where the data is centered.
You may have calculated an average of some kind.
Despite the common use of average, there are different
statistics by which we can describe the average of a data set:
• Mean.
• Median.
• Mode.
Measures of center
Central tendency- In any distribution, majority of the observations pile
up, or cluster around in a particular region.
Median- observation in the data set that divides the data set into half.
Mode- value of the data set which occurs with greatest frequency
Outlier- observation that falls far from the rest of the data. Mean gets
highly influenced by the outlier.
Descriptive Statistics
Mean:
The total of all the values divided by the size of the data set.
It is the most commonly used statistic of position.
It is easy to understand and calculate.
It works well when the distribution is symmetric and there are
no outliers.
The mean of a sample is denoted by ‘x-bar’.
The mean of a population is denoted by ‘μ’.
Mean
0 1 2 3 4 5 6 7 8 9
Descriptive Statistics
Median:
The middle value where exactly half of the data values are
above it and half are below it. Median Mean
Example
23 12
33 30 1,2,1,1,3,4,100
34 31
36 37 Mean = 16
38 38
median = 2
40 40
41 41 mode = 1
41 41
44 44 Assume 100 is an outlier
45
Mean =2
Median = 38 + 40 / 2 = 39
median = 1.5
mode = 1
Descriptive Statistics
Why can the mean and median be different?
Median Mean
0 1 2 3 4 5 6 7 8 9
Descriptive Statistics
Mode:
The value that occurs the most often in a data set.
It is rarely used as a central tendency measure
It is more useful to distinguish between unimodal and
multimodal distributions
• When data has more than one peak.
Normal distribution
Bell shaped symmetric distribution.
Why is it important?
Many things are normally distributed, or very close to it.
It is easy to work with mathematically
Most inferential statistical methods make use of properties of
the normal distribution.
∑ ( x – x )2
s=
n-1 Mean (x-bar)
s = standard deviation
x = mean
x = values of the data set
n = size of the data set
Descriptive Statistics
Range:
The difference between the highest and the lowest values.
The simplest measure of variability.
Often denoted by ‘R’.
It is good enough in many practical cases.
It does not make full use of the available data.
It can be misleading when the data is skewed or in the presence
of outliers.
• Just one outlier will increase
the range dramatically. 0 1 2 3 4 5 6 7 8 9
Range
Descriptive Statistics
Measures of Shape:
Data can be plotted into a histogram to have a general idea of
its shape, or distribution.
The shape can reveal a lot of information about the data.
Shape
Skewness- Lack of symmetry in distribution. It can be interpreted from
frequency distribution.
Properties-
Mean, median & mode fall at different points.
Curve is not symmetrical but stretched more to one side.
XX XX
XX X
X X
XXXX
XXXX
X
XXX
X
X
X
X
X
XX
X
X
X
X
X
X
X
X
X
X
X
X
(+) – SK > 0 (-) – SK < 0
Descriptive Statistics
Kurtosis:
Measures the degree of flatness (or peakness) of the shape.
When the data values are clustered around the middle, then the
distribution is more peaked.
• A greater kurtosis value.
When the data values are spread around more evenly, then the
distribution is more flatted.
• A smaller kurtosis values.
X X X
X
X
X X
X X
X
X
X
X X X
X X
X X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
(-) Platykurtic (0) Mesokurtic (+) Leptokurtic
Descriptive Statistics
Further Information:
Variance is a measure of the variation around the mean.
It measures how far a set of data points are spread out from
their mean.
The units are the square of the units used for the original data.
• For example, a variable measured in meters will have a variance
measured in meters squared.
It is the square of the standard deviation. Variance = s2
Some Formulas
Mean/Average
Standard Error
Skewness =
Standard Error of Means vs Standard
Deviation
The standard error (SE) of a statistic is the
approximate standard deviation of a statistical sample
population.
the mean and standard deviation are descriptive statistics,
whereas the standard error of the mean is descriptive of the
random sampling process.
the standard error of the sample mean is an estimate of
how far the sample mean is likely to be from the population
mean, whereas the standard deviation of the sample is the
degree to which individuals within the sample differ from the
sample mean.