0% found this document useful (0 votes)
116 views34 pages

C4 Descriptive Statistics

The document discusses descriptive statistics which are methods used to describe characteristics of a data set. Descriptive statistics help make sense of data and allow conclusions to be drawn. Key aspects covered include measures of central tendency (mean, median, mode), measures of spread (range, standard deviation), and measures of shape (skewness, kurtosis). Graphical displays like histograms are also used to describe data.

Uploaded by

NAVANEETH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views34 pages

C4 Descriptive Statistics

The document discusses descriptive statistics which are methods used to describe characteristics of a data set. Descriptive statistics help make sense of data and allow conclusions to be drawn. Key aspects covered include measures of central tendency (mean, median, mode), measures of spread (range, standard deviation), and measures of shape (skewness, kurtosis). Graphical displays like histograms are also used to describe data.

Uploaded by

NAVANEETH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Descriptive Statistics

Dr. Linta Rose

[email protected]
Descriptive Statistics

 Methods of describing the characteristics of a data set.


 Useful because they allow you to make sense of the data.
 Helps exploring and making conclusions about the data in order
to make rational decisions.
 Includes calculating things such as the average of the data, its
spread and the shape it produces.
Descriptive Statistics

 Descriptive statistics involves describing, summarizing and


organizing the data so it can be easily understood.
 Graphical displays are often used along with the quantitative
measures to enable clarity of communication.
Describing data
• Qualitative data-
the variable which yield non numerical data.

– E.g.- education, marital status, eye colour

– Frequency- number of observations falling into particular class/


category of the qualitative variable.

– Frequency distribution- table listing all classes & their frequencies.

– Graphical representation- Pie chart, Bar graph.


Describing data
• Quantitative data-
– Can be presented by a frequency distribution.
– If the discrete variable has a lot of different values, or if the data is a
continuous variable then data can be grouped into classes/
categories.

– Class interval / BINS- covers the range between maximum &


minimum values.
– Class limits- end points of class interval.
– Class frequency- number of observations in the data that belong to
each class interval.

– Usually presented as a Histogram or a Bar graph.


Frequency Distribution and Histogram
Descriptive Statistics
 When analyzing a graphical display, you can draw conclusions
based on several characteristics of the graph.
 You may ask questions such ask:
• Where is the approximate middle, or center, of the graph?
• How spread out are the data values on the graph?
• What is the overall shape of the graph?
• Does it have any interesting patterns?
Descriptive Statistics
The following measures are used to describe a data set:
 Measures of position (also referred to as central tendency or
location measures).
 Measures of spread (also referred to as variability or dispersion
measures).
 Measures of shape.
Descriptive Statistics
 If assignable causes of variation are affecting the process, we
will see changes in:
• Position.
• Spread.
• Shape.
• Any combination of the three.
Properties of
Numerical data
& Measures

Central tendency Dispersion Shape

Mean Range Skewness

Standard
Median Kurtosis
Deviation

Mode
Descriptive Statistics
Measures of Position:
 Position Statistics measure the data central tendency.
 Central tendency refers to where the data is centered.
 You may have calculated an average of some kind.
 Despite the common use of average, there are different
statistics by which we can describe the average of a data set:
• Mean.
• Median.
• Mode.
Measures of center
 Central tendency- In any distribution, majority of the observations pile
up, or cluster around in a particular region.

 Mean- sum of observed values in a data divided by the


numberof observations

 Median- observation in the data set that divides the data set into half.

 Mode- value of the data set which occurs with greatest frequency

 Mean & Median can be applied only to Quantitative data

 Mode can be used either to Qualitative or Quantitative data.

 Outlier- observation that falls far from the rest of the data. Mean gets
highly influenced by the outlier.
Descriptive Statistics
Mean:
 The total of all the values divided by the size of the data set.
 It is the most commonly used statistic of position.
 It is easy to understand and calculate.
 It works well when the distribution is symmetric and there are
no outliers.
 The mean of a sample is denoted by ‘x-bar’.
 The mean of a population is denoted by ‘μ’.

Mean

0 1 2 3 4 5 6 7 8 9
Descriptive Statistics
Median:
 The middle value where exactly half of the data values are
above it and half are below it. Median Mean

 Less widely used.


 A useful statistic due to its robustness. 0 1 2 3 4 5 6 7 8 9

 It can reduce the effect of outliers.


 Often used when the data is nonsymmetrical.
 Ensure that the values are ordered before calculation.
 With an even number of values, the median is the mean of the
two middle values.
Descriptive Statistics
Median Calculation:

Example
23 12
33 30 1,2,1,1,3,4,100
34 31
36 37 Mean = 16
38 38
median = 2
40 40
41 41 mode = 1
41 41
44 44 Assume 100 is an outlier
45
Mean =2
Median = 38 + 40 / 2 = 39
median = 1.5
mode = 1
Descriptive Statistics
 Why can the mean and median be different?

Median Mean

0 1 2 3 4 5 6 7 8 9
Descriptive Statistics
Mode:
 The value that occurs the most often in a data set.
 It is rarely used as a central tendency measure
 It is more useful to distinguish between unimodal and
multimodal distributions
• When data has more than one peak.
Normal distribution
 Bell shaped symmetric distribution.
 Why is it important?
 Many things are normally distributed, or very close to it.
 It is easy to work with mathematically
 Most inferential statistical methods make use of properties of
the normal distribution.

 Mean = Median = Mode


Descriptive Statistics
Measures of Spread:
 The Spread refers to how the data deviates from the position
measure.
 It gives an indication of the amount of variation in the process.
• An important indicator of quality.
• Used to control process variability and improve quality.
 All manufacturing and transactional Spread
processes are variable to some degree.
 There are different statistics by which
we can describe the spread of a data set:
• Range.
• Standard deviation.
 Range- difference between the largest observed value in the data set
and the smallest one.
 So, while considering range great deal of information is ignored.

 Standard deviation- it is a kind of average of the absolute deviation of


observed values from the mean of the variable.
 It is defined using the sample mean & values get strongly affected
by few extreme observations.

 Variance- square of standard deviation


Descriptive Statistics
Standard Deviation:
 The average distance of the data points from their own mean.
 A low standard deviation indicates that the data points are
clustered around the mean.
 A large standard deviation indicates that they are widely
scattered around the mean.
 The standard deviation of a sample is
denoted by ‘s’.
 The standard deviation of a population
is denoted by “μ”.
Descriptive Statistics
Standard Deviation:
 Perceived as difficult to understand because it is not easy to
picture what it is.
 It is however a more robust measure of variability.
 Standard deviation is computed as follows:

∑ ( x – x )2
s=
n-1 Mean (x-bar)

s = standard deviation
x = mean
x = values of the data set
n = size of the data set
Descriptive Statistics
Range:
 The difference between the highest and the lowest values.
 The simplest measure of variability.
 Often denoted by ‘R’.
 It is good enough in many practical cases.
 It does not make full use of the available data.
 It can be misleading when the data is skewed or in the presence
of outliers.
• Just one outlier will increase
the range dramatically. 0 1 2 3 4 5 6 7 8 9
Range
Descriptive Statistics
Measures of Shape:
 Data can be plotted into a histogram to have a general idea of
its shape, or distribution.
 The shape can reveal a lot of information about the data.
Shape
 Skewness- Lack of symmetry in distribution. It can be interpreted from
frequency distribution.

 Properties-
 Mean, median & mode fall at different points.
 Curve is not symmetrical but stretched more to one side.

 Distribution may be positively or negatively skewed. Limits for


coefficient of skewness is ± 3.

 Kurtosis- convexity of a curve.


 Gives an idea about the flatness/ peakedness of the curve.
 Gives an idea about how much weights are at the tail end of the
distribution
Descriptive Statistics
Measures of Shape:
 It may be symmetrical or nonsymmetrical.
 In a symmetrical distribution, the two sides of the distribution
are a mirror image of each other.
 Examples of symmetrical distributions include:
• Uniform.
• Normal.
• Camel-back.
Descriptive Statistics
Measures of Shape:
 The shape helps identifying which descriptive statistic is more
appropriate to use in a given situation.
 If the data is symmetrical, then we may use the mean or median
to measure the central tendency as they are almost equal.
 If the data is skewed, then the median will be a more
appropriate to measure the central tendency.
 Two common statistics that measure the shape of the data:
• Skewness.
• Kurtosis.
Descriptive Statistics
Skewness:
 Describes whether the data is distributed symmetrically around
the mean.
 A skewness value of zero indicates perfect symmetry.
 A negative value implies left-skewed data.
 A positive value implies right-skewed data.
XXXXXX X

XX XX
XX X
X X
XXXX

XXXX

X
XXX

X
X
X
X

X
XX
X

X
X
X
X
X
X
X

X
X
X
X
(+) – SK > 0 (-) – SK < 0
Descriptive Statistics
Kurtosis:
 Measures the degree of flatness (or peakness) of the shape.
 When the data values are clustered around the middle, then the
distribution is more peaked.
• A greater kurtosis value.
 When the data values are spread around more evenly, then the
distribution is more flatted.
• A smaller kurtosis values.

X X X
X
X

X X

X X
X
X
X

X X X
X X
X X
X
X
X
X
X

X
X
X
X
X

X
X
X
X
X
X
X
X

X
X
X
X
X
X
X
X

X
(-) Platykurtic (0) Mesokurtic (+) Leptokurtic
Descriptive Statistics
Further Information:
 Variance is a measure of the variation around the mean.
 It measures how far a set of data points are spread out from
their mean.
 The units are the square of the units used for the original data.
• For example, a variable measured in meters will have a variance
measured in meters squared.
 It is the square of the standard deviation. Variance = s2
Some Formulas
Mean/Average

Standard Error
Skewness =
Standard Error of Means vs Standard
Deviation
 The standard error (SE) of a statistic is the
approximate standard deviation of a statistical sample
population.
 the mean and standard deviation are descriptive statistics,
whereas the standard error of the mean is descriptive of the
random sampling process.
 the standard error of the sample mean is an estimate of
how far the sample mean is likely to be from the population
mean, whereas the standard deviation of the sample is the
degree to which individuals within the sample differ from the
sample mean.

You might also like