0% found this document useful (0 votes)

116 views34 pages

C4 Descriptive Statistics

The document discusses descriptive statistics which are methods used to describe characteristics of a data set. Descriptive statistics help make sense of data and allow conclusions to be drawn. Key aspects covered include measures of central tendency (mean, median, mode), measures of spread (range, standard deviation), and measures of shape (skewness, kurtosis). Graphical displays like histograms are also used to describe data.

Uploaded by

NAVANEETH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

116 views34 pages

C4 Descriptive Statistics

Uploaded by

NAVANEETH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Descriptive Statistics

Dr. Linta Rose

[email protected]
Descriptive Statistics

 Methods of describing the characteristics of a data set.

 Useful because they allow you to make sense of the data.
 Helps exploring and making conclusions about the data in order
to make rational decisions.
 Includes calculating things such as the average of the data, its
spread and the shape it produces.
Descriptive Statistics

 Descriptive statistics involves describing, summarizing and

organizing the data so it can be easily understood.
 Graphical displays are often used along with the quantitative
measures to enable clarity of communication.
Describing data
• Qualitative data-
the variable which yield non numerical data.

– E.g.- education, marital status, eye colour

– Frequency- number of observations falling into particular class/

category of the qualitative variable.

– Frequency distribution- table listing all classes & their frequencies.

– Graphical representation- Pie chart, Bar graph.

Describing data
• Quantitative data-
– Can be presented by a frequency distribution.
– If the discrete variable has a lot of different values, or if the data is a
continuous variable then data can be grouped into classes/
categories.

– Class interval / BINS- covers the range between maximum &

minimum values.
– Class limits- end points of class interval.
– Class frequency- number of observations in the data that belong to
each class interval.

– Usually presented as a Histogram or a Bar graph.

Frequency Distribution and Histogram
Descriptive Statistics
 When analyzing a graphical display, you can draw conclusions
based on several characteristics of the graph.
 You may ask questions such ask:
• Where is the approximate middle, or center, of the graph?
• How spread out are the data values on the graph?
• What is the overall shape of the graph?
• Does it have any interesting patterns?
Descriptive Statistics
The following measures are used to describe a data set:
 Measures of position (also referred to as central tendency or
location measures).
 Measures of spread (also referred to as variability or dispersion
measures).
 Measures of shape.
Descriptive Statistics
 If assignable causes of variation are affecting the process, we
will see changes in:
• Position.
• Spread.
• Shape.
• Any combination of the three.
Properties of
Numerical data
& Measures

Central tendency Dispersion Shape

Mean Range Skewness

Standard
Median Kurtosis
Deviation

Mode
Descriptive Statistics
Measures of Position:
 Position Statistics measure the data central tendency.
 Central tendency refers to where the data is centered.
 You may have calculated an average of some kind.
 Despite the common use of average, there are different
statistics by which we can describe the average of a data set:
• Mean.
• Median.
• Mode.
Measures of center
 Central tendency- In any distribution, majority of the observations pile
up, or cluster around in a particular region.

 Mean- sum of observed values in a data divided by the

numberof observations

 Median- observation in the data set that divides the data set into half.

 Mode- value of the data set which occurs with greatest frequency

 Mean & Median can be applied only to Quantitative data

 Mode can be used either to Qualitative or Quantitative data.

 Outlier- observation that falls far from the rest of the data. Mean gets
highly influenced by the outlier.
Descriptive Statistics
Mean:
 The total of all the values divided by the size of the data set.
 It is the most commonly used statistic of position.
 It is easy to understand and calculate.
 It works well when the distribution is symmetric and there are
no outliers.
 The mean of a sample is denoted by ‘x-bar’.
 The mean of a population is denoted by ‘μ’.

Mean

0 1 2 3 4 5 6 7 8 9
Descriptive Statistics
Median:
 The middle value where exactly half of the data values are
above it and half are below it. Median Mean

 Less widely used.

 A useful statistic due to its robustness. 0 1 2 3 4 5 6 7 8 9

 It can reduce the effect of outliers.

 Often used when the data is nonsymmetrical.
 Ensure that the values are ordered before calculation.
 With an even number of values, the median is the mean of the
two middle values.
Descriptive Statistics
Median Calculation:

Example
23 12
33 30 1,2,1,1,3,4,100
34 31
36 37 Mean = 16
38 38
median = 2
40 40
41 41 mode = 1
41 41
44 44 Assume 100 is an outlier
45
Mean =2
Median = 38 + 40 / 2 = 39
median = 1.5
mode = 1
Descriptive Statistics
 Why can the mean and median be different?

Median Mean

0 1 2 3 4 5 6 7 8 9
Descriptive Statistics
Mode:
 The value that occurs the most often in a data set.
 It is rarely used as a central tendency measure
 It is more useful to distinguish between unimodal and
multimodal distributions
• When data has more than one peak.
Normal distribution
 Bell shaped symmetric distribution.
 Why is it important?
 Many things are normally distributed, or very close to it.
 It is easy to work with mathematically
 Most inferential statistical methods make use of properties of
the normal distribution.

 Mean = Median = Mode

Descriptive Statistics
Measures of Spread:
 The Spread refers to how the data deviates from the position
measure.
 It gives an indication of the amount of variation in the process.
• An important indicator of quality.
• Used to control process variability and improve quality.
 All manufacturing and transactional Spread
processes are variable to some degree.
 There are different statistics by which
we can describe the spread of a data set:
• Range.
• Standard deviation.
 Range- difference between the largest observed value in the data set
and the smallest one.
 So, while considering range great deal of information is ignored.

 Standard deviation- it is a kind of average of the absolute deviation of

observed values from the mean of the variable.
 It is defined using the sample mean & values get strongly affected
by few extreme observations.

 Variance- square of standard deviation

Descriptive Statistics
Standard Deviation:
 The average distance of the data points from their own mean.
 A low standard deviation indicates that the data points are
clustered around the mean.
 A large standard deviation indicates that they are widely
scattered around the mean.
 The standard deviation of a sample is
denoted by ‘s’.
 The standard deviation of a population
is denoted by “μ”.
Descriptive Statistics
Standard Deviation:
 Perceived as difficult to understand because it is not easy to
picture what it is.
 It is however a more robust measure of variability.
 Standard deviation is computed as follows:

∑ ( x – x )2
s=
n-1 Mean (x-bar)

s = standard deviation
x = mean
x = values of the data set
n = size of the data set
Descriptive Statistics
Range:
 The difference between the highest and the lowest values.
 The simplest measure of variability.
 Often denoted by ‘R’.
 It is good enough in many practical cases.
 It does not make full use of the available data.
 It can be misleading when the data is skewed or in the presence
of outliers.
• Just one outlier will increase
the range dramatically. 0 1 2 3 4 5 6 7 8 9
Range
Descriptive Statistics
Measures of Shape:
 Data can be plotted into a histogram to have a general idea of
its shape, or distribution.
 The shape can reveal a lot of information about the data.
Shape
 Skewness- Lack of symmetry in distribution. It can be interpreted from
frequency distribution.

 Properties-
 Mean, median & mode fall at different points.
 Curve is not symmetrical but stretched more to one side.

 Distribution may be positively or negatively skewed. Limits for

coefficient of skewness is ± 3.

 Kurtosis- convexity of a curve.

 Gives an idea about the flatness/ peakedness of the curve.
 Gives an idea about how much weights are at the tail end of the
distribution
Descriptive Statistics
Measures of Shape:
 It may be symmetrical or nonsymmetrical.
 In a symmetrical distribution, the two sides of the distribution
are a mirror image of each other.
 Examples of symmetrical distributions include:
• Uniform.
• Normal.
• Camel-back.
Descriptive Statistics
Measures of Shape:
 The shape helps identifying which descriptive statistic is more
appropriate to use in a given situation.
 If the data is symmetrical, then we may use the mean or median
to measure the central tendency as they are almost equal.
 If the data is skewed, then the median will be a more
appropriate to measure the central tendency.
 Two common statistics that measure the shape of the data:
• Skewness.
• Kurtosis.
Descriptive Statistics
Skewness:
 Describes whether the data is distributed symmetrically around
the mean.
 A skewness value of zero indicates perfect symmetry.
 A negative value implies left-skewed data.
 A positive value implies right-skewed data.
XXXXXX X

XX XX
XX X
X X
XXXX

XXXX

X
XXX

X
X
X
X

X
XX
X

X
X
X
X
X
X
X

X
X
X
X
(+) – SK > 0 (-) – SK < 0
Descriptive Statistics
Kurtosis:
 Measures the degree of flatness (or peakness) of the shape.
 When the data values are clustered around the middle, then the
distribution is more peaked.
• A greater kurtosis value.
 When the data values are spread around more evenly, then the
distribution is more flatted.
• A smaller kurtosis values.

X X X
X
X

X X

X X
X
X
X

X X X
X X
X X
X
X
X
X
X

X
X
X
X
X

X
X
X
X
X
X
X
X

X
(-) Platykurtic (0) Mesokurtic (+) Leptokurtic
Descriptive Statistics
Further Information:
 Variance is a measure of the variation around the mean.
 It measures how far a set of data points are spread out from
their mean.
 The units are the square of the units used for the original data.
• For example, a variable measured in meters will have a variance
measured in meters squared.
 It is the square of the standard deviation. Variance = s2
Some Formulas
Mean/Average

Standard Error
Skewness =
Standard Error of Means vs Standard
Deviation
 The standard error (SE) of a statistic is the
approximate standard deviation of a statistical sample
population.
 the mean and standard deviation are descriptive statistics,
whereas the standard error of the mean is descriptive of the
random sampling process.
 the standard error of the sample mean is an estimate of
how far the sample mean is likely to be from the population
mean, whereas the standard deviation of the sample is the
degree to which individuals within the sample differ from the
sample mean.

COMP5310 Notes
No ratings yet
COMP5310 Notes
10 pages
A+ Student-Notes General Maths Unit 3
No ratings yet
A+ Student-Notes General Maths Unit 3
35 pages
Seeing Through Statistics 4th Edition Utts Test Bank
100% (37)
Seeing Through Statistics 4th Edition Utts Test Bank
6 pages
DAPv9d Mac2011
No ratings yet
DAPv9d Mac2011
36 pages
Lec 5 Contd Minimax Alpha Beta Algorithm
No ratings yet
Lec 5 Contd Minimax Alpha Beta Algorithm
21 pages
Chapter 3 - Displaying and Summarizing Quantitative Data
No ratings yet
Chapter 3 - Displaying and Summarizing Quantitative Data
77 pages
Executive Data Science
100% (1)
Executive Data Science
6 pages
ERT355 - Lab Week 1 - Sem2 - 2016-2017
No ratings yet
ERT355 - Lab Week 1 - Sem2 - 2016-2017
18 pages
Statistics For Data Analysis
No ratings yet
Statistics For Data Analysis
7 pages
Chapter 9 Fundamental of Hypothesis Testing
No ratings yet
Chapter 9 Fundamental of Hypothesis Testing
26 pages
Business Statistics
No ratings yet
Business Statistics
166 pages
Chapter 1 Data Analysis
No ratings yet
Chapter 1 Data Analysis
18 pages
4 Data Analysis1
No ratings yet
4 Data Analysis1
32 pages
AL142024 Milton
No ratings yet
AL142024 Milton
72 pages
Chapter 7 - Sampling Distributions
No ratings yet
Chapter 7 - Sampling Distributions
43 pages
Statistic Interview Questions and Answers by Jeevan Raj
No ratings yet
Statistic Interview Questions and Answers by Jeevan Raj
21 pages
Modelling in R
No ratings yet
Modelling in R
47 pages
Losses in Fiber Optics
No ratings yet
Losses in Fiber Optics
22 pages
A Review On Computational Methods For Denoising and Detecting ECG Signals To Detect Cardiovascular Diseases
No ratings yet
A Review On Computational Methods For Denoising and Detecting ECG Signals To Detect Cardiovascular Diseases
40 pages
Navidi ch6
No ratings yet
Navidi ch6
82 pages
Applied Statistics: Assessment Tasks
No ratings yet
Applied Statistics: Assessment Tasks
4 pages
Slides 02 Python
No ratings yet
Slides 02 Python
24 pages
An Introduction To T
No ratings yet
An Introduction To T
7 pages
asset-v1-IIMBx QM901x 3T2015 Type@asset Block@w02 - C03
No ratings yet
asset-v1-IIMBx QM901x 3T2015 Type@asset Block@w02 - C03
6 pages
SPSS Multiple Linear Regression
No ratings yet
SPSS Multiple Linear Regression
55 pages
Statistics For Decisions Making: Dr. Rohit Joshi, IIM Shillong, Rj@iimshillong - in
No ratings yet
Statistics For Decisions Making: Dr. Rohit Joshi, IIM Shillong, Rj@iimshillong - in
10 pages
Stars Chema B
No ratings yet
Stars Chema B
10 pages
100 Days of ML
100% (1)
100 Days of ML
15 pages
Panda Programs
No ratings yet
Panda Programs
40 pages
Univariate and Bivariate Data Analysis + Probability
100% (1)
Univariate and Bivariate Data Analysis + Probability
5 pages
Approaches To The Analysis of Survey Data PDF
No ratings yet
Approaches To The Analysis of Survey Data PDF
28 pages
Minitab Workbook
No ratings yet
Minitab Workbook
28 pages
Nonparametric Tests in R
No ratings yet
Nonparametric Tests in R
5 pages
Technical Analysis RA
No ratings yet
Technical Analysis RA
27 pages
Outliers
No ratings yet
Outliers
16 pages
Gurobi Training
No ratings yet
Gurobi Training
84 pages
Excel Probability Function
No ratings yet
Excel Probability Function
4 pages
SAS Cluster Project Report
100% (1)
SAS Cluster Project Report
24 pages
In-Class Practices - Session 1 - Answers
No ratings yet
In-Class Practices - Session 1 - Answers
19 pages
Top 10 Excel Formulas
No ratings yet
Top 10 Excel Formulas
12 pages
Anomaly Detection
No ratings yet
Anomaly Detection
11 pages
Markov Chain For Transition Probability
100% (1)
Markov Chain For Transition Probability
29 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
9 pages
Discrete Data Is A Count That Involves Integers. Only A Limited Number of
No ratings yet
Discrete Data Is A Count That Involves Integers. Only A Limited Number of
3 pages
Class 7
No ratings yet
Class 7
42 pages
Stat Chapter 5-9
No ratings yet
Stat Chapter 5-9
32 pages
02-03 ASAP Business Analytics-2 Descriptive Statistics
No ratings yet
02-03 ASAP Business Analytics-2 Descriptive Statistics
109 pages
Describing Data:: Frequency Tables, Frequency Distributions, and Graphic Presentation
No ratings yet
Describing Data:: Frequency Tables, Frequency Distributions, and Graphic Presentation
14 pages
Importing Data To MATLAB
100% (1)
Importing Data To MATLAB
6 pages
Chapter 2 IA
No ratings yet
Chapter 2 IA
49 pages
BA Project Group33
No ratings yet
BA Project Group33
10 pages
Mining Class Comparisons
100% (1)
Mining Class Comparisons
4 pages
Univariate, Bivariate and Multivariate Methods in Corpus-Based Lexicography - A Study of Synonymy
100% (1)
Univariate, Bivariate and Multivariate Methods in Corpus-Based Lexicography - A Study of Synonymy
614 pages
Unit 3 Univariate Analysis
No ratings yet
Unit 3 Univariate Analysis
39 pages
Classification Metrics in Machine Learning
No ratings yet
Classification Metrics in Machine Learning
6 pages
Control Engineering-I Lab-1 Dated: 24-10-2007 1. What Is MATLAB
No ratings yet
Control Engineering-I Lab-1 Dated: 24-10-2007 1. What Is MATLAB
9 pages
Big Data For Dummies
No ratings yet
Big Data For Dummies
8 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
Exploratory Data Analysis
100% (1)
Exploratory Data Analysis
48 pages
2 - Introduction To Statistics
No ratings yet
2 - Introduction To Statistics
97 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
2.2.1 BS en 50164-1
No ratings yet
2.2.1 BS en 50164-1
20 pages
BSCO Cable Clamiping & Idetification
No ratings yet
BSCO Cable Clamiping & Idetification
44 pages
Data Prep 101
No ratings yet
Data Prep 101
50 pages
Leak Test Procedure Instrument
100% (1)
Leak Test Procedure Instrument
10 pages
Displacer LT Calibration Procedure
No ratings yet
Displacer LT Calibration Procedure
2 pages
Section2 Group7 Id Case Analysis
No ratings yet
Section2 Group7 Id Case Analysis
3 pages
SSRN 3312462
No ratings yet
SSRN 3312462
55 pages
Methods For Describing Sets of Data
No ratings yet
Methods For Describing Sets of Data
47 pages
Statistical Analysis of Data in AI
No ratings yet
Statistical Analysis of Data in AI
12 pages
Chat Openai Com Share 42b24a73 839b 4128 Ade9 7d8eed9e9533
No ratings yet
Chat Openai Com Share 42b24a73 839b 4128 Ade9 7d8eed9e9533
21 pages
Time Series-Ch08
No ratings yet
Time Series-Ch08
26 pages
A Baseline Assessment of Light Pollution Impact On Anglesey, North Wales, UK
No ratings yet
A Baseline Assessment of Light Pollution Impact On Anglesey, North Wales, UK
42 pages
Stata Seasonal Adjustment
No ratings yet
Stata Seasonal Adjustment
30 pages
Lab Manual DS 7 To 10
No ratings yet
Lab Manual DS 7 To 10
4 pages
Homogenization of Climatic Series With Climatol
No ratings yet
Homogenization of Climatic Series With Climatol
22 pages
Proske (2011) - Debris Flow Impact Estimation For Breakers
No ratings yet
Proske (2011) - Debris Flow Impact Estimation For Breakers
13 pages
Data Cleaning Ebook
No ratings yet
Data Cleaning Ebook
25 pages
MATLAB Data Analysis - MathWorks - MATLAB and Simulink For
No ratings yet
MATLAB Data Analysis - MathWorks - MATLAB and Simulink For
104 pages
Anomaly Detection RapidMiner
No ratings yet
Anomaly Detection RapidMiner
12 pages
13 - Chapter 4 PDF
No ratings yet
13 - Chapter 4 PDF
46 pages
Zuur 2010
No ratings yet
Zuur 2010
12 pages
Protocol QUATEST3 (E)
No ratings yet
Protocol QUATEST3 (E)
10 pages
Machine Learning 1.4.19
No ratings yet
Machine Learning 1.4.19
23 pages
Data Mining
No ratings yet
Data Mining
13 pages
AIML Curriculum Powered by IBM - Pregrad-Merged
No ratings yet
AIML Curriculum Powered by IBM - Pregrad-Merged
66 pages
Unit 2 ML
No ratings yet
Unit 2 ML
14 pages
Prison Tattoos Manuscript
No ratings yet
Prison Tattoos Manuscript
23 pages
CFA Level II Item-Set - Questions Study Session 3 June 2019: Reading 7 Correlation and Regression
No ratings yet
CFA Level II Item-Set - Questions Study Session 3 June 2019: Reading 7 Correlation and Regression
30 pages
CH 2 Answers
No ratings yet
CH 2 Answers
27 pages
Data Science Assignment
No ratings yet
Data Science Assignment
9 pages
Application of Time Series Analysis For Bridge Monitoring: Article
No ratings yet
Application of Time Series Analysis For Bridge Monitoring: Article
26 pages
Int375 Etp Paper
No ratings yet
Int375 Etp Paper
11 pages
Advanced Python Programming Data Science: The University of Sheffield
No ratings yet
Advanced Python Programming Data Science: The University of Sheffield
55 pages
Chapter 2 Sta404
No ratings yet
Chapter 2 Sta404
16 pages

C4 Descriptive Statistics

Uploaded by

C4 Descriptive Statistics

Uploaded by

Descriptive Statistics

Dr. Linta Rose

 Methods of describing the characteristics of a data set.

 Descriptive statistics involves describing, summarizing and

– E.g.- education, marital status, eye colour

– Frequency- number of observations falling into particular class/

– Frequency distribution- table listing all classes & their frequencies.

– Graphical representation- Pie chart, Bar graph.

– Class interval / BINS- covers the range between maximum &

– Usually presented as a Histogram or a Bar graph.

Central tendency Dispersion Shape

Mean Range Skewness

 Mean- sum of observed values in a data divided by the

 Mean & Median can be applied only to Quantitative data

 Mode can be used either to Qualitative or Quantitative data.

 Less widely used.

 It can reduce the effect of outliers.

 Mean = Median = Mode

 Standard deviation- it is a kind of average of the absolute deviation of

 Variance- square of standard deviation

 Distribution may be positively or negatively skewed. Limits for

 Kurtosis- convexity of a curve.

You might also like