0% found this document useful (0 votes)

135 views13 pages

Basic Statistical Description of Data

The document introduces three common measures of central tendency: mean, median, and mode. It provides definitions and examples to explain how to calculate each measure. The mean is the average value and can be impacted by outliers. The median is the middle value when data is sorted. The mode is the most frequent value. It also introduces several measures of dispersion, including range, quartiles, interquartile range, standard deviation, and variance. These measures help describe how spread out the values in a data set are from the central tendency.

Uploaded by

Tarika Saij

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

135 views13 pages

Basic Statistical Description of Data

Uploaded by

Tarika Saij

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Introduction to Measuring the Central Tendency:

Mean, Median, and Mode are the three most common Measures of Central Tendency. They are the commonly used
descriptive statistics to describe the data through a single value (central value) that represents the center point of the
data.

1. Mean
 Mean is the most commonly used measure of central tendency.
 Mean is equal to the sum of all the values divided by the total number of values.
 Mean is also known as Arithmetic Average.
 Mean includes all the values in the data.
 Mean is impacted by outlier (extreme) values.
 Mean cannot be used for categorical data.

Practice Example
There are 15 students in a preschool and their age in months is given below. Calculate the mean age of the students.

Mean = (24+37+38+38+36+39+40+37+38+41+40+36+37+37+39 ) /15

Mean =37.13

Interpretation: The average age at which parents send their students to preschool is around 37 month
Histogram: A histogram is a commonly used graphical chart to depict numerical variables. The histogram plot of
the age of the students is shown below:

From the histogram plot, we can observe that:

 Most of the data points are distributed around mean age (37).
 The age value 24 must be a potential outlier. (Outlier values are those value which is extreme and far away from the
central tendency)

1.1 Truncated Mean or Trimmed Mean

 Truncated mean is a mean obtained after trimming off values at the high and low extremes.
 Example: In a 5% trimmed mean, the mean is computed after removing 5% of the highest and lowest value from
the data sample.

Practice Example
Let us remove 5% of the highest and lowest value from the below data. 5% trimming from 15 values is removing
0.75 observations, i.e. 1 observation from both the extremes. The sorted data is shown below:

The values 24 and 41 from both extremes will get trimmed.

Trimmed Mean = (36+36+37+37+37+37+38+38+38+39+39+40+40 ) /13

Trimmed Mean = 37.85

1.2 Weighted Mean

 In a simple mean, we give equal weight to each value.
 However, there may be instances where we may have to give some more weight to certain observations
than others in computing the mean and it is called Weighted Mean.
 The Weighted mean is calculated as.

Example
There sample data of 15 Students could have been shown as below. To compute the mean age we will have to give
weight to the frequency of occurrence of each age value and the mean so computed is weighted mean

Age 24 36 37 38 39 40 41

Frequency 1 2 4 3 2 2 1

Weighted Mean = ((24*1) + (36*2) + (37*4) + (38*3) + (39*2) + (40*2) + (41*1)) / (1+2+4+3+2+2+1)
Weighted Mean =37.13

2. Median
 Median is the middle value of the data when the observations are sorted (ascending or descending order)
 When sorted (ascending or descending), the median splits the data into two halves equally (upper and lower
halves).
 The percentile rank of median = 50%
 When sorted,
o If the number of observations (n) is odd, then the median is the value of the middle observation at
position (n + 1) / 2.
o Else If the number of observations (n) is even, then the median is the mean of the two middle-most
values at position (n/2, (n+1)/2).

Example

There are 15 students in a preschool and their age in months is given below. Calculate median:

 To find the median, first we sort the values in ascending order (or descending)
 As n = 15 (n is odd), the median will be 8th position value [(15 + 1)/2 = 8].


 The value at 8th position is 38, therefore Median = 38

Interpretation
50% of the students in preschool are below the age of 38 months and the remaining 50% are above 38 months.

3. Mode
 The most frequently occurring value in data is called the mode.
 We can use mode as the measure of central tendency for both categorical and numerical variables.
 The data distribution can have more than one mode.
 .Example

The age in months of 15 Students from a preschool is given in the table below. Compute Mode.
Let’s create a frequency distribution table for the above data.

Value 24 36 37 38 39 40 41

Frequenc
1 2 4 2 2 2 1
y

 The value 37 appeared the max number of times (four times) in the data distribution.
 Hence, Mode = 37

Types of Mode

 Unimodal: There is only one mode in the data distribution. For E.g., x = 1,2,2,3 (mode = 2).
 Bimodal: There are two modes in the data distribution. For E.g., x = 1,2,2,3,3,4 (mode = [2,3] ).
 Trimodal: There are three modes in the data distribution. For E.g., x = 1,2,2,3,3,4,4,5 (mode = [2,3,4] ).
 Multimodal: There are more than three modes in the data distribution. For E.g., x = 1,2,2,3,3,4,4,5,5,6 (mode =
[2,3,4,5] ).

Measuring the dispersion of data:

4. Range

In statistics, the range is one of the most common measures of dispersion. It is the difference between the largest and
the smallest observation in the data distribution. The range has the same unit as the data variable.
Formula: For the values of X, the range is

Range = Largest Value of X – Smallest Value of X

Example: The sample age data (in years) of the 15 students of the Data Science Executive Course is given below.
Calculate the range of the age of the students.

Solution: Sort the values in ascending order. The difference between the Max and Min is the range.

Range = 18 (i.e., the maximum observed dispersion in the data is 18)

5. Quartiles

Quartiles divide the rank-ordered data distribution into three equal parts. The values that separate parts are called the
first, second, and third quartiles.
 First Quartile (Q1): It is the median of the lower half of the data distribution (25th percentile)
 Second Quartile (Q2): It is the median of the entire data distribution (50th percentile)
 Third Quartile (Q3): It is the median of the upper half of the data distribution (75th percentile)

Example
We will use the small start-up example having 10 employees as discussed earlier. The monthly salary of the
employees is given in the table below. Find the quartiles and inter-quartile range of the salary.

Emp. No. 1 2 3 4 5 6 7 8 9 10
Monthly Salary
90 80 18 18 17 16 16 16 15 14
(k)

Second Quartile
Let us first calculate the second quartile (Median).
 Sort the values in ascending order

 The number of observations, n=10 (even), therefore Q2 is mean of (n/2)th observation and ((n/2) + 1)th observation
 Q2(median) = (1/2) * (5th observation + 6th observation)
Q2 = (16 + 17) / 2
Q2 = 16.5

First Quartile
Now, let’s calculate the first quartile (Q1)
 Q2 is the median. It splits the dataset into the upper and lower half of the distribution.
 Q1 is the median of the lower half of the distribution (90,80,18,18,17). The number of observations is 5, it is an odd
number. As such Q1 is the value at 3rd position, (n+1) / 2.
 Q1 = Value at 3rd observation
Q1 = 16

Third Quartile
 Q3 is the median of the upper half of the distribution (16,16,16,15,14). The number of observations in the upper half
also is 5. As such Q3 will be the value at 3rd position in the upper half of the data.
 Therefore Q3 = 18

6. Interquartile Range
Interquartile Range (IQR) is the range of the middle 50% of the values in the data distribution. It is the difference
between the third quartile (Q3) and the first quartile (Q1).
 Formula:

IQR = Q3 – Q1
Interquartile Range (IQR)
 The three quartiles that divide the data distribution into four equal parts are:
 Q1 = 16; Q2 = 16.5; Q3 = 18;

 IQR = Q3 – Q1 = 18 – 16
 IQR = 2

7. Standard Deviation
Standard Deviation is often denoted by the symbol SD or the Greek symbol σ or the Latin letter ‘s’. SD or σ is used
for population standard deviation and ‘s’ is used for sample standard deviation.
 Extreme values and outliers will impact the standard deviation.
 Standard Deviation can be zero (if all the values in the variable are the same)

Formula

Standard Deviation –Example

We will now take another example to explain the calculations of standard deviation.
Calculate the standard deviation for the sample age data of the 15 students from Data Science Certification Program
as given below.
Calculations
STEP – 1: Calculate the Mean
 The mean age of the 15 Data Science Executive Course students is,
 Mean (Age) = (22 + 23 + 25 + 27 + 28 + 35 + 32 + 28 + 30 + 40 + 24 + 26 + 27 + 29 + 31) / 15 => (427 / 15)
 Mean (Age) = 28.47

STEP – 2: Calculate the Standard Deviation
 Let X be the Age of the 15 Data Science Executive Students Sample, Then the Standard Deviation of sample X is,

 Let us calculate the standard deviation.
 The total number of observations, n = 15. Hence,

8. Variance
Variance is the square of the standard deviation. Being a squared term, it is non-negative.
Moreover, standard deviation is preferred over variance because standard deviation can be compared with the mean.
*) Graphic display of basic statistical description of data:

Variable
Plot Type Description
Type

Only One
A bar plot is a chart
Categorical
that presents
Variable
categorical data with
rectangular bars with
Or
heights or lengths
Bar Plot proportional to the
One
values that they
Categorical
represent.
Variable &
Visually represents
One
frequency
Continuous
distribution.
Measure

A stacked bar chart,

also known as
a stacked bar graph,
is a graph that is used
to break down a
category by another
category and compare
parts of a whole.
Two or
Stacked more
Each bar in
Bar Plot Categorical
the chart represents
Variables
one category as a
whole, and segments
in the bar represent
different parts or
categories of that
whole.
Visually represents
cross-tabulation data.
A histogram is an
approximate
representation of the
distribution of
Only One numerical data.
Histogram Continuous
Variable It is created by
converting a
continuous variable
into categorical by
binning/bucketing it.

It is a smoothed
Distributio version of the
Only One
n Plot histogram.
Continuous
(Density
Variable
Plot) Visually shows
Skewness in data.

The box plot is a
standardized way of
Only One displaying the
Continuous distribution of data
Variable based on the five-
Box Plot
Or number summary:
(Box and
One minimum, first
Whisker
Continuous quartile, median, third
Plot)
& One quartile, and
Categorical maximum.
Variable
Quickly helps find
outliers in data.
A line plot is a type of
One of the
chart that displays
dimension
information as a series
has to be
of data points called
Time and
‘markers’ connected
Line Plot the second
by straight line
dimension
segments.
a
Continuous
Visually shows trends
Variable
in Time Series Data.

A graph in which the

values of two variables
are plotted along two
axes. The pattern of
the resulting points on
Two
the plot visually
Scatter Plot Continuous
depicts the existence
Variables
of Correlation between
the two variables.

Quickly helps find

Correlation.

A pie chart is a
circular statistical
One
graphic, which is
Categorical
divided into slices to
Variable
illustrate numerical
Pie Chart associated
proportions.
with a
Continuous
Quickly helps
Measure
compare parts of a
whole.

Decision Science
No ratings yet
Decision Science
523 pages
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet
Mini Project
No ratings yet
Mini Project
31 pages
Introduction To Measuring The Central Tendency:: Practice Example
No ratings yet
Introduction To Measuring The Central Tendency:: Practice Example
14 pages
Central Tendency - Lecture Notes
No ratings yet
Central Tendency - Lecture Notes
34 pages
Quantitative Methods For Management
No ratings yet
Quantitative Methods For Management
118 pages
Lecture 3
No ratings yet
Lecture 3
10 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
26 pages
Chapter 3 - Descriptive Statistics (Ungrouped Data)
No ratings yet
Chapter 3 - Descriptive Statistics (Ungrouped Data)
30 pages
Data Analytics TB
No ratings yet
Data Analytics TB
1,944 pages
GCE As Level Representation of Dbxbbcata Measures of Central Tendency and Variation
No ratings yet
GCE As Level Representation of Dbxbbcata Measures of Central Tendency and Variation
9 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
38 pages
Statistics
No ratings yet
Statistics
6 pages
Measures of Location and VARIATION For 1 Variable
No ratings yet
Measures of Location and VARIATION For 1 Variable
44 pages
Measures of Dispersion: Profgrcnair
No ratings yet
Measures of Dispersion: Profgrcnair
22 pages
Representation of Data - 1.1.4
No ratings yet
Representation of Data - 1.1.4
6 pages
Work Book Related To Mean, Median, Mode
No ratings yet
Work Book Related To Mean, Median, Mode
14 pages
Module 1 Overview - of - Statistics
No ratings yet
Module 1 Overview - of - Statistics
11 pages
Learning Unit 3 (Chapter 3)
No ratings yet
Learning Unit 3 (Chapter 3)
63 pages
Ken Black QA ch03
0% (1)
Ken Black QA ch03
61 pages
3) S1 Representation and Summary of Data - Dispersion
No ratings yet
3) S1 Representation and Summary of Data - Dispersion
27 pages
Intro W03 Rev
No ratings yet
Intro W03 Rev
23 pages
Lec1 Statistics
No ratings yet
Lec1 Statistics
30 pages
Measures of Centrality and Variability
No ratings yet
Measures of Centrality and Variability
42 pages
CH03 - Descriptive Statistics 2
No ratings yet
CH03 - Descriptive Statistics 2
67 pages
Central Tendency and Dispersion: A.Ramesh
No ratings yet
Central Tendency and Dispersion: A.Ramesh
58 pages
Qtymeth Dispersion
No ratings yet
Qtymeth Dispersion
8 pages
Descriptive Statistics: Mean or Average
No ratings yet
Descriptive Statistics: Mean or Average
5 pages
Stats Reviewer
No ratings yet
Stats Reviewer
41 pages
Stat 1101 4 7
No ratings yet
Stat 1101 4 7
18 pages
L2-Types of Data, Central Tendency and Dispersion-2
No ratings yet
L2-Types of Data, Central Tendency and Dispersion-2
81 pages
Lesson 8 Analysis Interpretation Use of Test Data - 20231027 - 140201 - 0000
No ratings yet
Lesson 8 Analysis Interpretation Use of Test Data - 20231027 - 140201 - 0000
42 pages
1-Descriptive Statistics
No ratings yet
1-Descriptive Statistics
44 pages
1-Descriptive Statistics
No ratings yet
1-Descriptive Statistics
44 pages
4b) ppt-C4-prt 2
No ratings yet
4b) ppt-C4-prt 2
48 pages
Lesson 4: Statistics/Data Management Unit 1 - Measures of Central Tendency
No ratings yet
Lesson 4: Statistics/Data Management Unit 1 - Measures of Central Tendency
26 pages
Week 6+7+8
No ratings yet
Week 6+7+8
37 pages
Standard Deviation
No ratings yet
Standard Deviation
13 pages
FDSA Unit 2
No ratings yet
FDSA Unit 2
44 pages
Jerome Statistics
No ratings yet
Jerome Statistics
12 pages
Basic 1
No ratings yet
Basic 1
60 pages
Topic 1 Numerical Measure
No ratings yet
Topic 1 Numerical Measure
11 pages
Measure of Variation
No ratings yet
Measure of Variation
50 pages
Measures of Central Tendency To Z Score
No ratings yet
Measures of Central Tendency To Z Score
33 pages
Measures of Central Tendency and Disperssion
No ratings yet
Measures of Central Tendency and Disperssion
33 pages
Social Science Statistics (June-Aug) 2025-Topic 2
No ratings yet
Social Science Statistics (June-Aug) 2025-Topic 2
21 pages
Exploring Numerical Data - Students
No ratings yet
Exploring Numerical Data - Students
97 pages
Data Management
No ratings yet
Data Management
50 pages
Statistics Lab 10-4
No ratings yet
Statistics Lab 10-4
11 pages
Descriptive Statistics 1
No ratings yet
Descriptive Statistics 1
63 pages
2 Measures of Location - Dispersion
No ratings yet
2 Measures of Location - Dispersion
61 pages
Stat I Chapter 3
No ratings yet
Stat I Chapter 3
48 pages
QTM Lecture 3
No ratings yet
QTM Lecture 3
36 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Business Statistics CH
No ratings yet
Business Statistics CH
37 pages
Lect2c - CentralTendancy-Excercise
No ratings yet
Lect2c - CentralTendancy-Excercise
8 pages
Measures of Variability and Position
No ratings yet
Measures of Variability and Position
34 pages
Unit 20 - Central Tendency and Dispersion (Student)
No ratings yet
Unit 20 - Central Tendency and Dispersion (Student)
13 pages
CH 3
No ratings yet
CH 3
59 pages
Data Management
No ratings yet
Data Management
36 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Isometric Projection Unit 5
No ratings yet
Isometric Projection Unit 5
14 pages
Experiment-1: Aim: Software Required: Operating System Required: Procedure For Generating TCL Script Using NSG2.1
No ratings yet
Experiment-1: Aim: Software Required: Operating System Required: Procedure For Generating TCL Script Using NSG2.1
5 pages
Root Locus
No ratings yet
Root Locus
49 pages
State Variables
No ratings yet
State Variables
93 pages
B.Sc. Biotechnology Semester V BBT 514: Biometrics and Quality Control (QC) Total 45 L
No ratings yet
B.Sc. Biotechnology Semester V BBT 514: Biometrics and Quality Control (QC) Total 45 L
2 pages
Factor Analysis Ready
No ratings yet
Factor Analysis Ready
11 pages
Honors Assignment - Probability Distributions and Normal Distribution Instructions
No ratings yet
Honors Assignment - Probability Distributions and Normal Distribution Instructions
3 pages
Example ANOVA
50% (2)
Example ANOVA
3 pages
Evans 2E Ch8 Solutions
No ratings yet
Evans 2E Ch8 Solutions
191 pages
ANOVA Examples
No ratings yet
ANOVA Examples
5 pages
CH 11 Wooldridge 5e PPT
No ratings yet
CH 11 Wooldridge 5e PPT
22 pages
C 8
0% (1)
C 8
45 pages
Chapter 7 - Methods of Finding Estimators: Chapter 7 For BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011
No ratings yet
Chapter 7 - Methods of Finding Estimators: Chapter 7 For BST 695: Special Topics in Statistical Theory. Kui Zhang, 2011
30 pages
Normal Q Q Plot (Sepal - Length) Normal Q Q Plot (Sepal - Width)
No ratings yet
Normal Q Q Plot (Sepal - Length) Normal Q Q Plot (Sepal - Width)
1 page
Quantitative Critique Example
100% (1)
Quantitative Critique Example
8 pages
Activity 5 MMW
No ratings yet
Activity 5 MMW
3 pages
A Model For Predicting Music Popularity On Streami
No ratings yet
A Model For Predicting Music Popularity On Streami
10 pages
Chapter 7 Exercises
No ratings yet
Chapter 7 Exercises
4 pages
One Way ANOVA For H0: M
No ratings yet
One Way ANOVA For H0: M
2 pages
Sampling of Data
No ratings yet
Sampling of Data
13 pages
Statistical Methods For Decision Making
No ratings yet
Statistical Methods For Decision Making
11 pages
PSM Notes
No ratings yet
PSM Notes
1 page
Unit 4 Activities With Lessons
100% (1)
Unit 4 Activities With Lessons
24 pages
Panel Data Analysis For Economics and The Melbourne Institute
No ratings yet
Panel Data Analysis For Economics and The Melbourne Institute
36 pages
Presentation Chapter 7 Non Par Tests of Independent Samples (Compatibility Mode)
No ratings yet
Presentation Chapter 7 Non Par Tests of Independent Samples (Compatibility Mode)
41 pages
Estimation and Testing of Hypothesis PDF
100% (1)
Estimation and Testing of Hypothesis PDF
75 pages
IM Tutorial 3
No ratings yet
IM Tutorial 3
8 pages
Null Hypothesis
No ratings yet
Null Hypothesis
4 pages
Predictive and Descriptive Task
No ratings yet
Predictive and Descriptive Task
1 page
The Effect of Computerized Accounting Information Systems Big Data Analysis and Internal Audit
No ratings yet
The Effect of Computerized Accounting Information Systems Big Data Analysis and Internal Audit
6 pages
Com 216
100% (1)
Com 216
8 pages
Aiml QB With Ans - 075736
No ratings yet
Aiml QB With Ans - 075736
69 pages
Chi-Square Test: Understanding Research
No ratings yet
Chi-Square Test: Understanding Research
1 page

Basic Statistical Description of Data

Uploaded by

Basic Statistical Description of Data

Uploaded by

Introduction to Measuring the Central Tendency:

Mean = (24+37+38+38+36+39+40+37+38+41+40+36+37+37+39 ) /15

From the histogram plot, we can observe that:

1.1 Truncated Mean or Trimmed Mean

The values 24 and 41 from both extremes will get trimmed.

Trimmed Mean = (36+36+37+37+37+37+38+38+38+39+39+40+40 ) /13

1.2 Weighted Mean

 The value at 8th position is 38, therefore Median = 38

Measuring the dispersion of data:

Range = Largest Value of X – Smallest Value of X

Standard Deviation –Example

A stacked bar chart,

A graph in which the

Quickly helps find

You might also like