0% found this document useful (0 votes)

410 views34 pages

Data Collection and Display

This document discusses methods for collecting and displaying data. It begins by reviewing learning outcomes from a previous topic on data collection and presentation. It then discusses measures of central tendency like mean, median, and mode to summarize data. It explains how to calculate the mean and introduces the concept of grouped data. It describes how frequency distributions can be used to represent grouped data and how the mean can be estimated from grouped data by using midpoints of intervals. The document emphasizes that grouping data simplifies large datasets but may reduce accuracy, so choosing appropriate interval widths is important.

Uploaded by

kemigishapatience369

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

410 views34 pages

Data Collection and Display

Uploaded by

kemigishapatience369

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Data collection/display

Theme: Data and probability.

Topic: Data collection/Display.
Competency: The learner collects and represents different sorts of data.

Introduction.
Since the dawn of time, human beings have asked some fundamental questions like who are we? why are
we here? Are we alone in the universe? Do aliens exist? Why haven’t aliens visited Earth yet? Is there life
after death? Unable to answer any of the above questions, in this topic we will delve into the various
statistical diagrams used to display data.
Recall about two years ago, we saw a topic called data collection and presentation in which we achieved
the following learning outcomes:
The learner should be able to:

• understand the differences between types of data. (k, u)

• collect and represents simple data from the local environment using tally chart, bar chart (bars do
not touch), pie chart and line graph. (k, u, s, v/a)
In this topic, we will aim to achieve the following learning outcomes.
The learner should be able to:

• understand mode, mean and median, as measures of location/central tendency and knows how to
find them and when to use them. (k, u, s)
• understand range as a measure of dispersion/spread and how to find it. (k, u, s)
• draw and use frequency tables for ungrouped data. (u, s)
• draw and use frequency tables for grouped data. (u, s)

1
Data collection/display

• estimate measures of location and dispersion for grouped data. (u, s)

• calculate the mean using an assumed mean. (u, s)
• draw a histogram with equal class intervals and uses it to estimate the mode. (u, s).
• draw a cumulative frequency curve(ogive) and uses it to estimate the median. (u, s, v/a)
Measures of central tendency.

Measures of central tendency are statistical measures that give us a single value to describe the center or
typical value of a dataset. The three main measures of central tendency are the mean, median, and mode.
These measures are essential for summarizing data and understanding its central characteristics. They
have various applications in different fields, from understanding the average income in a population to
analyzing test scores in education or sales figures in business.
Mean
The mean is the average of a set of numbers. To find the mean, you add up all the values in the dataset
and then divide by the total number of values. For example, let's consider a set of numbers: 5, 8, 10, 12,
5 + 8 + 10 + 12 + 15 50
and 15. The mean would be calculated as 5
= 5
= 10.

2
Data collection/display

However, it's important to note that the mean can be influenced by extreme values (outliers) in the dataset,
which can skew its value, making it less representative of the typical value in some cases.
In general, the mean of the numbers 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 is denoted by 𝑥 and is given by:
𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛
𝑥=
𝑛
A shorthand way of writing 𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛 is:
𝑛

∑ 𝑥𝑖
𝑖=0

Which is pronounced as summation from 𝑖 = 1 up to 𝑛 of 𝑥𝑖

The symbol Σ(the Greek capital letter, sigma) is used to denote ‘the sum of’

So we can write 𝑥 = 𝑛

The above notation is rather cumbersome so usually the subscript 𝑖 is omitted.

∑𝑥
For discrete data, 𝑥 =
𝑛

∑ 𝑓𝑥
For disrete data in a frequency distribution, , 𝑥 =
∑𝑓

Learner’s activity.
1. To obtain grade A, Ben must achieve an average of at least 70 in five tests. If his average mark for
four tests is 68, what is the lowest mark he can score in his fifth test and still obtain grade A?
2. The members of an orchestra were asked how many instruments they could play and the
following results were obtained.

2 5 2 4 1 1 1 2 1 3

3 2 1 2 1 1 2 4 3 2

1 2 3 1 4 2 3 1 1 2
Find the mean number of instruments played.

Grouping data
Grouped data refers to a method of organizing and presenting numerical information in classes or
intervals. It's often used when dealing with large sets of data to make it more manageable and
easier to analyze. Instead of presenting individual values, grouped data organizes values into
ranges or groups, allowing for a more concise representation of the dataset.

3
Data collection/display

For instance, imagine you have a dataset of ages of people in a town. Instead of listing each
individual age, you might group the ages into intervals like 0-10, 11-20, 21-30, and so on. Here
the lower class limits are 0, 11, 21 and so on and the upper class limits are 10, 20, 30 and so on.
This grouping helps in drawing conclusions and insights without overwhelming detail.

Grouped data is commonly represented using frequency distributions. A frequency distribution

displays the number of observations within each group or interval. This information can be
presented through tables, histograms, or frequency polygons.

When working with grouped data, it's crucial to consider the width or size of the intervals. The
choice of interval width can impact the insights drawn from the data. If intervals are too wide,
valuable information might be lost. Conversely, if intervals are too narrow, the data might become
too detailed, complicating analysis.

Analyzing grouped data involves various statistical measures and techniques. Measures such as
the mean, median, mode, and standard deviation can still be approximated from grouped data
using assumptions about the distribution within each interval. However, this approximation might
introduce some level of error compared to analyzing raw, individual data points.

Grouped data simplifies the representation of large datasets, making it easier to interpret and draw
conclusions. Still, it's essential to strike a balance between simplification and retaining enough
detail to ensure accurate analysis and decision-making.

Mean of grouped data

Once grouped, exact statistics (mean, median, mode, range) cannot be determined because when
data is grouped into intervals the actual values of the readings will be unknown. Therefore, we
can only calculate an estimate of the mean of grouped data. To do this we take the midpoint of the
interval as the representative of all the readings in the interval.
upper class limit + lower class limit
midpoint of interval =
2
or
upper class boundary + lower class boundary
midpoint of interval =
2
Its apparent but nevertheless noteworthy that:
class width = upper class boundary − lower classs boundary
Class boundaries are used in statistics to define the limits of each interval or class in grouped
data. Unlike class limits, which represent the actual data values, class boundaries are located
midway between the upper limit of one class and the lower limit of the next. The purpose of class
boundaries is to avoid ambiguity and ensure clarity in defining the intervals.

Here's an example:

Let's say you have the following grouped data representing the heights(cm) of a group of people:

150 - 160
161 - 170

4
Data collection/display

171 - 180
In this case, the class boundaries would be determined by taking the midpoint between the upper
limit of one class and the lower limit of the next. For the first interval (150 - 160) and the second
interval (161 - 170), the class boundaries would be:
Class 1: 149.5 - 160.5
Class 2: 160.5 - 170.5
This way, the class boundaries help in avoiding confusion about which values belong to which
interval. The use of class boundaries becomes particularly important when dealing with
continuous data, as it ensures that each data point falls unambiguously into a specific interval.

Calculating mean using assumed mean.

Calculating the mean using the assumed mean is a method employed in statistics to determine the
average of a set of values. The assumed mean serves as a starting point for the calculation,
simplifying the process and making it more efficient.
Assumed Mean Method Steps:
• Choose an Assumed Mean: Select a value that is reasonably close to the expected mean
of the dataset. This value simplifies calculations by reducing the arithmetic involved in
finding the deviations of individual data points from the mean.
• Calculate Deviations: Find the differences between each data point and the chosen
assumed mean.
• Calculate Mean: Use the formula to find the mean based on the assumed mean and the
deviations.
The formula for calculating the mean of n observations using the assumed mean is as follows:
∑(x − assumed mean)
Mean = assumed mean +
𝑛
∑d
Or mean = A + 𝑛
where A is the assumed mean and d = x − A. d is called the deviation.
For data in a grouped frequency distribution, the formula below is used:
∑ fd
mean = A + ∑𝑓
where d = x − A and x the midpoint of the interval.
Advantages and Considerations:
• Simplification: Reduces computational effort, especially with large datasets.
• Speed: Faster calculations, particularly useful when doing manual calculations.
• Sensitivity to Assumed Value: The accuracy of the result depends on how close the
assumed mean is to the actual mean of the dataset. If the assumed mean is significantly
different, the calculated mean might not be accurate.
The assumed mean method is a handy technique for approximating the mean of a dataset quickly.
It's particularly beneficial when precise values are not critical or when the dataset is extensive,
saving time and effort in calculations. However, the choice of the assumed mean impacts the
accuracy of the result, so it's essential to make a reasonably close estimate to the actual mean for
more reliable outcomes.

Here are some applications of the mean:

1. Descriptive Statistics: The mean is often used as a descriptive measure to summarize a

dataset. For instance, in a set of exam scores (85, 78, 92, 88, 95), the mean score is calculated

5
Data collection/display

85 + 78 + 92 + 88 + 95
by adding all the scores together and dividing by the number of scores 5
=
87.6.
2. Population Analysis: It's utilized to understand characteristics of populations. For example,
the mean income of a population can provide insights into its economic status. If you have the
incomes of all individuals in a town, summing them up and dividing by the number of people
gives you the mean income.
3. Sample Analysis: In statistical sampling, the mean is used to estimate population parameters.
For instance, a researcher might collect data from a sample of voters to estimate the average
age of voters in a country. The mean age of the sample can provide an estimate of the
population mean.
4. Comparative Analysis: Mean is employed to compare different groups. For instance,
comparing the average monthly sales between different regions or the average test scores of
students in different schools can be done using means.
5. Time-Series Analysis: Mean is used to understand trends over time. In financial analysis, the
moving average, which is a type of mean, is employed to understand stock price trends over a
period.
6. Quality Control: In manufacturing, mean measurements of parts or products are used to
ensure consistency and quality. For example, a company producing screws might measure the
mean length of screws to ensure they meet specific standards.
It's important to note that while the mean is a valuable statistic, it can be influenced by extreme
values (outliers), potentially skewing the interpretation of the data. For instance, in a dataset of
salaries where most employees earn around $50,000, but a few executives earn millions, the mean
salary would be skewed higher due to these outliers.

Therefore, it's often helpful to use other measures of central tendency (like the median or mode)
alongside the mean to gain a more comprehensive understanding of the dataset.

Learner’s activity

1. Thirty bulbs were life-tested and their lifespan to the nearest hour are as follows:

167 171 179 167 171 165 175 179 169 171
177 169 171 177 173 165 175 167 174 177
172 164 175 179 179 174 174 168 171 168
a) Find the mean of lifespan by dividing their sum by 30.
b) Find the mean of lifespan by grouping the lifespan using class intervals 164 – 166, 167 – 169, and
so on.
c) Comment on your answers in (a) and (b) above.

2. The following table shows the distribution of marks of some students who took part in
science quiz.

6
Data collection/display

Marks Tally class boundaries Frequency

56 – 60 //// //
61 – 65 //// //
66 – 70 ////
71 – 75 //// ////
76 – 80 ////
81 – 85 ////
86 – 90 //
91 – 95 ///
96 – 100 ///

a) Copy and complete the table.

b) Calculate the mean mark.

3. The length, in mm, of 48 rubber tree leaves are given below.

137 152 127 147 141 157 132 153 166 147 136 134
146 142 162 169 149 135 166 157 141 146 147 148
163 133 148 150 136 127 162 152 143 138 142 153
145 154 144 126 139 126 158 147 136 144 159 161
a) Construct a grouped frequency distribution table with intervals of class width 5mm starting with
the interval 125−129.
b) Calculate the mean length.

4. In an examination taken by 400 students, the scores were as shown in the following
distribution table:

Marks Frequency Find the mean mark.

1 – 10 8
11 – 20 14
21 – 30 32
31 – 40 56
41 – 50 102
51 – 60 80
61 – 70 54
71 – 80 30
81 – 90 16
91 – 100 8

5. The marks scored in an IQ test by 500 six year old children are given in the following table:
Marks Number of children
60−79 81
80 − 99 103
100 − 119 127
120 − 139 99
140 − 159 90

7
Data collection/display

Using an assumed mean of 110, calculate the mean mark.

Mode.
In statistics, the mode is the value that appears most frequently in a data set. Unlike the mean (average) or
median (middle value), the mode represents the highest occurring observation. A dataset can have one
mode (unimodal), two modes (bimodal), or more than two modes (multimodal).
Let's consider a few examples to illustrate the concept of mode:
Example 1: Unimodal Data
Suppose you have the following set of exam scores:
85, 92, 75, 92, 68, 92, 75, 80, 85, 92, 75, 92, 68, 92, 75, 80.
In this case, the mode is 92 because it appears more frequently (three times) than any other score.
Example 2: Bimodal Data
Consider a dataset representing daily temperatures in a week:
68, 72, 70, 72, 68, 75, 78, 68, 72, 70, 72, 68, 75, 78.
Here, both 68°F and 72°F appear twice, making the dataset bimodal.

Example 3: Multimodal Data

Imagine a survey asking people about their favorite colors: Red, Blue, Green, Blue, Yellow, Red, Red,
Green, Blue, Red, Blue, Green, Blue, Yellow, Red, Red, Green, Blue
In this case, there are three modes: Red, Blue, and Green, each appearing twice.
Example 4: No Mode (No clear highest frequency)
Consider a dataset of ages in a classroom:
15, 16, 17, 18, 19, 20, 15, 16, 17, 18, 19, 20.
In this scenario, there's no mode since each value appears only once, and there's no value with a higher
frequency than others.
The mode is particularly useful when dealing with categorical or discrete data, such as colors, names, or
categories, but it can be applied to continuous data as well, though less commonly due to the potential for
each value to be unique.
Mode is particularly useful in various fields for different purposes:

• Descriptive Statistics: Mode helps describe the central tendency of a dataset, especially in
scenarios where identifying the most common value is crucial. For example:
In a survey asking people their favorite color, the mode would indicate the color most preferred by the
respondents.
In a classroom, the mode of test scores could identify the most common score achieved by students.

8
Data collection/display

• Business and Economics: Modes are used in analyzing sales data, market trends, and consumer
behavior.
Identifying the most popular product sold in a month can help businesses strategize and manage their
inventory efficiently.
In income distribution studies, the mode income can indicate the salary range where most individuals fall.

• Healthcare and Medicine: In medical research, the mode can be used to analyze patient data.

Identifying the most prevalent blood type within a specific population or region aids in planning blood
donation drives and medical interventions.
In clinical trials, researchers might use the mode to identify the most common side effect experienced by
patients taking a particular medication.

• Education and Psychology: The mode is used to understand student performance or behavioral
trends.

In a study of learning styles, identifying the mode learning method preferred by students can assist
educators in designing effective teaching strategies.
In psychological studies, the mode might help identify the most common behavior among participants in
certain situations.

• Transportation and Urban Planning: Modes are utilized to analyze commuting patterns and urban
development.
Identifying the mode of transportation used by commuters in a city helps urban planners allocate
resources for public transportation.
Studying the mode of travel in different regions can assist in developing infrastructure tailored to specific
transportation needs.
Calculating the mode of grouped data.
Calculating the mode of grouped data involves determining the most frequently occurring class interval or
category in a given dataset. Grouped data is usually presented in the form of a frequency distribution
table, where data is organized into intervals or classes, and the frequency of each class is specified. The
modal class/interval is the class/interval with the highest frequency.
To find the mode of grouped data, you can use the following formula:

d1
Mode = L + ( )×c
d1 + d2

Where: L−lower class boundary of modal class

d1 − difference between the highest frequency and the frequency before it
d2 −difference between the highest frequency and the frequency after it

9
Data collection/display

c−class width of the modal class/interval

This formula may also be written as:

f1 − f0
Mode = L + ( )×c
2f1 − f0 − f2

Where:
Where:
L is the lower class boundary of the modal class,
f1 is the frequency of the modal class,
f0 is the frequency of the class before the modal class,
f2 is the frequency of the class after the modal class, and
c is the class width of the modal class
It's essential to understand that the mode obtained from this formula is an approximation. Due to the
grouping of data, the precise values within each class interval are not known. As a result, the mode is only
an estimate, providing a reasonable approximation of the most probable value within the modal class.
Learner’s activity
1. The table below shows the volume of petrol(litres) William used for his bike in the last 20 days.

Volume of Number of days

petrol(litres)
1−5 1
6 − 10 4
11 − 15 10
16 − 20 5

Calculate the modal volume of petrol used for the 20 days.

2. A survey has been conducted by a group of students on 20 households in a locality as shown in
the following frequency distribution table. Find the mode for the given data.

Size of Family 1-3 3-5 5-7 7-9 9-11

No. of Families 7 8 2 2 1

3. The information on the observed lifetimes (in hours) of 225 electrical components are given in
the following frequency table. Find the modal lifetimes of the electrical components.

10
Data collection/display

lifetime(in hours) 0-20 20-40 40-60 60-80 80-100 100-200

Frequency 10 35 52 61 38 29

4. The following distribution table shows the number of runs scored by some top batsmen of the
world in one-day international cricket matches. Find the mode of the given data.

Runs scored by Top Batsmen Number of Batsmen

3000 – 4000 4

4000 – 5000 18

5000 – 6000 9

6000 – 7000 7

7000 – 8000 6

8000 – 9000 3

9000 – 10000 1

10000- 11000 1

Estimating the mode of grouped data(equal class widths) using a histogram.

A histogram or frequency histogram is a pictorial representation of the numerical data with rectangular
bars. Like a bar graph, the height of each bar depicts the frequency of the data values. A histogram differs
from a bar graph in that the bars are drawn with no space in between them.

11
Data collection/display

To construct a histogram using the grouped data, we plot class boundaries on the x-axis and frequencies
on the y-axis. Each bar represents a class interval, and the height of the bar corresponds to the frequency
of that interval.

Note:

It's important to note that estimating the mode using a histogram is a straightforward method but provides
only an approximation. The exact values within each class interval are unknown due to the grouping of
data, and the mode obtained is based on visual inspection. While this method is more accessible than
using formulas, users should be aware that the mode is still an estimate and might not precisely represent
the most frequent value in the dataset.

Consider the example below:

Given below is the table showing the approximate lengths, in mm, of 40 leaves taken from different parts
of a certain species.

12
Data collection/display

Length (mm) 25-30 30-35 35-40 40-45 45-50 50-55 55-60

Number of leaves 1 4 8 10 8 7 2
Represent the data in the form of a histogram.

Estimating the mode of grouped data using a histogram involves visually identifying the class interval
(bar) with the highest frequency. While this method is more intuitive than using a formula, it still provides
an estimate rather than an exact value due to the nature of grouped data. Here's a step-by-step guide for
estimating the mode using a histogram:

• Draw a histogram of the data and identify the modal class. The modal class is the class with the
highest bar.
• Draw a straight line connecting the top-left corner of the tallest bar to the top-left corner of the
bar representing the frequency of the following class.
• Draw a straight line connecting the top-right corner of the tallest bar to the top-right corner of the
bar representing the frequency of the class immediately before.
• From the point of intersection of these lines, draw a vertical line down to the 𝑥-axis. This value is
the estimate for the mode.
An example of this is given below. Here, the mode of the frequency distribution has been estimated
graphically as 27.

13
Data collection/display

Class boundaries
Consider the examples below:

1. The speeds, in kilometres per hour, of cars driving on a road were recorded in the table below and
are represented in the histogram.

Speed (km/h) 50– 60– 70– 80– 90–

Frequency 8 15 17 25 7

14
Data collection/display

2. The table represents the time taken by a group of people to travel to work.

Time (𝑡minutes) 5–9 10–14 15–19 20–24

Frequency 10 15 7 3

Represent this data on a histogram and hence estimate the mode.

15
Data collection/display

Learner’s activity.
1. The table shows the distribution of ages of 100 people attending a school concert. Represent this
data on a histogram and hence estimate the mode.

Ages (years) 0-20 20-40 40-60 60-80 80-100

Frequency 22 35 31 10 2

2. The table shows the results of a survey on the weekly pocket money of 100 sixteen-year-olds.
Draw a histogram for this data and use it to estimate mode.
Weekly earnings ($) 20-30 30-40 40-50 50-60 60-70 70-80

Frequency 45 20 11 11 10 3

3. The table shows the distribution of the average marks of 40 children in the end-of-year
examinations. Draw a histogram to represent the frequency table hence estimate the modal mark.

Average
0-19 20-39 40-59 60-79 80-99
Marks

Frequency 2 4 16 15 3

4. In a survey, the length of the ring finger on the right hand of a sample of adults was measured (to
the nearest mm). Draw a histogram to represent the frequency table hence estimate mode.

16
Data collection/display
Length (mm) 45-55 55-65 65-75 75-85 85-95

Frequency 4 10 47 32 7

5. Bags of clips were weighed to the nearest gram. Draw a histogram to represent the frequency
table hence estimate mode.
Mass (g) 50-59 60-69 70-79 80-89 90-99

Frequency 2 6 22 12 8

Median.

Any data set can be partitioned as follows:

The median is a statistical measure of central tendency that represents the middle value of a dataset when
it is ordered in either ascending or descending numerical order. Unlike the mean, which is the average of
all values, the median is less sensitive to extreme values and provides a more robust measure of central
location.

Median of discrete data

Calculating the median of ungrouped data involves finding the middle value when the data set is arranged
in either ascending or descending order. If the number of observations is odd, the median is the middle
value. If the number of observations is even, the median is the average of the two middle values.

Here are the steps to calculate the median for ungrouped data:

Step 1: Arrange the Data

Order the data set in either ascending or descending order. This step is crucial for identifying the middle
values.

Step 2: Determine the Number of Observations

17
Data collection/display

Count the total number of observations (𝑛) in the data set. This will help determine whether the data set
has an odd or even number of elements.

Step 3: Find the Median

If 𝑛 is odd, the median (M) is the middle value directly:

𝑛+1
M = value at position
2

If 𝑛 is even, the median (M) is the average of the two middle values:

𝑛 𝑛
value at position 2 + value at position 2 + 1
M=
2

Consider the data set: 6, 2, 8, 4, 5, 9, 1, 7, 3.

Step 1: Arrange the data in ascending order: 1, 2, 3, 4, 5, 6, 7, 8, 9.

Step 2: Determine the number of observations (n) which is 9.

9+1
Step 3: Since n is odd, the median is the value at position 2
= 5. Therefore, the median is the fifth value
in the ordered list, which is 5.

So, the median of the given ungrouped data set is 5.

Calculating the median for ungrouped data is a straightforward process when the data set is relatively
small. However, for larger datasets or grouped data, additional techniques may be employed to simplify
the calculation.

For discrete data in an ungrouped frequency distribution, a cumulative frequency column is constructed
for purposes of finding the median.

Cumulative frequency represents the running total of frequencies up to a certain point in a dataset.
Cumulative frequency is particularly useful for understanding the distribution of values and identifying
patterns, especially when dealing with grouped data.

Consider the example below:

Create a cumulative frequency table for the following information, which represents the number of hours
per week that Arjun plays indoor games:

18
Data collection/display

Therefore, Arjun spends a total of 20 hours in a week to play indoor games.

Learner’s activity

1. Find the Median of 3, 23, 13, 11, 15, 5, 4 and 2.

2. Find the Median of 69, 66, 67, 69, 64, 63, 65, 68 and 72.
3. The table below shows the masses of bolts bought by a mechanic. Calculate the median mass.

19
Data collection/display

Mass(g) 98 99 100 101 102 103 104

Number of 8 11 54 20 17 6 4
bolts

4. The shoe size of 155 people was recorded and the raw data was presented in the form of the
following frequency table:
Size of shoe 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

Frequency 10 18 22 25 40 15 10 8 7

Obtain the median size of the shoe.

5. The time taken, in minutes, for a group of children to complete a puzzle is recorded.

Time taken ( in min ) 5 6 7 8 9

Number of children 8 4 3 10 3

Find the median time taken by the group of children to complete the puzzle.

6. What is the median of the first 50 natural numbers?

7. What is the median of the first 10 prime numbers?

1 2 3 1 7
8. What is the median of 2 3 4 6
and 12
?

9. The numbers 3, 7, 13, 14, 16, 19, 20 and 𝑥 are arranged in ascending order. If the mean of the
numbers is equal to the median, find the value of 𝑥.

10. The median of a set of eight numbers is 4.5. Given that seven of the numbers are 7, 2, 13, 4, 8, 2
and 1, find the eighth number.

11. If 10, 13, 15, 18, (𝑥+1), (𝑥+3), 30, 32, 35 and 41 are the observations in the ascending order with
median 24, find the value of 𝑥.

12. The temperatures, taken at midnight, of six consecutive nights in Bengaluru are given as follows:
22° C, 24° C, 26° C, 20° C, 23° C, 22° C.
i. State the median temperature.
ii. If 22° C is added to the above set of data, what will be the new median temperature?
13. As part of the school's Earth Day celebration, 100 students each sowed 5 seeds into each of 100
planters. One week later, the number of seeds germinating in each planter was recorded and the
results are given in the table.

20
Data collection/display

Number of seeds germinating 0 1 2 3 4 5

Number of planters 10 20 30 25 10 5
i. Write down the total number of seeds that were sown.
ii. Find the fraction of the seeds that did not germinate.
iii. Calculate the median of the distribution.

14. The number of errors in the first draft of Tina's thesis is shown in the table.

Number of errors 0 1 2 3 4 5 6

Number of pages 11 3 10 7 4 3 2
i. How many pages does the thesis contain?
ii. Find the percentage of pages with fewer than 2 errors.
iii. Calculate the median number of errors made by Tina.
15. The number of magazines read by a group of women in a week is recorded.

Number of magazines 0 1 2 3

Number of women 5 2 1 𝑥

If the median of the distribution is 2, find the value of 𝑥.

Calculating the median of grouped data.

Calculating the median for grouped data involves determining the middle value or midpoint of a
distribution where the data is grouped into intervals or classes. This is common in cases where the
exact values are not available, and data is presented in intervals with corresponding frequencies.
Steps involved:
1. Calculate the Cumulative Frequency:
Add a column for cumulative frequency to the frequency distribution table. The cumulative
frequency (F) is the sum of the frequencies up to a particular class interval.
2. Identify the Median Class:
Determine the class interval that contains the median. This is often the class with a
N
cumulative frequency closest to half of the total frequency( 2 where N is the total frequency
i.e N = ∑ 𝑓).
3. Apply the Median Formula:

21
Data collection/display

The median can be estimated within the median class using the formula:
N
− F𝑏
Median = L + ( 2 )×c
𝑓𝑚
Where:
L-lower class boundary of median class
N-total number of observations(∑ 𝑓)
F𝑏 - cumulative frequency before that of the median class
𝑓𝑚 -frequency of median class
c-class width of median class
4. Interpret the Result:
The calculated value represents the estimated median for the grouped data. It is the point
within the median class where half of the data lies below and half above.

It's important to note that this method provides an approximation of the median for grouped
data since the exact value might not be known. However, it gives a meaningful estimate
based on the available information.
Learner’s activity
1. The following frequency distribution that shows the number of points scored per game by 60
basketball players. Calculate the median.

2. Find the median of the following frequency distribution.

Marks 0-10 10-30 30-60 60-80 80-90

Number of students 8 20 36 24 12

3. Compute the median from the following data.

22
Data collection/display

Mid-points 5 15 25 35 45 55

Frequency 7 10 23 51 6 3

4. Find the median for the following frequency distribution.

Class 10-19 20-29 30-39 40-49 50-59 60-69 70-79

Frequency 2 4 8 9 4 2 1

5. Find the median for the following frequency distribution.

Mid points 75 95 115 135 155 175 195

Frequency 4 5 13 20 14 8 4

6. The median of the following data is 52.5. Find the value of 𝑓1 and 𝑓2 if the total frequency is 100.

Class 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100

Frequency 2 𝑓1 9 12 17 𝑓2 15 9 7 4

23
Data collection/display

Estimating the median of grouped data using a cumulative frequency curve(O-give)

Estimating the median of grouped data using a cumulative frequency curve or O-give involves creating a
graphical representation of the cumulative frequencies and then identifying the point on the curve
corresponding to the median. This method is useful when the exact values are not available, and the data
is presented in intervals or classes.

The data organized in the form of a cumulative frequency distribution may be graphically represented
through the cumulative frequency graph. The technique of drawing the cumulative frequency polygons
and cumulative frequency curves or ogives is more or less the same. We plot the cumulative frequencies
on the 𝑦 −axis and upper class boundaries on the 𝑥 −axis. The only difference is that the cumulative
frequency polygon is obtained by joining the points by line segments while the cumulative frequency
curve is obtained by joining the points by free hand smooth curve.

24
Data collection/display

Here's a step-by-step guide:

1. Calculate Cumulative Frequencies:

Add a column for cumulative frequency to the frequency distribution table. Calculate the
cumulative frequencies, representing the running total of frequencies as you move down the
table.

2. Create the Cumulative Frequency Curve:

Plot the cumulative frequencies on a graph. The x-axis represents the class boundaries, and
the y-axis represents the cumulative frequencies. Plot the cumulative frequencies with the
corresponding upper class boundary. Connect the points with a smooth curve.

3. Locate the Median:

Identify the point on the cumulative frequency curve that corresponds to the median. For n
𝑛 ∑𝑓
observations, the median is typically the value where 2 𝑜𝑟 2 falls on the y-axis.

4. Read the Median Value:

Once you've located the median point on the curve, draw a line horizontally from that point to
the y-axis. The value where this line intersects the y-axis gives you the estimated median for
the grouped data.

This graphical method provides a visual and approximate way to estimate the median for grouped data. It
is particularly useful when a cumulative frequency curve is available, and it offers insights into the central
tendency of the dataset based on the graphical representation of cumulative frequencies.

25
Data collection/display

Consider the following examples:

1. The cumulative frequency table below shows the marks of senior four students in an
examination graded out of 70.

Scores Class boundaries Frequency Cumulative frequency

20-24 19.5-24.5 1 1

25-29 24.5-29.5 2 3

30-34 29.5-34.5 4 7

35-39 34.5-39.5 8 15

40-44 39.5-44.5 11 26

45-49 44.5-49.5 9 35

50-54 49.5-54.5 7 42

55-59 54.5-59.5 4 46

60-64 59.5-64.5 3 49

65-69 64.5-69.5 1 50

Represent this information on a cumulative frequency curve.

26
Data collection/display

2. Draw the o-give for the below given data and from it determine the median income.

Monthly income ( ₹) 600-700 700-800 800-900 900-1000 1000-1100 1100-1200 1200-1300

Number of
40 68 86 120 90 40 26
employees

The cumulative frequency distribution for the given distribution is given below.

27
Data collection/display

Income (₹) Number of employees Cumulative frequency

600-700 40 40

700-800 68 108

800-900 86 194

900-1000 120 314

1000-1100 90 404

1200-1300 40 444

1300-1400 26 470

28
Data collection/display

Learner’s activity.

1. Draw an o-give for the following frequency distribution of test marks for a group of 32
students hence estimate the median mark.
Marks (%) 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100

Number of
1 2 4 7 5 8 2 2 1 0
students

2. The table below shows the frequency distribution of test marks for 120 students.

Marks (%) 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100

Number of students 1 6 8 15 17 24 22 15 9 3

(a) Draw a cumulative frequency curve for this data hence estimate the median.
(b) Use your curve in (a) to find the number of students that scored above 75%.
(c) Find the interquartile range

3. The results for the long jump at school sports day are recorded.

Distance (cm) 170-180 180-190 190-200 200-210 210-220 220-230 230-240 240-250

Number of
2 6 9 7 15 8 8 2
students

Draw a cumulative frequency curve and hence obtain the median distance.

4. The temperature in °C recorded over a 60-day period is shown in the table below:

Temperature ( in °C ) 0-3 3-6 6-9 9-12 12-15 15-18 18-21 21-24

Number of days 1 6 16 18 16 1 1 1

Draw an ogive and obtain the median temperature.

5. The local gym conducted a survey on the age distribution of its 800 members.

Age (years ) 10-20 20-30 30-40 40-50 50-60 60-70 70-80

Number of members 120 180 210 170 60 40 20

Draw a cumulative frequency curve for this data and hence determine the median.

29
Data collection/display

Here are some practical uses of the median in everyday situations:

Income and Wealth Distribution:

The median income is often used to measure the economic well-being of a population. It is less affected
by extremely high or low incomes, providing a more accurate representation of the typical earning level.

Real Estate:

In the real estate market, the median home price is frequently reported to give a sense of the typical cost
of housing in a specific area. This helps potential buyers and sellers understand the market without being
skewed by exceptionally expensive or inexpensive properties.

Education:

The median score on standardized tests is used to evaluate the performance of students. It offers a
measure that is less sensitive to extreme scores, giving a more realistic indication of the typical student's
achievement level.

Healthcare:

In medical research, the median is often employed to describe the central tendency of patient
characteristics, such as age or recovery time. This helps researchers avoid the influence of outliers and
provides a more representative measure.

Demographics:

When studying demographics, the median age is a useful metric. It gives a better indication of the typical
age in a population, minimizing the impact of outliers that might be present in mean age calculations.

Traffic Analysis:

In transportation planning, the median travel time or distance can be used to represent the typical
commuting experience. This is beneficial for understanding the average conditions without being overly
influenced by extreme cases.

30
Data collection/display

Survival Analysis:

In medical and social sciences, the median survival time is often used to describe the time until an event
of interest (e.g., death or failure) occurs. It provides a more robust estimate, especially when the survival
data is skewed.

Customer Reviews:

In e-commerce, the median rating of a product can be more informative than the mean, as it is less
sensitive to a few exceptionally high or low reviews. This gives a better representation of the typical
customer experience.

Salary and Wage Analysis:

Median earnings are commonly used to analyze salary and wage data. Unlike the mean, the median
provides a better sense of the middle ground, making it valuable for understanding income distribution.

In these and many other scenarios, the median is preferred when the dataset is skewed or contains
outliers, as it provides a more robust measure of central tendency that reflects the typical value in a
dataset.

SEE A VIDEO ABOUT USING A SCIENTIFIC CALCULATOR TO CALCULATE THE

VARIOUS MEASURES FOR A GIVEN DATA SET.

Measures of dispersion/variability/spread.

Measures of dispersion are statistical metrics that quantify the spread or variability of a dataset. These
measures provide insights into how individual data points are distributed around the central tendency,
such as the mean or median. Common measures of dispersion include the range, interquartile range,
variance, and standard deviation.

Its apparent but nevertheless noteworthy that the measures of tendency discussed earlier can not be used
to measure spread of a data set.

For example, each of these sets of numbers has mean 7 but the spread of each set is different.

(a) 7, 7, 7, 7, 7
(b) 4, 6, 6.5, 7.2, 11.3

31
Data collection/display

(c) −193, −46, 28, 69, 177

Range

The range is the simplest measure of dispersion and is calculated as the difference between the maximum
and minimum values in a dataset.

Range = Highest value − lowest value

While easy to compute, the range is sensitive to extreme values and may not provide a robust
representation of variability in the presence of outliers.

Interquartile range

The interquartile range is a more robust measure that focuses on the middle 50% of the data. It is
calculated as the difference between the upper quartile (Q 3 ) and the lower quartile (Q1 ).

Interquartile range = 𝑄3 − 𝑄1

The semi-interquartile range (SIQR) is a statistical measure of dispersion that is related to the interquartile
range (IQR). While the IQR represents the spread of the central 50% of the data, the SIQR focuses on the
spread of the central 25% of the data on either side of the median. It is calculated as half of the IQR and is
particularly useful when the distribution of data is skewed or contains outliers.

𝑄3 − 𝑄1
semi − interquartile range =
2

The IQR is less affected by extreme values, making it useful for datasets with outliers.

We shall discuss more sophisticated measures of variability like variance and standard deviation in senior
five.

Here are some examples applications of range:

Quality Control in Manufacturing:

In manufacturing processes, the range is often used to monitor the consistency and precision of products.
A narrow range indicates that the manufacturing process is producing items with similar specifications,
while a wide range may suggest variations that need to be addressed.

Educational Assessments:

32
Data collection/display

In education, the range of scores on tests and exams can provide insights into the diversity of performance
among students. Teachers and educational institutions use the range to assess the spread of scores and
identify areas where additional support may be needed.

Weather and Climate Analysis:

Meteorologists use the range of temperatures to describe the variability in weather conditions. A larger
temperature range within a day or across seasons indicates more significant fluctuations, affecting climate
patterns and helping in weather prediction.

Financial Analysis:

In finance, the range is employed to assess the volatility of financial instruments, such as stocks or
currencies. Traders and investors use the range to understand the potential risk and return associated with
particular assets.

Sports Performance Analysis:

Coaches and analysts in sports use the range of performance metrics like scores, running times, or
distances covered to evaluate the consistency of athletes. A narrow range in performance may suggest a
high level of skill and stability.

Real Estate:

In the real estate market, the range of property prices in a particular area provides information about the
variability of housing costs. It helps potential buyers and sellers understand the market dynamics and
make informed decisions.

Health and Biomedical Research:

Medical researchers use the range in various studies, such as clinical trials or health surveys. For example,
the range of blood pressure measurements can indicate the variability in a population, contributing to the
understanding of health conditions.

Supply Chain Management:

Businesses involved in supply chain management use the range to evaluate the variability in delivery
times, production rates, or inventory levels. Understanding the range helps in optimizing processes and
ensuring a more efficient supply chain.

33
Data collection/display

Environmental Monitoring:

Ecologists and environmental scientists use the range to analyze ecological data, such as species diversity
or pollutant concentrations. A wide range in these variables may indicate ecosystem instability or
environmental stress.

Human Resources:

In HR analytics, the range of salaries within a company can provide insights into pay equity and
compensation structures. It helps organizations identify potential disparities and make data-driven
decisions to ensure fair and competitive compensation.

Understanding the range of data is crucial in various fields, as it provides a quick and straightforward
measure of the spread or variability in a dataset, allowing for informed decision-making and analysis.

END.

(eBook PDF) Introduction to Statistics and Data Analysis 6th Editionpdf download
100% (4)
(eBook PDF) Introduction to Statistics and Data Analysis 6th Editionpdf download
45 pages
Baker Research Methodology
No ratings yet
Baker Research Methodology
25 pages
Mathematics and Statistics (Unit IV & V)
75% (4)
Mathematics and Statistics (Unit IV & V)
61 pages
Fishers Theory
No ratings yet
Fishers Theory
11 pages
Statistics Class IX
0% (1)
Statistics Class IX
14 pages
Statistical Methods Previous Year Question Paper
No ratings yet
Statistical Methods Previous Year Question Paper
9 pages
Ch-10 (Human Resources) Class 8 Geography - Notes
No ratings yet
Ch-10 (Human Resources) Class 8 Geography - Notes
6 pages
Consumer Price Index
100% (1)
Consumer Price Index
14 pages
Mathematics Working Model
No ratings yet
Mathematics Working Model
16 pages
Mean Deviation
No ratings yet
Mean Deviation
23 pages
Sample Paper-2013 Economics Class-XI: MAX - MARKS:100 Time: 3hours Ge Neral Instructions
No ratings yet
Sample Paper-2013 Economics Class-XI: MAX - MARKS:100 Time: 3hours Ge Neral Instructions
5 pages
Higher Order Derivatives
No ratings yet
Higher Order Derivatives
2 pages
Collection of Data Class 11
No ratings yet
Collection of Data Class 11
13 pages
Sampling Distribution and Estimation
No ratings yet
Sampling Distribution and Estimation
46 pages
English Reading, Writing, GRAMMER Padhai Ak Mazza Notes
100% (1)
English Reading, Writing, GRAMMER Padhai Ak Mazza Notes
10 pages
Class 12 English Flamingo Chapter 1 Solution
No ratings yet
Class 12 English Flamingo Chapter 1 Solution
18 pages
Index Numbers
No ratings yet
Index Numbers
8 pages
Introduction To Triangles: Learning Enhancement Team
No ratings yet
Introduction To Triangles: Learning Enhancement Team
4 pages
The Principle of Least Squares PDF
No ratings yet
The Principle of Least Squares PDF
9 pages
Application of Trigonometry: Angles of Elevation and Depression
No ratings yet
Application of Trigonometry: Angles of Elevation and Depression
27 pages
CH 3 Organisation of Data Notes
No ratings yet
CH 3 Organisation of Data Notes
7 pages
INDIAN ECONOMY QUESTIONS
No ratings yet
INDIAN ECONOMY QUESTIONS
46 pages
L5 - Presentation of Data
No ratings yet
L5 - Presentation of Data
35 pages
Class 12 Macro Economics
No ratings yet
Class 12 Macro Economics
86 pages
Class 9 Social Science Notes for Session 2025-26 Chapter - 1 What_is_Democracy_Why_Democracy
No ratings yet
Class 9 Social Science Notes for Session 2025-26 Chapter - 1 What_is_Democracy_Why_Democracy
34 pages
Trigonometry Notes - 11th Standard
No ratings yet
Trigonometry Notes - 11th Standard
7 pages
Mmw Statistics
No ratings yet
Mmw Statistics
50 pages
Index Numbers II
No ratings yet
Index Numbers II
13 pages
Set Theory Symbols
No ratings yet
Set Theory Symbols
6 pages
A OFdh O5 Ecar EZ9 o 9 W 0 T Q
No ratings yet
A OFdh O5 Ecar EZ9 o 9 W 0 T Q
95 pages
Globalisation and The Indian Economy
No ratings yet
Globalisation and The Indian Economy
3 pages
Module 5: Index Numbers & Time Series: 1. Index Number For The Base Year Is Always Taken As 100
No ratings yet
Module 5: Index Numbers & Time Series: 1. Index Number For The Base Year Is Always Taken As 100
21 pages
Classification N Tabulation
88% (16)
Classification N Tabulation
39 pages
(6426) Revision Worksheet For Cycle Test - Measures of Dispersion Economics - Grade 11F Final
No ratings yet
(6426) Revision Worksheet For Cycle Test - Measures of Dispersion Economics - Grade 11F Final
5 pages
Bounds On The Distance Two-Domination Number
No ratings yet
Bounds On The Distance Two-Domination Number
9 pages
CH 3 Poverty As A Challenge
No ratings yet
CH 3 Poverty As A Challenge
10 pages
Correlation & Simple Regression
No ratings yet
Correlation & Simple Regression
15 pages
Measure of Central Tendency (Assignment)
No ratings yet
Measure of Central Tendency (Assignment)
8 pages
Kothari Commisssion 1964 66
No ratings yet
Kothari Commisssion 1964 66
57 pages
Dmba103 Statistics For Management
No ratings yet
Dmba103 Statistics For Management
7 pages
SBI Rural Business PO Exam 18-04-2010 Solved Question Paper
No ratings yet
SBI Rural Business PO Exam 18-04-2010 Solved Question Paper
37 pages
History of Pie Charts
No ratings yet
History of Pie Charts
5 pages
Geometric Mean and Harmonic Mean
No ratings yet
Geometric Mean and Harmonic Mean
12 pages
Chapter - 5 Cell - The Fundamental Unit of Life - Class Ix Cbse - Science
No ratings yet
Chapter - 5 Cell - The Fundamental Unit of Life - Class Ix Cbse - Science
34 pages
Income Determination and Multiplier
No ratings yet
Income Determination and Multiplier
27 pages
Chapter 1
No ratings yet
Chapter 1
60 pages
Development
No ratings yet
Development
7 pages
NCERT Solutions For Class 11 Political Science Indian Constitution at Work Chapter 2
100% (1)
NCERT Solutions For Class 11 Political Science Indian Constitution at Work Chapter 2
7 pages
MA Economics MCQ
No ratings yet
MA Economics MCQ
13 pages
Statistics PPT UNIT I 28.11.2020
No ratings yet
Statistics PPT UNIT I 28.11.2020
150 pages
Module 5 Attributes
100% (1)
Module 5 Attributes
16 pages
Math Notes PDF Class 11 Maths Chapter 11 Conic Section Part 2 Ellipse
No ratings yet
Math Notes PDF Class 11 Maths Chapter 11 Conic Section Part 2 Ellipse
8 pages
Numerical On Mean Median and Mode
No ratings yet
Numerical On Mean Median and Mode
3 pages
Class 11 Mathematics Chapter 1 Set MCQ
No ratings yet
Class 11 Mathematics Chapter 1 Set MCQ
16 pages
Uses of Index Numbers
No ratings yet
Uses of Index Numbers
3 pages
Social & Economic Statistics (Chapter 1 - 5)
No ratings yet
Social & Economic Statistics (Chapter 1 - 5)
71 pages
1.introduction of Statistics
No ratings yet
1.introduction of Statistics
22 pages
Chapter One
No ratings yet
Chapter One
5 pages
Pramod KM - Education and Development Among Tribals in Kerala A Study With Special Reference To Wayanad District
No ratings yet
Pramod KM - Education and Development Among Tribals in Kerala A Study With Special Reference To Wayanad District
347 pages
Statistical Analysis 2023
No ratings yet
Statistical Analysis 2023
56 pages
1 - Descriptive Statistics Data: Frequency Distribution
No ratings yet
1 - Descriptive Statistics Data: Frequency Distribution
57 pages
Plain Language Statement
No ratings yet
Plain Language Statement
2 pages
Assignment BRS10203
No ratings yet
Assignment BRS10203
28 pages
Population-and-Sampling-Techniques
No ratings yet
Population-and-Sampling-Techniques
34 pages
1 s2.0 S0959652622019278 Main
No ratings yet
1 s2.0 S0959652622019278 Main
14 pages
Medical Record-Keeping and Pat
No ratings yet
Medical Record-Keeping and Pat
14 pages
How To Do Clinical Audits
No ratings yet
How To Do Clinical Audits
40 pages
Guide To Clinical Trial Protocol Content and Format
No ratings yet
Guide To Clinical Trial Protocol Content and Format
8 pages
Community Empowerment by Strengthening Village Own PDF
No ratings yet
Community Empowerment by Strengthening Village Own PDF
5 pages
Quality of Life and Swallowing Questionnaire For Individuals With Parkinson's Disease: Development and Validation
No ratings yet
Quality of Life and Swallowing Questionnaire For Individuals With Parkinson's Disease: Development and Validation
11 pages
Download
No ratings yet
Download
12 pages
Learning Activity Sheet in 3is (Inquiries, Investigation, Immersion)
No ratings yet
Learning Activity Sheet in 3is (Inquiries, Investigation, Immersion)
7 pages
Types of Manuscripts
No ratings yet
Types of Manuscripts
1 page
Economic Applications
No ratings yet
Economic Applications
4 pages
(Spector, Paul E., 2019) - Do Not Cross Me-Optimizing The Use of Cross-Sectional Design
No ratings yet
(Spector, Paul E., 2019) - Do Not Cross Me-Optimizing The Use of Cross-Sectional Design
13 pages
Questionnaire Design Sharambei
No ratings yet
Questionnaire Design Sharambei
33 pages
Pom 14 Inppt 04
No ratings yet
Pom 14 Inppt 04
38 pages
Sumatra - Geology, Resources and Tectonics
0% (1)
Sumatra - Geology, Resources and Tectonics
304 pages
Ads511 - Case Study - Aqeelah&asyraf
No ratings yet
Ads511 - Case Study - Aqeelah&asyraf
14 pages
Assessment of Outlier....................
No ratings yet
Assessment of Outlier....................
8 pages
The Impact of Online Travel Reviews On Tourist Decision
No ratings yet
The Impact of Online Travel Reviews On Tourist Decision
17 pages
Question Bank Class 11 Eco II CH 8 Use of Statistical Tools
No ratings yet
Question Bank Class 11 Eco II CH 8 Use of Statistical Tools
5 pages
EXPERIMENT 5: Direct Determination of Ascorbic Acid in A Commercial Fruit Juice
No ratings yet
EXPERIMENT 5: Direct Determination of Ascorbic Acid in A Commercial Fruit Juice
13 pages
Association of Screen Time With Parent - Reported Cognitive Delay in Preschool Children of Kerala, India
No ratings yet
Association of Screen Time With Parent - Reported Cognitive Delay in Preschool Children of Kerala, India
8 pages
Chapter 1
No ratings yet
Chapter 1
4 pages
Lopez Martinez 2014
No ratings yet
Lopez Martinez 2014
4 pages
Division of Maguindanao - I: o o o o
No ratings yet
Division of Maguindanao - I: o o o o
13 pages
Effectiveness of Project Teams and Their Impacts On The Performance of Saudi Construction Projects
No ratings yet
Effectiveness of Project Teams and Their Impacts On The Performance of Saudi Construction Projects
24 pages
ANNEX D-1 Barangay GPB Form (JMC 2016-01)
No ratings yet
ANNEX D-1 Barangay GPB Form (JMC 2016-01)
8 pages
Ganti Juudl PDF
No ratings yet
Ganti Juudl PDF
7 pages