0% found this document useful (0 votes)
25 views

Module6 Statistical Tools

The document discusses statistical tools used to analyze and draw conclusions from data. It introduces key terms like population, sample, descriptive statistics, and inferential statistics. Descriptive statistics are used to describe data without wider conclusions, while inferential statistics are used to make inferences about a larger population based on a sample. The document outlines common sampling methods like random sampling, systematic sampling, stratified sampling, and cluster sampling. It also discusses ways to organize raw data through frequency distributions and grouped frequency distributions.

Uploaded by

reigoraoul56
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Module6 Statistical Tools

The document discusses statistical tools used to analyze and draw conclusions from data. It introduces key terms like population, sample, descriptive statistics, and inferential statistics. Descriptive statistics are used to describe data without wider conclusions, while inferential statistics are used to make inferences about a larger population based on a sample. The document outlines common sampling methods like random sampling, systematic sampling, stratified sampling, and cluster sampling. It also discusses ways to organize raw data through frequency distributions and grouped frequency distributions.

Uploaded by

reigoraoul56
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

MATHEMATICS IN THE MODERN WORLD

MODULE

The Statistical Tools


Overview
Welcome to Statistics!
In this module, we will discuss the different statistical tools that help you develop
your skills to solve problems involving data. This will teach you the basic techniques
primarily used in research and in daily life activities.

Learning Outcomes
By the end of the module, students will be able to:

■ Use a variety statistical tools to process and manage


numerical data
■ Advocate the use of statistical data in making important
decisions
■ Solve problems involving data.

References:
[1] Sobecki, D. (2019). Math in Our World (4th Ed.). McGraw-Hill Education

Activity 7.1 Share your Statistical Suave


Instructions
Share and Discuss.
Have you done Statistical Analysis before?
Write a post to describe your experience.
MATHEMATICS IN THE MODERN WORLD
MODULE

1. Use these questions to guide you as you're


reflecting on your experience:
a. What kind of analysis was it?
b. Where was the study conducted?
c. How was the study life-changing?
2. Share your experience in 4 or 5 sentences on
our discussion board.

Your discussion post will be graded using the rubric below:

Answer a Question Rubrics

Criteria Rating Points


2 pts
3 pts 1 pt
Learners wrote
Learners wrote Learners wrote 1
2-3 sentences 3 pts
Length 4-5 sentences sentence about
about their
about their voting their voting
voting
experience. experience.
experience.
2 pts 1 pt
4 pts
Learner was not Learner was not
Learner was able
Description able to convey able to discuss
to discuss the
and the experience the experience 3 pts
experience in a
Grammar well, and had well, with many
clear, concise
some grammar misuse of
manner.
misuse. grammar.
2 pts
4 pts Learner wrote 1 pt
Relevance Learner wrote an an experience Learner wrote an
of experience that that does not experience that 4 pts
Experience answered the really answer is not relevant to
questions asked. the guide the question.
questions.
MATHEMATICS IN THE MODERN WORLD
MODULE

Terms to Remember
Data are measurements or observations that are gathered for an event under study.
Statistics is the branch of mathematics that involves collecting, organizing,
summarizing, and presenting data and drawing general conclusions from the data.
A population consists of all subjects under study.
A sample is a representative subgroup or subset of a population.
There are two main branches of statistics: descriptive and inferential.
1. Statistical techniques used to describe data are called descriptive statistics.
This is based on collecting, organizing, and reporting data without using the data
to draw any wide-ranging conclusions.
For example, a researcher might be interested in the average age of the
full-time students on your campus and how many credit hours they’re scheduled
for this term.
2. Statistical techniques used to make inferences are called inferential statistics.
This is based on studying characteristics of a sample within a larger population
and using them to draw conclusions about the entire population.
For example, the Bureau of Labor and Statistics estimates the number of
people in the United States that are unemployed every month. Since it would be
impossible to survey everyone, the bureau picks a sample of adults to see what
percentage are unemployed. Then they use that information to estimate the
unemployment rate for the entire population.

Sampling Methods
We will study four basic sampling methods:
1. In order to obtain a random sample, each subject of the population must have an
equal chance of being selected.
● Ask the school registrar to give him a list of 50 students whose student ID
numbers end in 4.
MATHEMATICS IN THE MODERN WORLD
MODULE

2. A systematic sample is taken by numbering each member of the population and


then selecting every kth member, where k is a natural number. When using
systematic sampling, it’s important that the starting number is selected at
random.
● Look at the campus directory and choose the first name at the top of each
page.
3. When a population is divided into groups were the members of each group have
similar characteristics and members from each group are chosen at random, the
result is called a stratified sample.
● Survey the five people sitting closest to the door in each of his five
classes.
4. When an existing group of subjects that represent the population is used for a
sample, it is called a cluster sample.
● Stop by the union before 9 A.M. class and ask everyone sitting at a table
how late they were up studying the night before.

Frequency Distribution
The data collected for a statistical study are called raw data. In order to describe
situations and draw conclusions, we need to organize the data in a meaningful way.
Two methods that we will use are frequency distributions and stem and leaf plots.

The first type of frequency distribution we will investigate is the categorical


frequency distribution. This is used when the data are categorical rather than numerical.
Twenty-five volunteers for a medical research study were given a blood test to
obtain their blood types. The data follow. Construct a frequency distribution for the data.

A B B AB O O O B AB B

B B O A O AB A O B A

A O O O AB
MATHEMATICS IN THE MODERN WORLD
MODULE

CONSTRUCTING A FREQUENCY DISTRIBUTION TABLE


Step 1 Make a table with all categories
represented.
Step 2 Tally the data using the second
column.
Step 3 Count the tallies and place the
numbers in the third column.

Grouped Frequency Distribution


Another type of frequency distribution that can be constructed uses numerical
data and is called a grouped frequency distribution. In a grouped frequency
distribution, the numerical data are divided into classes.
For example, if you gathered data on the weights of people in your class, there’s
a decent chance that no two people have the exact same weight. So it would be
reasonable to group people into weight ranges, like 100–119 pounds, 120–139 pounds,
and so forth.
When deciding on classes for a grouped frequency distribution, here are some
guidelines:
1. Try to keep the number of classes between 5 and 15.
2. Make sure the classes do not overlap.
3. Don’t leave out any numbers between the lowest and highest, even if nothing
falls into a particular class.
MATHEMATICS IN THE MODERN WORLD
MODULE

4. Make sure the range of numbers included in a class is the same for each one.
5. The beginning and ending values have to be chosen based on how the data
values are rounded.

These data represent the record high temperatures for each of the 50 states in
degrees Fahrenheit. Construct a grouped frequency distribution for the data.

112 100 128 120 134 118 106 110 109 112

100 118 117 116 118 121 114 114 105 109

107 112 114 115 118 117 118 125 106 110

122 108 110 121 113 120 119 111 104 113

120 113 120 117 105 110 118 112 114 115

Step 1 Subtract the lowest value from the highest value: 134 − 100 = 34.
Step 2 If we use a range of 5
degrees, that will give us seven
classes, since the entire range
(34 degrees) divided by 5 is 6.8.
Step 3 Start with the lowest
value and add 5 to get the lower
class limits: 100, 105, 110, 115,
120, 125, 130. Notice that all of
the data are rounded to the
nearest whole number, so that’s reflected in our choices of class limits.
Step 4 Set up the classes. To find the upper limit for each, subtract one from the next
upper limit.
Step 5 Tally the data and record the frequencies. It’s a really good idea to cross out
each data value as you tally it up, and an even better idea to make sure that all of the
frequencies add up to the total number of data values.
MATHEMATICS IN THE MODERN WORLD
MODULE

Stem and Leaf Plots


Another way to organize data is to use a stem and leaf plot (sometimes called a
stem plot).
Each data value or number is separated into two parts. The very last digit is
called the leaf, and what comes before is called the stem.
For a two-digit number such as 53, 5 is the stem, and 3 is the leaf. For the
number 72, the stem is 7, and the leaf is 2. For a three-digit number such as138, the
first two digits, 13, are used as the stem, and the third digit, 8, is used as the leaf. For
values rounded to the tenths place, like 8.4, you can use the value to the left of the
decimal place as stem and the tenths place as leaf.

The data below are the July 2015 unemployment rates for each state. Draw a
stem and leaf plot illustrating these data.

6.2 6.6 6.3 5.4 6.1 4.2 5.3 4.9 5.3 5.9

3.5 4.2 5.6 4.6 3.7 4.6 5.2 6.0 4.5 5.1

4.7 5.1 4.0 6.3 5.6 4.1 2.8 6.8 3.6 5.7

6.7 5.2 5.9 2.9 4.7 4.6 6.1 5.4 5.6 6.0

3.7 5.7 4.1 3.7 3.6 4.5 5.3 7.6 4.5 4.0

Step 1: Use the whole number parts as stems


and the tenths place as leaves

Step 2: Write the appropriate leaf next to the


matching stem

Step 3: Put a key at the bottom of the plot to


clarify

Key: 6|2 means 6.2


MATHEMATICS IN THE MODERN WORLD
MODULE

It is also possible to do a Grouped Stem and Leaf Plot. It follows the same rules as the
Group Frequency Distribution and we can have this Plot using the previous example.

Presentation of Data
When data are representative of certain categories, rather than numerical, we
often use bar graphs or circle graphs (commonly known as pie charts) to illustrate the
data.

A pie chart, also called a circle graph, is


a circle that is divided into sections in
proportion to the frequencies
corresponding to the categories.
The purpose of a pie chart is to show the
relationship of the parts to the whole by
visually comparing the size of the
sections.
MATHEMATICS IN THE MODERN WORLD
MODULE

Response Frequency
The marketing firm Deloitte Retail
conducted a survey of grocery shoppers. Always 10
The frequency distribution below
Never 39
represents the responses to the survey
question “How often do you bring your Frequently 19
own bags when grocery shopping?”
Occasionally 32
Draw a pie chart to represent the data.

A bar graph is used to compare amounts


or percentages using either vertical or
horizontal bars of various lengths which
correspond to the amounts or
percentages.

While a pie chart is used to compare parts


to a whole, a bar graph is used for
comparing parts to other parts.

Store Frequency

Publix 50 Let’s revisit the marketing firm’s survey of


100 grocery shoppers. Suppose the
Trader Joe’s 40 frequency distribution below represents

Fareway 28 the responses to the survey question,


“Which grocery stores have you shopped
Aldi 33 at in the last month?”

Other 65
MATHEMATICS IN THE MODERN WORLD
MODULE

A time series graph can be drawn for data collected over a period of time. This
type of graph is used primarily to show trends, like prices rising or falling, for the time
period.
There are three types of trends. Secular trends are viewed over a long period of
time, such as yearly. Cyclical trends show oscillating patterns. Seasonal trends show
the values of a commodity for shorter periods of the year, such as fall, winter, spring,
and summer.
Politicians often talk about violent crime in the United States in apocalyptic terms,
but what does the data say? The table shows the number of violent crimes committed
per 100,000 citizens every five years from 1980 to 2015. Draw a time series graph for
the data and use it to discuss trends in violent crime.

Year 1980 1985 1990 1995 2000 2005 2010 2015

Violent crimes per


596.6 556.6 729.6 684.5 506.5 469.0 404.5 372.6
100,000 citizens
MATHEMATICS IN THE MODERN WORLD
MODULE

Presentation of Grouped Data


When data are organized into grouped frequency distributions, two types of
graphs are commonly used to represent them: histograms and frequency polygons.

Step in making a Histogram

Step 1 Write the scale for the frequencies on the vertical axis and the class limits on the
horizontal axis. Make sure that your labeling on the vertical axis starts at zero.
Step 2 Draw vertical bars with heights that correspond to the frequencies for each class.

A histogram is similar to a vertical bar


graph in that the heights of the bars
correspond to frequencies. The difference
is that class limits are placed on the
horizontal axis, rather than categories.

Class Frequency

100-104 3

We analyzed and organized data 105-109 8

representing the record high temperature 110-114 16


in every state, now presenting it
graphically. 115-119 13

120-124 7

125-129 2

130-134 1
MATHEMATICS IN THE MODERN WORLD
MODULE

A frequency polygon is similar to a


histogram, but instead of bars, a series
of line segments is drawn connecting the
midpoints of the classes. The heights of
those points match the heights of the
bars in a histogram.

Steps in Making a Frequency Polygon


Step 1 Find the midpoints for each class.
This is accomplished by adding the upper and lower limits and dividing by 2. For
the first two classes, we get: "100+104" /"2" =102 "105+109" /"2" =107
The remaining midpoints are 112, 117, 122, 127, and 132.
Step 2 Write the scale for the frequencies on the vertical axis (making sure to start at
zero), and label a scale on the horizontal axis so that all midpoints will be included.
Step 3 Plot points at the midpoints with heights matching the frequencies for each
class, then connect those points with straight lines.
Step 4 Finish the graph by drawing a line back to the horizontal axis at the beginning
and end. The horizontal distance to the axis should equal the distance between the
midpoints. In this case, that distance is 5, so we extend back to 97 and forward to 137.
MATHEMATICS IN THE MODERN WORLD
MODULE

Activity 7.2 Data Organization


Instructions
There are 40 students in a class.
Using their scores in a quiz, given below, answer the questions.

121 101 129 121 135 119 107 111 110 113

102 120 119 118 120 123 116 116 107 111

105 110 112 113 116 115 116 123 104 108

108 109 119 116 122 118 112 115 106 109

The quiz is not timed, so you can pause it and resume at any time.
If you cancel the quiz, your answers are discarded and they are not counted as a submission.

Measures of Central Tendency


In casual terms, average means the most typical case, or the center of the
distribution. Measures of average are also called measures of central tendency, and
include the mean, median, mode, and midrange.

The Greek letter (sigma) is used to represent the sum of a list of numbers. If

we use the letter X to represent data values, then X means to find the sum of all
values in a data set.
The mean is the sum of the values in a data set divided by the number of values.
If X1, X2, X3, …, Xn are the data values, we use "X-bar" to stand for the mean, and
MATHEMATICS IN THE MODERN WORLD
MODULE

Here’s the salary list for Vandelay Industries in 1000s of dollars:


The company advertises that its average employee makes almost $150,000 per year.
Is the company’s claim technically truthful? Do you think it’s deceiving? Explain.

Employee Jerry Kramer Newman George Elaine Susan Tim Estelle Frank

Salary 58 65 944 20 52 51 53 55 50

Using the mean:

The result of 149.8 tells us that the mean salary is 149.8 thousand dollars, or 149,800.

The Median
In short, the median of a data set is the value in the middle if all values are
arranged in order. The median will either be a specific data value in the set, or will fall in
between two values.

Steps in Computing the Median of a Data Set


Step 1 Arrange the data in order, from smallest to largest. Actually, largest to smallest
will work, too. Whatever makes you happy.
Step 2 If the number of data values is odd, the median is the value in the exact middle
of the list. If the number of data values is even, the median is the mean of the two
middle data values.

Or we can use to find the position of the median.

Find the median salary for Vandelay Industries. How does it compare to the mean?

Employee George Frank Susan Elaine Tim Estelle Jerry Kramer Newman

Salary 20 50 51 52 53 55 58 65 944
MATHEMATICS IN THE MODERN WORLD
MODULE

There are nine salaries listed, and where I come from, nine is odd. So the
median will be the salary right in the middle: there will be four salaries less and four
more. That makes it the fifth salary on the list, which is $53,000. This is a whole lot less
than the mean of $149,800, and in fact is a much more reasonable measure of average
for these data. Find the mean and median if Newman’s salary is left out. What can you
conclude?

Employee George Frank Susan Elaine Tim Estelle Jerry Kramer

Salary 20 50 51 52 53 55 58 65

Now there are eight salaries, so we’ll need to find the mean of the two in the
middle, which are 52,000 and 53,000. It would be nice if you could just figure out that
the mean is halfway in between, but for the sake of completeness:

The new mean is

Now that’s interesting. The median was almost unaffected by throwing away the
largest value, but the mean changed dramatically, to say the least. This is exactly why
the mean was a poor measure of average for this data set: the one very large value has
a great impact on the mean, but not so much on the median.

The Midrange
The advantage of the midrange is that it’s very quick and easy to calculate.
The disadvantage is that it totally ignores most of the data values, so it’s not a
particularly reliable measure.
MATHEMATICS IN THE MODERN WORLD
MODULE

Finding the Midrange for a Data Set

Find the midrange of all salaries at Vandelay Industries. Is it meaningful in this case?

Employee George Frank Susan Elaine Tim Estelle Jerry Kramer Newman

Salary 20 50 51 52 53 55 58 65 944

Wow. The midrange is a whopping $482,000, which is meaningful in that it


emphasizes how big Newman’s salary is, but as a measure of average it’s not good for
much.

The Mode
The mode is sometimes said to be the most typical case.
The value that occurs most often in a data set is called the mode.
A data set can have more than one mode or no mode at all.
These data represent the duration (in days) of the final 20 U.S. space shuttle voyages.

Find the mode.

11, 12, 13, 12, 15, 12, 15, 13, 15, 12, 12, 15, 13, 10, 13, 15, 11, 12, 15, 12

If we construct a frequency distribution, it will be easy to find the mode—it’s


simply the value with the greatest frequency. The frequency distribution for the data is
shown to the right, and the mode is 12.
MATHEMATICS IN THE MODERN WORLD
MODULE

Days Frequency

10 1

11 2

12 7

13 4

15 6

The number of Atlantic hurricanes for each of the years from 1997–2016 is
shown in the list.
Find the mode, and describe what that tells you.
3, 10, 8, 8, 9, 4, 7, 9, 15, 5, 6, 8, 3, 12, 7, 10, 2, 6, 4, 7

This time, we’ll find the mode without making a frequency distribution. Instead,
we can just work down the list, counting the number of occurrences for each number of
hurricanes. It turns out that there are two numbers that appear three times, while no
others appear more than twice. Those numbers are 7 and 8, so this data set has two
modes. This means that over that 20-year span, the most common number of Atlantic
hurricanes was 7 and 8.
MATHEMATICS IN THE MODERN WORLD
MODULE

Comparing the Measures of Central Tendency


Formula
Measure Definition Strengths Weaknesses Symbols
(Ungrouped)
• Can be adversely
• Unique – there’s affected by one or two
Most exactly one mean for unusually high or low
popular, any data set values
Mean x-bar
arithmetic • Factors in all values • Can be
mean in the set time-consuming to
• Easy to understand calculate for large data
sets
• Divides a data set
neatly into two • Can ignore the effects
Middle groups of large or small values
Median Md
value • Not affected by one even if they are
or two extreme important to consider
values
• Very easy to find • May not exist for a
• Describes the most data set
typical case • May not be unique
Most • Can be used with • Can be very different value / score

Mode frequent categorical data like from mean and median Mo with highest

value candidate if the most typical case frequency


preference, choice of happens to be near the
major, etc. low or high end of the
range

•Dramatically
•Very quick and
The mean affected by
easy to compute
of highest extremely high or low
Midrange •Provides a
and lowest values in the data set
simple look at
value. •Ignores all but two
average
values in the set
MATHEMATICS IN THE MODERN WORLD
MODULE

Measures of Variation
In this section we will study measures of variation,which will help to describe how
the data within a set vary. The three most commonly used measures of variation are
range, variance,and standard deviation.
The range of a data set is the difference between the highest and lowest values
in the set.
Range = Highest value – Lowest value

The first list below is the weights of the dogs in the first picture, and the second is
the weights of the dogs in the second picture.
Find the range for each list, then describe any observations you can make
based on the results.
1st: 70, 73, 58, 60 2nd: 30, 85, 40, 125, 42, 75, 60, 55

For the first list,


Range=73-58=15 lbs
For the second list,
Range=125-30=95 lbs

The ranges are very different. This is reflective of the fact that there’s a lot more
variation in size among the dogs in the second picture.
MATHEMATICS IN THE MODERN WORLD
MODULE

The Variance and the Standard Deviation


If most of the values are similar, but there’s just one unusually high value, the
range will make it look like there’s a lot more variation than there actually is.
For this reason, we will next define variance and standard deviation, which are
much more reliable measures of variation.

Procedure for Finding the Variance and Standard Deviation


Step 1: Find the mean.
Step 2: Subtract the mean from each data value in the data set.
Step 3: Square the differences.
Step 4: Find the sum of the squares.
Step 5: Divide the sum by n– 1 to get the variance, where nis the number of data
values.
Step 6: Take the square root of the variance to get the standard deviation.

Find the variance and standard deviation for the weights of the eight dogs in the
second picture at the beginning of this section.
The weights are listed again for reference.
30, 85, 40, 125, 42, 75, 60, 55

Step 1: Find the mean weight.

Step 2 Subtract the mean from each data value.


30 - 64 = -34, 85 - 64 = 21, 40 - 64 = -24, 125 - 64 = 61,
42 - 64 = -22, 75 - 64 = 11, 60 - 64 = -4, 55 - 64 = -9

Step 3 Square each result.


(-34)2 = 1,156, (21)2 = 441, (-24)2 = 576, (61)2 = 3,721,
(-22)2 = 484, 112 = 121, (-4)2 = 16, (-9)2 = 81
MATHEMATICS IN THE MODERN WORLD
MODULE

Step 4 Find the sum of the squares.


1,156+441+576+3,721+484+121+16+81=6,596

Step 5 Divide the sum by n- 1 to get the variance, where nis the sample size. In this
case, n is 8, so n- 1 = 7.

Step 6 Take the square root of the variance to get standard deviation.

You can also organize it to this table:

Data (X) X-mean [X-(mean)]2

30 -34 1,156

85 21 441

40 -24 576

125 61 3,721

42 -22 484

75 11 121

60 -4 16

55 -9 81
MATHEMATICS IN THE MODERN WORLD
MODULE

Understanding the Standard Deviation


To understand the significance of standard deviation, we’ll look at the process
one step at a time.
Step 1 Compute the mean.Variation is a measure of how far the data vary from
the mean, so it makes sense to begin there.
Step 2 Subtract the mean from each data value.In this step, we are literally
calculating how far away from the mean each data value is. The problem is that since
some are greater than the mean and some less, their sum will always add up to zero.
(Try it!) So that doesn’t help much.
Step 3 Square the differences.This solves the problem of those differences
adding to zero—when we square them, they’re all positive.
Step 4 Add the squares.In the next two steps, we’re getting an approximate
average of the squares of the individual variations from the mean. First we add them,
then…
Step 5 Divide the sum by n− 1. It seems like dividing by the number of values (n)
here is a good idea, but it turns out that when we’re using a sample from a larger
population to compute mean and variance, dividing by n− 1 makes the sample variance
more likely to be a true reflection of the population variance. In any case, at this point
we have an approximate average of the squares of the individual variations from the
mean.
Step 6 Take the square root of the sum.This “undoes” the square we did in Step
3. It will return the units of our answer to the units of the original data, giving us a good
measure of how far the typical data value varies from the mean.
The sample variance for a data set is an approximate average of the square of
the distance between each value and the mean.
If X represents individual values, "X-bar" is the mean and nis the number of
values:

The sample standard deviation (s) is the square root of the variance. It provides
an approximate average of the distances between data values and the mean.
MATHEMATICS IN THE MODERN WORLD
MODULE

Measures of Position
A percentile, or percentile rank, of a data value indicates the percent of data
values in a set that are below that particular value.
Suppose you score 77 on a test in a class of 10 people, with the 10 scores listed
below. What was your percentile rank?
93 82 64 75 98 52 77 88 90 71
The ordered list is 52, 64, 71, 75, 77, 82, 88, 90, 93, 98

Now if you focus on 77, you can see that there are exactly four scores lower than
yours. Since there were 10 scores total, that means that 4/10 or 40% of the scores were
lower than yours.
As for the percentile rank? That’s just to see if you’re paying attention. The
definition of percentile rank is the percentage of data values that are lower than a given
value, so we say that a score of 77 is at the 40th percentile.

Finding a Percentile Rank


The percentile rank for a given data point in a set with n total values can be
calculated using this formula:

The result should be rounded to the nearest whole number.

Finding a Data Value Corresponding to a Given Percentile


The number of words in each of the last 10 presidential inaugural addresses is
listed below. Find the length that corresponds to the 30th percentile.
1,433 2,096 2,395 2,071 1,592 2,155 1,598 2,320 2,561 2,427

Step 1 We’re asked to find the number on the list that has 30% of the numbers
below it. There are 10 numbers, and 30% of 10 is 3.
MATHEMATICS IN THE MODERN WORLD
MODULE

Step 2 Arrange the data in order from smallest to largest, and find the value that
has 3 values below it.
1,433, 1,592, 1,598, 2,071, 2,096, 2,155, 2,320, 2,395, 2,427, 2,561
The 30th percentile is the speech that consists of 2,071 words.

Using Percentiles to Compare Data from Different Sets


Two students are competing for one remaining spot in a law school class. Miguel
ranked 51st in a graduating class of 1,700, while Dustin ranked 27th in a class of 540.
Which student’s percentile rank within his class was higher? What can you conclude?
Miguel ranked 51st out of 1,700, so there were
1,700 - 51 = 1,649 students ranked below him. His
percentile rank is

Dustin had 540 - 27 = 513 students ranked below


him, so his percentile rank is

Both are excellent students, but Miguel’s ranking is higher even though he was
51st and Dustin was 27th.

Quartiles
A quartile divides a data set into quarters.
The second quartile is the same as the median, and divides a data set into an
upper half and a lower half.
The first quartile is the median of the lower half, and the third quartile is the
median of the upper half.

We use the symbols Q1, Q2,and Q3 for the first, second, and third quartiles
respectively.
● Q1 (first quartile): 25% of data values are less than this, and 75% are greater
than it.
MATHEMATICS IN THE MODERN WORLD
MODULE

● Q2 (second quartile): 50% of data values are less than this, and 50% are greater
than it.
● Q3 (third quartile): 75% of data values are less than this, and 25% are greater
than it.

The data below are the percentages of total electricity generated that comes from
nuclear power for the nations with the 12 largest economies in the world, listed by size
of economy. Find the quartiles and describe what they mean.
19.5 2.4 0 15.8 17.2 76.9 2.9 0 3.7 18.6 16.8 0

First, we’ll find the median, which is also Q2 (the second quartile).
To do so, put the data values in order, from least to greatest.
0, 0, 0, 2.4, 2.9, 3.7, 15.8, 16.8, 17.2, 18.6, 19.5, 76.9

There’s an even number of values, so the median is the


number halfway in between the two values in the middle,
which are 3.7 and 15.8.
So Q2 = 9.75%.
This divides the distribution into two halves, with six values in the lower half and six in
the upper half.

Next, we’ll find the median of the lower half: this is Q1 (the first quartile). The two values
in the middle of the bottom half are 0 and 2.4, and halfway between them is 1.2.
So Q1 = 1.2%.
Finally, we’ll find the median of the upper half. The two values in the middle of the upper
six values are 17.2 and 18.6, and 17.9 is halfway between. So Q3 = 17.9%.

Now let’s summarize.


The first quartile is 1.2%, so nations that get less than 1.2% of their power are in
the bottom fourth among the world’s largest economies.
MATHEMATICS IN THE MODERN WORLD
MODULE

The second quartile is 9.75%, which tells us that nations that get less than 9.75%
of their energy from nuclear are in the bottom half.
The third quartile is 17.95%, so if a nation gets more than 17.95% of its energy
from nuclear, it’s in the top fourth in terms of nuclear generation among the world’s
largest economies.

Box Plot
One of the most useful applications of quartiles is using them to draw a box plot
(sometimes called a box and whisker plot). This is a graphical way to evaluate the
spread of a data set. In particular, a box plot makes it easy to identify data points that
are outliers—those that appear to be aberrational in some way.

First, we’ll need to define a new term. The distance between the first and third
quartiles for a data set is called the interquartile range, or IQR.
That is, IQR= Q3 − Q1
Data values are considered to be outliers if they're more than 1.5 times the IQR
below Q1, or above Q3.

Below is an example of a box plot for a group of test scores.

There’s a lot of information about the data set here.


MATHEMATICS IN THE MODERN WORLD
MODULE

You can see what all of the quartiles are (at least approximately): Q1 is about 68,
Q2 is about 78, and Q3 is about 89.
The interquartile range is the distance between 89 and 68, which is 21.
The lowest score was 34 and the highest was 99.
Looking a little deeper, the box shows us that the middle half of all scores (which
are the values between Q1 and Q3) fall between 68 and 89.

Draw a box plot for the nuclear power data, then use it to answer some questions
about the data.
(a) What does the box plot tell us about the data set?
(b) Find any outliers in the data set.

Step 1 Find the quartiles. We know that the quartiles are Q1 = 1.2, Q2 = 9.75, and
Q3 = 17.9.
Step 2 Draw a number line that begins before the lowest value in the data set
and ends after the highest value. Locate the lowest and highest data values and draw
short vertical lines above the line at those locations.
Step 3 Draw a rectangular box over the number line, beginning at Q1 and ending
at Q3. Then draw a vertical line through the box at Q2. Then draw horizontal lines from
the lowest and highest values to the edges of the box.

(a) The position of the box shows that most of the values in the data set are on the low
end of the distribution.
Half of the values are between 1.2 and 17.9, and three fourths are less than 17.9.

(b) To decide if there are any outliers, we’ll first need to multiply the interquartile range
by 1.5.
The IQR is 17.9 - 1.2 = 16.7, and 1.5 · IQR = 25.05.
MATHEMATICS IN THE MODERN WORLD
MODULE

Next, we subtract this number from the first quartile and add it to the third:
Q1 -25.05 = 1.2-25.05=-23.85
Q3 +25.05 = 17.9+25.05=42.95
There are no negative data values, so there can’t be any less than -23.85, but there’s
definitely a value greater than 42.95. Looking back at the data, there’s one outlier: the
maximum value of 76.9%.

Activity 7.3 Descriptive Statistics


Instructions
There are 40 students in a class.
Using their scores in a quiz, given below, answer the questions.

121 101 129 121 135 119 107 111 110 113

102 120 119 118 120 123 116 116 107 111

105 110 112 113 116 115 116 123 104 108

108 109 119 116 122 118 112 115 106 109

The quiz is not timed, so you can pause it and resume at any time.
If you cancel the quiz, your answers are discarded and they are not counted as a submission.
MATHEMATICS IN THE MODERN WORLD
MODULE

Normal Distribution
Instructions

80 85 83 88

80 85 83 88

81 85 83 88

81 87 83 89

81 87 85 89

82 87 85 89

82 87 85 90

82 87 85 90

83 88

Use the data above to answer the questions

The quiz is not timed, so you can pause it and resume at any time.
If you cancel the quiz, your answers are discarded and they are not counted as a submission

Normal Distribution Notes


Normal_Distribution.xlsx
Normal_Distribution.docx

You might also like