0% found this document useful (0 votes)
14 views56 pages

3.3.1 Data Summarization

The document explains the concepts of parameters and statistics, highlighting the differences between measures derived from entire populations versus samples. It details various measures of central tendency, including mean, median, mode, and midrange, along with their definitions, uses, and examples. Additionally, it covers measures of variation such as range, variance, and standard deviation, as well as measures of position like percentiles and quartiles.

Uploaded by

qpenelopegrothe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views56 pages

3.3.1 Data Summarization

The document explains the concepts of parameters and statistics, highlighting the differences between measures derived from entire populations versus samples. It details various measures of central tendency, including mean, median, mode, and midrange, along with their definitions, uses, and examples. Additionally, it covers measures of variation such as range, variance, and standard deviation, as well as measures of position like percentiles and quartiles.

Uploaded by

qpenelopegrothe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Describing Datasets with

Numerical Measures
Measures found by using all the data values in the population are called
parameters. Measures obtained by using the data values from samples are
called statistics; hence, the average of the sales from a sample of
representatives is a statistic, and the average of sales obtained from the
entire population is a parameter.
A statistic is a characteristic or measure obtained by using the data
values from a
sample.
A parameter is a characteristic or measure obtained by using all the
data values
from a specific population.

General Rounding Rule In statistics the basic rounding rule is that when
computations are done in the calculation, rounding should not be done
until the final answer is calculated.
Measures of Central Tendency
Mean
Definition: The mean is the arithmetic average of a set of values. It is calculated
by summing all the values and dividing by the number of values.

Usage: The mean is useful for datasets without extreme outliers, as it provides a
good overall estimate of the data. It's widely used in various fields, including
economics (e.g., average income), education (e.g., average test scores), and
research.
Mean
Police Incidents
The number of calls that a local police department responded to for a
sample of 9 months is shown. Find the mean.

475,447, 440, 761, 993, 1052, 783, 671,621

Solution:
σ 𝑥 475 + 447 + 440, +761 + 993 + 1052 + 783, +671 + 621
𝑋ത = =
𝑛 9
6243
= ≈ 693.7
9
Hence, the mean number of incidents per month to which the police
responded is 693. 7
Median
Definition: The median is the middle value of a dataset when the values are
arranged in ascending or descending order. If there is an even number of
observations, the median is the average of the two middle numbers.

Usage: The median is particularly useful for skewed distributions or datasets with
outliers, as it is less affected by extreme values. For example, in income data, a
few very high incomes can skew the mean, while the median provides a better
indication of the "typical" income.
Median
The median is the halfway point in a data set. Before you can find
this point, the data must be arranged in ascending or increasing
order. When the data set is ordered, it is called a data array.
Finding the Median
Step 1 Arrange the data values in ascending order.
Step 2 Determine the number of values in the data set.
Step 3 a. If n is odd, select the middle data value as the median.

෥=𝒙
𝒙 𝒏+𝟏
𝟐
b. If n is even, find the mean of the two middle values. That is, add
them and divide the sum by 2.
𝒙 𝒏 +𝒙 𝒏
𝟐 𝟐 +𝟏
෥=
𝒙
𝟐
Example:
Police Officers Killed
The number of police officers killed in the line of duty over the last 11 years is
shown. Find the median.
177, 153, 122, 141, 189, 155, 162, 165, 149, 157, 240

Tornadoes in the United States


The number of tornadoes that have occurred in the United States over an 8-
year period follows. Find the median.
684, 784, 656, 702, 856, 1133, 1132, 1303
Mode
Definition: The mode is the value that appears most frequently in a
dataset.

Usage: The mode is particularly useful for categorical data where


we want to know which is the most common category. For example,
in a survey of favorite ice cream flavors, the mode would indicate the
most popular flavor among respondents.
Mode
The value that occurs most often in a data set is called the mode.
A data set that has only one value that occurs with the greatest
frequency is said to be unimodal. If a data set has two values that
occur with the same greatest frequency, both values are considered
to be the mode and the data set is said to be bimodal. If a data set
has more than two values that occur with the same greatest
frequency, each value is used as the mode, and the data set is said
to be multimodal.
Example:
NFL Signing Bonuses
Find the mode of the signing bonuses of eight NFL players for a specific
year. The bonuses in millions of dollars are
18.0, 14.0, 34.5, 10, 11.3, 10, 12.4, 10

Tornadoes in the United States


The number of accidental deaths due to firearms for a six-year period is
shown. Find the mode
649, 789, 642, 613, 610, 600
Midrange
The symbol MR is used for the midrange. The midrange is defined as
the sum of the lowest and highest values in the data set, divided by
2nge.

𝑙𝑜𝑤𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 + ℎ𝑖𝑔ℎ𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒


𝑀𝑅 =
2
Example:
Bank Failures
The number of bank failures for a recent five-year period is shown. Find the
midrange.
3, 30, 148, 157, 71
NFL Signing Bonuses
Find the midrange of data for the NFL signing bonuses in Example 3–6. The
bonuses in millions of dollars are
18.0, 14.0, 34.5, 10, 11.3, 10, 12.4, 10
Weighted Mean
Find the weighted mean of a variable X by multiplying each value by its
corresponding weight and dividing the sum of the products by the sum of
the weights.
𝑤1 𝑋1 + 𝑤2 𝑋2 + ⋯ + 𝑤𝑛 𝑋𝑛 σ 𝑤𝑋
𝑋ത = =
𝑤1 + 𝑤2 + ⋯ + 𝑤𝑛 σ𝑤
where
𝑤1 + 𝑤𝑛 + ⋯ + 𝑤𝑛 are the weights 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 are the values.
Example:
Grade Point Average
A student received an A in English Composition I (3 credits), a C in Introduction to
Psychology (3 credits), a B in Biology I (4 credits), and a D in Physical Education (2
credits). Assuming A = 4 grade points, B =3 grade points, C= 2 grade points and D=1
grade points.
Example:

The grade point average is 2. 7


Properties and Uses of Central Tendency
The Mean
1. The mean is found by using all the values of the data.
2. The mean varies less than the median or mode when samples are taken
from the same population and all three measures are computed for
these samples.
3. The mean is used in computing other statistics, such as the variance.
4. The mean for the data set is unique and not necessarily one of the data
values.
Properties and Uses of Central Tendency
The Mean
5. The mean cannot be computed for the data in a frequency distribution
that has an open-ended class.
6. The mean is affected by extremely high or low values, called outliers,
and may not be the appropriate average to use in these situations.
Properties and Uses of Central Tendency
The Median
1. The median is used to find the center or middle value of a data set.
2. The median is used when it is necessary to find out whether the data
values fall into the upper half or lower half of the distribution.
3. The median is used for an open-ended distribution.
4. The median is affected less than the mean by extremely high or
extremely low values.
Properties and Uses of Central Tendency
The Midrange
1. The midrange is easy to compute.
2. The midrange gives the midpoint.
3. The midrange is affected by extremely high or low values in a data set.
Measures of Variation
Range
The range is the highest value minus the lowest value. The symbol R is
used for the range.

𝑅 = ℎ𝑖𝑔ℎ𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 − 𝑙𝑜𝑤𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒


Example:
Bank Failures
Find the ranges for the paints
Population Variance
Population Standard Deviation
Example:
Comparison of Outdoor Paint
Find the variance and standard deviation for the data set for brand A paint.
The number of months brand A lasted before fading was
10, 60, 50, 30, 40, 20
Sample Variance
Definition: Variance measures the average squared deviation of each
data point from the mean of the dataset. It quantifies how much the
values in a dataset differ from the mean.

Usage: Variance is useful for understanding the degree of spread in a


dataset. A higher variance indicates that the data points are more
spread out from the mean, while a lower variance indicates that they
are closer to the mean. Variance is often used in fields like finance,
quality control, and research to assess risk and variability.
Sample Variance
Sample Standard Deviation
Standard deviation (SD) measures how spread out the values in a
dataset are around the mean (average). A low standard deviation
indicates that the data points tend to be close to the mean, while a
high standard deviation indicates that the data points are spread out
over a wider range of values.
Sample Standard Deviation
Example:
Teacher Strikes
The number of public-school teacher strikes in Pennsylvania for a random
sample of school years is shown. Find the sample variance and the sample
standard deviation.
9 ,10, 14, 7, 8, 3
Coefficient of Variation
Example:
Sales of Automobiles
The mean of the number of sales of cars over a 3-month period is 87, and
the standard deviation is 5. The mean of the commissions is $5225, and
the standard deviation is $773. Compare the variations of the two.
Example:
Sales of Automobiles
Measures of Position/Location
Introduction:
In addition to measures of central tendency and measures of variation,
there are measures of position or location. These measures include
standard scores, percentiles, deciles, and quartiles. They are used to
locate the relative position of a data value in the data set. For example, if a
value is located at the 80th percentile, it means that 80% of the values fall
below it in the distribution and 20% of the values fall above it
Standard Scores
Example:
Test Scores
A student scored 65 on a calculus test that had a mean of 50 and a
standard deviation of 10; she scored 30 on a history test with a mean of 25
and a standard deviation of 5. Compare her relative positions on the two
tests.
Example:
Test Scores
Find the z score for each test, and state which is higher.
Percentiles
Percentiles divide the data set into 100 equal groups.
Percentiles are symbolized by
𝑃1 , 𝑃2 , 𝑃3 , . . , 𝑃99
and divide the distribution into 100 groups.
Percentiles
Example:
Test Scores
A teacher gives a 20-point test to 10 students. The scores are shown here.
Find the percentile rank of a score of 12.
18, 15, 12, 6, 8, 2, 3, 5, 20, 10
Quartiles
Quartiles divide the distribution into four equal groups, denoted
by Q1, Q2, Q3.
Quartiles
Quartiles
Example:
Test Scores
Find Q1, Q2, and Q3 for the data set 15, 13, 6, 5, 12, 50, 22, 18.
Example:
Test Scores
Find Q1, Q2, and Q3 for the data set 15, 13, 6, 5, 12, 50, 22, 18.
Interquartile Range (IQR)
Deciles
Deciles divide the distribution into 10 groups.
Deciles
Deciles divide the distribution into 10 groups.
THANK YOU!

You might also like