0% found this document useful (0 votes)
55 views12 pages

Chapter 2 Handout Jan 30

This document discusses key concepts for describing quantitative distributions with numbers, including: - Population and sample parameters and statistics - Measures of center such as mean, median, and mode - Measures of spread such as range and standard deviation - Using boxplots to visualize data distributions through five number summaries and identify outliers - Choosing appropriate measures based on data type, distribution, and presence of outliers

Uploaded by

Information Me
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views12 pages

Chapter 2 Handout Jan 30

This document discusses key concepts for describing quantitative distributions with numbers, including: - Population and sample parameters and statistics - Measures of center such as mean, median, and mode - Measures of spread such as range and standard deviation - Using boxplots to visualize data distributions through five number summaries and identify outliers - Choosing appropriate measures based on data type, distribution, and presence of outliers

Uploaded by

Information Me
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

MA 218 - Chapter 2: Describing Quantitative

Distributions with Numbers


• Population: The complete set of people or things being studied

• Sample: The subset of the population from which information is actually obtained

• Parameter: A numerical characteristic of a population. (Usually not measured and/or


not measurable)

• Statistic: A numerical summary of a sample or (a number that summarizes the raw


data)

• Determine whether the underlined value is a parameter or a statistic.

1. Following the 2014 national midterm election, 18% of the governors of the 50
United States were female.

2. The average score for a class of 28 students taking a calculus midterm was 72%.

3. In a national survey of 1300 high school students, 32% of respondents reported


that someone had bullied them at school.

4. Ty Cobb is one of Major League Baseball’s greatest hitter of all time with a career
batting average of 0.366

5. Only 12 men have walked on the moon. The average time these men spent on
the moon was 43.92 hours.

6. A study of 6076 adults in public rest rooms found that 23% did not wash their
hands before exiting.

7. Interviews of 100 adults 18 years or older, conducted nationwide, found that 44%
could state the minimum age requirement for the office of U.S. president.
• The mean of a variable is computed by adding all values of the variable in the data
set and dividing by the number of observations. If a population has N observations
and a sample has n, then
x1 + x2 + · · · + xN
– Population mean: µ =
N
x1 + x2 + · · · + xn
– Sample mean: x̄ =
n

• The median of a variable is the value that lies in the middle of the data when arranged
in ascending order.

– If n (or N ) is odd, the median is the value exactly in the middle: that is, at the
n+1
2
position.
n
– If n (or N ) is even, the median is the average of the two middle values: 2
and
n
2
+1

• The mode of a variable is the most frequent observation of the variable that occurs in
the data set. If no observation occurs more than once, we say the data have no mode.
(A set of data can have no mode, one mode, or more than one mode.)

• The range of a variable is the di↵erence between the largest and the smallest data
value.

– Range = R = largest data value smallest data value

• A numerical summary of data is said to be resistant if extreme values (very large or


small) relative to the data do not a↵ect its value substantially.

2
• Use the data in Table 1 to answer the questions below.

1. Computer the population mean, µ (i.e., the average of all 10 test scores).

2. Suppose you use a random number generator to choose a sample of 4 students,


and your sample consists of {Michelle, Jennifer, Dave, and Justine}. Compute
the sample mean, x̄, for this sample.

3. Calculate the median of the set of student scores in Table 1.

4. Remove Justine’s score from the set of student scores in Table 2 and calculate the
new median.

3
4
• The data in the table below represents the birth weights (in pounds) of 25 randomly
selected babies.

1. Find the mean and median birth weight using technology.

The mean x̄ =

The median Med=

2. Calculate the range for the data.

3. Find the mode of the birth weight data set.

4. The histogram below shows the birth weight data. Use it to determine the shape
of the data (is it skewed left, skewed right, or bell shaped (normal))?

5
• The sample variance is given by

(x1 x̄)2 + (x2 x̄)2 + · · · + (xn x̄)2


s2 =
n 1
p
• Sample standard deviation: s = s2

• Example:

• Use technology to calculate the standard deviation of the birth weight data.

6
• Mean, median, standard deviation, and distribution shapes

7
• Quartiles divide data sets into fourths, or four equal parts.

– Q1 has 25% of the values below it.


– Q2 has 50% of the values below it.
– Q3 has 75% of the values below it.

• To find the quartiles:

– Arrange the data in ascending order.


– Determine the median, M (which is also the second quartile, Q2 )
– Divide the data set into halves: the observations below M and the observations
above M . The first quartile, Q1 , is the median of the bottom half of the data and
the third quartile, Q3 , is the median of the top half of the data.

8
• The interquartile range, IQR gives the spread of the middle 50% of the values or
observations.

– IQR = Q3 Q1

• Checking for outliers by using quartiles

– Determine the first and third quartiles of the data.


– Compute the interquartile range.
– Determine the fences. Fences serve as cuto↵ points for determining outliers.

Lower fence = Q1 1.5(IQR)

Upper fence = Q3 + 1.5(IQR)


– If a data value is less than the lower fence or greater than the upper fence, it is
considered an outlier.

• The five-number summary of a set of data consists of

– xmin (the smallest data value)


– Q1
– M = Q2
– Q3
– xmax (the largest data value)

• Drawing a Boxplot

– Find the five-number summary.


– Determine the lower and upper fences.
– Draw a number line long enough to include the maximum and minimum values.
Insert vertical lines at Q1 , M and Q3 . Enclose these vertical lines in a box.
– Label the lower and upper fences.
– Draw a line from Q1 to the smallest data value that is larger than the lower fence.
Draw a line from Q3 to the largest data value that is smaller than the upper fence.
These are called whiskers.
– Any data values less than the lower fence or greater than the upper fence are
outliers and are marked with an asterisk (*).

9
• Use the data in Table 2 to answer the questions below.

1. Find the five number summary for the data.

2. Compute the interquartile range for the data.

3. Find the upper and lower fences. Are there any outliers?

4. Create a boxplot for the data.

10
• The results of an experiment in which researchers placed colored boards at random
locations in a field and then counted the number of beetles attracted to the board in
a 48-hour period appear given below.

1. Which board color attracted the most beetles?

2. There is virtually no di↵erence in the median in which two colors?

3. Which color had the smallest spread in the middle 50% of its data?

11
• Shapes of boxplots

• Choosing measures

– Some measures (such as the mean, the standard deviation, and the range) are
a↵ected by outliers.
– When data are quantitative, the distribution is reasonable symmetric, and there
are no outliers, use the mean and standard deviation.
– When data are quantitative, the distribution is skewed, and outliers are present,
use the five-number summary.
– When most frequent observation is desired or data is qualitative, use the mode.

12

You might also like