0% found this document useful (0 votes)
4 views

Module 3 (2)

The document discusses measures of center, specifically the mean and median, and measures of variation, including standard deviation and interquartile range (IQR). It provides examples using Lebron James' scoring data and explores how to visualize and analyze employee salary data through graphs and descriptive statistics. Additionally, it addresses the impact of outliers on the mean and median in reporting test scores.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Module 3 (2)

The document discusses measures of center, specifically the mean and median, and measures of variation, including standard deviation and interquartile range (IQR). It provides examples using Lebron James' scoring data and explores how to visualize and analyze employee salary data through graphs and descriptive statistics. Additionally, it addresses the impact of outliers on the mean and median in reporting test scores.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Describing Data with Graphs and Numbers

Measures of Center: Measures of center are numerical values that tend to report the middle of a set of
data. The two that we will focus on are the mean and the median.

1. Mean: The mean of a set of n observations is simply the sum of the observations divided by the
number of observations, n.

2. Median: The median of a set of observations, ordered from smallest to largest, is a value such that
at least half of the observations are less than or equal to that value and at least half the
observations are greater than or equal to that value.

Measures of Variation or Spread: Measures of variation include the IQR and standard deviation. These
numerical summaries describe the amount of spread that is found among the data, with larger values
indicating more variability.

a. Standard Deviation: Standard deviation is a measure of the spread of the observations from the
mean. It is actually the square root of an average of the squared deviations of the observations
from the mean. We can think of the standard deviation as approximately an average distance of
the observations from the mean.

b. IQR: The IQR measures the spread of the middle 50% of the data. It is defined as the difference
between the 3rd quartile (Q3) and the 1st quartile (Q1). These quartiles are also called the 75th and
25th percentiles, respectively. IQR = Q3 – Q1.

Mean and Median


We are interested in analyzing Lebron James’ scoring output by game. We have his scoring output for 5
games which we have arranged from lowest to highest: 6, 24, 28, 34, 36. There are two measures of
center we could report.

Which measure would be better to report? Median Mean

Which is the most likely value for the mean? 8 12 26

What is the median? 24 28 34

Visualizing and Exploring a Data Set


In this activity, you will learn how to create graphs and obtain descriptive statistics for a data set.

Task: The data set contains information on employees at a company. Explore possible questions this data
could be used to address. Create appropriate graphs and obtain descriptive statistics for current salary,
and discuss the results.
 You can see the variables in the data set and their values. The first variable you should see is Last.
What is the second variable present in the data set?
What type of variable is it?
What is the eighth variable present in the data set?
What type of variable is it?
 Create a histogram for salary.
Draw a quick sketch of the histogram and describe what the histogram shows about the
distribution of salaries.
 You would like to compare the distribution of salary for different cities. Generate histograms again.
Compare and contrast the distribution of salary for the groups – can we use the same descriptions
for both histograms?
 Obtain a boxplot for salary. Make a quick sketch of this boxplot, and describe what the boxplot
shows about the distribution of salary. What do the various lines on the boxplot represent?
 As we did with histograms, we can also use side-by-side boxplots to compare the distributions.
How does the distribution for current salary compare for the different cities?

Numerical summaries may also be obtained for any quantitative variable. Complete below details:

Mean: Median: Standard Deviation:

Q1: Q3: IQR: Q3-Q1 =

Min: Max: Range: Max-Min =

Which Measure of Center to Report?


Mark is a Stats 250 GSI who would like to report a measure of center for scores on the first exam. The
mean score for his lab section was 77.46 points and the median was 84 points. One of Mark’s students did
not take the exam and received a zero. Since Mark knows this score will not count against the student, he
removes the score of zero from his data.

How will the mean test score change if the grade of 0 is not included?

If there is an outlier test score of 20 points, which measure of center would you recommend that Mark
report?

You might also like