Module 3 (2)
Module 3 (2)
Measures of Center: Measures of center are numerical values that tend to report the middle of a set of
data. The two that we will focus on are the mean and the median.
1. Mean: The mean of a set of n observations is simply the sum of the observations divided by the
number of observations, n.
2. Median: The median of a set of observations, ordered from smallest to largest, is a value such that
at least half of the observations are less than or equal to that value and at least half the
observations are greater than or equal to that value.
Measures of Variation or Spread: Measures of variation include the IQR and standard deviation. These
numerical summaries describe the amount of spread that is found among the data, with larger values
indicating more variability.
a. Standard Deviation: Standard deviation is a measure of the spread of the observations from the
mean. It is actually the square root of an average of the squared deviations of the observations
from the mean. We can think of the standard deviation as approximately an average distance of
the observations from the mean.
b. IQR: The IQR measures the spread of the middle 50% of the data. It is defined as the difference
between the 3rd quartile (Q3) and the 1st quartile (Q1). These quartiles are also called the 75th and
25th percentiles, respectively. IQR = Q3 – Q1.
Task: The data set contains information on employees at a company. Explore possible questions this data
could be used to address. Create appropriate graphs and obtain descriptive statistics for current salary,
and discuss the results.
You can see the variables in the data set and their values. The first variable you should see is Last.
What is the second variable present in the data set?
What type of variable is it?
What is the eighth variable present in the data set?
What type of variable is it?
Create a histogram for salary.
Draw a quick sketch of the histogram and describe what the histogram shows about the
distribution of salaries.
You would like to compare the distribution of salary for different cities. Generate histograms again.
Compare and contrast the distribution of salary for the groups – can we use the same descriptions
for both histograms?
Obtain a boxplot for salary. Make a quick sketch of this boxplot, and describe what the boxplot
shows about the distribution of salary. What do the various lines on the boxplot represent?
As we did with histograms, we can also use side-by-side boxplots to compare the distributions.
How does the distribution for current salary compare for the different cities?
Numerical summaries may also be obtained for any quantitative variable. Complete below details:
How will the mean test score change if the grade of 0 is not included?
If there is an outlier test score of 20 points, which measure of center would you recommend that Mark
report?