Data Analysis
Data Analysis
Example:
A teacher asked 12 students how many pets they owned. The results are shown in the table
above. What is the average number of pets owned by the students?
Median: The median is the middle value when the data are ordered from least to greatest.
• If the number of values is odd, the median is the middle value.
• If the number of values is even, the median is the average of the two middle values.
Example: 2, 5, 6, 7, 7, 10
What is the median of the data set above?
Mode: The mode is the value that appears most frequently in a data set.
A data set can have no mode if no value appears more than any other; a data set can also
have more than one mode.
Example:
The table above shows the items Stevie bought from a garage sale and their prices.
What is the mean price of the items Stevie bought?
What is the median price of the items Stevie bought?
What is the mode of the prices?
Range: The range is the difference between the maximum and minimum values.
It measures the total spread of the data. A larger range indicates a greater spread in the data.
Standard deviation: Standard deviation is the average distance between the mean and a
value in the data set.
It measures the typical spread from the mean; Larger standard deviations indicate greater
spread in the data.
Example:
Of the two dot plots shown above, which one has a greater standard deviation?
Outlier: An outlier is a value in a data set that significantly differs from other values.
Effect on the range and standard deviation
• The inclusion of outliers increases the spread of data, leading to larger range and standard
deviation.
• Removing outliers decreases the spread of data, leading to smaller range and standard
deviation.
Effect on the mean
• If a very large outlier is removed, the mean of the remaining values will decrease.
• If a very small outlier is removed, the mean of the remaining values will increase.
Effect on the median
• If a very large outlier is removed, the median of the remaining value will either decrease
or remain the same.
• If a very small outlier is removed, the median of the remaining value will either increase
or remain the same.
Example:
The dot plot above shows the height in inches of 20 elementary school students.
If the shortest student is removed from the data set and the summary statistics are
recalculated, how would they compare to the summary statistics for all 20 students?
The mean height of the 19 remaining students would be that of all 20 students.
The median height of the 19 remaining students would be that of all 20 students.
The range of the heights of the 19 remaining students would be that of all 20 students.
Ned runs a soybean farm and recorded the yields for 175 different one-acre sections. The
results are shown in the graph above. Which of the following could be the median yield of
Ned's soybean acres?
A. 44 bushels B. 48 bushels C. 52 bushels D. 56 bushels
2. The minimum value of a data set consisting of 15 positive integers is 29. A new data set
consisting of 16 positive integers is created by including 22 in the original data set. Which of
the following measures must be 7 greater for the new data set than for the original data set?
A. The mean B. The median C. The range D. The standard deviation
Histograms use bars to represent the frequency at which a range of values occurs.
Histograms are useful because it's often impractical to list every possible value independently.
Line graphs usually show how quantities change over time.
key phrases in line graph translation problems and how to interpret them
A scatterplot displays data about two variables as a set of points in the xy -plane. Each axis
of the plane usually represents a variable in a real-world scenario.
While each point in a scatterplot represents a specific observation, the line of best
fit describes the general trend based on all of the points.
The scatterplot above shows the relative housing cost and the population density for several
large US cities in the year 2005. The equation of the line of best fit is y = 0.0125 x + 61 .
①The constant 61 means that when the population density is 0 people per square mile of
land area, the relative housing cost is .
②The coefficient 0.0125 means that as the population density increases by 1,000 people
per square mile land area, the relative housing cost increases by of the national average
cost.
③ According to the graph, the predicted relative housing cost for a population density
of 15,000 people per square mile land area is approximately of the national average cost.
④According to the equation of the line of best fit, the predicted relative housing cost for a
population density of 5,000 people per square mile land area is of the national
average cost.
Once we determine the correct type of equation to use, we can write the equation by using
our knowledge of linear and exponential equations.
• Using y = mx + b to represent a linear equation:
m is the number repeatedly added, the rate of change, or the slope of the line when the
equation is graphed in the xy -plane.
b is the initial value, or the y -intercept of the line when the equation is graphed in the xy
-plane.
Example: Match each of the four scenarios below to their appropriate description.
Description Scenario
Increasing linear The population of a village decreases by 2.3% each year.
Decreasing linear Jorge has $200 for lodging while traveling. He pays $40 per
day staying at a hostel.
Increasing exponential A plant's height increases by 5 centimeters per day.
Decreasing exponential For a savings account, 1.25% of the current value is added to
the value of the account each successive year.