Data Management
Data Management
MANAGEME
NT
Gathering and organizing Data Measures of Variation
-categorical data -range
- measurement scales -quartile deviation
- variables -interquartile range
-mean deviation
Measures of Central Tendency -variance
-mean -standard deviation
-median
-mode
Measures of Relative Position
-percentiles
-quartiles
-deciles
LEARNING OUTCOMES
1. Use variety of statistical tools to process and manage
numerical data;
a. Gathering and Organizing Data
b. Measures of Central Tendency
c. Measures of Position
d. Measures of Variation/Dispersion
e. Correlation
Quantitative data is numerical. It’s used to define information that can be counted.
Some examples of quantitative data include distance, speed, height, length and weight.
It’s easy to remember the difference between qualitative and quantitative data, as one
refers to qualities, and the other refers to quantities.
Discrete data is a whole number that can’t be divided or broken into individual parts, fractions or decimals. Examples
of discrete data include the number of pets someone has – one can have two dogs but not two-and-a-half dogs. The
number of wins someone’s favourite team gets is also a form of discrete data because a team can’t have a half win – it’s
either a win, a loss, or a draw.
Continuous data describes values that can be broken down into different parts, units, fractions and decimals.
Continuous data points, such as height and weight, can be measured. Time can also be broken down – by half a second
or half an hour. Temperature is another example of continuous data.
An average that uses the exact value of each entry is the mean (sometimes
called the arithmetic mean). To compute the mean, we add the values of all the
entries and then divide by the number of entries.
MEASURES OF CENTRAL TENDENCY
MODE
Count the letters in each word of this sentence and give the mode. The
numbers of letters in the words of the sentence are
5 3 7 2 4 4 2 4 8 3 4 3 4
Scanning the data, we see that 4 is the mode because more words have 4
letters than any other number. For larger data sets, it is useful to order—or sort
—the data before scanning them for the mode.
Not every data set has a mode. For example, if Professor Fair gives equal
numbers of A’s, B’s, C’s, D’s, and F’s, then there is no modal grade.
MEASURES OF CENTRAL TENDENCY
MODE
UNIMODAL - having one mode
MEDIAN
MEASURES OF CENTRAL TENDENCY
MEDIAN
What do barbecue-flavored potato chips cost? According to Consumer
Reports, Volume 66, No. 5, the prices per ounce in cents of the rated chips
are
19 19 27 28 18 35
(a) To find the median, we first order the data, and then note that there are
an even number of entries. So the median is constructed using the two
middle values.
MEASURES OF CENTRAL TENDENCY
MEDIAN
(b) According to Consumer Reports, the brand with the lowest overall
taste rating costs 35 cents per ounce. Eliminate that brand, and find the
median price per ounce for the remaining barbecue-flavored chips. Again
order the data. Note that there are an odd number of entries, so the median
is simply the middle value.
18 19 19 27 28
MEASURES OF CENTRAL TENDENCY
MEDIAN
(c) One ounce of potato chips is considered a small serving. Is it reasonable
to budget about $10.45 to serve the barbecue-flavored chips to 55 people?
Yes, since the median price of the chips is 19 cents per small serving.
This budget for chips assumes that there is plenty of other food!
MEASURES OF CENTRAL TENDENCY
MEDIAN
Belleview College must make a report to the budget committee about the average
credit hour load a full-time student carries. (A 12-credit-hour load is the minimum
requirement for full-time status. For the same tuition, students may take up to 20 credit
hours.) A random sample of 40 students yielded the following information (in credit
hours):
17 12 14 17 13 16 18 20 13 12
12 17 16 15 14 12 12 13 17 14
15 12 15 16 12 18 20 19 12 15
18 14 16 17 15 19 12 13 12 15
MEASURES OF CENTRAL TENDENCY
MEDIAN
MEASURES OF CENTRAL TENDENCY
MEAN
To graduate, Linda needs at least a B in biology. She did not do very well on her
first three tests; however, she did well on the last four. Here are her scores:
58 67 60 84 93 98 100
Compute the mean and determine if Linda’s grade will be a B (80 to 89 average)
or a C (70 to 79 average).
MEASURES OF CENTRAL TENDENCY
MEAN
MEAN
When we compute the mean, we sum the given data. There is a convenient
notation to indicate the sum. Let x represent any value in the data set. Then the
notation
∑x (read “the sum of all given x values”)
means that we are to sum all the data values. In other words, we are to sum all
the entries in the distribution. The summation symbol ∑ means sum the
following and is capital sigma, the S of the Greek alphabet.
The symbol for the mean of a sample distribution of x values is denoted by
(read “x bar”). If your data comprise the entire population, we use the symbol
(lowercase Greek letter mu, pronounced “mew”) to represent the mean.
MEASURES OF CENTRAL TENDENCY
MEAN
MEASURES OF CENTRAL TENDENCY
MEAN
A resistant measure is one that is not influenced by extremely high or low data
values. The mean is not a resistant measure of center because we can make the
mean as large as we want by changing the size of only one data value. The
median, on the other hand, is more resistant. However, a disadvantage of the
median is that it is not sensitive to the specific size of a data value.
A measure of center that is more resistant than the mean but still sensitive to
specific data values is the trimmed mean. A trimmed mean is the mean of the
data values left after “trimming” a specified percentage of the smallest and
largest data values from the data set. Usually a 5% trimmed mean is used. This
implies that we trim the lowest 5% of the data as well as the highest 5% of the
data. A similar procedure is used for a 10% trimmed mean.
MEASURES OF CENTRAL TENDENCY
MEAN
MEASURES OF CENTRAL TENDENCY
MEAN
Barron’s Profiles of American Colleges, 19th Edition, lists
average class size for introductory lecture courses at each of
the profiled institutions. A sample of 20 colleges and
universities in California showed class sizes for introductory
lecture courses to be
14 20 20 20 20
23 25 30 30 30
35 35 35 40 40
42 50 50 80 80
MEASURES OF CENTRAL TENDENCY
MEAN
MEASURES OF CENTRAL TENDENCY
MEAN
MEASURES OF CENTRAL TENDENCY
WEIGHTED AVERAGE
Sometimes we wish to average numbers, but we want to
assign more importance, or weight, to some of the numbers.
For instance, suppose your professor tells you that your grade
will be based on a midterm and a final exam, each of which is
based on 100 possible points. However, the final exam will be
worth 60% of the grade and the midterm only 40%. How
could you determine an average score that would reflect these
different weights? The average you need is the weighted
average.
MEASURES OF CENTRAL TENDENCY
WEIGHTED AVERAGE
WEIGHTED AVERAGE
MEASURES OF CENTRAL TENDENCY
EXERCISES
1. How hot does it get in Death Valley? The following data are
taken from a study conducted by the National Park System,
of which Death Valley is a unit. The ground temperatures
(8F) were taken from May to November in the vicinity of
Furnace Creek.
EXERCISES
2. How large is a wolf pack? The following information is
from a random sample of winter wolf packs in regions of
Alaska, Minnesota, Michigan, Wisconsin, Canada, and
Finland (Source: The Wolf, by L. D. Mech, University of
Minnesota Press). Winter pack size:
13 10 7 5 7 7 2 4 3
2 3 15 4 4 2 8 7 8
Compute the mean, median, and mode for the size of winter
wolf packs
MEASURES OF CENTRAL TENDENCY
EXERCISES
3. How expensive is Maui? If you want a vacation rental condominium (up to four
people), visit the Brase/Brase statistics site at
http://math.college.hmco.com/students, find the link to Maui, and then search for
accommodations. The Maui News gave the following costs in dollars per day for a
random sample of condominiums located throughout the island of Maui.
ANSWERS
1. Mean 167.3 °F;
Median 171 °F;
Mode 178 °F.
2. Mean 6.2;
Median 6;
Mode 7.
QUARTILES
Quartiles are the score points that divide a distribution into four
equal parts. The three (3) main quartiles are denoted by 𝑄₁, 𝑄₂
and 𝑄₃. 𝑄₁ is read as 1st quartile or the lower quartile, 𝑄₂ as 2nd
quartile or the median, and 𝑄₃ as 3rd quartile or the upper quartile.
Meanwhile, the difference between the upper quartile ( 𝑄₃) and
lower quartile (𝑄₁) is called interquartile range (I.R.= 𝑄₃ − 𝑄₁).
It means that 25% falling below 𝑄₁ and 25% above 𝑄₃, thus 50% of
the distribution fall within the interquartile range.
DECILES
Deciles are the nine score points that divide a distribution into ten equal parts.
These deciles are denoted as 𝐷₁,𝐷₂,𝐷₃, … ,𝐷₉ , such that 10% of the data fall
below 𝐷₁, 20% of the data fall below 𝐷2 and so on. Also, 𝐷₁ is read as 1st
decile, 𝐷₂ as 2nd decile, and so on
PERCENTILES
Percentiles are the ninety-nine score points that divide a
distribution into one hundred equal parts, so that each part
represents of the data set. These values are denoted by 𝑃₁, 𝑃₂,
…, 𝑃₉₉, such that 1% of the data falls below 𝑃₁ 2% of the data
fall below 𝑃₂ and so on. Also, 𝑃₁ is read as “first percentile,”
as 𝑃₂, “second percentile,” and so on.
EXAMPLE
Given a set of scores of 15 students in their English
Quiz:
Notice, the least value in the data is 2 and the greatest value in
the data is 15
The middle value in the data is 8, which is The upper quartile (𝑄₃)is the value that is
the 2nd Quartile (𝑄₂ ) or 5th decile (𝐷₅) or the between the middle value and the greatest
50th percentile (𝑃₅₀). value in the data set. So, the upper quartile
The lower quartile is the value that is (𝑄₃) or the 75th percentile (𝑃₇₅) is 11.
between the median and the least value in the
data set. So, lower quartile (𝑄₁ ) or the 25th
percentile (𝑃₂₅) is 5.
EXERCISES
MEASURES OF VARIATION/DISPERSION
FOCUS POINTS
RANGE
The range is the simplest measure of variability. It is the
difference between the largest value and the smallest value.
R=H–L
where R = Range, H = Highest value, L = Lowest value
RANGE
The following are the daily wages of 8 factory workers of
two garment factories. Factory A and factory B. Find the range of salaries in
peso (Php).
Factory A: 400, 450, 520, 380, 482, 495, 575, 450.
Factory B: 450, 400, 450, 480, 450, 450, 400, 672
Workers of both factories have mean wage = 469
Finding the range of wages: Range = Highest wage – Lowest wage
Range A = 575 – 380 = 195
Range B = 672 – 350 = 322
The range tells us that it is not a stable measure of variability because its value can fluctuate
greatly even with a change in just a single value, either the highest or lowest.
MEASURES OF VARIATION/DISPERSION
AVERAGE/MEAN DEVIATION
Find the average deviation of the following data:
AVERAGE/MEAN DEVIATION
MEASURES OF VARIATION/DISPERSION
AVERAGE/MEAN DEVIATION
MEASURES OF VARIATION/DISPERSION
AVERAGE/MEAN DEVIATION
MEASURES OF VARIATION/DISPERSION
VARIATION
Variance is not only useful, it can be computed with ease and it can also be broken
into two or more component sums of squares that yield useful information.
The variance (∂² ) of a data is equal to 1/N . The sum of their squares minus the
square of their mean. It is virtually the square of the standard deviation.
MEASURES OF VARIATION/DISPERSION
STANDARD DEVIATION
To compute for the standard deviation of an ungrouped data, we use the formula:
MEASURES OF VARIATION/DISPERSION
QUARTILE DEVIATION
Quartile deviation is a statistic that measures the deviation in the middle
of the data. Quartile deviation is also referred to as the semi interquartile
range and is half of the difference between the third quartile and the first
quartile value.
➢ A correlation coefficient greater than zero indicates a positive relationship while a value
less than zero signifies a negative relationship.
➢ A value of zero indicates no relationship between the two variables being compared.
➢ Calculating the correlation coefficient is time-consuming, so data are often plugged into
a calculator, computer, or statistics program to find the coefficient.
CORRELATION
The images below illustrate what the relationships might look like at different degrees of
strength (for different values of r).
CORRELATION
For a correlation coefficient of zero, the points have no direction, the shape is almost round,
and a line does not fit to the points on the graph.
As the correlation coefficient increases, the observations group closer together in a linear
shape.
The line is difficult to detect when the relationship is weak (e.g., r = -0.3), but becomes more
clear as relationships become stronger (e.g., r = -0.99)
QUIZ