0% found this document useful (0 votes)
27 views66 pages

Data Management

This document discusses various statistical concepts related to data management, including: - Gathering and organizing categorical and measurement scale data, as well as variables - Common measures of central tendency (mean, median, mode) and how they can be affected by outliers - Measures of relative position like percentiles, quartiles, and deciles - Measures of variation or dispersion such as range, quartile deviation, interquartile range, mean deviation, variance and standard deviation - Examples of how to calculate and interpret these different statistical measures
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views66 pages

Data Management

This document discusses various statistical concepts related to data management, including: - Gathering and organizing categorical and measurement scale data, as well as variables - Common measures of central tendency (mean, median, mode) and how they can be affected by outliers - Measures of relative position like percentiles, quartiles, and deciles - Measures of variation or dispersion such as range, quartile deviation, interquartile range, mean deviation, variance and standard deviation - Examples of how to calculate and interpret these different statistical measures
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 66

DATA

MANAGEME
NT
Gathering and organizing Data Measures of Variation
-categorical data -range
- measurement scales -quartile deviation
- variables -interquartile range
-mean deviation
Measures of Central Tendency -variance
-mean -standard deviation
-median
-mode
Measures of Relative Position
-percentiles
-quartiles
-deciles
LEARNING OUTCOMES
1. Use variety of statistical tools to process and manage
numerical data;
a. Gathering and Organizing Data
b. Measures of Central Tendency
c. Measures of Position
d. Measures of Variation/Dispersion
e. Correlation

2. Advocate the use of statistical data in making


important decisions.
GATHERING and ORGANIZING DATA
Categorical Data (nominal & ordinal scales)
Continuous Data (ratio & interval scales)

Qualitative data refers to information about qualities, or information that cannot be


measured. It’s usually descriptive and textual. Examples include someone’s eye colour
or the type of car they drive. In surveys, it’s often used to categorise ‘yes’ or ‘no’
answers.

Quantitative data is numerical. It’s used to define information that can be counted.
Some examples of quantitative data include distance, speed, height, length and weight.
It’s easy to remember the difference between qualitative and quantitative data, as one
refers to qualities, and the other refers to quantities.
Discrete data is a whole number that can’t be divided or broken into individual parts, fractions or decimals. Examples
of discrete data include the number of pets someone has – one can have two dogs but not two-and-a-half dogs. The
number of wins someone’s favourite team gets is also a form of discrete data because a team can’t have a half win – it’s
either a win, a loss, or a draw.
Continuous data describes values that can be broken down into different parts, units, fractions and decimals.
Continuous data points, such as height and weight, can be measured. Time can also be broken down – by half a second
or half an hour. Temperature is another example of continuous data.

Discrete versus continuous


There’s an easy way to remember the difference between the two types of quantitative data: data is considered discrete
if it can be counted and is continuous if it can be measured. Someone can count students, tickets purchased and books,
while one measures height, distance and temperature.
EXAMPLES
DISCRETE CONTINUOUS
Number of planets around the Sun Number of stars in the space
Number of students in a class Height or weight of the students in a particular class
number of students present height of students in class
number of red marbles in a jar weight of students in class
number of heads when flipping three coins time it takes to get to school
students’ grade level distance traveled between classes
MEASUREMENT SCALES/LEVELS OF MEASUREMENT
Nominal Scale
A nominal scale is the 1st level of measurement scale in which the numbers serve as “tags”
or “labels” to classify or identify the objects. A nominal scale usually deals with the non-
numeric variables or the numbers that do not have any value.
Examples
M- Male
F- Female
The variables are used as tags, and the answer to this question should be either M or F.
Ordinal Scale
The ordinal scale is the 2nd level of measurement that reports the ordering and ranking of data without establishing
the degree of variation between them. Ordinal represents the “order.” Ordinal data is known as qualitative data or
categorical data. It can be grouped, named and also ranked.
Examples:
Ranking of school students – 1st, 2nd, 3rd, etc.
Ratings in restaurants
Evaluating the frequency of occurrences
Very often
Often
Not often
Not at all
Assessing the degree of agreement
Totally agree
Agree
Neutral
Disagree
Totally disagree
Interval Scale
The interval scale is the 3rd level of measurement scale. It is defined as a
quantitative measurement scale in which the difference between the two
variables is meaningful. In other words, the variables are measured in an exact
manner, not as in a relative way in which the presence of zero is arbitrary.
Examples:
temperature (Farenheit), temperature (Celcius), pH, SAT score (200-800), credit score (300-
850).
Ratio Scale
The ratio scale is the 4th level of measurement scale, which is quantitative. It is a type of variable
measurement scale. It allows researchers to compare the differences or intervals. The ratio scale has a
unique feature. It possesses the character of the origin or zero points.
Examples:
What is your weight in Kgs?
Less than 55 kgs
55 – 75 kgs
76 – 85 kgs
86 – 95 kgs
More than 95 kgs
MEASURES OF CENTRAL TENDENCY
FOCUS POINTS
• Compute mean, median, and mode from raw data.
• Interpret what mean, median, and mode tell you.
• Explain how mean, median, and mode can be affected by extreme data
values.
• What is a trimmed mean? How do you compute it?
• Compute a weighted average.
MEASURES OF CENTRAL TENDENCY
The average price of an ounce of gold is $420. The Zippy car averages 39
miles per gallon on the highway. A survey showed the average shoe size for
women is size 8.

In each of the preceding statements, one number is used to describe the


entire sample or population. Such a number is called an average. There are
many ways to compute averages, but we will study only three of the major
ones.

The easiest average to compute is the mode.


MEASURES OF CENTRAL TENDENCY
The mode of a data set is the value that occurs most frequently.

The median is the central value of an ordered distribution.

An average that uses the exact value of each entry is the mean (sometimes
called the arithmetic mean). To compute the mean, we add the values of all the
entries and then divide by the number of entries.
MEASURES OF CENTRAL TENDENCY

MODE
Count the letters in each word of this sentence and give the mode. The
numbers of letters in the words of the sentence are
5 3 7 2 4 4 2 4 8 3 4 3 4

Scanning the data, we see that 4 is the mode because more words have 4
letters than any other number. For larger data sets, it is useful to order—or sort
—the data before scanning them for the mode.

Not every data set has a mode. For example, if Professor Fair gives equal
numbers of A’s, B’s, C’s, D’s, and F’s, then there is no modal grade.
MEASURES OF CENTRAL TENDENCY

MODE
UNIMODAL - having one mode

BIMODAL - having 2 modes

MULTIMODAL - having 3 or more modes


MEASURES OF CENTRAL TENDENCY

MEDIAN
MEASURES OF CENTRAL TENDENCY

MEDIAN
What do barbecue-flavored potato chips cost? According to Consumer
Reports, Volume 66, No. 5, the prices per ounce in cents of the rated chips
are
19 19 27 28 18 35

(a) To find the median, we first order the data, and then note that there are
an even number of entries. So the median is constructed using the two
middle values.
MEASURES OF CENTRAL TENDENCY

MEDIAN
(b) According to Consumer Reports, the brand with the lowest overall
taste rating costs 35 cents per ounce. Eliminate that brand, and find the
median price per ounce for the remaining barbecue-flavored chips. Again
order the data. Note that there are an odd number of entries, so the median
is simply the middle value.
18 19 19 27 28
MEASURES OF CENTRAL TENDENCY

MEDIAN
(c) One ounce of potato chips is considered a small serving. Is it reasonable
to budget about $10.45 to serve the barbecue-flavored chips to 55 people?

Yes, since the median price of the chips is 19 cents per small serving.
This budget for chips assumes that there is plenty of other food!
MEASURES OF CENTRAL TENDENCY

MEDIAN
Belleview College must make a report to the budget committee about the average
credit hour load a full-time student carries. (A 12-credit-hour load is the minimum
requirement for full-time status. For the same tuition, students may take up to 20 credit
hours.) A random sample of 40 students yielded the following information (in credit
hours):

17 12 14 17 13 16 18 20 13 12
12 17 16 15 14 12 12 13 17 14
15 12 15 16 12 18 20 19 12 15
18 14 16 17 15 19 12 13 12 15
MEASURES OF CENTRAL TENDENCY

MEDIAN
MEASURES OF CENTRAL TENDENCY

MEAN
To graduate, Linda needs at least a B in biology. She did not do very well on her
first three tests; however, she did well on the last four. Here are her scores:

58 67 60 84 93 98 100

Compute the mean and determine if Linda’s grade will be a B (80 to 89 average)
or a C (70 to 79 average).
MEASURES OF CENTRAL TENDENCY

MEAN

Since the average is 80, Linda will get the needed B.


MEASURES OF CENTRAL TENDENCY

MEAN
When we compute the mean, we sum the given data. There is a convenient
notation to indicate the sum. Let x represent any value in the data set. Then the
notation
∑x (read “the sum of all given x values”)
means that we are to sum all the data values. In other words, we are to sum all
the entries in the distribution. The summation symbol ∑ means sum the
following and is capital sigma, the S of the Greek alphabet.
The symbol for the mean of a sample distribution of x values is denoted by
(read “x bar”). If your data comprise the entire population, we use the symbol
(lowercase Greek letter mu, pronounced “mew”) to represent the mean.
MEASURES OF CENTRAL TENDENCY

MEAN
MEASURES OF CENTRAL TENDENCY

MEAN
A resistant measure is one that is not influenced by extremely high or low data
values. The mean is not a resistant measure of center because we can make the
mean as large as we want by changing the size of only one data value. The
median, on the other hand, is more resistant. However, a disadvantage of the
median is that it is not sensitive to the specific size of a data value.

A measure of center that is more resistant than the mean but still sensitive to
specific data values is the trimmed mean. A trimmed mean is the mean of the
data values left after “trimming” a specified percentage of the smallest and
largest data values from the data set. Usually a 5% trimmed mean is used. This
implies that we trim the lowest 5% of the data as well as the highest 5% of the
data. A similar procedure is used for a 10% trimmed mean.
MEASURES OF CENTRAL TENDENCY

MEAN
MEASURES OF CENTRAL TENDENCY

MEAN
Barron’s Profiles of American Colleges, 19th Edition, lists
average class size for introductory lecture courses at each of
the profiled institutions. A sample of 20 colleges and
universities in California showed class sizes for introductory
lecture courses to be
14 20 20 20 20
23 25 30 30 30
35 35 35 40 40
42 50 50 80 80
MEASURES OF CENTRAL TENDENCY

MEAN
MEASURES OF CENTRAL TENDENCY

MEAN
MEASURES OF CENTRAL TENDENCY

WEIGHTED AVERAGE
Sometimes we wish to average numbers, but we want to
assign more importance, or weight, to some of the numbers.
For instance, suppose your professor tells you that your grade
will be based on a midterm and a final exam, each of which is
based on 100 possible points. However, the final exam will be
worth 60% of the grade and the midterm only 40%. How
could you determine an average score that would reflect these
different weights? The average you need is the weighted
average.
MEASURES OF CENTRAL TENDENCY

WEIGHTED AVERAGE

Suppose your midterm test score is 83 and your final exam


score is 95. Using weights of 40% for the midterm and 60%
for the final exam, compute the weighted average of your
scores. If the minimum average for an A is 90, will you earn
an A?
MEASURES OF CENTRAL TENDENCY

WEIGHTED AVERAGE
MEASURES OF CENTRAL TENDENCY

EXERCISES
1. How hot does it get in Death Valley? The following data are
taken from a study conducted by the National Park System,
of which Death Valley is a unit. The ground temperatures
(8F) were taken from May to November in the vicinity of
Furnace Creek.

146 152 168 174 180 178 179

180 178 178 168 165 152 144


MEASURES OF CENTRAL TENDENCY

EXERCISES
2. How large is a wolf pack? The following information is
from a random sample of winter wolf packs in regions of
Alaska, Minnesota, Michigan, Wisconsin, Canada, and
Finland (Source: The Wolf, by L. D. Mech, University of
Minnesota Press). Winter pack size:

13 10 7 5 7 7 2 4 3
2 3 15 4 4 2 8 7 8

Compute the mean, median, and mode for the size of winter
wolf packs
MEASURES OF CENTRAL TENDENCY

EXERCISES
3. How expensive is Maui? If you want a vacation rental condominium (up to four
people), visit the Brase/Brase statistics site at
http://math.college.hmco.com/students, find the link to Maui, and then search for
accommodations. The Maui News gave the following costs in dollars per day for a
random sample of condominiums located throughout the island of Maui.

89 50 68 60 375 55 500 71 40 350


60 50 250 45 45 125 235 65 60 130

Compute a 5% trimmed mean for the data


MEASURES OF CENTRAL TENDENCY

ANSWERS
1. Mean 167.3 °F;
Median 171 °F;
Mode 178 °F.

2. Mean 6.2;
Median 6;
Mode 7.

3. Trimmed mean $121.28; yes.


MEASURES OF POSITION
FOCUS POINTS

Illustrate the following measures of position:


a. quartiles;
b. deciles and;
c. percentiles
MEASURES OF POSITION

QUARTILES
Quartiles are the score points that divide a distribution into four
equal parts. The three (3) main quartiles are denoted by 𝑄₁, 𝑄₂
and 𝑄₃. 𝑄₁ is read as 1st quartile or the lower quartile, 𝑄₂ as 2nd
quartile or the median, and 𝑄₃ as 3rd quartile or the upper quartile.
Meanwhile, the difference between the upper quartile ( 𝑄₃) and
lower quartile (𝑄₁) is called interquartile range (I.R.= 𝑄₃ − 𝑄₁).
It means that 25% falling below 𝑄₁ and 25% above 𝑄₃, thus 50% of
the distribution fall within the interquartile range.
DECILES
Deciles are the nine score points that divide a distribution into ten equal parts.
These deciles are denoted as 𝐷₁,𝐷₂,𝐷₃, … ,𝐷₉ , such that 10% of the data fall
below 𝐷₁, 20% of the data fall below 𝐷2 and so on. Also, 𝐷₁ is read as 1st
decile, 𝐷₂ as 2nd decile, and so on
PERCENTILES
Percentiles are the ninety-nine score points that divide a
distribution into one hundred equal parts, so that each part
represents of the data set. These values are denoted by 𝑃₁, 𝑃₂,
…, 𝑃₉₉, such that 1% of the data falls below 𝑃₁ 2% of the data
fall below 𝑃₂ and so on. Also, 𝑃₁ is read as “first percentile,”
as 𝑃₂, “second percentile,” and so on.
EXAMPLE
Given a set of scores of 15 students in their English
Quiz:

15, 2, 8, 3, 11, 8, 8, 6, 9, 5, 7, 5, 12, 11 and 14.

Find the lower quartile (𝑄₁) or the 25th percentile ( 𝑃₂₅),


upper quartile (𝑄₃) or the 75th percentile (𝑃₇₅) , and 5th
decile (𝐷₅) or the 50th percentile (𝑃₅₀) or the 2nd
Quartile (𝑄₂) of the data.
SOLUTION
Scores are sorted in ascending order: \
2, 3, 5, 5, 6, 7, 8, 8, 8, 9, 11, 11, 12, 14, 15

Notice, the least value in the data is 2 and the greatest value in
the data is 15
The middle value in the data is 8, which is The upper quartile (𝑄₃)is the value that is
the 2nd Quartile (𝑄₂ ) or 5th decile (𝐷₅) or the between the middle value and the greatest
50th percentile (𝑃₅₀). value in the data set. So, the upper quartile
The lower quartile is the value that is (𝑄₃) or the 75th percentile (𝑃₇₅) is 11.
between the median and the least value in the
data set. So, lower quartile (𝑄₁ ) or the 25th
percentile (𝑃₂₅) is 5.
EXERCISES
MEASURES OF VARIATION/DISPERSION
FOCUS POINTS

Illustrate the following measures of variation:


range
quartile deviation
interquartile range
mean deviation
variance
standard deviation
MEASURES OF VARIATION/DISPERSION
Measures other than the mean may provide additional information
about the same data. These are the measures of dispersion.
Measures of dispersion or variability refer to the spread of the values
about the mean. These are important quantities used by statisticians in
evaluation. Smaller dispersion of scores arising from the comparison often
indicates more consistency and more reliability.
The most commonly used measures of dispersion are the range, the
average deviation, the standard deviation, and the variance.
MEASURES OF VARIATION/DISPERSION

RANGE
The range is the simplest measure of variability. It is the
difference between the largest value and the smallest value.
R=H–L
where R = Range, H = Highest value, L = Lowest value

Test scores of 10, 8, 9, 7, 5, and 3, will give us a range of 7 from


10 – 3 = 7.
MEASURES OF VARIATION/DISPERSION

RANGE
The following are the daily wages of 8 factory workers of
two garment factories. Factory A and factory B. Find the range of salaries in
peso (Php).
Factory A: 400, 450, 520, 380, 482, 495, 575, 450.
Factory B: 450, 400, 450, 480, 450, 450, 400, 672
Workers of both factories have mean wage = 469
Finding the range of wages: Range = Highest wage – Lowest wage
Range A = 575 – 380 = 195
Range B = 672 – 350 = 322
The range tells us that it is not a stable measure of variability because its value can fluctuate
greatly even with a change in just a single value, either the highest or lowest.
MEASURES OF VARIATION/DISPERSION

AVERAGE/MEAN DEVIATION
Find the average deviation of the following data:

12, 17, 13, 18, 18, 15, 14, 17, 11


MEASURES OF VARIATION/DISPERSION

AVERAGE/MEAN DEVIATION
MEASURES OF VARIATION/DISPERSION

AVERAGE/MEAN DEVIATION
MEASURES OF VARIATION/DISPERSION

AVERAGE/MEAN DEVIATION
MEASURES OF VARIATION/DISPERSION

VARIATION
Variance is not only useful, it can be computed with ease and it can also be broken
into two or more component sums of squares that yield useful information.
The variance (∂² ) of a data is equal to 1/N . The sum of their squares minus the
square of their mean. It is virtually the square of the standard deviation.
MEASURES OF VARIATION/DISPERSION

STANDARD DEVIATION
To compute for the standard deviation of an ungrouped data, we use the formula:
MEASURES OF VARIATION/DISPERSION

QUARTILE DEVIATION
Quartile deviation is a statistic that measures the deviation in the middle
of the data. Quartile deviation is also referred to as the semi interquartile
range and is half of the difference between the third quartile and the first
quartile value.

Q.D. = (Q3 - Q1)/2


CORRELATION
KEY POINTS
➢ Correlation coefficients are used to measure the strength of the linear relationship
between two variables.

➢ A correlation coefficient greater than zero indicates a positive relationship while a value
less than zero signifies a negative relationship.

➢ A value of zero indicates no relationship between the two variables being compared.

➢ A negative correlation, or inverse correlation, is a key concept in the creation of


diversified portfolios that can better withstand portfolio volatility.

➢ Calculating the correlation coefficient is time-consuming, so data are often plugged into
a calculator, computer, or statistics program to find the coefficient.
CORRELATION
The images below illustrate what the relationships might look like at different degrees of
strength (for different values of r).
CORRELATION
For a correlation coefficient of zero, the points have no direction, the shape is almost round,
and a line does not fit to the points on the graph.

As the correlation coefficient increases, the observations group closer together in a linear
shape.

The line is difficult to detect when the relationship is weak (e.g., r = -0.3), but becomes more
clear as relationships become stronger (e.g., r = -0.99)
QUIZ

You might also like