LESSON 4 Data MGT
LESSON 4 Data MGT
DATA MANAGEMENT
Overview:
The lesson begins with students engaging in a review of various measures
of central tendency. Following the review, students are given cases where these
measures are calculated. Students are also asked to examine both strengths and
limitations of these measures. Sometime will be devoted to having students
discuss question with a partner before reporting to the class. Assessments will be
given to students on their ability to calculate these measures, and also to get an
overall sense of whether they recognize how these measures respond to
changes in data values.
Peace Concept:
Develop critical thinking and positive attitude towards data management.
Learning Outcomes:
At the end of this lesson, the students can:
1. Compute the measures of central tendency and variability of various data set.
2. Perform data analysis on quantitative data using descriptive and inferential
statistics.
3. Show how Descriptive Statistics is used to describe data and how Inferential
Statistics is used to reach conclusions from the analysis of the obtained data.
4. Determine the most appropriate statistical tool and procedure in treating
statistical data.
5. Identify the characteristics of normal curve and compute probabilities
involving the normal distribution.
6. Differentiate correlation and regression analysis and which one is most
fitting to answer various questions.
Materials Needed:
PAGE \* MERGEFORMAT 13
Duration: 3 hours
LEARNING CONTENT:
A. MEAN:
When the data is ungrouped, we add the values of the data and dividing
by the total number of values.
X = ∑x = X1 + X2 + X3 + . . . + Xn
n
Where: ∑ is the sum of (in other words, add them up),
= 75 + 85 + 83 + 76 + 89 + 80
6
= 488 / 6
PAGE \* MERGEFORMAT 13
= 81.33
For a population, the Greek letter µ (mu) is used for the mean.
The formula is µ = ∑x = X1 + X 2 + X 3 + . . . + X N
N
2. Weighted Mean:
Solution:
Th
Therefore, the weighted average is 2.7.
Grouped data is data that has been bundled together in categories or presented
in a frequency distribution. There are two methods in computing:
Steps
1. Set up the class midpoint (mp) column.
2. Multiply frequency of each class (f) and
Corresponding class midpoint (mp).
3. Add the results in step 2.
4. Add frequency column to get ∑ f.pm
5. Substitute data in the formula. Solve.
Solution
Steps:
Example 4: Find the mean age using the unit – deviation method.
Solution:
Step 4.
i = \*
PAGE 25MERGEFORMAT
– 20 = 5 13
[60, 61, 62, 63, 64,] i=5
Step 5.
B. MEDIAN
The median (MD) is the number that falls in the middle position once the
data has been organized or arranged from lowest to highest. Not affected by the
outliers. Outliers or extreme values that are substantially large or smaller than
the other values. These are bad data which may have been encoded incorrectly
and therefore, need to be verified and investigated.
Find the median of the scores: 12, 4, 8, 17, 13, 15, 6, 14, 10.
Solution: 4, 6, 8, 10, 12, 13, 14, 15, 17, n is odd
Median
Interpretation: Out of 9 scores, 4 scores are below 12 and the other 4 scores
are above to 12.
The number of cloudy days for the top 10 cloudiest cities is shown.
Find the median.
209, 223, 211, 227, 213, 240, 240, 211, 229, 212
n = 10 which is even
Solution:
Arrange the data in order.
209, 211, 211, 212, 213, 223, 227, 229, 240, 240
Md = (213 + 223)/2 = 218
In computing for the median of a grouped data, the following formula is used:
fmd is the absolute frequency of the median class the median value
cf<md is the less than cumulative frequency entry before the median class
Solution
PAGE \* MERGEFORMAT 13
Step 4.
C. MODE
The value that occurs most often in the data set (if there is such a value).
A data set that has only one value that occurs with the greatest frequency is said
to be unimodal. If a data set has two values that occur with the same greatest
frequency, both values are considered as the mode and the data set is said to
bimodal. If the data set has more than two values that occur with the same
greatest frequency, each value is used as the mode, and the data set is said to
be multimodal. When no data value occurs more than once, the data set is said
to have no mode.
Step 1. Select the measure that appears most often in the set;
Step 3. If every measure appears the same number of times, then the
set of data has no mode.
PAGE \* MERGEFORMAT 13
Example 8: Find the mode of the ages of some college students:
18, 20, 19, 21,25, 18, 19, 17, 20,21, 20, 18, 20, 18,
Solution: The number 18 and 20 occur more often than the others.
Thus 18 and 20 are the mode (bimodal).
Example 10: Using the data from Example 7, the mode of the ages of the 90
employees in a government agency is the midpoint of the class interval 45 – 49,
which is 47 years.
Class interval
with highest
frequency
Solution
Who’s representing?
Example 12: Sonya’s Kitchen received an invitation for one person from a food
exposition. The service crew seven member is very eager to go. To be fair to all,
Sonya decided to choose a person whose age falls within the mean age of her
seven members. She made a list such as below:
Answer:
a. 18, 18, 18, 19, 20, 21, 47 b. 19 c. Yes
d. 3 younger this age and 3 older than this age.
e. The cashier
Learning Activity:
There are 34 families living in your neighborhood. The household family monthly
incomes are shown below. Find the mean, median and mode of the data set.
PAGE \* MERGEFORMAT 13
Prelim Exam Scores of 50 College Students in GEC104
1. Find the mean, median, and mode of the prelim exam scores of 50 college
students in GEC104 (see data on item 1) and interpret the results.
A. RANGE
The range measures the distance between the largest and the smallest
values and, as such, gives an idea of the spread of the data. However, the range
does not use the concept of deviation. It is affected by the outliers but does not
consider all values in the data set. Thus, it is not very useful measure of
variability.
The following are the daily wages of 8 factory workers of two garment
factories. Factory A and Factory B. Find the range of salaries in peso (Php).
Factory A: 400, 450, 520, 380, 482, 495, 575, 450.
Factory B: 450, 400, 450, 480, 450, 450, 400, 672.
Computing the mean wages of workers, both factories have mean wage of P469.
Finding the range of wages: Range (R) = Highest wage - Lowest wage
PAGE \* MERGEFORMAT 13
Range A : RA = 575 – 380 = 195
Range B : RB = 672 – 400 = 272
Comparing the two wages, you will note that wages of workers of factory B
have a higher range than wages of workers of factory A. These ranges tell us
that the wages of workers of factory B are more scattered than the wages of
workers of factory A.
Look closely at wages of workers of factory B. You will see that except for
672 the highest wage, the wages of the workers are more consistent than the
wages in A. Without the highest wage of 672, the range would be 80 from 480 –
400 = 80. Whereas, if you exclude the highest wage 575 in A, the range would be
140 from 520 – 380 = 140.
Can you now say that the wages of workers of B are more scattered or
variable than the wages of workers of factory A?
The range tells us that it is not a stable measure of variability because its
value can fluctuate greatly even with a change in just a single value, either the
highest or lowest.
The MAD utilizes deviations of the data values from the mean in its
computation. It is the average of the absolute deviation values from the mean,
computed using the formula:
M.A.D. = Σ | x - x |
n
Where: M.A.D. = Mean Average Deviation
Σ = symbol for summation
n = total number of scores
x = individual score
x = mean of all scores
Example: 1. Find the mean average deviation of the age of the 11 students in
Math 31.
18,18, 19, 20, 21, 23, 24, 24, 25, 26, 28,
Step: 1. Find the mean age using;
X = Σx = (18 + 18 + 19 + … + 28) ÷ 11 = 22.4years
N
PAGE \* MERGEFORMAT 13
Step: 2. Subtract algebraically each value from the mean.
x (x- x) |x-x|
18 (18 - 22.4)= -4.4 4.4
18 (18 - 22.4)= -4.4 4.4
19 (19 - 22.4)= -3.4 3.4
20 (20 - 22.4)= -2.4 2.4
21 (21 - 22.4)= -1.4 1.4
23 (23 - 22.4)= 0.6 0.6
24 (24 - 22.4)= 1.6 1.6
24 (24 - 22.4)= 1.6 1.6
25 (25 - 22.4)= 2.6 2.6
26 (26 - 22.4)= 3.6 3.6
28 (28 - 22.4)= 5.6 5.6
Σ | x- x | = 31.6
Step:3. Take the absolute values of all deviations from the mean to make all the
results of step 2 positive values.
Step: 4. Add | x- x | values.
Step: 5. Substitute values in the formula and solve.
M.A.D. = Σ | x- x |
11
M.A.D. = 31.6 ÷ 11 = 2.63
Interpretation : On the average ages of the 1 students vary by 2.63 years
The variance and standard deviation are the most common and useful
measures of variability. These two measures provide information about how the
data vary about the mean. The variance is a measure of variation which
considers the position of each observation relative to the mean, while the
standard deviation is the square root of the variance.
∑(X - x )2 ∑(X - x )2
s2 = s=
n-1 n-1
Note: When the sum of squares of deviation is divided by n, the measure is said
to be biased, thus for the sample variance, the sum of squares of deviation
is divided by (n - 1).
PAGE \* MERGEFORMAT 13
Example: Compute the value of the variance and the standard deviation from the
following measurement and interpret the results: 5, 7, 9, 10, 12, 13, 15, 17
Solution:
Step 1 Step 2 step 3
x x- x (x- x )2
5 5 -11 = -6 (-6)(-6) = 36
7 7 -11 = -4 16
9 9 -11 = -2 4
10 10 -11 = -1 1
12 12 -11 = 1 1
13 13 -11 = 2 4
15 15 -11 = 4 16
17 17 -11 = 6 36
2
Σx = 88 Σ(x - x ) = 114
x = Σx = 88 = 11 ∑(X - x )2
2
n 8 s = n-1
PAGE \* MERGEFORMAT 13
1 2
δ 2π
δ, the mean and standard deviation, respectively. The equation for the normal
distribution is
Where f(x) is the height of a curve for a given value x, δ is the population
standard deviation, e = 2.71828… and π = 3.14159…The normal curve most of
its area contained within the range - 3δ to +3δ. Figure 4.2a shows the area
under the normal curve. The total area under the curve is 1 and the area under
the curve between any two points can be interpreted as the probability of
occurrence of the value included between those given points.
When a set of scores is normally distributed, 34.13% of the area under the
normal curve is contained between the mean μ and a score that is equal to μ+1δ
(that is a score which is one standard deviation away from the mean). The same
percentages of the areas are found in the other side of the curve, below the
mean, since the curve is symmetrical.
PAGE \* MERGEFORMAT 13
The Standard Normal Distribution
To find the probability that X takes on values that are greater than or equal to
some particular value x, we first transform the x-value into the corresponding Z-
score and the following equality holds:
X-μ
P (X ≥ x) = P Z ≥ = P (Z ≥ z)
δ
Example 1. For each of the following, use the table of Areas Under the Normal
Curve. Given a standard normal distribution, find the area under the curve that
lies a) between z = 0 and z = 1.34
PAGE \* MERGEFORMAT 13
0 1.34
Thus, P(0 < z < 1.34) = 0.4099 or 41% probability.
PAGE \* MERGEFORMAT 13
= 0.4099 + 0.3051
= 0.7150
X: 85 97
Z = (x - μ) / δ = (97 - 85)/15 = 0.8 Z: 0 0.8
This corresponds to the area from z = 0 to z = 0.8 which is 0.2881, thus the
commulative area is 0.7881 ( 0.5 + 0.2881) which is equivalent to 78.81%.
Therefore, the percentile rank is 78.81%
1. Given a standard normal distribution, find the area under the curve that lies
a. between z = -0.46 to z = 2.21
b. to the left of z = -0.6
c. to the right of z = 1.96
PAGE \* MERGEFORMAT 13
PAGE \* MERGEFORMAT 13
PAGE \* MERGEFORMAT 13
PAGE \* MERGEFORMAT 13
PAGE \* MERGEFORMAT 13
PAGE \* MERGEFORMAT 13
PAGE \* MERGEFORMAT 13
PAGE \* MERGEFORMAT 13
PAGE \* MERGEFORMAT 13
PAGE \* MERGEFORMAT 13
PAGE \* MERGEFORMAT 13
PAGE \* MERGEFORMAT 13
PAGE \* MERGEFORMAT 13
PAGE \* MERGEFORMAT 13
PAGE \* MERGEFORMAT 13
PAGE \* MERGEFORMAT 13
PAGE \* MERGEFORMAT 13