Data Management
Data Management
• Organization of Data
When conducting statistical research, investigation or study, the research must
gather data for the particular variable under investigation. To describe situations, make
conclusions, and draw inferences about events, the researcher must organize the data
gathered in some meaningful way. The easiest way and widely used organizing data is
to construct a frequency distribution. A frequency distribution is a grouping of the data
into categories showing the number of observations in each of the non-overlapping
classes.
After organizing data, the next move of the researcher is to present the data so
they can be understood easily by those who will benefit from reading the study. The
most useful method of presenting data is by constructing graphs and charts. There are
number of ways to plot graphs and charts, and each one has a specific purpose.
• This section discussed how to organize data by constructing frequency
distribution and how to present data by constructing graphs and charts. Before
we get started in constructing frequency distribution, we must define some
terms that are essential to understand deeper the nature of data that are
displayed in frequency distribution.
Raw Data is the data collected in original form.
Range is the difference of the highest value and the lowest value in a distribution.
Frequency Distribution is the organization of data in a tabular form, using mutually exclusive classes
showing the number of observations in each.
Class Limits (or Apparent Limits) is the highest and lowest values describing a class.
Class Boundaries (Real Limits) is the upper and lower values of a class for group frequency
distribution whose values has additional decimal place more than the class limits and end with the
digit 5.
Interval (or width) is the distance between the class lower boundary and the class upper boundary
and it is denoted by the symbol i.
Frequency (f) is the number of values in a specific class of a frequency distribution.
Percentage is obtained by multiplying the relative frequency by 100 %.
Cumulative Frequency (cf) is the sum of the frequencies accumulated up to the upper
boundary of a class in a frequency distribution.
Midpoint is the point halfway between the class limits of each class and is
representative of data within that class.
Example. 1. Twenty applicants were given a performance evaluation appraisal. The data set is
High High High Low Average
Average Low Average Average Average
Low Average Average High High
Low Low Average High High
High
Average
Low
High IIII - II
Average IIII - III
Low IIII
Step 3: Convert the tallied data into numerical frequencies.
Class Tally Frequency Percentage
High IIII - II 7
Average IIII - III 8
Low IIII 5
Step 4: Determine the percentage. The percentage is computed using the formula:
Percentage = f /n X 100 % The cumulative frequency can be found by adding the frequency in each class to the
total frequencies of the classes preceding that class, as you can see 7 + 8 = 15
Where f = frequency of the class and n = total number of values
Class Tally Frequency Cumulative Relative Percentage Found by The relative frequency can be found by
Frequency Frequency dividing each frequency by the total
frequency, as you can see 7 / 20 = 0.35
High IIII - II 7 7 0.35 35 (7 / 20 ) X 100
Total 20 100
A. Mean
The arithmetic mean, often called as the mean, is the most frequently used measure of central
tendency. The mean is the only common measure in which all values play an equal role, meaning, to
determine its values you would need to consider all the values of any given data set. The mean is
appropriate to determine the central tendency of an interval or ratio data.
The symbol x̄, called “x bar”, is used to represent the mean of a sample and the symbol μ, called
“mu”, is used to denote the mean of a population.
Properties of Mean
1. A set of data has only one mean.
2. Mean can be applied for interval and ratio data.
3. All values in the data set are included in computing the mean.
4. The mean is very useful in comparing two or more data sets.
5. Mean is affected by the extreme small or large values on a data set.
6. Mean is most appropriate in symmetrical data.
∑x ∑x
Sample Mean: x̄ = n Population Mean: μ = N
μ = 52.11
The mean population age of middle-management employees is 52.11
B. Median
The median is the midpoint of the data array. When the data set is ordered, whether ascending or descending, it is called a data
array. Median is an appropriate measure of central tendency for data that are ordinal or above, but is more valuable in an ordinal
type of data.
Properties of Median
1. The median is unique, there is only one median for a set of data.
2. The median is found by arranging the set of data from lowest or highest (or highest to lowest) and getting the value of the
middle of the observation.
3. Median is not affected by the extreme small or large values.
4. Median can be applied for ordinal, interval and ratio data.
5. Median is most appropriate in a skewed data.
To determine the value of median for ungrouped, we need to consider two rules:
6. If n is odd, the median is the middle ranked.
7. if n is even, then the median is the average of the two middle ranked values.
n+1
Median (Ranked Value) =
2
n+1 9+1 10
Median ( Rank Value ) = = = = 5
2 2 2
5th
Hence the median age is 53 years
Example no. 2: The daily rates of a sample of eight employees at GMS Inc. are P550, P420, P560, P500, P700, P670, P860, P480. find the
median daily rate of employee.
Solution:
Step 1: Arrange the data in order
P420, P480, P500, P550, P560, P670, P700, P860
Step 2: Select the middle rank value
n+1 8+1 9
Median (Rank Value) = 2 = 2 = 2 = 4.5
4.5th
Since the middle point falls between P550 and P560, we can determine the median of the data set by getting the average of the two values.
550 + 560 1, 110
Median = 2 = = 555 therefore the median daily rate is P555
• Mode
The mode is the value in a data set that appears most frequently. Like the median and unlike the mean,
extreme values in a data set do not affect the mode. A data may not contain any mode if none of the values are
“most typical”. A data set that has only one value that occurs the greatest frequency is said to be unimodal. If the
data has two values with the same greatest frequency, both values are considered the mode and the data set is
bimodal. If a data set has more than two modes, then the data set is said to be multimodal. There are some cases
when a data set values have the same number frequency. When this occurs, the data set is said to be no mode.
Properties of mode
1. The mode is found by locating the most frequently occurring value.
2. The mode is the easiest average to compute.
3. There can be more than one mode or even no mode in any given data set.
4. Mode is not affected by the extreme small or large values.
5. Mode can be applied for nominal, ordinal, interval and ratio data.
Example 1: The following data represent the total unit sales for smartphones from a sample of 10 Communication Centers for the
month of August: 15, 17, 10, 12, 13, 10, 14, 10, 8, and 9. Find the mode.
Solution:
The ordered array for these data is 8, 9, 10, 10,10, 12, 13, 14, 15, 17.
Because 10 appear 3 times, more times than the other values, therefore the mode is 10.
Example 2: An operations manager in charge of a company’s manufacturing keeps track of the number of manufactured LED
television in a day. Compute the following data that represents the number of LED television manufactured for the past three
weeks: 20, 18, 19, 25, 20, 21, 20, 25, 30, 29, 28, 29, 25, 25, 27, 26, 22, and 20. Find the mode of the given data set.
Solution:
The ordered array for these data is 18, 19, 20, 20, 20, 20, 21, 22, 25, 25, 25, 25, 26, 27, 28, 29, 29, 30.
There are two modes 20 and 25, since each of these values occurs four times.
Example 3: Find the mode of the ages of 9 middle-management employees of a certain company. The ages are 53, 45, 59, 48, 54,
46, 51, 58, and 55.
Solution:
The ordered array for these data is 45, 46, 48, 51, 53, 54, 55, 58, 59.
There is no mode since the data set has the same frequency.
D. Weighted Mean
The weighted mean is particularly useful when various classes or groups contribute differently to the total. The
weighted mean is found by multiplying each value by its corresponding weight and dividing by the sum of the weights.
̄
W1 = corresponding weight
X1 = the value of any particular observations or measurement.
Example 1: At the Mathematics Department of San Sebastian College there are 18 instructors, 12 assistant professors, 7
associate professors, and 3 professors. There monthly salaries are P30, 500, P33, 700, P 38,600, and P45, 000. What is
the weighted mean salary?
Solution:
Let W1 = 18 W2 = 12 W3 = 7 W4 = 3
X1 = 30,500 X2 = 33, 700 X3 = 38, 600 X4 = 45,000
X1W1 + X2W2 + X3W3+X4W4
w=
W1 + W2 + W3 + W4
Step 2 : subtract the mean from each of the value in the data set. Step 3: Square the x – x, then get the sum
2
x x–x x x–x (x – x)
550 - 42.5 550 - 42.5 1, 806.25 2
(-42.5) = 1,806.25
420 - 172.5 420 - 172.5 29,756.25
∑ (x – x) = 0 2
∑x = 4,740 ∑x = 4,740 ∑ (x – x) = 0 ∑ (x – x) = 142,950
Step 4: Solve for variance and the standard deviation. We can also obtain the standard deviation by simply extracting the square root of
the variance.
∑ (x – x) 2 142,950
s² = = = 20,421.43
n–1 8–1
∑ (x – x) 2 142,950
s= = = 20,421.43 = 142.90
n–1 8–1
k( N + 1)
Qk =
4
1(N+ 1) 1 ( 9+ 1) 10
Q1 = = 4
= = 2.5
4 4
2(N+ 1) 2 ( 9 + 1) 2 (10)
Q2 = = = 4
=5
4 4
Step 3: Identify the first, second, and the third quartiles values in the data set.
45, 46, 48, 51, 53, 54, 55, 58, 59
46 + 48 94 55 + 58 113
Q1 = = = 47 Q3 = = = 56.5
2 2 2 2
k( N + 1)
Qk = where: N = 14 k = 3 which is 3rd quartile
4
3 ( 14 + 1) 3 (15) 45
Q3 = = = = 11.25 rank value
4 4 4
Step 2: Arrange the data in order
2, 3, 3, 4, 5, 5, 5, 6, 6, 7, 7, 8, 9, 9
11th
Total ----
2. A college professor administered a unit exam to one of his classes and found that the majority of the items were
too easy. The scores are 45, 39, 40, 48, 35, 37, 36, 37, 40, 44, 41, 49, 29, 32, 36, 37, 41, 40, 36, 39, 30, 25, 43, and 50.
Calculate the mean.
3. A time-study analyst observed a packaging operation and collected the following times (in seconds) required for the
operation to fill packages of a fixed volume box: 11, 12, 15, 18, 13, 18, 16, 14, 12, and 17. Find the range, variance and
standard deviation.
4.In a certain shopping mall, restaurants charge P140, P195, P125, P150, P200, P165, P175, P190, P230, and P180 for
a regular dinner. Find the first quartile, second quartile, and third quartile.
Take Note: Answer should be on Hand Written use long bond paper
Answer’s should be send on my gmail account @ [email protected] including your Name, Course and
Section.
• Make sure that you send clear picture of your answer if you send blurred picture I will not accept that as your Quiz