0% found this document useful (0 votes)
199 views25 pages

Data Management

This document discusses data management and organization. It explains that researchers must organize collected data in a meaningful way, such as by constructing a frequency distribution. A frequency distribution groups data into categories and shows the number of observations in each category. The document also notes that after organizing data, researchers should present it in graphs and charts for easy understanding. Finally, it provides definitions and examples related to constructing frequency distributions, including raw data, range, class limits, frequency, percentage, and cumulative frequency.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
199 views25 pages

Data Management

This document discusses data management and organization. It explains that researchers must organize collected data in a meaningful way, such as by constructing a frequency distribution. A frequency distribution groups data into categories and shows the number of observations in each category. The document also notes that after organizing data, researchers should present it in graphs and charts for easy understanding. Finally, it provides definitions and examples related to constructing frequency distributions, including raw data, range, class limits, frequency, percentage, and cumulative frequency.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Data Management

• Organization of Data
When conducting statistical research, investigation or study, the research must
gather data for the particular variable under investigation. To describe situations, make
conclusions, and draw inferences about events, the researcher must organize the data
gathered in some meaningful way. The easiest way and widely used organizing data is
to construct a frequency distribution. A frequency distribution is a grouping of the data
into categories showing the number of observations in each of the non-overlapping
classes.
After organizing data, the next move of the researcher is to present the data so
they can be understood easily by those who will benefit from reading the study. The
most useful method of presenting data is by constructing graphs and charts. There are
number of ways to plot graphs and charts, and each one has a specific purpose.
• This section discussed how to organize data by constructing frequency
distribution and how to present data by constructing graphs and charts. Before
we get started in constructing frequency distribution, we must define some
terms that are essential to understand deeper the nature of data that are
displayed in frequency distribution.
Raw Data is the data collected in original form.
Range is the difference of the highest value and the lowest value in a distribution.
Frequency Distribution is the organization of data in a tabular form, using mutually exclusive classes
showing the number of observations in each.
Class Limits (or Apparent Limits) is the highest and lowest values describing a class.
Class Boundaries (Real Limits) is the upper and lower values of a class for group frequency
distribution whose values has additional decimal place more than the class limits and end with the
digit 5.
Interval (or width) is the distance between the class lower boundary and the class upper boundary
and it is denoted by the symbol i.
Frequency (f) is the number of values in a specific class of a frequency distribution.
Percentage is obtained by multiplying the relative frequency by 100 %.
Cumulative Frequency (cf) is the sum of the frequencies accumulated up to the upper
boundary of a class in a frequency distribution.
Midpoint is the point halfway between the class limits of each class and is
representative of data within that class.

Example. 1. Twenty applicants were given a performance evaluation appraisal. The data set is
High High High Low Average
Average Low Average Average Average
Low Average Average High High
Low Low Average High High

Construct a frequency distribution for the data


Solution:
Step 1: Construct a Table
Class Tally Frequency Percentage

High
Average
Low

Step 2: Tally the raw data


Class Tally Frequency Percentage

High IIII - II
Average IIII - III
Low IIII
Step 3: Convert the tallied data into numerical frequencies.
Class Tally Frequency Percentage

High IIII - II 7
Average IIII - III 8
Low IIII 5
Step 4: Determine the percentage. The percentage is computed using the formula:
Percentage = f /n X 100 % The cumulative frequency can be found by adding the frequency in each class to the
total frequencies of the classes preceding that class, as you can see 7 + 8 = 15
Where f = frequency of the class and n = total number of values
Class Tally Frequency Cumulative Relative Percentage Found by The relative frequency can be found by
Frequency Frequency dividing each frequency by the total
frequency, as you can see 7 / 20 = 0.35
High IIII - II 7 7 0.35 35 (7 / 20 ) X 100

Average IIII - III 8 15 0.4 40 ( 8 / 20 ) X 100

Low IIII 5 20 0.25 20 ( 5 / 20 ) X 100

Total 20 100

For the sample, more applicants received an average performance rating .


Measure of Central Tendency
Any data set can be characterized by measuring its central tendency. A measure of central tendency,
commonly referred to as an average, is a single value that represents a data set . Its purpose is to
locate the center a data set. This chapter discusses three different measures of central tendency: the
mean, median, and the mode. We will illustrate how to calculate each of these measures for
ungrouped and grouped data. Measure of central tendency both for sample grouped and population
grouped is also included in the discussion.

A. Mean
The arithmetic mean, often called as the mean, is the most frequently used measure of central
tendency. The mean is the only common measure in which all values play an equal role, meaning, to
determine its values you would need to consider all the values of any given data set. The mean is
appropriate to determine the central tendency of an interval or ratio data.
The symbol x̄, called “x bar”, is used to represent the mean of a sample and the symbol μ, called
“mu”, is used to denote the mean of a population.
Properties of Mean
1. A set of data has only one mean.
2. Mean can be applied for interval and ratio data.
3. All values in the data set are included in computing the mean.
4. The mean is very useful in comparing two or more data sets.
5. Mean is affected by the extreme small or large values on a data set.
6. Mean is most appropriate in symmetrical data.

Sum of all values


Mean = Number of Values

∑x ∑x
Sample Mean: x̄ = n Population Mean: μ = N

Where: x̄ = sample mean (it is read “x bar”)


μ = population mean (it is read “mu”)
x == the
∑x sumvalue of any particular observation or measurement
of all x’s

n = total number of values in the sample


N = total number of values in the population
Example 1: The daily salaries of a sample of eight employees at GMS Inc. are Ρ550, Ρ420, Ρ560, Ρ500, Ρ700, Ρ670,
Ρ860, Ρ480. Find the mean daily rate of employees.
Solution:
∑x X1, X2, X3, X4, X5, X6, X7, X8
x̄ = =
n n
550 + 420+ 560+ 500+ 700+ 670+ 860+480 4,740
x̄ = =
8 8
x̄ = 592.50
The sample mean daily salary of employees is Ρ592.50
Example 2: Find the population mean of the ages of 9 middle-management employees of a certain company. The
ages are 53, 45,59,48,54,46,51,58, and 55.

∑x X1, X2, X3, X4, X5, X6, X7, X8 53+45+59+48+54+46+51+58+55 469


μ= = = =
N N 9 9

μ = 52.11
The mean population age of middle-management employees is 52.11
B. Median
The median is the midpoint of the data array. When the data set is ordered, whether ascending or descending, it is called a data
array. Median is an appropriate measure of central tendency for data that are ordinal or above, but is more valuable in an ordinal
type of data.
Properties of Median
1. The median is unique, there is only one median for a set of data.
2. The median is found by arranging the set of data from lowest or highest (or highest to lowest) and getting the value of the
middle of the observation.
3. Median is not affected by the extreme small or large values.
4. Median can be applied for ordinal, interval and ratio data.
5. Median is most appropriate in a skewed data.
To determine the value of median for ungrouped, we need to consider two rules:
6. If n is odd, the median is the middle ranked.
7. if n is even, then the median is the average of the two middle ranked values.

n+1
Median (Ranked Value) =
2

Note that n is the population/sample size


Example 1: Find the median of the ages of 9 middle-management employees of a certain company.
The ages are 53, 45, 59, 48, 54, 46, 51, 58, and 55.
Solution:
Step 1: Arrange the data in order.
45, 46, 48, 51, 53, 54, 55, 58, 59
Step 2: Select the middle rank value.

n+1 9+1 10
Median ( Rank Value ) = = = = 5
2 2 2

Step 3: Identify the median in the data set


45, 46, 48, 51, 53, 54, 55, 58, 59

5th
Hence the median age is 53 years
Example no. 2: The daily rates of a sample of eight employees at GMS Inc. are P550, P420, P560, P500, P700, P670, P860, P480. find the
median daily rate of employee.
Solution:
Step 1: Arrange the data in order
P420, P480, P500, P550, P560, P670, P700, P860
Step 2: Select the middle rank value
n+1 8+1 9
Median (Rank Value) = 2 = 2 = 2 = 4.5

Step 3: Identify the median in the data set


P420, P480, P500, P550, P560, P670, P700, P860

4.5th

Since the middle point falls between P550 and P560, we can determine the median of the data set by getting the average of the two values.
550 + 560 1, 110
Median = 2 = = 555 therefore the median daily rate is P555
• Mode
The mode is the value in a data set that appears most frequently. Like the median and unlike the mean,
extreme values in a data set do not affect the mode. A data may not contain any mode if none of the values are
“most typical”. A data set that has only one value that occurs the greatest frequency is said to be unimodal. If the
data has two values with the same greatest frequency, both values are considered the mode and the data set is
bimodal. If a data set has more than two modes, then the data set is said to be multimodal. There are some cases
when a data set values have the same number frequency. When this occurs, the data set is said to be no mode.
Properties of mode
1. The mode is found by locating the most frequently occurring value.
2. The mode is the easiest average to compute.
3. There can be more than one mode or even no mode in any given data set.
4. Mode is not affected by the extreme small or large values.
5. Mode can be applied for nominal, ordinal, interval and ratio data.
Example 1: The following data represent the total unit sales for smartphones from a sample of 10 Communication Centers for the
month of August: 15, 17, 10, 12, 13, 10, 14, 10, 8, and 9. Find the mode.
Solution:
The ordered array for these data is 8, 9, 10, 10,10, 12, 13, 14, 15, 17.
Because 10 appear 3 times, more times than the other values, therefore the mode is 10.

Example 2: An operations manager in charge of a company’s manufacturing keeps track of the number of manufactured LED
television in a day. Compute the following data that represents the number of LED television manufactured for the past three
weeks: 20, 18, 19, 25, 20, 21, 20, 25, 30, 29, 28, 29, 25, 25, 27, 26, 22, and 20. Find the mode of the given data set.
Solution:
The ordered array for these data is 18, 19, 20, 20, 20, 20, 21, 22, 25, 25, 25, 25, 26, 27, 28, 29, 29, 30.
There are two modes 20 and 25, since each of these values occurs four times.

Example 3: Find the mode of the ages of 9 middle-management employees of a certain company. The ages are 53, 45, 59, 48, 54,
46, 51, 58, and 55.
Solution:
The ordered array for these data is 45, 46, 48, 51, 53, 54, 55, 58, 59.
There is no mode since the data set has the same frequency.
D. Weighted Mean
The weighted mean is particularly useful when various classes or groups contribute differently to the total. The
weighted mean is found by multiplying each value by its corresponding weight and dividing by the sum of the weights.

X1W1 + X2W2 + X3W3…+ XnWn


x w=
W1 + W2 + W3 + …+ Wn
̄
x
Where: w = weighted mean

̄
W1 = corresponding weight
X1 = the value of any particular observations or measurement.

Example 1: At the Mathematics Department of San Sebastian College there are 18 instructors, 12 assistant professors, 7
associate professors, and 3 professors. There monthly salaries are P30, 500, P33, 700, P 38,600, and P45, 000. What is
the weighted mean salary?
Solution:
Let W1 = 18 W2 = 12 W3 = 7 W4 = 3
X1 = 30,500 X2 = 33, 700 X3 = 38, 600 X4 = 45,000
X1W1 + X2W2 + X3W3+X4W4
w=
W1 + W2 + W3 + W4

30,500(18) + 33,700(12) + 38,600(7) + 45,000(3) 1,358,600


w= = = 33, 965
18 +12 + 7 +3 40
The weighted mean salary is P33,965
Example 2: Riana’s first quarter grade is shown in the table below. Use the weighted mean formula to find Riana’s GPA for the first
quarter.
Subjects English Mathematics Filipino Science P.E Religion
Grades 90 87 88 93 95 96
Units 3 3 3 3 2 1
Solution:
Let W1 = 3 W2 = 3 W3 = 3 W4 = 3 W5 = 2 W6 = 1
X1 = 90 X2 = 87 X3 = 88 X4 = 93 X5 = 95 X6 = 96
X1W1 + X2W2 + X3W3+X4W4+ X5W5 + X6W6
w=
W1 + W2 + W3 + W4 + W5 + W6

90(3) + 87(3) + 88(3)+93(3)+ 95(2) + 96(1) 1360


w= = = 90.66 the weighted GPA for the first quarter
3 + 3 + 3 + 3+2+1 15
Measure of Dispersion
Another important characteristic of a data set is how it is distributed, or how far each element is from some measure of central tendency
(average). There are several ways to measure the variability of the data. Although the most common and most important is the standard
deviation, which provides an average distance for each element from the mean, several others are also important, and are hence discussed
here. Standard deviation is a statistical term that provides a good indication of volatility. It measures how widely values are dispersed from the
average. Dispersion is the difference between the actual value and the average value.
A. Range
Probably the simplest and easiest way to determine measure of dispersion is the range. The range is the difference of the highest value
and the lowest value in the data set. There are two advantages of the range: (i) it is easy to compute and (ii) it is easy to understand. On the
other hand, it also has two advantages, it can be distorted by a single extreme value (or outlier) and only two values are used in the calculation.
Example 1. The daily rates of a sample of eight employees at GMS Inc. are P550, P420, P560, P500, P700, P670, P860, P480. Find the range.
Solution:
Step 1: Determine the highest value and lowest value in the data set.
Highest Value (HV) = P860 Lowest Value (LV) = P420
Step 2: Solve the range.
Range = Highest Value (HV) - Lowest Value (LV)
P860 – P420 = P440
The range in daily rate salary is P440
B. Variance and Standard Deviation
One of the most widely used measures of dispersion is the standard deviation. The more spread apart
the data, the higher the deviation. Standard deviation is calculated as the square root of variance. In
finance, standard deviation is applied to the annual rate of return of an investment to measure the
investment’s volatility. Standard deviation is also known as historical volatility and used by investors as
gauge for the amount of expected volatility.
A measure of the dispersion of a set of data points around their mean value. Variance is a
mathematical expectation of the average squared deviations from the mean. Volatility is a measure of
risk, so this statistics can help determine the risk an investor might take on when purchasing a specific
security.
Sample Variance, and Sample Standard Deviation for Ungrouped Data
2
∑ (x – x) ∑ (x – x)
s² = s² =
(n – 1) (n – 1)

where: s² = sample variance x = sample mean


s = sample standard deviation n = sample population
x = the value of any particular observation of measurement
Example 2: The daily rates of a sample of eight employees at GMS Inc. are P550, P420, P560, P500, P700, P670, P860, P480. Find the
variance and standard deviation.
Solution:
Step 1 : Compute the mean of the data set.

∑x 550, 420, 560, 500, 700, 670, 860, 480 4,740


x= = = = 592.50
n 8 8

Step 2 : subtract the mean from each of the value in the data set. Step 3: Square the x – x, then get the sum
2
x x–x x x–x (x – x)
550 - 42.5 550 - 42.5 1, 806.25 2
(-42.5) = 1,806.25
420 - 172.5 420 - 172.5 29,756.25

560 - 32.5 560 - 32.5 1,056.25

500 - 92.5 500 - 92.5 8,556.25

700 107.5 700 107.5 11,556.25

670 77.5 670 77.5 6,006.25

860 267.5 860 267.5 71,556.25

480 - 112.5 480 - 112.5 12,656.25

∑ (x – x) = 0 2
∑x = 4,740 ∑x = 4,740 ∑ (x – x) = 0 ∑ (x – x) = 142,950
Step 4: Solve for variance and the standard deviation. We can also obtain the standard deviation by simply extracting the square root of
the variance.

∑ (x – x) 2 142,950
s² = = = 20,421.43
n–1 8–1

∑ (x – x) 2 142,950
s= = = 20,421.43 = 142.90
n–1 8–1

Hence, the variance is P20,421.43 and the standard deviation is P142.90


Measure of Relative Relation
When presenting or analysing data set it is sometimes helpful to group subjects into several
equal groups. For example, to create four equal groups we need the values that split the data such
that 25% of the observations are in each group. The cut off points are called quartiles, and there are
three (3) of them ( the middle one also being called the median ). The general term for such cut off
points is quantiles; other values likely to be encountered are deciles, which split data into 10 parts,
and percentiles, which split the data into 100 parts (also called centiles ). Values such as quartiles
can also be expressed as percentiles; for example, the lowest quartile is also the 25 th percentile and
the median is the 50th percentile or the 5th decile.
A. Quartiles

k( N + 1)
Qk =
4

where : Qk = Quartiles k = quartile location


N = population
Example 1: Find the first, second, and third quartiles of the ages of 9 middle-management employees of a
certain company. The ages are 53, 45, 59, 48, 54, 46, 51, 58, and 55.
Solution:
Step 1: Arrange the data in order
45, 46, 48, 51, 53, 54, 55, 58, 59
Step 2: Select the first, second, and the third quartiles value using the Formula.

1(N+ 1) 1 ( 9+ 1) 10
Q1 = = 4
= = 2.5
4 4

2(N+ 1) 2 ( 9 + 1) 2 (10)
Q2 = = = 4
=5
4 4

3(N+ 1) 3(9+ 1) 3 (10)


Q3 = = = = 7.5
4 4 4

Step 3: Identify the first, second, and the third quartiles values in the data set.
45, 46, 48, 51, 53, 54, 55, 58, 59

2.5th 5th 7.5th


Since the 2.5th falls between 46 and 48; and 7.5th falls between 55 and 58 we can determine the first and the third
quartiles of the data set by getting the average of the two values.

46 + 48 94 55 + 58 113
Q1 = = = 47 Q3 = = = 56.5
2 2 2 2

Therefore, Q1 = 47, Q2 = 53, Q3 = 56.5

Example 2: 9, 3, 5, 4, 8, 6, 7, 9, 5, 5, 6, 3, 2, 7 find the 3rd quartile


Step 1: solve for Qk
Solution:

k( N + 1)
Qk = where: N = 14 k = 3 which is 3rd quartile
4
3 ( 14 + 1) 3 (15) 45
Q3 = = = = 11.25 rank value
4 4 4
Step 2: Arrange the data in order
2, 3, 3, 4, 5, 5, 5, 6, 6, 7, 7, 8, 9, 9

11th

Step 3: Make an equation to solve for Q3


11th + 0.25 ( 12th – 11th )
7 + 0.25 ( 8 – 7 )
7 + 0.25 (1)
7.25 – Q3
Therefore Q3 = 7. 25 ans.
Quiz no. 7
1. Samples of forty-two (42) college students are considered for study and were categorized according to year level. The data set is
Senior Freshman Freshman Sophomore Freshman Junior Junior

Junior Sophomore Sophomore Freshman Sophomore Freshman Junior

Sophomore Freshman Sophomore Senior Senior Senior Sophomore

Freshman Freshman Senior Sophomore Freshman Sophomore Junior

Sophomore Junior Freshman Freshman Sophomore Junior Junior

Freshman Freshman Sophomore Senior Junior Freshman Freshman


Construct a frequency distribution for the data.
Solution:

Class Tally Frequency Cumulative Relative Frequency Percentage


Frequency

Total ----
2. A college professor administered a unit exam to one of his classes and found that the majority of the items were
too easy. The scores are 45, 39, 40, 48, 35, 37, 36, 37, 40, 44, 41, 49, 29, 32, 36, 37, 41, 40, 36, 39, 30, 25, 43, and 50.
Calculate the mean.
3. A time-study analyst observed a packaging operation and collected the following times (in seconds) required for the
operation to fill packages of a fixed volume box: 11, 12, 15, 18, 13, 18, 16, 14, 12, and 17. Find the range, variance and
standard deviation.
4.In a certain shopping mall, restaurants charge P140, P195, P125, P150, P200, P165, P175, P190, P230, and P180 for
a regular dinner. Find the first quartile, second quartile, and third quartile.

Take Note: Answer should be on Hand Written use long bond paper
Answer’s should be send on my gmail account @ [email protected] including your Name, Course and
Section.
• Make sure that you send clear picture of your answer if you send blurred picture I will not accept that as your Quiz

• Deadline: Nov. 21, 2021

You might also like