Module in Mathematics As A Tool Data Management
Module in Mathematics As A Tool Data Management
COURSE: BSIT AT – 1A, 1C, 1D, 1E, ET – 1A, 1B, 1C, 1D, ELT – 1A
DESCRIPTION: This course deals with nature of mathematics, appreciation of its practical,
intellectual, and aesthetic dimensions, and application of mathematical tools in daily
life. The course begins with an introduction to the nature of mathematics as an
exploration of patterns (in nature and the environment) and as an application of
inductive and deductive reasoning. By exploring these topics, students are
encouraged to go beyond the typical understanding of mathematics as merely a set
of formulas but as a source of aesthetics in patterns of nature, for example, and a
rich language in itself (and of science) governed by logic and reasoning. The course
then proceeds to survey ways in which mathematics provides a tool for
understanding and dealing with various aspects of present-day living, such as
managing personal finances, making social choices, appreciating geometric designs,
understanding codes used in data transmission and security, and dividing limited
resources fairly. These aspects will provide opportunities for actually doing
mathematics in a broad range of exercises that bring out the various dimensions of
mathematics as a way of knowing, and test the students’ understanding and
capacity.
TOPIC: DATA MANAGEMENT
Week #: 10 – 12
Hours: 9 hours
“It is easy to lie with statistics. It is hard to tell the truth without statistics.” - Andrejs Dunkel
Introduction
Data management is a process by which information is acquired and processed to ensure its
accessibility and reliability for users. One of the most important tools in processing and managing such
information is statistics. Statistics is utilized in most areas of human endeavor. It is usually used in
education, research, business, agriculture, and other fields and everyday life activities.
Definition 1: Statistics is a science that deals with the collection, organization, presentation, analysis,
and interpretation of data to give more meaningful information
Data or the pieces of information may be collected by conducting a survey, interview, observation,
and experiment. The data gathered can be properly organized and presented graphically by a line graph,
bar graph or pictograph or with the aid of a statistical table known as frequency distribution table (FDT). A
concise and meaningful conclusion is obtained from the analysis and interpretation of data. Relevant
information can be deduced from the analysis of numerical descriptions and predictions may be made
based on a small group to project the whole population. The work of statistics offers a wide area of
concern. Thus, statistics is subdivided into two branches, namely: descriptive statistics and inferential
Definition 2: Descriptive statistics refers to the collection, organization, summary, and presentation of
data while inferential statistics deals with the interpretation and analysis of data where the conclusion
is drawn based on the subset of the population.
statistics.
In descriptive statistics, a set of data is simply described without drawing any inferences or
implications. The data is merely summarized and discussed in a clear, concise and informative manner. In
inferential statistics, information or inferences concerning a large group known as population is provided
based on the study of a representative group or selected members in the population which are identified as
sample. Calculating the average rating of a class of 40 students in GE 4 illustrates the descriptive statistics
while determining the performance of the same class based on the performance of 10 randomly selected
members in the class exhibits inferential statistics.
BASIC TERMS
All rights reserved
No part of this module may be reproduced or transmitted in any form or by any means, electronic or mechanical without
permission from the author or copyright holder.
Mathematics in the Modern World | Mathematics as a Tool: Data Management 3
Some of the basic terminologies and notations involved in statistics are the following:
a) Population – a collection or set of things or objects under consideration.
b) Sample – a subset or representative group of the population
c) Data – refers to the information gathered in a research
Statistical data are classified according to their sources, namely: primary data or secondary data.
A variable may be quantitative or qualitative whereas the quantitative variable is further classified as
discrete or continuous.
Levels of Measurement
A more detailed distinction, termed the levels of measurement, is used by some researchers in
examining the information that is collected. It is classified as follows:
1. Nominal Measurement – numbers or symbols are used to code or classify each element in the
population. Note that the assigned numbers have no numerical meaning.
Examples: gender, educational background, employment status
2. Ordinal Measurement – uses numerical category that expresses the meaningful order. There is no
indication of distance between positions. The numbers become meaningful because they reveal
whether one class or category is more or less than the other. Categories are ranked according to
the order of their value on the property like first, second, third; oldest, next oldest, youngest.
Example: rank in beauty contest
3. Interval Measurement – has equal intervals. There is significance to the distance between any two
values. It tells us that one unit differs by a certain amount of the property from another unit. It has no
absolute zero.
Example: Aptitude test, temperature
4. Ratio Measurement – A variable measured at this level not only includes the concepts of order and
interval, but also includes the idea of ’nothingness’, or absolute zero.
Example: Measurement of height, weight, ages
Remark: The scale of measurement depends mainly on the method of measurements and not on the
property being measured. For instance, the weight of a pack of milk measured in kilograms has an interval
scale but if the boxes are labelled as one of small, medium or large, the weight is measured in ordinal
scale.
One way of summarizing the data is to figure out the data set by using the descriptive measures.
Among the most commonly used descriptive measures which are important are the measures of central
tendency and measures of dispersion.
Definition 3: A measure of central tendency (or central location) is a single value that is used to
identify the “center” of the data set or set of observations.
The three measures of central tendency are the mean, median and mode where the mean is the
most familiar measure of the “center”.
Definition 4: The mean also known as the arithmetic average is the sum of all the observed values
n
The mean of the population is symbolized by the lowercase letter “mu” in Greek alphabet, μ, while
the mean of the sample is represented by x (x – bar).
Example 1: The scores of five students who are selected randomly in a class of GE 4 are as follows: 44,
37, 41, 35 and 32. Find their average score.
The means of subgroups can be combined to come up with the group mean known as weighted
mean. This can be calculated using the formula
n
∑ f i Xi
x= i=1n
∑ fi
i=1
Where
x i is the i th observation
f i is the frequency or weight for each observation
n is the total of the frequencies
Example 2: If the final examination of a class in statistics is given the weight 2, the average quizzes the
weight 3, and a project report the weight 1, what would be the mean grade of a student who got the grades
90, 85 and 87, respectively.
Solution:
2(90)+3(85)+1(87) 522
x= = =87
2+ 3+1 6
Remarks:
Definition 5: The median is a single value which divides an array of observations into two equal parts
The median of the data set consisting of an odd – numbered observations is the middlemost value
x n+1
in the list. That is, ~
x= where n is the number of observations. If is even, the median is the average of
2
m 1+ m 2
the two middlemost values. It can be computed as ~
x= where m 1 and m 2are the two middlemost
2
values. Take note that the observations are first arranged in an array form (from lowest to highest) before
getting the median value.
Example 1: The number of books owned by the eleven children are as follows: 5, 2, 4, 6, 5, 10, 7, 6, 9, 8,
6. What is the median?
Solution: Arrange the data in an array form: 2, 4, 5, 5, 6, 6, 6, 7, 8, 9, 10. Since the list contains 11
numbers then the median is the middlemost value (6th number) which is 6.
Example 2: Compute the median of the data set: 2.5, 4.0, 5.8, 3.5, 2.5, 8.2, 7.1, 3.7
Forming an array, we have 2.5, 2.5, 3.5, 3.7, 4.0, 5.8, 7.1, 8.2. There are n=8 values, hence, the median is
X 4 + X 5 3.7+4.0
calculated as ~
x= = =3.85.
2 2
Remarks:
1. The median value may not be an actual observation in the data set.
2. The median is a positional value, hence, it is not affected by the presence of extreme observations.
3. When the data is qualitative, the median is not a possible measure so describe the center by determining
the mode.
Definition 6: The mode is an observation that occurs most frequently in the given data set.
a) set A: 36, 36, 12, 29, 35, 45. 50, 45, 45, 53
b) set B: 8, 7, 6, 5, 6, 9, 2, 3, 11, 11, 43, 10
c) set C: 39, 23, 25, 25, 63, 37, 45, 37, 48, 51, 28, 45, 50
d) set D: 2, 9, 8, 12, 5, 13, 6, 10
Solution:
The mode in set A is 45 because 45 occurs most frequently in the list. Both 6 and 11 have the most
number in set B, therefore, set B has the mode equal to 6 and 11. The mode in set C are 25, 37 and 45
since these numbers have the highest frequency. Each element in set D has the same number of
occurrences, thus, the data set has no mode.
The distribution of data may be classified as unimodal, bimodal, trimodal or multimodal distribution
depending upon the number of modal values in the given data set. In the above example, set A is
unimodal, set B is bimodal and set C is trimodal.
Example 2: What is the modal color of the shirt worn by the students if the data gathered were as follows:
white, gray, gray, black, white, red, red, gray, black, white, white, red, gray, red, gray, black, red, red, gray,
gray, black?
Solution:
Since gray has the highest frequency, it follows that the modal color of the shirt worn by the
students is gray.
Remarks:
1. The mode can be used for both quantitative and qualitative data.
2. It is very much affected by the method of grouping.
3. It is determined by the frequency and not by the values of the observations.
MEASURE OF DISPERSION
In some cases, describing the data using the measures of central tendency alone is not enough to
provide a sufficient information concerning a population or sample. It should be supplemented by an
analysis on how the individual elements of the population/sample tends to cluster around the central
tendency. Thus, an analysis on the variability of the observations may be applied.
The most commonly used measures of dispersion are the range, variance, and standard deviation.
The simplest measure and easiest to compute but a rough estimate for the measure of dispersion is the
range.
Definition 8: The range, R, is the difference between the highest value (H) and lowest value (L) in the
data set. That is, R = H – L.
Example 1. Compare the performances of the three students based on their ratings (in percent) in the 5
long tests.
Solution:
In terms of measure of central tendency, each student performs equally since they have same
average rating of 80%. However, looking at the variability of their ratings, Student A has the highest range
as compared to the other students. This shows that scores of student A are more dispersed than the other.
The rating of Student A is fluctuating while that of Student B is uniformly distributed. On the other hand,
Student C has range equal to zero so his ratings are all concentrated at its mean indicating that the
distribution has no spread.
Example 2. The average daily allowances (in pesos) of 12 college students studying at University Y are
112, 127, 118, 147.5, 165.5, 99.75, 150, 145, 145, 102, 136.25 and 113. Find the range.
Solution:
Remarks:
1. The larger the value of the range, the more dispersed the observations are.
2. The range considers only the extreme values or observations in the data set.
A more reliable measure in describing the spread of a set of observations is the standard deviation.
Most researches uses this measure in the treatment of data. The computation includes all the values in the
data set.
Definition 9: The standard deviation is the positive square root of the variance. The variance is the
average of the squared deviations of every observation from the mean.
The standard deviation and variance can be obtained from a population and a sample but most its
applications utilizes the sample rather than the population due to the complete enumeration of the latter.
The unit of the variance is squared unit while that of the standard deviation is the same as the unit of the
data set. The following symbols are used to designate these measures to a population and sample.
The variance and standard deviation of a population are calculated by using the formulas below.
σ =√ σ 2 .
All rights reserved
No part of this module may be reproduced or transmitted in any form or by any means, electronic or mechanical without
permission from the author or copyright holder.
Mathematics in the Modern World | Mathematics as a Tool: Data Management 8
Example 1: The following are the scores of a student in all her long exams in Calculus: 83, 80, 89, 78, and
70. Calculate the standard deviation.
Solution:
xi x i−μ ( x i−μ )
2
83 3 9
80 0 0
89 9 81
78 -2 4
70 -10 100
Total 400 ∑ ( xi −μ )2 = 194
N=5 μ=80
N
∑ ( xi −μ )2 194
σ 2= i =1 = =38.8
N 5
The result indicates that on average, the percentage scores of the student tend to deviate from the
mean by an amount of 6.23 units.
xi x i−x (x ¿¿ i−x) ¿
2
10 -6 36
12 -4 16
14 -2 4
15 -1 1
17 1 1
18 2 4
18 2 4
24 8 64
Total 128 ∑ (x ¿¿ i−x)2 ¿ = 130
n=8 x=16
The variance is 18.57 while the standard deviation is approximately 4.31. What can you infer from
this?
Remarks:
A large amount of standard deviation indicates that, on the average, the data values will be far from
the mean while the standard deviation of smaller amount shows that, on the average, the data values will
be close to the mean.
A statistical tool which is significant in identifying the position of an observation relative to the other
elements in a given data set the measure of relative position.
Definition 10: A measure of relative position is a statistical measure that provides the specific location
of an observation relative to the other values when the data are in ranked order.
This measure divides the data set into subgroups such that a specific portion of the data set
belongs to the lower bracket and the remaining on the higher bracket. Percentiles, deciles, and quartiles
are among the most commonly used measures of relative position.
Definition 11:
The percentile, denoted by Pi , is a value that divides an array of observations into 100 equal
parts in order that 𝑖 % of all the observations lies below Pi .
The quartile, denoted by Q i , is a value that divides an array of observations into four equal parts
in order that (𝑖 × 25%) of all the observations lies below Q i .
The decile, denoted by D i , is a value that divides an array of observations into ten equal parts in
order that (𝑖 × 10%) of all the observations lies below D i .
In determining the desired measure, the data must first be arranged in an increasing pattern. The
entire set of observations in a percentile contains 99 partitions which are located at P1, P2, P3, … , and P99
where 1% of the total observations are lower than P1 and the remaining 99% are higher than P1 , 2% of the
total observations are found below P2 and 98% are above it, and so on.
Analogous to this, quartiles have the subdivisions described by Q 1 (the first quartile which has 25%
of the observations falling below it and the remaining 75% above it), Q 2 (the second quartile which is equal
to the median and has 50% of the observations below it), and Q 3 (the third quartile with 75% of the total
observations falls below it and the remaining 25% lies above it).
The percentile Pi of ungrouped data consisting of 𝑛 observations located on the i th place can be
Definition 12: Formula for the Percentile
¿
computed as Pi=
100 .
The portions of deciles are the 1 st decile (D1), 2nd decile (D2), … , and 9th decile (D9). The lowest
decile D1 corresponds to a value in the set wherein 10% of the whole observations are located below D 1,
the second decile D2 corresponds to a value in which 20% of the entire observations are lower than D 2 , … ,
and so on up to the last decile D 9 which has a value positioned at the top such that 90% of all the
observations are located below the value corresponding to D9.
Remarks:
1. The quartile and decile can be determined by solving its equivalent percentile
95th percentile. This means that at least 95% of those who took the test had scores less than or equal to
Example 1: Joy was told that relative to the other scores on a long exam in Statistics, her score was the
Example 2: Given the following data set: 25, 5, 6, 12, 8, 16, 17, 22, 20, 9. Compute for
Solutions:
a) 20th percentile
n=10 i=20
10(20) 200
P20= = =2 (location of 20th percentile)
100 100
This means that the 20th percentile is the second score from the lowest. So, P20=6 .
b) 56th percentile
10(56) 560
P56= = =5.6 ≈ 6
100 100
When the result is not exact round it to the nearest whole number. The 56 th percentile is
approximately described by the 6th value in the data set. Thus, P56=16 .
Note: Interpolation may be applied to find for an exact value corresponding to the 56 th percentile.
P56=5.6 means that the 56th percentile is between the 5th and 6th value. To interpolate, multiply the
difference of the 5th and 6th values by the decimal part then add the result to the 5 th value. That is,
( 16 – 12 ) × 0.6=2.4 . So, P56=12+2.4=14.4 which is the exact value.
c) first quartile
10(25) 10
P25= = =2.5
100 4
P25 is located halfway between the 2nd and 3rd value in the list. So, P25=7 .
d) 2nd quartile
12+16
Note that Q 2 has the same value as the median. Solving for the median gives Md= =14 . So,
2
Q2=14 .
e) 3rd decile
10(30)
P30= =3 (3rd value from the lowest). Therefore, D3=8.
100
f) seventh decile
10(70)
P70= =7 (7th number in the list)
10
Do These!
Instruction: Solve the following problems in a 1 whole sheet of paper.
1. Company ABC is awarding the top ten most outstanding workers in their company every year. The ages
of the top ten awardees for the year 2018 are 47, 53, 36, 60, 30, 28, 42, 43, 38 and 52. Determine the
mean, median and mode of the ages. Show your solution.
2. The average height of the four basketball players is 74 inches. If the height of the three players are 69
inches, 72 inches and 78 inches, what is the height of the fourth player?
3. An interview was made to a class of 20 college students to determine the number of books owned by
the students. The data gathered are as follows: 4, 9, 0, 1, 3, 24, 12, 3, 30, 12, 7, 13, 18, 4, 5, 15, 8, 10,
10, and 12. Treating the data as a population, calculate the standard deviation.
4. An interview was made to a class of 20 college students to determine the number of books owned by
the students. The data gathered are as follows: 4, 9, 0, 1, 3, 24, 12, 3, 30, 12, 7, 13, 18, 4, 5, 15, 8, 10,
10, and 12.