Module-4 Mathematics in The Modern World
Module-4 Mathematics in The Modern World
MODULE 1
DATA MANAGEMENT
INTRODUCTION
OBJECTIVES
Lesson 1
Statistics has several meanings. It is frequently used to refer to recorded data such
as the number of traffic accidents, the size of enrollment, or the number of patients
visiting a clinic. Statistics is also used to denote characteristics calculated for a set of
data – for example, mean, standard deviation, or correlation. In another context,
statistics refers to statistical methodology and theory.
In short, STATISTICS is a body of techniques and procedures dealing with the
collection, organization/presentation, analysis, and interpretation of information that can
be stated numerically.
Steps in Statistical Investigation
1. Defining the problem.
2. Collection of data – refers to the process of obtaining numerical measurements.
3. Tabulation and presentation of data – refers to the organization of data in tables,
graphs or charts, so that logical conclusions can be derived from them.
4. Analysis of data – pertains to the process of deriving from the given data relevant
information from which numerical descriptions can be formulated.
5. Interpretation of data – refers to the task of drawing conclusions from the analyzed
data.
Branches of Statistics
Descriptive Statistics – deals with the collection, enumeration, classification and
graphical representation of data and computation of values to describe the characteristics
of data. An example is the census conducted by the NSO, in which all residents are
requested to provide information such as age, sex, and marital status. The data obtained
in such census can then be compiled and arranged into tables and graphs that describe the
characteristics of the population at a given time.
Inferential Statistics – is concerned with reaching conclusions about large groups of
data by studying the characteristics of samples drawn from the population, and making
inferences on previously formulated hypothesis. An example is an opinion poll such as the
SWS survey, which attempts to draw inferences as to the outcome of an election. In such
a poll, a sample of individuals (frequently fewer than 2000) is selected, their preferences
are tabulated, and inferences are made as to how more than 80 million persons would
vote if an election were held that day.
II. Categorize each of the following as either nominal, ordinal, interval or ratio
measurements.
11. Color of the eyes
12. Rank in the military
13. Students rating in College Admission Test
14. Basketball Scores
15. Major field of Specialization
16. Emotional Quotient
17. Bank Deposits
18. Level of Commitment of Mayors
19. Tuition Fee
20. Classification of Municipalities.
Collection of Data
Types of Data:
1. Primary Data – It refers to the firsthand experience or directly way of gathering
the data or information gathered from an original source. Example
autobiographies, diaries, first person accounts.
2. Secondary Data – It refers to the information taken from published or unpublished
data which are previously gathered by other individuals or agencies. Example
books, magazines, journals, newspapers, theses and dissertations.
Presentation of Data
Methods
1. Textual form – is used in presenting data in a paragraph or narrative form.
2. Tabular form – a very effective and efficient means of organizing and summarizing
data because a lot of information can be seen from a single table and it makes
comparison of figures quick under each category. Data appears in rows and
columns.
Definition of Terms:
1. Class interval and class limits – a symbol defining class such as 60-62 is called
class interval. The end numbers 60 and 62 are called class limits; the smaller
number 60 is the lower class limit and the large number 62 is the upper limit.
The term class (category) and class interval are often used interchangeably
although the class interval is actually a symbol of class.
2. Class boundaries – more precise expressions of the class limits at least 0.5 of
their value. They are called the true limits; the class boundary is situated
between the upper of one interval and the lower limit of the next.
3. Class frequency – refer to the number of observations belonging to a class
interval.
4. Class mark/midpoint (X) – is the class midpoint obtained by adding the lower
and upper limits and divide it by 2.
5. Class size, width or length – difference between the upper class boundary and
the lower class boundary of an interval.
Sturge’s Formula:
k =1+3.3 log N
Where: k = no. of classes, N = no. of scores/cases
Example 1
The following are scores of 50 students in the midterm examination
of Mathematics in the Modern World (MMW), construct a frequency
distribution table.
50 85 91 54 62 72 68 70 79 90
58 35 52 61 93 98 60 62 76 99
64 78 49 88 73 51 69 80 93 89
68 98 66 96 55 77 57 61 70 92
46 73 83 91 79 53 62 59 82 93
Solution
Step 1: Find the largest and smallest values and compute for the Range.
Lowest Score = 35
Highest Score = 99
Range (R) = 99 – 35 = 64
Step 2: Compute for the number of classes and class width.
Number of classes
N = 50
k =1+3.3 log N
k =1+3.3 log 50=6.6 ≈ 7 Classes
Class width
R 64
c= = =9
k 7
Step 3: Organize the class interval. Use the lowest score as the lower
limit of the lowest class. Add c on each succeeding lower limits per class.
Class interval
35-43
44-52
53-61
62-70
71-79
80-88
89-97
98-106
Step 4 and 5: Tally each score to the category of class interval it belongs.
Summarize under column f (frequency).
Class interval Tally f
35-43 1 1
44-52 11111- 5
53-61 11111-1111 9
62-70 11111-11111- 10
71-79 11111-111 8
80-88 11111- 5
89-97 11111-1111 9
98-106 111 3
N = 50
Step 6: Compute the Midpoint for each class interval and put it under
column M.
Class interval f X
35-43 1 39
44-52 5 48
53-61 9 57
62-70 10 66
71-79 8 75
80-88 5 84
89-97 9 93
98-106 3 102
N = 50
3. Graphical form
Data can also be presented in graphical form. This form is the most
effective means of organizing and presenting statistical data because the
important relationships are brought out more clearly and creatively in
virtually solid and colorful figures.
Types of Graph
a. Histogram – consists of an abscissa which depicts the class boundaries and a
perpendicular ordinate which shows the frequency of observations.
Steps: Prepare the x and y axis. Mark the x-axis representing the
class boundaries, and y-axis the frequencies. The bases of the bars
are plotted on the x-axis where the width of the base corresponds to
the real limits or class boundaries of the class interval. The center
of the base falls on the midpoint of the class interval.
Frequency
4
2
0
29.5- 39.5- 49.5- 59.5- 69.5-
39.5 49.5 59.5 69.5 79.5
Class Boundaries
10
8
Frequency
6
4
2
0
34.5 44.5 54.5 64.5 74.5
Class Midpoints
Example 2: Using the frequency distribution table below, answer the questions
that follows:
Answers:
1. 10% of the students obtained a score within 80-88.
2. 15 students have scored lower than 62.
3. The percentage of the students who have passed the Statistics
examination is 34%.
Lesson 2
MEAN ( x )
- The mean is the average of the set of scores. By far, the most common
measure of central tendency in statistics is the mean. It is the most
sensitive measure of central tendency.
Σx
The population mean: μ=
N
Where: μ= population mean
Σ x =t h e ∑ of all t h e scores
N=total number of cases
Example 3: Consider the scores of ten people who took a make up quiz in Algebra.
12 14 16 10 5 8 18 7 10 4
The sum of the scores is Σ x =104, then the mean score is
Σ x 104
x= = =10.4
n 10
Weighted Arithmetic Mean – can be expressed as the sum of the values
multiplied by their corresponding weights divided by the total weight.
Σf x
The formula is: x=
Σf
Where: f =weig h t∨frequency of eac h item
x=value of eac h item
Example 4: The final grades of a student at the end of semester are the following:
Subjects Grades (x) Units (f)
GECC 101 – Art Appreciation 85.00 3
GECC 104 – Ethics 88.00 3
HMGT 102 – Kitchen Essentials 84.00 3
Tour 103 – Quality Service Management 89.00 3
GECC 103 – Mathematics in the Modern World 87.00 3
Tour 104 – Philippine Tourism Culture 90.00 3
Phed 102 – Individual/Dual Sports 92.00 2
1753
¿
20
x=87.65
Σ fX
b . μ= ( for population)
N
Solution
Σ fX 3580
x= = =71.60
n 50
MEDIAN ( ~
x)
- Is a measure of central tendency that occupies the middle position in an
array of values. It is the number that divides the bottom 50% of the data
from the top 50%.
( )
n
−cf p
~ 2
x=L B + i
f md
Where: LB =lower boundary of t h e median class
n
Median class ist h e class interval w h ere is found
2
cf p =cumulative frequency ( ¿ cf ) before t h e median class
i=class ¿ interval
Since n is odd:
~
x=x n +1 =x 9+1 =x 10 =x 5=34
2 2 2
Find n/2 = 50/2 = 25. Find 25 under <cf column to find the median
class. The median class lies at 70-76.
( )
n
−cf p
~ 2
x=L B + i
f md
~
x=69.5+(25−20
15 )
× 7=71.83
MODE ( ^x )
d1 = 15 -11= 4
d2 =15 -10 = 5
^x =LB +
( d1
)
d 1 +d 2
i
1. Given the following scores: 7, 13, 8, 5, 9, 12, 15, 22, 10, and 9 find the mean,
median and mode.
2. Using the learning activity in lesson 1, compute the mean, median and mode of
the set of scores.
Lesson 3
MEASURES OF DISPERSION
Variance of a Population
The average of the squares of the distances from the population mean. It is
the sum of the squares of the deviations from the mean divided by the
population size. The units on the variance are the units of the population
squared.
Variance of a Population Formula
σ 2
=
∑ (X −μ)2
N
Variance of a Sample
Unbiased estimator of a population variance. Instead of dividing by the
population size, the sum of the squares of the deviations from the sample
mean is divided by one less than the sample size. The units on the variance
are the units of the population squared.
s=
∑
2 (x−x)2
n−1
Standard Deviation
The square root of the variance. The population standard deviation is the
square root of the population variance and the sample standard deviation is
the square root of the sample variance. The units on the standard deviation
is the same as the units of the population/sample.
σ=
√ ∑ ( X −μ)2
N
Where: σ −standard deviation of a population
X −values of observations∈t h e population
μ− population mean
N−total number of observations∈t h e population
s=
√ ∑ (x −x)2
n−1
Where: s−standard deviation of a sample
x−values of observations∈t h e sample
x−sample mean
n−total number of observations∈t h e sample
Example 11: Compute for the variance and standard deviation of the following
sample data:
x x−x ( x−x )2
22 -5 25
24 -3 9
26 -1 1
28 1 1
30 3 9
32 5 25
Σ x =¿162 ∑ ( x− x )2=70
x=
∑ x = 162 2
s=
∑ ( x−x)2
n 6 n−1
x=27 2 70 70
s= = =14.00
6−1 5
s= √14.00=3.74
√
2
s =n¿ ¿ 2
n( ∑ x 2 )- ( ∑ x)
s=
n ( n-1 )
Where: s2−variance of a sample
s−standard deviation of a sample
x−values of observations∈t h e sample
n−total number of observations ∈t h e sample
√
s2=n¿ ¿ n( ∑ f X −( ∑ fX ) )
2 2
s=
n ( n−1 )
2
Where: s −sample variance
s−sample standard deviation
X −midpoint∨class marks
f −frequency ∈a class
n−total number of observations ∈t h e sample
Height f X fX X
2
fX
2
(in inches)
45 – 49 3 47 141 2209 6627
50 – 54 4 52 208 2704 10816
55 – 59 6 57 342 3249 19494
60 – 64 7 62 434 3844 26908
65 – 69 10 67 670 4489 44890
70 – 74 7 72 504 5184 36288
75 – 79 6 77 462 5929 35574
80 – 84 4 82 328 6724 26896
85 – 89 3 87 261 7569 22707
N = 50 ΣfX = 3350 2
ΣfX =230200
n( ∑ f X 2 )−( ∑ fX )
2
2 50 (230200)−33502 287500
s= = = =117.34
n ( n−1 ) 50 (49) 2450
s= √ n¿ ¿ ¿
Chebyshev’s Theorem
The accuracy and the position of the scores in frequency distribution
relative to the mean can be computed by using the Chebyshev’s theorem.
Example 13: If the mean score of the students enrolled in an English class is 66
points with standard deviations of 5 points, at least what percentage of the scores
must lie between 46 and 86?
Solution: x ̅ −k (s)=46
66−k (5)=46
5 k =20 →k =4
1 1 1 15
1− =1− 2 =1− = =0.9375∨93.75 %
k 2
4 16 16
∴ At least 93.75 % of t h e data lie between 46∧86.
1. Given the following scores: 7, 13, 8, 5, 9, 12, 15, 22, 10, and 9 find the range,
variance and standard deviation.
2. Using the learning activity in lesson 1, compute the range, variance and
standard deviation of the set of scores.
Lesson 4
Example 14: Veniz scored 55 on a mathematics test that had a mean of 45 and a
standard deviation of 10. On an English test with a mean of 56 and a standard
deviation of 12, she had scored 70. Compare her relative positions on the two
tests.
Solution: Convert her scores for the two tests to standard score:
For Mathematics;
X −μ 55−45
z= = =1.00
σ 10
For English;
X −μ 70−56
z= = =1.17
σ 12
Since the standard score for English is larger, her relative position in English is
higher than her relative position in Mathematics.
Example 15: Suppose that the mean of a test is 122 and the s is 24. If Josie earns a
score of 146 on the test, her deviation from the mean is 146-122 is 24. Dividing
Josie’s deviation of 24 by the s of the test, we give her a z of 1.00. If Edlyn’s score
is 110, then what is Edlyn’s z-score?
110 - 122
z= =−0.50
24
Example 16: Two equivalent intelligence test are given to similar group, the test
are designed with different scales. The statistics for the tests are listed below.
Which is better a score of 145 on Test I or a score of 60 on Test II?
Test I Test II
Mean = 100 Mean = 40
s = 15 s=5
PERCENTILE
A percentile is a measure indicating the value below which a given percentage
of observations in a group of observations fall. For example, the 80th percentile is
the value below which 80% of the observations may be found.
Example 17: On an examination given 4500 students, Jedd’s score of 340 was
higher than the scores of 2,898 students who took the examination. What is the
percentile of Jedd’s score?
Solution:
Percentile 2898
¿ × 100
4500
¿ 0.644 × 100
¿ 64
QUARTILE
Refers to the value that divides the distribution into four (4) equal parts.
Q1 – refers to the value of the distribution that falls on the first one fourth
of the distribution arranged in magnitude.
Box-and-Whisker Plots
A box-and-whisker plot or boxplot is a diagram based on the five-number
summary of a data set. The five-number summary of a data set consists of the five
numbers determined by computing the minimum, Q1 or the 1st Quartile, median, Q3
or the 3rd Quartile, and maximum value of the data set.
To construct a box-and-whisker plot, first draw an equal interval scale on
which to make the box plot. The boxplot is a visual representation of the
distribution of the data. Greater distances in the diagram should correspond to
greater distances between numeric values.
Using the equal interval scale, draw a rectangular box with one end at Q1
and the other end at Q3. And then draw a vertical segment at the median value.
Finally, draw two horizontal segments on each side of the box, one down to the
minimum value and one up to the maximum value, (these segments are called the
"whiskers").
Example 19: Draw a box-and-whisker plot for the data set 16, 18, 21, 21, 22, 22,
23, 24, 25, 26, 28, 30, 30.
Solution:
1. Find/Compute for the five-number summary:
Minimum =16, Q1 = 21, Median = 23, Q3 = 27 and Maximum = 30.
2. Plot the values.
Lesson 5
()
x
deviations along the baseline. The sigma units are also called Z-scores .
σ
6. Two parameters are used to describe the curve. One is the parameter mean
which is equal to zero (μ = 0) and the other is the standard deviation which
is equal 1 (σ = 1).
7. Standard deviations or Z-scores departing away from the μ towards the right
of the curve or above the mean are expressed in positive values while the
scores departing from the mean to the left of the curve or below the mean
are in negative values.
Tables and calculators are used to determine the area under the normal
curve. The following table of Areas under the Normal Curve will help. Since, the
normal curve is symmetrical, values for negative and positive z-scores are the
same.
Example 20: Find the area under the standard normal curve for the following z-
scores and draw and shade the corresponding area on the curve.
a. Between z = 0 and z = 0.50
Solution: Using the table, the area between the mean and a z-score of 0.50
corresponds to .1915. Hence, the area is 0.1915 or 19.15%.
Solution: Using the table, the area to the between -1.50 to the mean (0)
is .4332 and the area from the mean to 0.50 is .1915. Hence, total area is
equal to the sum of 0.4332 and 0.1915 which is 0.6247 or 62.47%.
Solution: Using the table, the area between a z-score of -2.4 and the mean
is .4918. Hence, the area is 0.4918 or 49.18%.
Solution: Using the table, the area from the mean to z = 2.3 is 0.4893. The
total area to the left of z is 0.9893 or 98.93%.
Solution: Using the table, the area from the mean to a z-score of 1.0 is
0.3413. The total area to the right of z = 1.0 is 0.1587 or 15.87%.
Lesson 6
Linear Regression
In practice a relationship is found to exist between two (or more) variables,
and one wanted to express this relationship in a mathematical form by finding an
equation connecting the variables. To do this, one should collect data showing the
corresponding values of the variables. Next is to plot the points into the
rectangular coordinate system. The resulting graph is sometimes called the scatter
plot or scatter diagram.
Scatterplot
An effective way to see a relationship in data is to display the information
as a scatter plot. It shows how two variables relate to each other by showing how
closely the data points fit to a line. If the variables are correlated, the points will
fall along a line or curve. The better the correlation, the tighter the points will
hug the line.
A simple scatterplot can be used to (a) determine whether a relationship is
linear, (b) detect outliers and (c) graphically present a relationship. For example,
determining whether a relationship is linear (or not) is an important assumption if
you are analyzing your data using a Correlation and Regression.
Various types of correlation can be interpreted through the patterns
displayed on Scatterplots. These are: positive (values increase together), negative
(one value decreases as the other increases), null (no correlation). The strength of
the correlation can be determined by how closely packed the points are to each
other on the graph. Points that end up far outside the general cluster of points are
known as outliers.
Source: https://datavizcatalogue.com/methods/scatterplot.html
Source: https://datavizcatalogue.com/methods/scatterplot.html
A linear regression line has an equation of the form Y =aX +b, where X is
the explanatory variable and Y is the dependent variable. a is the slope of the line
and b is the intercept (the value of y when x = 0)
Step 2: Sum all x, y, x2 and xy, which gives us Σx, Σy, Σx2 and Σxy
(Σ means "sum up")
Example 22: The table below shows some data for the first ten (10) years of a
certain Manufacturing and Canning company, Marina. Each row in the table shows
Marina’s sales for a year, and the amount spent on advertising in that year.
Calculate the regression equation for the data using advertising as the explanatory
variable.
pesos) pesos)
1 18 665
2 23 758
3 25 823
4 28 1078
5 30 1199
6 33 1301
7 39 1472
8 47 1500
9 52 1604
10 61 1699
Solution:
X Y XY X2
1 18 665 11970 324
2 23 758 17434 529
3 25 823 20575 625
4 28 1078 30184 784
5 30 1199 35970 900
6 33 1301 42933 1089
7 39 1472 57408 1521
8 47 1500 70500 2209
9 52 1604 83408 2704
10 61 1699 103639 3721
∑ X=356 ∑ Y =¿ 12099 ¿ ∑ XY =474021 ∑ X 2=14406
To compute for the slope a;
a=n ( ∑ xy ) −¿ ¿
10 ( 474021 )−( 356 ) (12099)
a=
10 ( 14406 )−(356)2
a=24.992
4. A new dialogue box for Regression will appear Select the Y Range. This is
the predictor variable (also called dependent variable). Select the X
Range. These are the explanatory variables (also called independent
variables). These columns must be adjacent to each other. Check Labels
Click in the Output Range box and select any vacant in the work Check
Residuals Click OK.
R square means that if the value is closer to 1, the better the regression line
fits the data.
Coefficients
The regression line is: y = 24.992x + 320.175. In other words, for each unit
increase in advertising, sales increases with 320.175 units. This is an important
information.
*The same example was used in performing the regression analysis in MS Excel, there might be a
slight difference in the final answer due to manual computation and rounding off data.
Correlation
Correlation is a bivariate analysis that measures the strength of association
between two variables and the direction of the relationship.
The main result of a correlation is called the correlation coefficient (or "r").
It ranges from -1.0 to +1.0.
When the value of the correlation coefficient lies around ± 1, then it is said
to be a perfect degree of association between the two variables. The closer r is to
+1 or –1, the stronger the correlation. The direction of the relationship is simply
the + (indicating a positive relationship between the variables) or - (indicating a
negative relationship between the variables) sign of the correlation.
Interpreting Correlation
Pearson Correlation
n ( ∑ xy )−( ∑ x )( ∑ y )
r=
√ n (∑ x )−(∑ x ) ⋅√ n (∑ y )−(∑ y )
2 2 2 2
Where:
r = Pearson r correlation coefficient
n = number of value in each data set
∑xy = sum of the products of paired scores
∑x = sum of x scores
∑y = sum of y scores
∑x2= sum of squared x scores
∑y2= sum of squared y scores
20 ( 4937.6 )−(1308)(75.1)
r=
√ 20 ( 85912 )−(1308)2 ∙ √20 ( 285.45 )− (75.1 )2
98752−98230.8 521.2
r= = =0.731
√7376 ∙ √ 68.99 713.35
Interpretation: The r coefficient 0.731 indicates a positive strong
relationship between height and self-esteem level. This implies that shorter people
have lower self-esteem and taller people have higher self-esteem.
1. The following are data for 12 individual’s daily sodium intake and their
systolic blood pressure readings.
Person Sodium BP
1 6.8 154
2 7.0 167
3 6.9 162
4 7.2 175
5 7.3 190
6 7.0 158
7 7.0 166
8 7.5 195
9 7.3 189
10 7.1 186
11 6.5 148
12 6.4 140
a. Calculate the value of r and the regression equation for the data.
b. Test the hypothesis at 0.05 level of significance.
c. What would be a likely blood pressure for a person with sodium of
6.3? How about sodium of 7.6?