Module 4
Module 4
Learning Objectives:
At the end of this lesson, the students should be able to:
a. differentiate samples from population;
b. classify data as either quantitative or qualitative;
c. determine whether a variable is discrete or continuous; and
d. Identify the level of measurements.
Learning Focus:
Statistics is a branch of applied mathematics that deals with gathering, organizing, presenting,
analysing and interpreting the collected data. Statistics is the study of data, from its rarest form to its
relevance to daily lives. Data is everywhere. It is observable or measurable. With the advancement of
the technology every day, data can be accessed anywhere and by anyone. When data is correct, valid
analysis and interpretation can be generated to produce valuable information.
There are many classifications of data. Different kinds of data are collected, analysed and
interpreted. Being able to differentiate them is the first thing that must be considered when organizing
data. The data (Asaad, 2004) are the quantities (numbers) or qualities (attributes) measured or
observed that are to be collected and/or analysed. A collection of data is called data set.
Division of Statistics
1. Descriptive statistics – it involves the collecting, organizing, describing, summarizing and
presenting of gathered data in a meaningful and informative way; it is based on easily verifiable
facts.
2. Inferential Statistics – it refers to the process of drawing conclusion and making decision on the
population based on evidence obtained from a samples using the techniques of descriptive
statistics. The backbone of inferential statistics is descriptive statistics. Inferential statistics
include estimation and hypothesis testing.
1 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
5. Is there a significant difference between the mean GPAs of CA, HRM, CDA and HRIM
students?
Categories of Data:
Categorical Data – these are the nominal and ordinal scales and it uses non – parametric
statistics.
Continuous Data – it includes the ratio and interval scales and it uses parametric statistics.
Levels of Measurement
Nominal scale – Nominal scale consists of a finite set of possible values having no particular
order. It classifies qualitative data into two or more categories. It is the lowest level of
measurement. Some examples include gender, mode of transportation, nationality, occupation,
civil status, and books in the library and courses in college.
Ordinal Scale – it is a set of possible values having a specific order / rank. Some examples
include are pain level, social status, attitude towards a subject, winners in a science quiz bee
and levels of anxiety.
Interval scale – it involves quantitative data that are ranked and makes sense of differences.
Interval scales are measured on continuum and differences between any two numbers on the
scale are of known size. There is no starting point for this level of measurement. Some
examples include Celsius temperature, tons of garbage, number of arrests, income, and age.
Ratio scale - ratio level of measurement does not only include those characteristics of interval
level of measurement but also starts at the zero( 0 ) value. It is the highest level of
measurement. Some examples include weight, time it takes to do a math project and the
number of absences of students in a class.
Definition
Population - refers to the large collection of objects, place or things.
Parameter - is any numerical value which describes a population.
Example: There are 8,756 students enrolled in Nursing
N = 8,756 is a parameter
Sample - is a small portion or part of a population; a representative of the population in a
research study.
Statistic - is any numerical value which describes a sample
Example: Of the 8,756 students enrolled in Nursing, 2,893 are male
n = 2,893 is a statistic
Definition
Data - are facts, or a set of information gathered or under study.
Quantitative Data - are numerical in nature and therefore meaningful arithmetic can be done.
It involves numbers and can be obtained by counting
Example: age, weekly allowance, monthly salary
Qualitative Data - are data attributes which cannot be subjected to meaningful arithmetic.
These are attributed or characteristics such as sex, educational attainment, feelings or opinion
Example: gender, Size of T-shirt, brand of cars
2 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Definition
Quantitative or numerical data gathered about the population or sample can be further
classified into either discrete of continuous.
Discrete Data - assume exact values only and can be obtained by counting.
Example: number of student, score in an examination, number of book in a shelf
Continuous Data - assume infinite values within a specified interval and can be obtained by
measurement.
Example: height a PBA player, length of waistline,
Definition
Constant - is a characteristic or property of a population or sample which makes the members
similar to each other.
Example: Gender in a class of all boys is constant
Variable - is a characteristic or property of population or sample which makes the members
different from each other.
Example: Gender in a coed school is variable
Researchers are not interested in constants since they do not make the subjects of research
different from one another. They are specifically interested in variables.
Definition
In statistics, variables can also be classified as either independent or dependent.
Dependent. A variable which s affected by another variable.
Example: test scores
Independent. A variable which affects the dependent variable.
Example: number of hours spent in studying
Learning Activity:
Activity 16:
A. Determine the level of measurement of the following:
1. Civil status of a man. Ordinal Scale
2. Students’ scores on the final examinations. Ordinal Scale
3. The citizenship of a person. Nominal Scale
4. The time spent in the internet café of a student. Interval Scale
5. The classification of students by state of birth. Nominal Scale
6. The rating given by the students to his professor. Ordinal Scale
7. Rank of faculty. Ordinal Scale
8. Temperature in Baguio last December. Interval Scale
9. Colour of the eye. Nominal Scale
10. Number of typewriters in a room. Ordinal Scale
3 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Activity 17:
Direction: In the following research titles, give the target population (the respondents) and identify some
possible samples (should be taken from target population):
1. The attributes of the most likeable professors according to students
Population: Aldersgate College Students
Sample: Room
2. A survey on the most popular TV game show in Metro Manila
Population Metro Manila
Sample: Barangay
3. The opinions of Catholic parishioners about divorce
Population: Catholic People
Sample:Church
4. The study habits of private and public high school students in selected schools in Metro Manila
Population: Public and Private School Students
Sample:School in Metro manila
5. The degree of parent’s satisfaction regarding the quality of education their children get from
catholic colleges and universities in the Philippines.
Population Parents
Sample: Philippines
4 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Learning Focus:
5 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
In research, the more samples you have the better result you will get. Therefore, Slovin’s
formula is just a guide for obtaining the number of samples. You can get more than what is suggested
by the formula but not below it.
Another important is survey research is the type of sampling done. Since we already know how
to compute for the appropriate sample size, your next concern is how to select the sample from the
population. This activity is referred to as sampling.
Sampling Techniques
Schematic diagram of the two types of sampling techniques
Sampling Techniques
A. Probability Sampling: Sampling are chosen in such a way that each member of the population has
known though not necessarily equal chance of being included in the sample.
6 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Another process that could be uses is equal allocation. This procedure chooses
sample sizes equally from the different subgroups or strata.
4. Cluster Sampling. Cluster sampling is sometimes called area sampling because it is usually
applied when the population is large.
In this technique, groups or clusters instead of individuals are randomly chosen.
B. Non-Probability Sampling: Each member of the population does not have a known chance of being
included in the sample. Instead, personal judgment plays a very important role in the selection.
Now that you have already know how to get the acceptable number of samples from the target
population, your next step is to focus on how to gather the information or data which you need from
your samples or from your subjects or research.
Following is a diagram of the four popular data-gathering techniques and their advantages and
disadvantages
Data Gathering Techniques
7 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Sampling Techniques
a) Direct or Interview: In this method, the researcher has direct contact with the respondents.
Example: a researcher interviews respondents regarding their stand or view on a particular
issue.
d) The Experimental Method. This method of gathering data is used to find out cause and effect
relationships.
Example: the researcher wants to know if ELEMSTA Online will increase the academic
performance of the students. He/she has to do the following: Get two ELEMSTA classes of
equal intelligence. Give ordinary classroom lecture to one group while enroll the other group
online. At the end of the term, give the same test to both groups, compare their scores and by
the use of some statistical tools, find out if their academic performances are significantly
different.
8 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Learning Activity:
Activity 18:
N
A. Solve for the sample size( n), using Slovin’s formula : n= with your complete
1+ N e 2
solutions. (15 points)
N = 10,000 and e = 5% n=
N = 20,000 and e = 4% n=
N=15,000 and e = 3% n=
N=25,000 and e = 2% n=
N=30,000 and e = 1% n=
B. Classify each sample as random, systematic, stratified, or cluster.
Random 1 School supervisors are selected using random numbers to determine common
characteristics of excellent teachers
Stratified 3. In a province, municipal health officers of the 16 towns were asked to answer
questions on the recent flu epidemic.
Cluster 5. All salesladies of the ladies department of three big department stores in a city
are interviewed about customer preferences.
Systematic 7. Every fifth car is checked for smoke belching.
Random 8. A dean decided to take the same proportion of male and female instructors in
Random his college to determine the teaching method they frequently employed.
10. Every fiftieth product is checked for damages
Systematic 11. Students are selected using random numbers in order to determine their
favorite telenovela.
Stratified 13. Police officers in a city are divided into two groups according to gender.
Twenty are selected from each group and are interviewed to determine the
Random crimes most frequently committed by minors.
16. Every tenth female shopper is asked what products she bought at the health
and beauty shop.
Cluster 18. In a city, all doctors of two hospitals were asked to answer a questionnaire on
the most common fatal illness.
C. Determine what data gathering technique was portrayed in each of the following situations.
Direct 1. A researcher interviews respondents regarding their stand or view on a
particular issue.
Direct 2. A researcher makes a survey regarding the opinion of CSB students on
the implementation of the dress code.
Experimenta 4. If a researcher wants to know the number of registered cars, s/he just
l Method have to go to the Land Transportation Office.
Experimenta
l Method
9 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
6. An agriculturist treats his crops of different fertilizers, and then waits which
crop yield greater harvest.
Registration 7. A TV network uses texting and telephone call in collecting data regarding
Method viewer’s choice for Miss Philippines.
Registration 9. The COMELEC held a general registration for all qualified voters.
Method
Experimenta 10. Before undergoing to some medical treatment, a nurse does a pre-
l treatment chat with a patient.
Method
Direct
12. To determine the level of mathematical skills of the third year students of
San Mariano High School, the mathematics head teacher administered a
30-item problem solving exam to all the junior classes.
Registration 15. Births and deaths are required to be registered at the National Census.
Method
Experimenta 16. Health and disease specialist are working to find out the cause of the
l Method spread of swine flu in the Philippines.
Learning Objectives:
At the end of the lesson, students should be able to:
a. organized data in tables;
b. solve for the statistical data;
c. represent tables by graphs;
d. read and interpret tables and graphs; and
e. developed orderliness and neatness in presenting data.
Learning Focus:
The data gathered shall be presented, analyzed and interpreted that can be easily understood by
the reader. Data may be presented in textual, tabular, graphical or a combination of these.
10 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
b. Tabular Form – This is a table that shows data arranged into different classes, and the number
of cases which fall into each class. This form provides numerical facts in a more concise and
systemic manner. Statistical tables are constructed to facilitate the analysis of relationship.
Example:
SUMMARY OF ENROLMENT 2002-2003
Year Level Boys Girls Total
First Year 480 501 981
Second Year 420 465 885
Third Year 306 323 629
Fourth Year 273 290 560
Total 1479 1579 3058
c. Graphical Form – This from is the most effective means of organizing and presenting statistical
data because the important relationships are brought out more clearly and creatively in virtually
solid and colourful figures.
Frequency Distribution
Definition of Terms:
11 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Less than Cumulative Frequency (>cf) – accumulated from the lowest class to the greatest
class.
Greater Than Cumulative Frequency (<cf) – accumulated from the highest class to the
lowest class.
h. Class width – is obtained by dividing the range by the number of classes.
i. Class Mark or Class Midpoint – obtained by taking the average of the upper and lower class
limit.
Example:
Organize a frequency table according to the classification of 10,000 registered voters by
Political Affiliation. KBL (4,500), Liberal (2,700), Nacionalista (1,800), Independent (1,000)
Solution:
Distribution of Registered Voters by Political Affiliation
Political Party f %
KBL 4,500 45
Liberal 2,700 27
Nacionalista 1,800 18
Independent 1,000 10
Total n = 10,000 100
12 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Example:
In a survey to determine the number of pets in households of the 27 occupant in a subdivision,
the following data were obtained.
1 2 3 2 1 0 2 2 3
2 1 5 3 2 4 1 1 4
0 1 2 4 1 3 1 0 3
Construct the appropriate frequency distribution for the given data.
Solution:
Distribution of Number of Household Pets of the Twenty Seven Occupants in a Subdivision
No. of Tally Frequency ( f ) Percentage <cf >cf
Pets (%)
0 III 3 11 3 27
1 IIIII-III 8 30 11 24
2 IIIII-II 7 26 18 16
3 IIIII 5 18 23 9
4 III 3 11 26 4
5 I 1 4 27 1
Total N=27 100
13 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
12. Compute the greater or lesser than cumulative frequency (>cf or<cf).
Example:
Solution:
1. H = 77, L = 31, R = 77-31 = 46
2. Desired number of class intervals is 10; class interval size is 46/10 = 5(round off)
3. Start with 30 which is a multiple of 5
4. From the interval 30-34, 35-39, and so on until the interval 75-79 contains the highest
score.
5. Form the tally sheet and give a summary of the frequency.
6. The class marks are 32, 37, 42 and so on.
7. The class boundaries are 29.5-34.5, 34.5-39.5, 39.5-44.5 and so on.
8. Relative frequency are (3/54)100 = 5.55, 7.40, 14.81 and so on.
9. Cumulative frequency-greater than (<cf) are 3, 7, 15 and so on
10. Cumulative frequency-lesser than (>cf) are 54, 41, 47 and so on.
Frequency Distribution of the Mathematics Scores of 54 Students in a High School Senior Class
Class Frequency Classmark Class Relative
Interval (c.i) (f) (x) Boundaries Frequency <cf >cf
(c.b) (%)
30-34 3 32 29.5-34.5 5.55 3 54
35-39 4 37 34.5-39.5 7.40 7 51
40-44 8 42 39.5-44.5 14.81 15 47
45-49 11 47 44.5-49.5 20.37 26 39
50-54 9 52 49.5-54.5 16.66 35 28
55-59 7 57 54.5-59.5 12.96 42 19
60-64 5 62 59.5-64.5 9.25 47 12
65-69 2 67 64.5-69.5 3.70 49 7
70-74 4 72 69.5-74.5 7.40 53 5
14 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Some readers find a graphical presentation of data easier to comprehend than when data are
presented in tabular form. A graph adds life and beauty to one’s work, but more than this, helps
facilitate comparison and interpretation without going through the numerical data.
Types of Graph
Bar Chart. A graph represented by either vertical or horizontal rectangles whose bases
represent the class intervals and whose heights represent the frequencies. It is used for discrete
variables.
Histogram. A graph represented by vertical or horizontal rectangles whose bases are the
class marks and whose heights are the frequencies. It is used for continuous variables.
15 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Frequency Polygon. This is the line version of the histogram. It is a graph whose bases are
the class marks and whose heights are the frequencies. It is used for continuous variables.
The less and greater than ogive. The less than ogive is constructed by plotting the <cf
frequencies against the upper class boundaries. The greater than ogive is constructed by plotting
the >cf against the lower class boundaries. The graphs are used to estimate the number of cases
falling below any given value.
Pie chart. A circle graph showing the proportion of each class, through the relative or
percentage frequency.
16 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Multiple Graph. This is a combination of several graphs which is to compare the features or
behaviours of two groups or more groups. The most appropriate for this is either bar or line graph.
Learning Activity:
A. For each of the following class intervals, give the midpoint or the class mark(x) and the class
width(i). (20 points)
Class interval(ci) Class width( i ) Class Mark(x) Class Boundaries(c.b)
a. 4 — 8
b. 35 — 44
c. 110 —120
d. (-5) — (-1)
e. (-3) — (1)
Frequency Distribution
A. Randomly selected customers were asked about the use of a certain toothbrush. Below were
their responses. Tabulate the following data through a frequency table then interpret.
18 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
C. Below are monthly data on sales of a department store. What type of graph will best represent
the data? Draw the graph and give a brief explanation.
Month Sales Sale
(in thousand Percentag
pesos) e
January 200 3.2
February 400 6.4
March 600 9.6
April 500 8
May 750 12
June 750 12
July 450 7.2
August 400 6.4
September 350 5.6
October 300 4.8
November 550 8.8
December 1,000 16
Total 6250 100
Learning Focus:
Average occurs regularly in our daily life and it is important tool in statistics. A well-chosen
average consists of a single number about which a given data are centred. There can be several
different types of averages or sometimes called measures of central tendency. Measures of Central
Tendency are numerical descriptive measures which indicate or locate the center of distribution of a set
of data. This includes the mean, median and mode.
Properties of Mean
a) The sum of the deviations of all measurements in a set from the mean is 0.
b) It can be calculated for any set of numerical data, so it always exist.
c) A set of numerical data has one and only one mean.
d) It lends itself higher to statistical treatment.
19 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
e) It is the most reliable since it takes into account every item in the set of data.
f) It is greatly affected by extreme or deviant values.
g) It is used only if the data are interval or ratio and when normally distributed.
Example:
A researcher collects data on the ages of recipients of doctoral degree in science and
engineering, and his study yields the following:
37 41 37 33 24 27 28 43 44 36
Solution:
The mean is determined by the sum of the ages and then dividing by the total number of
recipients.
Let
x=mean
x=
∑x
n
The formula for finding the weighted mean for ungrouped data:
x 1 ( w1 ) + x 2 ( w2 ) + x 3 ( w3 ) +… x n ( w n)
The weighted mean: x=
w1+ w2 +…+ wn
A very good example of weighted mean is the computation of your term Grade Point Average (GPA). In
this case, weight is the number of units.
Example1. Get the Grade Point Average (GPA) of particular student whose grades are as follows:
Subject Grade (x) Unit (w)
20 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Statistics 4.0 3
English 2.0 3
Accounting 1.0 5
P.E. 1.5 2
x 1 ( w1 ) + x 2 ( w2 ) + x 3 ( w3 ) +… x n ( w n)
x=
w1+ w2 +…+ wn
4 (3 )+ 2 ( 3 )+1 ( 5 ) +1.5 ( 2 )
x=
3+ 3+5+2
12+6+5+3
x=
13
26
x= =2
13
Median:
The median is the midpoint of the data array. Before finding this value, the data must be arranged
in order, from least to greatest or vice versa. The median will either be a specific value or will fall
between two values.
The middle value, when a set of data is arranged either in ascending or descending order, is
called the median.
Value in a distribution of scores that separates the top half from the bottom half. If there is an
odd number of cases, it is the middle observation when all of them are ranked according to
size. It is that value which has 50% of the remaining cases above it and 50% below it. If there is
an even number of cases, it is the arithmetic mean of the two middle values.
Properties of median:
a) It is the score or class in a distribution, below which 50% of the score fall and above which
another 50% lie.
b) It is not affected by extreme or deviant values.
c) It is appropriate to use when there are extreme or deviant values.
d) It is used when data are ordinal.
e) It exist in both quantitative and qualitative.
[ ]
th
n+ 1
If n is odd, × = item in the distribution or simply the middle value
2
[ ]
th
m 1 +m 2
If n is even , × = where m1 and m2 are the two middle values
2
Example 1:
21 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Seven mothers were selected and given a blood pressure check. Their systolic pressures were
recorded below.
135 121 119 116 130 121 131
Find their median.
Solution:
Arrange the data in order.
116 119 121 121 130 131 135
Example 2:
Find the median of the following weights in kilos.
101, 107, 115, 120, 111, 105
Solution:
Arrange the given data in ascending order (smallest to largest).
101, 105, 107, 111, 115, 120
Since, we do not have the exact value of our median, we have to get the sum of the 2 middle values
and divide by 2.
Let m1=107
m=111
m1+ m2
x=
2
107+111
¿
2
218
¿ =109
2
Properties of Mode
a) It is used when you want to find the value which occurs most often.
b) It is quick approximation of the average.
c) It is an inspection average.
d) It is the most unreliable among the three measures of central tendency because its value is
undefined in some observations.
Example:
22 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
1) The following are the descriptive evaluation of 5 teachers VS, S, VS, VS and S. Mode: VS
2) The ages of five students are: 17, 18, 23, 20 and 19. Mode:None
3) The grades of five students are: 4.0, 3.5, 4.0, 3.5 and 1.0. Mode:3.5 and 4.0
4) The weights of five persons in pounds are: 117, 218, 233, 120 and 117. Mode:117
Example:
Below are the post test scores of the 45 pupil respondents after their exposure to an
Individually Guided Instruction.
Find the mean, median and the mode of the given data.
23 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
D4: =D3/D2
D5: =median(A2:A46)
D6: =mode(A2:A46)
24 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
25 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Sample mean(x ): x =
∑ f Xm
n
Example:
Below is frequency distribution of the scores of 40 students. The steps is to get the class mark,
then get the product of class mark and frequency ( f X m).
Class interval f Xm f Xm
16-23 1 19.5 19.5
24-31 3 27.5 82.5
32-39 6 35.5 213.0
40-47 12 43.5 522.0
48-55 10 51.5 515.0
56-63 8 59.5 476.0
N=40 ∑ fx=1828
μ=
∑ f X m = 1828 =45.7
N 40
Therefore, the mean is 45.7.
26 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Example 1:
27 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Step 2: Determine the number of measurement/ data (n), by doing the following:
a. Use sum formula to add all the frequencies (f)
b. Then press enter.
28 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
29 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
30 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
31 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
32 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Median:
For grouped data, add column for the less than cumulative frequency (<cf). Use the following formula:
[ ]
n
−¿ cf L – is the lower class boundary of the median class
Median = 2 i;
L+ <cf – is the less than cf above the median class
fm
I – is the class width
where
fm – is the frequency of the median class
Example: Given a grouped frequency distribution of the age of patients in the ICU if XYZ Hospital.
Determine the median age of the patients.
The Age Distribution of Patients in the ICU of XYZ Hospital
Class interval f Xm ¿ cf
16-23 1 19.5 1
24-31 3 27.5 4
32-39 6 35.5 10
40-47 12 43.5 22 ←Median
Class
48-55 10 51.5 32
56-63 8 59.5 40
N=40
Solution:
n
Step 1: Determine ;
2
n
Step 2: Look for in the column of the <cf.
2
Referring to the preceding table, you have the following pieces of information:
n
a) Median class is (40-47) because = 20, which is in less than 22 in the column <cf
2
b) L is the lower class boundary of the median class, which is 39.5.
c) <cf is the less than cumulative frequency above that of the median class, which is 10.
d) i is the class width which is 8
e) fm is the frequency of the median class which is 12
[ ] [ ]
n 40
−¿ cf −10
Median = 2 i = 39.5 + 2 = 46.17
L+ 8
fm 12
33 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Example 1:
Step 1: Determine the number of measurement/data (n).
Use sum formula to add all the frequencies (f)
34 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
35 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Where:
fm = 9
<cf = 26
i=5
L = 49.5
n = 54
Solution:
[ ]
54
−26
Substitution: = 2
49.5+ 5
9
¿ 49.5+
[ 27−26
9 ]5
36 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
¿ 49.5+
[]
1
9
5
¿ 49.5+ [ 0.11 ] 5
¿ 49.5+ 0.55
Median ¿ 50.056
Mode:
Mo=Lmo+
[ ∆1
∆ 1+∆ 2 ]
i,
∆ 1 - difference between the highest frequency and the
frequency above it
∆ 2−¿ difference between the highest frequency and the
where:
frequency below it
Lmo – lower class boundary of the nodal class
I – class width
Example: Given a grouped frequency distribution of the age of patients in the ICU if XYZ Hospital.
Determine the modal age of the patients.
37 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Given:
a) Modal class is (40-70) because it is the class with the highest frequency.
b) Lmo is the lower class boundary of the modal class which is 39.5
c) The class width i is 8
d) ∆ 1= 12-6 = 6 and ∆ 2= 12 -10 = 2
Solution:
Mo = Lmo + [ ∆1
∆ 1+∆ 2 ] [ ]
i = 39.5 +
6
6+ 2
8 = 45.5
38 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
For the sample data the modal class is class interval 45 – 49 (with the highest
frequency of 11)
39 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
∆ 1=3
∆ 2=2
i=5
Substitution:
¿ 44.5+
[ ]
3
3+ 2
5
¿ 44.5+
[]
3
5
5
¿ 44.5+ [ 0.6 ] 5
¿ 44.5+3
Mode = 47.5
Learning Activity:
Activity 19:
Mean of Ungrouped and Weighted Data
1. Find the mean (in two-decimal places) of the following set of data and answer the questions
which follows:
Table I
Number of Typing Errors Committed by Secretary A in the 24 Chapters of Book X
12 26 42 38 35 37
42 30 59 23 57 40 Mean
:
46 42 18 40 21 57
28 58 42 64 55 43
Table II
Number of Typing Errors Committed by Secretary B in the 24 Chapters of Book X
43 27 43 40 42 22
13 20 33 54 41 28 Mean
40 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
:
22 53 28 23 22 28
32 22 22 64 57 57
41 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Learning Focus:
Measures of Variation
The previous section focused on averages or measures of central tendency. The averages are
supposed to be central scores of a given set of data. However, not all features of a given data set may
be reflected by the averages. For example, two different groups of 5 students are given identical
quizzes in Math. The following data below represents their scores.
Group 1 Group 2
14 5
13 19
18 18
14 14
11 14
42 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Range
The range is the simplest measure of variation to calculate. It is just the difference between the
largest and the smallest value in a given data set. For group 1, the range is 18-11 = 7. The range for
group 2 is 19 – 5 = 14. A much larger range suggests greater variation or dispersion.
The range has a disadvantage of being influenced by extreme values called outliers. Another is
that it is based on two values only. All the other values in the set are being ignored.
The standard deviation is the positive square root of the variance. The variance calculated from
a population data is denoted byσ 2 (sigma squared) and the standard deviation by σ . The basic
formulas are:
σ 2=
N
N = number of population
√
∑ (x−μ)2 ∧σ = ∑ ( x−μ)2
N
μ= population mean
Example :
The final exam scores of 5 students were 80, 88, 92, 90 and 85. Determine the variance and
standard deviation.
Solution:
Find the mean ( μ ¿
μ=
∑ x = 88+80+92+ 90+85 = 435 =87
N 5 5
Subtract the mean from each individual score ( x−μ ) .
Score ( x−μ )
88 1
80 -7
43 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
92 5
90 3
85 -2
2
Square each of the difference( x−μ ) .
Score ( x−μ ) ( x−μ )2
88 1 1
80 -7 49
92 5 25
90 3 9
85 -2 4
Get the sum of ( x−μ )2 .
∑ ( x−μ )2=1+49+25+ 9+4=88
σ 2=
∑ (x−μ)2 = 88 =17.6 → variance
N 5
σ=
√ ∑ ( x−μ)2 =
N √ 88
5
= √17.6=4.2 → standard deviation
The following formulas are shortcut formulas for computing the variance and standard
deviation. These are mathematically equivalent to the preceding formulas. They save time when
44 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
repeated subtracting and squaring occur in the original formulas. These shortcut formulas will be used
mostly in this book.
Shortcut Formulas
s =n ∑ x −¿¿ ¿ ¿ ¿
2 2
Example : The weights of nine basketball players are recorded as follows (in pounds).
206 215 305 297 265 282 301 255 261
Solution
s2=n ∑ x 2−¿¿ ¿ ¿ ¿
5790 519−5697 769 92750
¿ = =1288.19
72 72
s= √ s =√ 1288.19=35.89
2
Example 1A :
For 108 randomly selected high school students, the following IQ frequency distribution were
obtained.
Class Limits Frequency
90-98 6
99-107 22
108-116 43
117-125 28
126-134 9
Find the variance and standard deviation.
Solution:
45 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Step 1: Make a table. Find the midpoints of each class. Multiply the midpoints by the frequency for each
class.
Class Limits Frequency xm f . xm
90-98 6 94 564
99-107 22 103 2266
108-116 43 112 4816
117-125 28 121 3388
126-134 9 130 1170
Step 2: Multiply the frequency by the square of the midpoint for each class.
Class Limits Frequency xm f . xm xm
2
f . xm
2
Step 3: Find the sum of columns 2, 4 and 6. Substitute in the formula s2.
s= √ s2 =√ 82.25=9.07
46 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
47 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Example 1:
An IQ test has a mean of 105 and a standard deviation of 20. Find the corresponding z score
for each IQ.
a. 88 b. 122 c. 110
Solution:
x−x 88−105
a. z= = =−0.85
s 20
x−x 122−105
b. z= = =0.85
s 20
x−x 110−105
c. z= = =0.25
s 20
Example 2:
Which of the following exam grades has a better relative position?
A grade of 43 on an Algebra test with a mean of 40 and s = 3
Or
A grade of 75 on a Geometry test with a mean of 72 and s = 5?
Solution:
x−x 43−40
For a grade of 43: z= = =1
s 3
x−x 75−72
For a grade of 75: z= = =0.6
s 5
Since the z score for the Algebra test is larger, the position in the Algebra test is higher than the
position in the Geometry test.
When the arrangements are arranged in order of magnitude, that is increasing or decreasing;
Q1=0.25 (n+1) Q2=0.50 (n+1) Q3=0.75(n+1)
Example:
Find Q1 ,Q 2 ,∧Q3 of the following set of data.
19, 12, 16, 0, 14, 9, 6, 1, 12 13, 10, 19, 7, 5, 8
Solution: Arrange the data from lowest to highest.
0, 1, 5, 6, 7, 8, 9, 10, 12, 12, 13, 14, 16, 19, 19
Using the formula.
Q1=0.25 ( n+1 )=0.25 ( 15+1 )=0.25 ( 16 ) =4 → 4 thdata:6
Q2=0.50 ( n+ 1 )=0.50 ( 15+1 )=0.50 ( 16 )=8 →8 th data: 10
48 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Percentiles:
Percentiles are position measures used in educational and health- related fields to indicate the
position of an individual in a group. It is symbolized by P1 , P2 , P 3 , . . ., P99 and divide the distribution into
100 groups.
The percentile corresponding to a given value x is computed by using the formula:
( number of values below x ) +0.5
percentile= x 100 %
total number of values
Example 1: Find the percentile rank of a test score of 49 in the data set.
12, 28, 35, 42, 47, 49, 50
Solution: Arrange the data in order from lowest to highest. Then substitute in the formula.
Let x = 49
( number of values below x ) +0.5
percentile= x 100 %
total number of values
( number of values below 49 ) +0.5
percentile= x 100 %
total number of values
5+ 0.5
percentile= x 100 %
7
5.5
percentile= x 100 %=78.57 %
7
The next examples illustrates how to find a value corresponding to a given percentile.
Example 2: The following scores in a Statistics test:
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
Find the value corresponding to the 25 th percentile.
np
Solution: Arrange the data set from lowest to highest. Compute c= , where n is the total number of
100
values and p is the percentile.
np (10)(25)
c= = =2.5
100 100
Since c is not a whole number, round it up to the next whole number; in this case, c= 3.
Therefore, the 25th percentile is the 3rd value which is 5.
Deciles:
Deciles divide the distribution into tenths or 10 equal parts. A data set has nine deciles which is
denoted by D1 , D2 , D3 ,. . . , D9 . Basically, the first decile, D1, is the number that divides the bottom
49 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
10% of the data from the top 90%. To obtain the deciles, divide the data set into tenths and then
determine the number dividing the tenths.
Note that the second quartile, fifth decile, and fiftieth percentile of a data set are all the same
and all equal to the median.
Median = Q2=D5 =P50.
Similarly,Q1=P25 , D 1=P10∧Q3=P75.
Example 1:
Find the value corresponding to the 60 th percentile for the given data set.
80 68 53 58 76 73 85 88 91 79
Solution: Arrange the data from lowest to highest.
53 58 68 73 76 79 80 85 88 91
Using the formula:
np (10)(60)
c= = =6
100 100
Since the value of c is a whole number, use the value halfway between 6 and 6 + 1 values
when counting from the lowest value – the 6 th and 7th values.
53 58 68 73 76 79 80 85 88 91
(79+80)
The value halfway between 79 and 80 is =79.5. Hence, 79.5 corresponds to the 60 th
2
percentile.
Grouped Data:
For grouped data, the quartiles, deciles or percentiles can be determined using the following
formula.
L+ (
kn−cf
f )
(w)
Where k is equal to
i i i
for quartiles ; for deciles ; for percentiles ;
4 10 100
50 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
3
For instance, if we are looking for the 3 rd quartile, Q3, then i = 3. Thus, k = . Or if we are
4
70
interested with the 70th percentile, P70, then i=70. Thus, k = .
100
Example: Find the third quartile, 4th decile and 70th percentile for the given frequency distribution below.
Solution:
51 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Activity No.22:
The following data give the hours worked last week by 30 employees of a company.
42 45 40 38 35 47 40 27 39 43
48 53 23 51 42 48 40 36 51 40
40 34 21 40 31 34 16 39 41 36
a. c
Learning Objectives:
At the end of the lesson, the students should be able to:
52 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Learning Focus:
Most of the datasets out there have a central value. They are either narrowly or widely spread
out. Drawing bell-shaped curve on a histogram gives the normal distribution or the Gaussian
distribution, named after its founder, Karl Friedrich Gauss.
Probability is the branch of mathematics that studies the possible outcomes of given events
together with the outcomes' relative likelihoods and distributions. In common usage, the word
"probability" is used to mean the chance that a particular event (or set of events) will occur expressed
on a linear scale from 0 (impossibility) to 1 (certainty), also expressed as a percentage between 0 and
100%. The analysis of events governed by probability is called statistics.
There are several competing interpretations of the actual "meaning" of probabilities.
Frequentists view probability simply as a measure of the frequency of outcomes (the more conventional
interpretation), while Bayesians treat probability more subjectively as a statistical procedure that
endeavors to estimate parameters of an underlying distribution based on the observed distribution.
A properly normalized function that assigns a probability "density" to each possible outcome
within some interval is called a probability density function (or probability distribution function), and its
cumulative value (integral for a continuous distribution or sum for a discrete distribution) is called
a distribution function (or cumulative distribution function).
Probability is simply how likely something is to happen. Whenever we’re unsure about the
outcome of an event, we can talk about the probabilities of certain outcomes—how likely they are. The
analysis of events governed by probability is called statistics.
(¿ of wayscan happen)
Probability of an event=
(total number of outcomes)
( ¿ of wayscan happen)
P ( a )=
(total number of outcomes)
Examples:
1. The best example for understanding probability is flipping a coin:
a. What is the probability of flipping a HEAD?
Solution:
There are two possible outcomes—heads or tails or 2 outcomes.
(¿ of wayscan happen)
P ( H )=
(total number of outcomes )
1
P ( H )= =0.5
2
2. Rolling a Die
53 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Different outcomes rolling a die. Therefore, the total possible outcomes will be six (6).
Normal Distribution is a continuous probability distribution. This means that it generally uses either
interval or ratio data. The histogram is a great approximation of a normal distribution. Drawing a bell-
shaped curve on the histogram determines whether the distribution is normal or not. A bell-shaped
curve symbolizes that there is one central peak. The rest of the data are on either side of the center
tapering off on the extremes.
The normal distribution is the most important probability distribution in statistics because it fits
many natural phenomena. For example, heights, blood pressure, measurement error, and IQ scores
follow the normal distribution. It is also known as the Gaussian distribution and the bell curve.
The normal distribution is a probability function that describes how the values of a variable are
distributed. It is a symmetric distribution where most of the observations cluster around the central peak
and the probabilities for values further away from the mean taper off equally in both directions. Extreme
values in both tails of the distribution are similarly unlikely.
Standard Normal Distribution. This is a distribution of a normal random variable with mean zero and
standard deviation equal to 1.
Gaussian distribution is another name for a normal distribution.
54 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
3. Exactly half of the values are to the left of center and exactly half the values are to the right.
4. The total area under the curve is 1.
5. The tails of the normal curve are asymptotic to the horizontal axis.
6. It is determined by the population mean μ and population standard deviation σ. The mean
controls the center and the standard deviation controls the spread of the distribution.
Mean
The mean is the central tendency of the distribution. It defines the location of the peak
for normal distributions. Most values cluster around the mean. On a graph, changing the mean
shifts the entire curve left or right on the X-axis.
Standard deviation
The standard deviation is a measure of variability. It defines the width of the normal
distribution. The standard deviation determines how far away from the mean the values tend to
fall. It represents the typical distance between the observations and the average.
On a graph, changing the standard deviation either tightens or spreads out the width of
the distribution along the X-axis. Larger standard deviations produce distributions that are more
spread out.
When you have narrow distributions, the probabilities are higher that values won’t fall
far from the mean. As you increase the spread of the distribution, the likelihood that
observations will be further away from the mean also increases.
55 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
The probability that X is less than A equals the area under the normal curve bounded by A and
minus infinity (as indicated by the shaded area in the figure above).
56 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
describes the percentage of the data that fall within specific numbers of standard deviations from the
mean for bell-shaped curves.
Additionally, every normal curve (regardless of its mean or standard deviation) conforms to the
following “rule”.
About 68% of the area under the curve falls within 1 standard deviation of the mean.
About 95% of the area under the curve falls within 2 standard deviation of the mean.
About 99.7% of the area under the curve falls within 3 standard deviation of the mean.
Those points are known as the empirical rule or the 68−95−99.7 rule. Clearly, given a normal
distribution, most outcomes will be within 3 standard deviations of the mean.
I t w a s s t a t e d t h
areas of a z – value is the same, whether it is positive or negative. Hence, area of –z is equal to the
area of +z.
57 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
library checks our per hour, you can count 21 or 22 books, but nothing in between.
Continuous Probability Distribution are also known as probability density functions. You know
that you have a continuous distribution if the variable can assume an infinite number of values between
any two values. Continuous variables are often measurements on a scale, such as height, weight, and
temperature.
Standard scores are a great way to understand where a specific observation falls relative to the
entire distribution. They also allow you to take observations drawn from normally distributed populations
that have different means and standard deviations and place them on a standard scale. This standard
scale enables you to compare observations that would otherwise be difficult.
This process is called standardization, and it allows you to compare observations and
calculate probabilities across different populations. In other words, it permits you to compare apples to
oranges. Isn’t statistics great!
To standardize your data, you need to convert the raw measurements into Z-scores.
To calculate the standard score for an observation, take the raw measurement, subtract the
mean, and divide by the standard deviation. Mathematically, the formula for that process is the
following:
X represents the raw value of the measurement of interest. Mu and sigma represent the parameters for
the population from which the observation was drawn.
x−μ
z=
σ
σ − population standard deviation
μ− population mean
x−raw value
z−z score
After you standardize your data, you can place them within the standard normal distribution. In
this manner, standardization allows you to compare different types of observations based on where
each observation falls within its own distribution.
58 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
How can we use the standard normal distribution in solving problems on probability?
Suppose we literally want to compare apples to oranges. Specifically, let’s compare their
weights. Imagine that we have an apple that weighs 110 grams and an orange that weighs 100 grams.
If we compare the raw values, it’s easy to see that the apple weighs more than the orange. However,
let’s compare their standard scores. To do this, we’ll need to know the properties of the weight
distributions for apples and oranges. Assume that the weights of apples and oranges follow a normal
distribution with the following parameter values:
Apples Oranges
Standard Deviation 15 25
Examples:
1. Given a normal distribution with mean = 50 and sd = 10, find the probability that X assumes
a value between 45 and 62.
Solution:
Transform the values, x 1=45∧x 2=62 to z values.
x −μ 45−50 −5
z 1= 1 = = =−0.5
sd 10 10
x1−μ 62−50 12
z 1= = = =1.2
sd 10 10
2. Given a normal distribution with mean = 300 and sd = 50, find the probability that X assumes
a value greater than 362.
Solution:
59 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
x−μ
z=
sd
362−300
z= =1.24
50
3. Zig merchandise sells Christmas light bulbs that have a length of life that is normally
distributed of 40 hours. Find the probability that a bulb burns between 778 and 834 hours.
Solution:
x−μ x−μ
z= z=
sd sd
778−800 778−800
z= =−0.55 z= =−0.55
40 40
The normal distribution is a probability distribution. As with any probability distribution, the
proportion of the area that falls under the curve between two points on a probability distribution plot
indicates the probability that a value will fall within that interval.
There are different versions of the standard normal curve table. In this version, the Z column
contains values of the standard normal distribution; the second column contains the area below Z.
since the distribution has a mean of 0 and a standard deviation of 1, the Z column is equal to the
number of standard deviations below (or above) the mean.
For example:
A Z-score of 2.5 represents a value of 2.5 standard deviations above the mean. The area to the
left of a Z value of 2.5 is 0.9938
60 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Example:
The table gives the proportion to the left of a chosen Z-value of up to 2 decimal places. To read
the table, find the Z score in the left column Z. If your score contains 2 decimal places, use the columns
to the right. For example, if you are looking for a Z score of 0.75, you will look at the intersection of 0.7
(Z column) and the column 0.05 (0.7+0.05=0.75).
To
obtain the probabilities, simply multiply the percentage by 100. Example: 0.7734 would be expressed
as 77.34%.
Examples:
Finding the percentage of values to the left of a Z score.
1. In a standard normal distribution, what percentage of values will be less than 1.28?
a. Draw a diagram: you are looking for the percentage of the graph to the left of 1.28.
b. Use the standard normal table to find the value to the left of 1.28.
c. The value is 0.89973, which means that the percentage of values less than 1.28 is 89.97%.
61 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
2. Finding the
62 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
3. Finding the percentage of values between the mean and a particular Z-score.
What percentage of values are between 0 and 1.28?
a. First draw a diagram in this case, you are looking for values between the mean (0) and 1.28.
b. Since we can’t find areas between two values in the standard normal table, we will use the
information we know about the values that are to the left of 1.28:
89.97% of values are below 1.28.
The curve is symmetrical, which means that 50% of values lie above the mean and
50% of values lie below the mean.
89.97%-50%=39.97%
63 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
5. Finding Z-scores and raw scores form percentage using the normal curve table. The table can
also be used to find the Z-scores and raw scores from specific percentages.
To find the Z-score from the percentage 90%, we look for the most approximate percentage in the
table: 0.8997. Working backwards we see that this figure corresponds to a Z-score of 1.28.
This Z-score can then be converted to a raw score using the mean and the standard deviation of
the distribution.
The notation P ( a< z <b ) , P ( z < a )∧P(z >a) will be used and their meanings are as follows:
P ( a< z <b ) is read as “the probability or area of z between a and b”.
P(z <a) is read as “the probability or are of z less than a or to the left of a”.
P(z >a) is read as “the probability or area of z greater than a or to the right of z”.
Note that the symbols ≤∧≥ have the same meanings as < and >. To find the areas, the Table of areas
under the Normal Curve will be used.
64 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Using the table, the area of z = -0.46 is 0.1772 and the area of z = 0.52 is 0.1985.
Learning Activity:
65 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
1. 0.99
2. -0.52
3. 0.66
4. 1.87
5. -2.58
6. 3.16
7. -0.12
1. P ( 0< z <1.44 )
2. P (−2.81< z< 0 )
3. P ( z←0.73 )
4. P ( z>2.92 )
5. P (−3.10< z <1.90 )
6. P ( 1.13< z< 1.39 )
66 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
A scatterplot can identify several different types of relationships between two variables.
A relationship has no correlation when the points on a scatterplot do not show any direction or pattern.
A relationship is non-linear when the points on a scatterplot follow a pattern but not a straight line.
A relationship is linear when the points on a scatterplot follow a somewhat straight line pattern. This is
the relationship that we will examine.
Linear relationships can be either positive or negative. Positive relationships have points that incline
upwards to the right. As x values increase, y values increase. As x values decrease, y values decrease.
For example, when studying plants, height typically increases as diameter increases.
Correlation coefficients are computed and the most widely used measure of correlation is the Pearson
Product Moment Correlation Coefficient or simply Pearson r:
r =n ¿ ¿
Where x=the observed data for the independent variable
y=theobserved data for the dependent variable
n=the sample ¿ ¿
∑ x =the summation of x values
∑ y=the summation of y values
∑ x =the summation of the square of each of x values
2
2
∑ y =the summation of the square of each of y values
Examples:
67 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
A study was conducted to investigate the relationship existing between the grade in Statistics
and the grade in Computer subject. A random sample of 10 computer students in a certain college were
taken and the data are as follows:
Student A B C D E F G H I J
Statistics 75 83 80 77 89 78 92 86 93 84
Computer 78 87 78 76 92 81 89 89 91 84
Is there a relationship between the performance of the students in Statistics and Computer
subjects?
Student x y xy x2 y2
A 75 78 5850 5625 6084
B 83 87 7221 6889 7569
C 80 78 6240 6400 6084
D 77 76 5852 5929 5776
E 89 92 8188 7921 8464
F 78 81 6318 6084 6561
G 92 89 8188 8464 7921
H 86 89 7654 7396 7921
I 93 91 8463 8649 8281
J 84 84 7056 7056 7056
N=10 ∑ x =837 ∑ y=845 ∑ xy =71030 ∑ x 2=70413 ∑ xy =71717
r =n ¿ ¿
10 (71030 )−( 837)( 845)
r=
√ [ 10 ( 70413 )−(837) ][ 10 ( 71717 ) −(845) ]
2 2
3035
r= =0.906906226=0.91
√(3561)(3145)
Therefore: There exists a very positive relationship between the performance of the students in
Statistics and Computer.
A simple linear regression model is a mathematical equation that allows us to predict a response for a
given predictor value. This is used in the process of prediction. Prediction is calculating scores of the
¿
criterion variable ( y ¿ on the basis of the knowledge of the predictor (x). one example is the prediction
of job performance of an applicant using information available during the time of his application.
The y-intercept is the predicted value for the response (y) when x = 0. The slope describes the change
in y for each one unit change in x.
The values of a∧b can be obtained by using the following:
b=n ∑ xy−¿ ¿ ¿
a=M ny −b M nx
68 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Example:
Given the following data on correlation between the grade in Statistics and Computer, what
would be the predicted grade of a student in Computer who has grade of 85 in Statistics and what
regression equation could be used?
Student A B C D E F G H I J
Statistics 75 83 80 77 89 78 92 86 93 84
Computer 78 87 78 76 92 81 89 89 91 84
Solution:
Studen x y xy x
2
y
2
t
A 75 78 5850 5625 6084
B 83 87 7221 6889 7569
C 80 78 6240 6400 6084
D 77 76 5852 5929 5776
E 89 92 8188 7921 8464
F 78 81 6318 6084 6561
G 92 89 8188 8464 7921
H 86 89 7654 7396 7921
I 93 91 8463 8649 8281
J 84 84 7056 7056 7056
n=10 ∑ x =837 ∑ y=845 ∑ xy =71030 ∑ x 2=70413 ∑ 2
y =71717
b=n ∑ xy−¿ ¿ ¿
10 ( 71030 ) −(837)( 845) 3035
b= = =0.85
10 ( 70413 )−(837)
2
3561
M nx =
∑ x = 837 =83.7
n 10
M ny =
∑ y = 845 =84.5
n 10
a=M ny −b M nx
a=84.5−(0.85)(83.7)
a=13.36
¿
The regression equation is y =a+ bx=13.36+ ( 0.85 ) x
If the grade of a student in Statistics (x) is 85, the predicted Computer grade is:
¿
y =13.36+ ( 0.85 ) ( 85 ) =85.61∨86
Learning Activity:
A.Determine the relationship between family monthly income and the grades of the students. Show
your complete solutions.
69 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education
Student A B C D E F G
Family 30,000 21,000 45,000 54,000 86,000 34,000 49,000
Income
Grades 1.25 1.75 3.0 2.75 3.0 2.25 2.5
70 GEC 4