0% found this document useful (0 votes)
423 views70 pages

Module 4

The document discusses key concepts in statistics including populations, samples, variables, and levels of measurement. It defines populations as large collections of objects or things, and samples as small portions that represent populations. Variables can be classified as quantitative or qualitative, discrete or continuous, dependent or independent. Levels of measurement include nominal, ordinal, interval, and ratio scales. The document provides examples to illustrate each concept.

Uploaded by

Ann Bombita
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
423 views70 pages

Module 4

The document discusses key concepts in statistics including populations, samples, variables, and levels of measurement. It defines populations as large collections of objects or things, and samples as small portions that represent populations. Variables can be classified as quantitative or qualitative, discrete or continuous, dependent or independent. Levels of measurement include nominal, ordinal, interval, and ratio scales. The document provides examples to illustrate each concept.

Uploaded by

Ann Bombita
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 70

Aldersgate College Mathematics in the Modern World

College of Arts, Sciences and Education

Module 4: Mathematics as a Tool

Lesson 1: Definition of terms related to Statistics

Learning Objectives:
At the end of this lesson, the students should be able to:
a. differentiate samples from population;
b. classify data as either quantitative or qualitative;
c. determine whether a variable is discrete or continuous; and
d. Identify the level of measurements.

Learning Focus:
Statistics is a branch of applied mathematics that deals with gathering, organizing, presenting,
analysing and interpreting the collected data. Statistics is the study of data, from its rarest form to its
relevance to daily lives. Data is everywhere. It is observable or measurable. With the advancement of
the technology every day, data can be accessed anywhere and by anyone. When data is correct, valid
analysis and interpretation can be generated to produce valuable information.
There are many classifications of data. Different kinds of data are collected, analysed and
interpreted. Being able to differentiate them is the first thing that must be considered when organizing
data. The data (Asaad, 2004) are the quantities (numbers) or qualities (attributes) measured or
observed that are to be collected and/or analysed. A collection of data is called data set.

Division of Statistics
1. Descriptive statistics – it involves the collecting, organizing, describing, summarizing and
presenting of gathered data in a meaningful and informative way; it is based on easily verifiable
facts.

Descriptive statistics can answer question such as:


1. How many students are interested to take Statistics online?
2. What are the highest and the lowest scores obtained by applicants in a test?
3. What are the characteristics of the most likable professors according to students?
4. Who performed better in the entrance examination?
5. What proportion of XYZ college students likes Mathematics?

2. Inferential Statistics – it refers to the process of drawing conclusion and making decision on the
population based on evidence obtained from a samples using the techniques of descriptive
statistics. The backbone of inferential statistics is descriptive statistics. Inferential statistics
include estimation and hypothesis testing.

Inferential statistics can answer questions like:


1. Is there a significant difference in the academic performance of male and female students
in Statistics?
2. Is there a significant difference between the proportions of students who are interested to
take Statistics online and those who are not?
3. Is there a significant correlation between the educational and job performance rating?
4. Is there a significant difference between the weights of 20 students before and after six
months of attending aerobics?

1 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

5. Is there a significant difference between the mean GPAs of CA, HRM, CDA and HRIM
students?

Categories of Data:
 Categorical Data – these are the nominal and ordinal scales and it uses non – parametric
statistics.
 Continuous Data – it includes the ratio and interval scales and it uses parametric statistics.

Levels of Measurement
 Nominal scale – Nominal scale consists of a finite set of possible values having no particular
order. It classifies qualitative data into two or more categories. It is the lowest level of
measurement. Some examples include gender, mode of transportation, nationality, occupation,
civil status, and books in the library and courses in college.
 Ordinal Scale – it is a set of possible values having a specific order / rank. Some examples
include are pain level, social status, attitude towards a subject, winners in a science quiz bee
and levels of anxiety.
 Interval scale – it involves quantitative data that are ranked and makes sense of differences.
Interval scales are measured on continuum and differences between any two numbers on the
scale are of known size. There is no starting point for this level of measurement. Some
examples include Celsius temperature, tons of garbage, number of arrests, income, and age.
 Ratio scale - ratio level of measurement does not only include those characteristics of interval
level of measurement but also starts at the zero( 0 ) value. It is the highest level of
measurement. Some examples include weight, time it takes to do a math project and the
number of absences of students in a class.

Definition
 Population - refers to the large collection of objects, place or things.
 Parameter - is any numerical value which describes a population.
Example: There are 8,756 students enrolled in Nursing
N = 8,756 is a parameter
 Sample - is a small portion or part of a population; a representative of the population in a
research study.
 Statistic - is any numerical value which describes a sample
Example: Of the 8,756 students enrolled in Nursing, 2,893 are male
n = 2,893 is a statistic
Definition
 Data - are facts, or a set of information gathered or under study.
 Quantitative Data - are numerical in nature and therefore meaningful arithmetic can be done.
It involves numbers and can be obtained by counting
Example: age, weekly allowance, monthly salary
 Qualitative Data - are data attributes which cannot be subjected to meaningful arithmetic.
These are attributed or characteristics such as sex, educational attainment, feelings or opinion
Example: gender, Size of T-shirt, brand of cars

2 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Definition
Quantitative or numerical data gathered about the population or sample can be further
classified into either discrete of continuous.
 Discrete Data - assume exact values only and can be obtained by counting.
Example: number of student, score in an examination, number of book in a shelf
 Continuous Data - assume infinite values within a specified interval and can be obtained by
measurement.
Example: height a PBA player, length of waistline,

Definition
 Constant - is a characteristic or property of a population or sample which makes the members
similar to each other.
Example: Gender in a class of all boys is constant
 Variable - is a characteristic or property of population or sample which makes the members
different from each other.
Example: Gender in a coed school is variable
Researchers are not interested in constants since they do not make the subjects of research
different from one another. They are specifically interested in variables.

Definition
In statistics, variables can also be classified as either independent or dependent.
 Dependent. A variable which s affected by another variable.
Example: test scores
 Independent. A variable which affects the dependent variable.
Example: number of hours spent in studying

Learning Activity:

Activity 16:
A. Determine the level of measurement of the following:
1. Civil status of a man. Ordinal Scale
2. Students’ scores on the final examinations. Ordinal Scale
3. The citizenship of a person. Nominal Scale
4. The time spent in the internet café of a student. Interval Scale
5. The classification of students by state of birth. Nominal Scale
6. The rating given by the students to his professor. Ordinal Scale
7. Rank of faculty. Ordinal Scale
8. Temperature in Baguio last December. Interval Scale
9. Colour of the eye. Nominal Scale
10. Number of typewriters in a room. Ordinal Scale

3 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

B. Indicate if the item is Discrete (D) or Continuous (C). D/C


11. Amount of milk in the cup. D
12. Height of babies from five to 11 months. C
13. Number of students in Section 1. D
14. Number of car accidents within 1 day. C
15. Size of shoes of PBA players. C
16. The height of the tower. D
17. The distance from the house to the playground. D
18. The length of the dress to be used in a play. D

C. Indicate if the item is Dependent (D) or Independent (I). D/I


19. Length of time to finish college education. D
20. Age I
21. Monthly savings I
22. Number of subject failed D
23. A person’s weight I
24. Price of commodity I
25. Gender D

Activity 17:
Direction: In the following research titles, give the target population (the respondents) and identify some
possible samples (should be taken from target population):
1. The attributes of the most likeable professors according to students
Population: Aldersgate College Students
Sample: Room
2. A survey on the most popular TV game show in Metro Manila
Population Metro Manila
Sample: Barangay
3. The opinions of Catholic parishioners about divorce
Population: Catholic People
Sample:Church
4. The study habits of private and public high school students in selected schools in Metro Manila
Population: Public and Private School Students
Sample:School in Metro manila
5. The degree of parent’s satisfaction regarding the quality of education their children get from
catholic colleges and universities in the Philippines.
Population Parents
Sample: Philippines

4 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Lesson II: Sampling and Data Gathering Techniques

Learning Focus:

The Slovin’s formula


In doing research, if the population is too big to handle, a substantial number of samples is
acceptable. One way of getting a number of samples is by using Slovin’s formula. However, it must be
emphasized that if accuracy of results is needed, studying the whole population or increasing the
number of sample and above what is acceptable will guarantee good results.
N
Slovin’s formula:n=
1+ N e 2
Where: n is the sample size
N is the population size
e is the margin of error
The “e” in the formula is called margin of error, Oftentimes, survey results are reported in the
newspapers like ‘The SWS said it surveyed 1,200 adults. The results have a +3% margin of error’ , as in
the Most Pinoys Believe in Love at First Sight Survey. Or ”The Fox News/Opinion Dynamics Poll was
conducted in the US in January 2003 with a national sample of 911 youth respondents and a margin of
error of 10%.”
The margin of error is a value which quantifies possible sampling errors. Sampling error
means that the results in the sample differ from those of the target population because of the “luck of
the draw.”
Determining Sample Size
Example 1: Find n if N = 10,000 and e = 5%
N 10,000
n= 2 = 2
1+ N e 1+(10,000)(0.05)
10,000
=
1+25
= 385
Example 2: Find n if N = 10,000 and e = 1%
N 10,000
n= 2 = 2
1+ N e 1+(10,000)(0.01)
10,000
=
1+1
= 5000

5 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

In research, the more samples you have the better result you will get. Therefore, Slovin’s
formula is just a guide for obtaining the number of samples. You can get more than what is suggested
by the formula but not below it.
Another important is survey research is the type of sampling done. Since we already know how
to compute for the appropriate sample size, your next concern is how to select the sample from the
population. This activity is referred to as sampling.
Sampling Techniques
Schematic diagram of the two types of sampling techniques

Types of Sampling Techniques

Non-Probability Sampling Probability Sampling

Convenience Quota Purposive Simple Systematic Stratified Cluster

Sampling Techniques
A. Probability Sampling: Sampling are chosen in such a way that each member of the population has
known though not necessarily equal chance of being included in the sample.

Types of Probability Sampling


1. Simple Random Sampling – Samples are chosen at random with members of the population
having a known or sometimes equal probability or chance of being included in the samples.
a. Lottery: This needs a complete listing of the members of the population. You. write the
names or codes on a piece of paper or cards, place them in a large container, then
randomly draw the desired number of samples. The process is relatively easy for small
population but relatively complicated and time-consuming for large population.
b. Generation of random numbers/digits : This is a better and perhaps more efficient
method for selecting a simple random sample. Computers and •even your calculators
can be used to generate random digits. The randomly produced digits can be used to
pick your samples. However, a complete listing of the members of the population is
needed in this type of random selection.
2. Systematic Sampling – Samples are randomly chosen following certain rules set by the
N
researcher. This involves choosing the kth member of the population with k = , but should be
n
a random set.
3. Stratified Random Sampling –This method is used when the population is too big to handle,
thus diving N into subgroup, called strata, is necessary. Sample per strata are then randomly
selected, but considerations must be given to the sizes of the random samples to be selected
from the subgroups.
A process that can be used is proportional allocation. This procedure chooses sample
sizes proportional to the sizes of the different subgroups or strata.

6 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Another process that could be uses is equal allocation. This procedure chooses
sample sizes equally from the different subgroups or strata.
4. Cluster Sampling. Cluster sampling is sometimes called area sampling because it is usually
applied when the population is large.
In this technique, groups or clusters instead of individuals are randomly chosen.

B. Non-Probability Sampling: Each member of the population does not have a known chance of being
included in the sample. Instead, personal judgment plays a very important role in the selection.

Types of Non-Probability Sampling


1. Convenience Sampling – This type is used because if the convenience it offers to the
researcher.
2. Quota Sampling – This is very similar to the stratified random sampling. The only difference is
that the selection of the members of the samples in stratified is done randomly.
3. Purposive sampling- Choosing the respondents on the basis of predetermined criteria set by
the researcher.

Now that you have already know how to get the acceptable number of samples from the target
population, your next step is to focus on how to gather the information or data which you need from
your samples or from your subjects or research.
Following is a diagram of the four popular data-gathering techniques and their advantages and
disadvantages
Data Gathering Techniques

DIRECT IN-DIRECT REGISTRATION EXPERIMENTAL


or INTERVIEW or QUESTIONAIRE METHOD METHOD

 Clarification  Saves time, effort  Most reliable  Can go beyond


can be done and money because it is plain
easily  Easy to tabulate governed by law description
 Body Gestures  Large number of
samples can be  Data are limited  Time
 Costly and reached to what are listed consuming
time-consuming in the documents  Lots of threats
 Need lot of  Problem on to internal and
effort retrieval external validity
 Limited to what are
included in the
questionnaire

7 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Sampling Techniques
a) Direct or Interview: In this method, the researcher has direct contact with the respondents.
Example: a researcher interviews respondents regarding their stand or view on a particular
issue.

b) Indirect or Questionnaire: The researcher gives or distributes the questionnaire to the


respondents either by personal delivery or by mail.
These are some of the characteristics/features of a good questionnaire:
1. It should contain a short letter to the respondents which includes:
a. The purpose of the survey
b. An assurance of confidentiality
c. The name of the researcher or writer of the questionnaire
2. There is a descriptive title/name of the questionnaire
3. It is designed to achieve objectives.
4. The directions are clear.
5. It is designed for easy tabulation
6. It avoids use of double negatives
7. It also avoids double-barrelled questions.
8. It phrases questions well for all respondents.

c) The Registration Method: This method of data gathering is governed by law.


Example: If a research wants to know the number of registered cars, s/he has to go to the Land
Transportation Office; the list of registered voters in the Philippines id found at the COMELEC.

d) The Experimental Method. This method of gathering data is used to find out cause and effect
relationships.
Example: the researcher wants to know if ELEMSTA Online will increase the academic
performance of the students. He/she has to do the following: Get two ELEMSTA classes of
equal intelligence. Give ordinary classroom lecture to one group while enroll the other group
online. At the end of the term, give the same test to both groups, compare their scores and by
the use of some statistical tools, find out if their academic performances are significantly
different.

8 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Learning Activity:
Activity 18:
N
A. Solve for the sample size( n), using Slovin’s formula : n= with your complete
1+ N e 2
solutions. (15 points)
N = 10,000 and e = 5% n=
N = 20,000 and e = 4% n=
N=15,000 and e = 3% n=
N=25,000 and e = 2% n=
N=30,000 and e = 1% n=
B. Classify each sample as random, systematic, stratified, or cluster.
Random 1 School supervisors are selected using random numbers to determine common
characteristics of excellent teachers
Stratified 3. In a province, municipal health officers of the 16 towns were asked to answer
questions on the recent flu epidemic.
Cluster 5. All salesladies of the ladies department of three big department stores in a city
are interviewed about customer preferences.
Systematic 7. Every fifth car is checked for smoke belching.
Random 8. A dean decided to take the same proportion of male and female instructors in
Random his college to determine the teaching method they frequently employed.
10. Every fiftieth product is checked for damages
Systematic 11. Students are selected using random numbers in order to determine their
favorite telenovela.
Stratified 13. Police officers in a city are divided into two groups according to gender.
Twenty are selected from each group and are interviewed to determine the
Random crimes most frequently committed by minors.
16. Every tenth female shopper is asked what products she bought at the health
and beauty shop.
Cluster 18. In a city, all doctors of two hospitals were asked to answer a questionnaire on
the most common fatal illness.
C. Determine what data gathering technique was portrayed in each of the following situations.
Direct 1. A researcher interviews respondents regarding their stand or view on a
particular issue.
Direct 2. A researcher makes a survey regarding the opinion of CSB students on
the implementation of the dress code.
Experimenta 4. If a researcher wants to know the number of registered cars, s/he just
l Method have to go to the Land Transportation Office.
Experimenta
l Method

9 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

6. An agriculturist treats his crops of different fertilizers, and then waits which
crop yield greater harvest.
Registration 7. A TV network uses texting and telephone call in collecting data regarding
Method viewer’s choice for Miss Philippines.

Registration 9. The COMELEC held a general registration for all qualified voters.
Method
Experimenta 10. Before undergoing to some medical treatment, a nurse does a pre-
l treatment chat with a patient.
Method
Direct
12. To determine the level of mathematical skills of the third year students of
San Mariano High School, the mathematics head teacher administered a
30-item problem solving exam to all the junior classes.
Registration 15. Births and deaths are required to be registered at the National Census.
Method
Experimenta 16. Health and disease specialist are working to find out the cause of the
l Method spread of swine flu in the Philippines.

Lesson III: Data Presentation

Learning Objectives:
At the end of the lesson, students should be able to:
a. organized data in tables;
b. solve for the statistical data;
c. represent tables by graphs;
d. read and interpret tables and graphs; and
e. developed orderliness and neatness in presenting data.

Learning Focus:
The data gathered shall be presented, analyzed and interpreted that can be easily understood by
the reader. Data may be presented in textual, tabular, graphical or a combination of these.

Manner of Presenting Data


a. Textual Form – This form combines text and numerical facts in numerical facts in a statistical
report. This manner of presenting data is used only when the data to be presented are few. Too
many data are difficult to understand.
Data presented in paragraphs or in sentences, are said be in textual form. This includes
enumeration of important characteristics, emphasizing the most significant features and
highlighting the most striking attributes of the said data. However, if data are presented in plain
text, sometimes readers get bored, thus tables and graphs are often used.
Example:
The number of pupils in Hatred Academy in the Elementary level
in 2002-2003 were Grade I – 232; Grade II – 340; Grade III – 342;

10 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Grade IV – 445; Grade V – 448; and Grade VI – 503

b. Tabular Form – This is a table that shows data arranged into different classes, and the number
of cases which fall into each class. This form provides numerical facts in a more concise and
systemic manner. Statistical tables are constructed to facilitate the analysis of relationship.
Example:
SUMMARY OF ENROLMENT 2002-2003
Year Level Boys Girls Total
First Year 480 501 981
Second Year 420 465 885
Third Year 306 323 629
Fourth Year 273 290 560
Total 1479 1579 3058

c. Graphical Form – This from is the most effective means of organizing and presenting statistical
data because the important relationships are brought out more clearly and creatively in virtually
solid and colourful figures.

A. TABULAR PRESENTATION OF DATA

Frequency Distribution

Definition of Terms:

a. Raw Data – data in their original form


b. Grouped Data – data organized and summarized in tables
c. Class Frequency – the number of individuals belonging to each class or category
d. Frequency Distribution – is a tabular arrangement of data by classes together with their
corresponding frequencies.
 Categorical Frequency Distribution – organized data of nominal and ordinal scales.
 Ungrouped Frequency Distribution – organized data of interval and ratio scales. This
method is more appropriate, when the range or the difference between the highest
value and the lowest value in the set of data is small.
 Grouped Frequency Distribution - organized data of interval and ratio scales. This method
is more appropriate, if the range or the difference between the highest value and the
lowest value in the set of data is large.
e. Range – the difference between the highest and lowest value
f. Class Boundaries – are obtained by subtracting 0.5 from the lower class limit and adding 0.5 to
the upper class limit.
g. Cumulative Frequency – is used to determine the accumulated frequencies either from the
lowest class up to and including that of the specific class or vice-versa.

11 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

 Less than Cumulative Frequency (>cf) – accumulated from the lowest class to the greatest
class.
 Greater Than Cumulative Frequency (<cf) – accumulated from the highest class to the
lowest class.
h. Class width – is obtained by dividing the range by the number of classes.
i. Class Mark or Class Midpoint – obtained by taking the average of the upper and lower class
limit.

Steps in Constructing a Categorical Frequency Distribution


1. Make a table indicating the following heading: categories, tally, frequency and percent.
2. Tally the data in the second column.
3. Count the tallies and put the result in the third columns.
4. Compute for the percentage values and put result under the last column. Percent values are
computed as follows:
f
%= × 100, where: f – class frequency
n
n – total number of values
5. Determine the totals for the last two columns.

Example:
Organize a frequency table according to the classification of 10,000 registered voters by
Political Affiliation. KBL (4,500), Liberal (2,700), Nacionalista (1,800), Independent (1,000)
Solution:
Distribution of Registered Voters by Political Affiliation
Political Party f %
KBL 4,500 45
Liberal 2,700 27
Nacionalista 1,800 18
Independent 1,000 10
Total n = 10,000 100

Steps in Constructing an Ungrouped Frequency Distribution


1. Determine the range.
2. Construct a table having the following headings: Class, Tally, and Frequency, Percentage and
Cumulative Frequency (< or >).
3. Tally the raw data under the second column.
4. Complete the frequency column.

12 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

5. Construct the column for class boundaries.


6. Complete the cumulative frequency column.

Example:
In a survey to determine the number of pets in households of the 27 occupant in a subdivision,
the following data were obtained.
1 2 3 2 1 0 2 2 3
2 1 5 3 2 4 1 1 4
0 1 2 4 1 3 1 0 3
Construct the appropriate frequency distribution for the given data.

Solution:
Distribution of Number of Household Pets of the Twenty Seven Occupants in a Subdivision
No. of Tally Frequency ( f ) Percentage <cf >cf
Pets (%)
0 III 3 11 3 27
1 IIIII-III 8 30 11 24
2 IIIII-II 7 26 18 16
3 IIIII 5 18 23 9
4 III 3 11 26 4
5 I 1 4 27 1
Total N=27 100

Steps in Constructing a Grouped Frequency Distribution


1. Determine the highest(H) and the lowest score(L).
2. Compute the range. The range (H-L)-difference between the highest and the lowest score.
3. Divide the range by 10 or 15 to determine the acceptable size of the interval or class width.
4. Select a value for the lowest class limit. The smallest value may be the lowest class limit or any
number less than the smallest value to make computations more convenient. Add the class
width to the lowest limit to get the lower limit of the next class. Keep adding until the desired
number of classes is obtained.
5. Subtract 1 from the next lower limit to determine the upper limit of the said class. Add the class
width to the upper limit to determine the upper limit of the next class until the last class.
6. Organize the class interval (c.i).
7. Tally each score to the category of class interval it belongs to.
8. Count the tally column and summarize it under column ( f ).
9. Compute the midpoint or clasmark ( x ).
10. Determine the limits of classes (c.b.)
11. Compute the relative frequency distribution (%).

13 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

12. Compute the greater or lesser than cumulative frequency (>cf or<cf).

Example:

Prepare a frequency distribution of the Mathematics scores of 54 students in a high


school senior class.
71 77 68 64 55 50 45 40 35
31 33 36 40 45 50 55 63 70
72 74 66 63 61 60 56 50 46
41 38 34 39 41 46 50 56 57
51 46 42 46 51 58 59 52 47
43 44 47 53 48 48 49 50 42

Solution:
1. H = 77, L = 31, R = 77-31 = 46
2. Desired number of class intervals is 10; class interval size is 46/10 = 5(round off)
3. Start with 30 which is a multiple of 5
4. From the interval 30-34, 35-39, and so on until the interval 75-79 contains the highest
score.
5. Form the tally sheet and give a summary of the frequency.
6. The class marks are 32, 37, 42 and so on.
7. The class boundaries are 29.5-34.5, 34.5-39.5, 39.5-44.5 and so on.
8. Relative frequency are (3/54)100 = 5.55, 7.40, 14.81 and so on.
9. Cumulative frequency-greater than (<cf) are 3, 7, 15 and so on
10. Cumulative frequency-lesser than (>cf) are 54, 41, 47 and so on.

Frequency Distribution of the Mathematics Scores of 54 Students in a High School Senior Class
Class Frequency Classmark Class Relative
Interval (c.i) (f) (x) Boundaries Frequency <cf >cf
(c.b) (%)
30-34 3 32 29.5-34.5 5.55 3 54
35-39 4 37 34.5-39.5 7.40 7 51
40-44 8 42 39.5-44.5 14.81 15 47
45-49 11 47 44.5-49.5 20.37 26 39
50-54 9 52 49.5-54.5 16.66 35 28
55-59 7 57 54.5-59.5 12.96 42 19
60-64 5 62 59.5-64.5 9.25 47 12
65-69 2 67 64.5-69.5 3.70 49 7
70-74 4 72 69.5-74.5 7.40 53 5

14 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

75-79 1 77 74.5-79.5 1.85 54 1


Total N=54 100

B. GRAPHICAL PRESENTATION OF DATA

Some readers find a graphical presentation of data easier to comprehend than when data are
presented in tabular form. A graph adds life and beauty to one’s work, but more than this, helps
facilitate comparison and interpretation without going through the numerical data.

Types of Graph
Bar Chart. A graph represented by either vertical or horizontal rectangles whose bases
represent the class intervals and whose heights represent the frequencies. It is used for discrete
variables.

Histogram. A graph represented by vertical or horizontal rectangles whose bases are the
class marks and whose heights are the frequencies. It is used for continuous variables.

15 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Frequency Polygon. This is the line version of the histogram. It is a graph whose bases are
the class marks and whose heights are the frequencies. It is used for continuous variables.

The less and greater than ogive. The less than ogive is constructed by plotting the <cf
frequencies against the upper class boundaries. The greater than ogive is constructed by plotting
the >cf against the lower class boundaries. The graphs are used to estimate the number of cases
falling below any given value.

Pie chart. A circle graph showing the proportion of each class, through the relative or
percentage frequency.

16 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Multiple Graph. This is a combination of several graphs which is to compare the features or
behaviours of two groups or more groups. The most appropriate for this is either bar or line graph.

Learning Activity:

Group Activity no. 2:


Group yourself with 5 members each using your social media accounts (messenger, email and
etc.) to solve the activities given below. Show your complete solutions for each solving. No solutions,
no points.

Parts of a Frequency Distribution

A. For each of the following class intervals, give the midpoint or the class mark(x) and the class
width(i). (20 points)
Class interval(ci) Class width( i ) Class Mark(x) Class Boundaries(c.b)
a. 4 — 8
b. 35 — 44
c. 110 —120
d. (-5) — (-1)
e. (-3) — (1)

B. Supply the missing data.

Class interval(ci) Class width( i ) Midpoint (x)


a. 4 — ________ 5 ________
b. 12 — ________ 7 ________
c. 29 — ________ 10 ________
d. 5001 — ________ 2,000 ________
e. 18.25 — ________ 1.5 ________
f. ________ — 15 3 ________
g. ________ — 89 10 ________
h. ________ — 121 25 ________
i. ________ — 7 6 ________
j. ________ — 301 100 ________
k. ________ — ________ 5 6
l. ________ — ________ 7 15
17 GECm.
4 ________ — ________ 9 204
n. ________ — ________ 6 30.5
o. ________ — ________ 12 56.5
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Group Activity No. 3


Group yourself with 5 members each using your social media accounts (messenger, email and
etc.) to do the activities given below.

Frequency Distribution

A. Randomly selected customers were asked about the use of a certain toothbrush. Below were
their responses. Tabulate the following data through a frequency table then interpret.

Slightly likable (62)


Somewhat unlikable (130)
Strongly likable (148)
Slightly unlikable (100)
Strongly unlikable (103)
Somewhat likable (92)
B. In a month, the number of tricycle passenger per day in a specific paradahan is as follows.
Construct a frequency distribution table for the given data.

305 402 300 275 395 500


675 299 389 472 581 642
746 826 915 583 762 468

18 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

466 531 284 761 953 846


800 588 278 386 842 758

C. Below are monthly data on sales of a department store. What type of graph will best represent
the data? Draw the graph and give a brief explanation.
Month Sales Sale
(in thousand Percentag
pesos) e
January 200 3.2
February 400 6.4
March 600 9.6
April 500 8
May 750 12
June 750 12
July 450 7.2
August 400 6.4
September 350 5.6
October 300 4.8
November 550 8.8
December 1,000 16
Total 6250 100

Lesson IV: Measures of Central Tendency

Learning Focus:
Average occurs regularly in our daily life and it is important tool in statistics. A well-chosen
average consists of a single number about which a given data are centred. There can be several
different types of averages or sometimes called measures of central tendency. Measures of Central
Tendency are numerical descriptive measures which indicate or locate the center of distribution of a set
of data. This includes the mean, median and mode.

Measures of Central Tendency for the Ungrouped Data


Mean
 the arithmetic average in layman’s term. The mean of a set of data is the sum of all the
measurements divided by the number of measurements contained in the set of data.
 the point on the scale above which and below which the sum of the devotions are equal;
 the sum of the values of observations divided by the cardinal number of values.

Properties of Mean
a) The sum of the deviations of all measurements in a set from the mean is 0.
b) It can be calculated for any set of numerical data, so it always exist.
c) A set of numerical data has one and only one mean.
d) It lends itself higher to statistical treatment.

19 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

e) It is the most reliable since it takes into account every item in the set of data.
f) It is greatly affected by extreme or deviant values.
g) It is used only if the data are interval or ratio and when normally distributed.

The formula for finding the mean for ungrouped data:


Population mean ( μ): μ =
∑x
N
Sample mean(x ): x =
∑ x
n

Where N – the total number of observations in the population


n – total number of observations in the sample

Example:
A researcher collects data on the ages of recipients of doctoral degree in science and
engineering, and his study yields the following:
37 41 37 33 24 27 28 43 44 36

Determine the average age of the recipients.


Where n = 10

Solution:
The mean is determined by the sum of the ages and then dividing by the total number of
recipients.
Let
x=mean

x=
∑x
n

37+37+24 +28+ 43+44 +36+41+33+ 27


x=
10
350
= =35
10
Therefore, the average age is 35 years old.

The formula for finding the weighted mean for ungrouped data:

x 1 ( w1 ) + x 2 ( w2 ) + x 3 ( w3 ) +… x n ( w n)
The weighted mean: x=
w1+ w2 +…+ wn

A very good example of weighted mean is the computation of your term Grade Point Average (GPA). In
this case, weight is the number of units.

Example1. Get the Grade Point Average (GPA) of particular student whose grades are as follows:
Subject Grade (x) Unit (w)

20 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Statistics 4.0 3
English 2.0 3
Accounting 1.0 5
P.E. 1.5 2

x 1 ( w1 ) + x 2 ( w2 ) + x 3 ( w3 ) +… x n ( w n)
x=
w1+ w2 +…+ wn
4 (3 )+ 2 ( 3 )+1 ( 5 ) +1.5 ( 2 )
x=
3+ 3+5+2
12+6+5+3
x=
13
26
x= =2
13

Median:

The median is the midpoint of the data array. Before finding this value, the data must be arranged
in order, from least to greatest or vice versa. The median will either be a specific value or will fall
between two values.
 The middle value, when a set of data is arranged either in ascending or descending order, is
called the median.
 Value in a distribution of scores that separates the top half from the bottom half. If there is an
odd number of cases, it is the middle observation when all of them are ranked according to
size. It is that value which has 50% of the remaining cases above it and 50% below it. If there is
an even number of cases, it is the arithmetic mean of the two middle values.

Properties of median:
a) It is the score or class in a distribution, below which 50% of the score fall and above which
another 50% lie.
b) It is not affected by extreme or deviant values.
c) It is appropriate to use when there are extreme or deviant values.
d) It is used when data are ordinal.
e) It exist in both quantitative and qualitative.

Steps in Determining the Median


To get the median:

For ungrouped data, just arrange the data in order of magnitudes.

[ ]
th
n+ 1
 If n is odd, × = item in the distribution or simply the middle value
2

[ ]
th
m 1 +m 2
 If n is even , × = where m1 and m2 are the two middle values
2
Example 1:

21 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Seven mothers were selected and given a blood pressure check. Their systolic pressures were
recorded below.
135 121 119 116 130 121 131
Find their median.
Solution:
Arrange the data in order.
116 119 121 121 130 131 135

Select the middle value.


116 119 121 121 130 131 135

Therefore, the median will be 121.

Example 2:
Find the median of the following weights in kilos.
101, 107, 115, 120, 111, 105
Solution:
Arrange the given data in ascending order (smallest to largest).
101, 105, 107, 111, 115, 120
Since, we do not have the exact value of our median, we have to get the sum of the 2 middle values
and divide by 2.
Let m1=107
m=111
m1+ m2
x=
2
107+111
¿
2
218
¿ =109
2

Therefore, our median for the data will be 109.


Mode
The mean and the median are good representative of the center of the distribution. But if you
want quick approximation of the center, you cannot rely on the previous two measurements. You can
use the third measure of the central tendency, which is the mode. It is the value that occurs most often
in the data set. A data can have more than one or none at all.
For ungrouped data, no computation is needed.

Properties of Mode

a) It is used when you want to find the value which occurs most often.
b) It is quick approximation of the average.
c) It is an inspection average.
d) It is the most unreliable among the three measures of central tendency because its value is
undefined in some observations.

Example:
22 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

1) The following are the descriptive evaluation of 5 teachers VS, S, VS, VS and S. Mode: VS
2) The ages of five students are: 17, 18, 23, 20 and 19. Mode:None
3) The grades of five students are: 4.0, 3.5, 4.0, 3.5 and 1.0. Mode:3.5 and 4.0
4) The weights of five persons in pounds are: 117, 218, 233, 120 and 117. Mode:117

How to compute for the measures of central tendency of an ungrouped data


using Microsoft Excel?

Example:
Below are the post test scores of the 45 pupil respondents after their exposure to an
Individually Guided Instruction.

Find the mean, median and the mode of the given data.

1. Enter the following labels


A1: Scores
2. Modify column widths for column A. Instead of selecting the best fit option, indicate precisely
the column width desired. Follow the steps below.
Step 1: Open the FORMAT menu.
Step 2: Select the COLUMN option.
Step 3: Select the “AutoFit column width.
3. Bold the text in cell A1. Select this cell and click on the BOLD button.
4. Enter the given data in the cells under column A.
5. Center the data in column A. Highlight Column A and click on the CENTER button.
6. Enter the following labels.
C2: n=
C3: Sum =
C4: Mean =
C5: Median =
C6: Mode =
7. To arrange the data values from ascending or descending order, highlight the cell ranges A2 to
A46. Click “Sort & Filter” button, select “sort smallest to largest”.
8. Enter the following formulas. Then press the ENTER tab.
D2: =counta (A2:A46)
D3: =sum(A2:A46)

23 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

D4: =D3/D2
D5: =median(A2:A46)
D6: =mode(A2:A46)

9. Your worksheet should look like the one below.

24 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

25 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Measures of Central Tendency for the Grouped Data


Mean:
The formula for finding the mean for grouped data:

Population mean( μ): μ=


∑ f Xm , Where:
X m – class mark
N
f – frequency
n – number of sample
N – number of population

Sample mean(x ): x =
∑ f Xm
n
Example:
Below is frequency distribution of the scores of 40 students. The steps is to get the class mark,
then get the product of class mark and frequency ( f X m).

Class interval f Xm f Xm
16-23 1 19.5 19.5
24-31 3 27.5 82.5
32-39 6 35.5 213.0
40-47 12 43.5 522.0
48-55 10 51.5 515.0
56-63 8 59.5 476.0
N=40 ∑ fx=1828

Substituting to the formula, you have:

μ=
∑ f X m = 1828 =45.7
N 40
Therefore, the mean is 45.7.

26 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Mean using Microsoft Excel


Mean

Example 1:

Below is the distribution of the Mathematics scores of 54 students in a high school


senior class.

Determine the mean, median and mode of the given data.

27 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Step 1: Enter the given data in the cells.

Step 2: Determine the number of measurement/ data (n), by doing the following:
a. Use sum formula to add all the frequencies (f)
b. Then press enter.

28 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Step 3: Determine the upper and lower limits

Step C: Determine the midpoint (Xm) in each interval.


The midpoint is the sum of the upper and lower limits divide by 2.

29 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

30 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

31 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

32 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Median:
For grouped data, add column for the less than cumulative frequency (<cf). Use the following formula:

[ ]
n
−¿ cf L – is the lower class boundary of the median class
Median = 2 i;
L+ <cf – is the less than cf above the median class
fm
I – is the class width
where
fm – is the frequency of the median class

Example: Given a grouped frequency distribution of the age of patients in the ICU if XYZ Hospital.
Determine the median age of the patients.
The Age Distribution of Patients in the ICU of XYZ Hospital
Class interval f Xm ¿ cf
16-23 1 19.5 1
24-31 3 27.5 4
32-39 6 35.5 10
40-47 12 43.5 22 ←Median
Class
48-55 10 51.5 32
56-63 8 59.5 40
N=40

Solution:
n
Step 1: Determine ;
2
n
Step 2: Look for in the column of the <cf.
2
Referring to the preceding table, you have the following pieces of information:
n
a) Median class is (40-47) because = 20, which is in less than 22 in the column <cf
2
b) L is the lower class boundary of the median class, which is 39.5.
c) <cf is the less than cumulative frequency above that of the median class, which is 10.
d) i is the class width which is 8
e) fm is the frequency of the median class which is 12

Step 3: Substitute the corresponding values to the formula, then simplify.

[ ] [ ]
n 40
−¿ cf −10
Median = 2 i = 39.5 + 2 = 46.17
L+ 8
fm 12

Thus, the median age of the patients is 46.17≈ 46

33 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Median using Microsoft Excel


Median:

Example 1:
Step 1: Determine the number of measurement/data (n).
Use sum formula to add all the frequencies (f)

Step 2: Add a column for less than cumulative frequency (<cf).

34 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Step 3: Divide n by 2. So, n / 2 = 27


Step 4: Look for n / 2 in the column of less than Cumulative Frequency <cf.

Step 5: Add a column for lower class boundaries (L)


The lower class boundary in each class interval is lower limit minus 0.5.

35 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Step 6: Determine the class width (i).


The class width is the number of scores in each interval
The class width in this example is 5.

Step 7: Use the formula to solve for the median.

Where:
fm = 9
<cf = 26
i=5
L = 49.5
n = 54
Solution:

[ ]
54
−26
Substitution: = 2
49.5+ 5
9

¿ 49.5+
[ 27−26
9 ]5

36 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

¿ 49.5+
[]
1
9
5

¿ 49.5+ [ 0.11 ] 5
¿ 49.5+ 0.55
Median ¿ 50.056

Mode:
Mo=Lmo+
[ ∆1
∆ 1+∆ 2 ]
i,
∆ 1 - difference between the highest frequency and the
frequency above it
∆ 2−¿ difference between the highest frequency and the
where:
frequency below it
Lmo – lower class boundary of the nodal class
I – class width

Example: Given a grouped frequency distribution of the age of patients in the ICU if XYZ Hospital.
Determine the modal age of the patients.

The Age Distribution of Patients in the ICU of XYZ Hospital


Class interval f Xm ¿ cf
16-23 1 19.5 1
24-31 3 27.5 4
32-39 6 35.5 10
40-47 12 43.5 22 ←Modal Class
48-55 10 51.5 32
56-63 8 59.5 40
N=40

37 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Given:
a) Modal class is (40-70) because it is the class with the highest frequency.
b) Lmo is the lower class boundary of the modal class which is 39.5
c) The class width i is 8
d) ∆ 1= 12-6 = 6 and ∆ 2= 12 -10 = 2

Solution:
Mo = Lmo + [ ∆1
∆ 1+∆ 2 ] [ ]
i = 39.5 +
6
6+ 2
8 = 45.5

Thus, the modal age of the patients is 45.5

Mode using Microsoft Excel


Mode

Step 1: Determine the modal class.


The modal class is the class interval with the highest frequency.

38 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

For the sample data the modal class is class interval 45 – 49 (with the highest
frequency of 11)

Step 2: Use the formula to determine the mode.

Note∆ 1=d 1∧∆2 ¿ d2


For this example:
Lmo=44.5

39 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

∆ 1=3
∆ 2=2
i=5

Substitution:

¿ 44.5+
[ ]
3
3+ 2
5

¿ 44.5+
[]
3
5
5

¿ 44.5+ [ 0.6 ] 5
¿ 44.5+3
Mode = 47.5

Learning Activity:

Activity 19:
Mean of Ungrouped and Weighted Data
1. Find the mean (in two-decimal places) of the following set of data and answer the questions
which follows:
Table I
Number of Typing Errors Committed by Secretary A in the 24 Chapters of Book X
12 26 42 38 35 37
42 30 59 23 57 40 Mean
:
46 42 18 40 21 57
28 58 42 64 55 43

Table II
Number of Typing Errors Committed by Secretary B in the 24 Chapters of Book X
43 27 43 40 42 22
13 20 33 54 41 28 Mean

40 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

:
22 53 28 23 22 28
32 22 22 64 57 57

a) Who shows poor typing skills?


Why?
b) Who has better typing skills?
Why?
2. In one book sale, the following were reported:
10, 000 books were sold at P10.00 Find:
8, 000 books were sold at P15.00 Mean price( x ):
5, 000 books were sold at P18.00
4, 000 books were sold at P20.00
2, 000 books were sold at P8.00

Group Activity No.4:


Mean, Median and Mode of Grouped Data
A. Find the mean of the data given below (use your calculator).Remember that you need
additional column for the midpoint, lower and higher limits, and the product of frequency and
class marks (midpoints). Show your complete solution in computing the mean. (50 points)

The Frequency Distribution for the Scores of 50 Students in a 45-item Test


Class frequency Midpoint LIMITS f ∙ xm
interval ( x ¿¿ m)¿
Lower Higher
12-14 2
15-17 4
18-20 5
21-23 6
24-26 8
27-29 10
30-32 7
33-35 4
36-38 3
39-41 1
N=
Activity 20:
Among your classmates, ask 5 of them the question, “How many friends do you currently have
on Facebook, Instagram and Twitter?” Use the table below to organize the data:
Facebook Instagram Twitter
Classmate 1 1300 200 300
Classmate 2 500 150 40
Classmate 3 80 50 47
Classmate 4 450 187 37
Classmate 5 145 168 74
Determine the mean, median and mode for each Social Network Sites.

41 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Lesson V: Measures of Variation

Learning Focus:
Measures of Variation
The previous section focused on averages or measures of central tendency. The averages are
supposed to be central scores of a given set of data. However, not all features of a given data set may
be reflected by the averages. For example, two different groups of 5 students are given identical
quizzes in Math. The following data below represents their scores.
Group 1 Group 2
14 5
13 19
18 18
14 14
11 14

42 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

The averages of each group are as follows.


Group 1 Group 2
Mean 14 14
Median 14 14
Mode 14 14
Midrange 14.5 12
These two sets of averages have no difference. But intuitively, both groups show an obvious
difference. Group 2 has a more widely scattered data than Group 1. This characteristic called variability
is nor reflected by averages. The three basic measures of variation are range, variance, and standard
deviation.

Range
The range is the simplest measure of variation to calculate. It is just the difference between the
largest and the smallest value in a given data set. For group 1, the range is 18-11 = 7. The range for
group 2 is 19 – 5 = 14. A much larger range suggests greater variation or dispersion.
The range has a disadvantage of being influenced by extreme values called outliers. Another is
that it is based on two values only. All the other values in the set are being ignored.

Standard Deviation and Variance


The standard deviation is the most commonly used measure of variation. The standard
deviation indicates how closely the values of a given data set are clustered around the mean. A lower
value of the standard deviation means that the values of that given data set are spread over a smaller
range around the mean. On the other hand, a large value of the standard deviation means that values
of the data set are spread over a larger range around the mean.

The standard deviation is the positive square root of the variance. The variance calculated from
a population data is denoted byσ 2 (sigma squared) and the standard deviation by σ . The basic
formulas are:

σ 2=
N
N = number of population

∑ (x−μ)2 ∧σ = ∑ ( x−μ)2
N
μ= population mean

Example :
The final exam scores of 5 students were 80, 88, 92, 90 and 85. Determine the variance and
standard deviation.

Solution:
Find the mean ( μ ¿

μ=
∑ x = 88+80+92+ 90+85 = 435 =87
N 5 5
Subtract the mean from each individual score ( x−μ ) .
Score ( x−μ )
88 1
80 -7

43 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

92 5
90 3
85 -2
2
Square each of the difference( x−μ ) .
Score ( x−μ ) ( x−μ )2
88 1 1
80 -7 49
92 5 25
90 3 9
85 -2 4
Get the sum of ( x−μ )2 .
∑ ( x−μ )2=1+49+25+ 9+4=88

Divide the sum by N = 5.

σ 2=
∑ (x−μ)2 = 88 =17.6 → variance
N 5

σ=
√ ∑ ( x−μ)2 =
N √ 88
5
= √17.6=4.2 → standard deviation

Variance and Standard Deviation for the Sample or Unbiased Estimator


When computing the variance for a sample, one might use the formula ∑ (x−x )
2
where x is
n
the sample mean and n is the sample size. This formula produces what is called biased estimate of the
population variance. This estimate is different from the expected value of a population parameter.
When the population is very large and the sample is small, the computed variance would underestimate
the population variance. Instead, divide by n – 1 to yield a slightly larger value and an unbiased
estimate of the population variance.
The unbiased estimator of the population variance is a statistic whose value approximates the
expected value of a population variance.
2
s=
∑ (x−x)2
n−1
Where x−sample mean ; n−sample ¿ ¿

The following formulas are shortcut formulas for computing the variance and standard
deviation. These are mathematically equivalent to the preceding formulas. They save time when

44 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

repeated subtracting and squaring occur in the original formulas. These shortcut formulas will be used
mostly in this book.

Shortcut Formulas
s =n ∑ x −¿¿ ¿ ¿ ¿
2 2

Example : The weights of nine basketball players are recorded as follows (in pounds).
206 215 305 297 265 282 301 255 261
Solution

Step 1: Find the sum of the values.


∑ x =206+215+305+297+ 265+282+301+ 255+261=2387
Step 2: Square each value and find the sum.

∑ x 2=206 2+ 2152+ 3052+ 2972 +2652 +2822+ 3012+255 2+2612 =643391


Step 3: Substitute in the formula.

s2=n ∑ x 2−¿¿ ¿ ¿ ¿
5790 519−5697 769 92750
¿ = =1288.19
72 72
s= √ s =√ 1288.19=35.89
2

Variance and Standard Deviation for Grouped Data


The procedure is similar to that of finding the mean for grouped data, and it uses the midpoints
of each class.
s =n ∑ f . x −¿ ¿¿ ¿ ¿
2 2

Example 1A :
For 108 randomly selected high school students, the following IQ frequency distribution were
obtained.
Class Limits Frequency
90-98 6
99-107 22
108-116 43
117-125 28
126-134 9
Find the variance and standard deviation.
Solution:

45 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Step 1: Make a table. Find the midpoints of each class. Multiply the midpoints by the frequency for each
class.
Class Limits Frequency xm f . xm
90-98 6 94 564
99-107 22 103 2266
108-116 43 112 4816
117-125 28 121 3388
126-134 9 130 1170

Step 2: Multiply the frequency by the square of the midpoint for each class.
Class Limits Frequency xm f . xm xm
2
f . xm
2

90-98 6 94 564 8 836 53 016


99-107 22 103 2266 10 609 233 398
108-116 43 112 4816 12 544 539 392
117-125 28 121 3388 14 641 409 948
126-134 9 130 1170 16 900 152 100

Step 3: Find the sum of columns 2, 4 and 6. Substitute in the formula s2.

Class Limits Frequency xm f . xm xm


2
f . xm
2

90-98 6 94 564 8 836 53 016


99-107 22 103 2266 10 609 233 398
108-116 43 112 4816 12 544 539 392
117-125 28 121 3388 14 641 409 948
126-134 9 130 1170 16 900 152 100

Sum n = 108 ∑ f . x m =12204 ∑ f . x m2=1387 854


s =n ∑ f . x −¿ ¿¿ ¿ ¿
2 2

s= √ s2 =√ 82.25=9.07

46 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Activity No. 21:


Find the variance and standard deviation of the following frequency distribution with your
complete solutions.

Class Boundaries Frequency xm f . xm x m2 f . x m2


16.5 – 18.5 5
18.5 – 20.5 40
20.5 – 22.5 70
22.5 - 24.5 47
24.5 – 26.5 6
26.5 – 28.5 2
28.5 – 30.5 2
n= ∑ f ∙ x m=¿ ¿ ∑ f ∙ x m2=¿ ¿

Lesson VI: Measures of Relative Position


Measures of Position
There are times when we want to know the position of a value relative to the other observations
in a data set. For instance, you took a 100 – item test. You might want to know how your score of 88
compares to the scores of the others.

Standard Scores or Z – Scores


A z – score measures the distance between an observation and the mean, measured in units
of standard deviation. Suppose that a student got a grade of 78 in her Math test and 55 in her Science
test. The scores cannot be compared directly since the exams may not be equivalent in terms of
number of questions, value of each question and so on. But the relative position of the scores can be
made using the z – scores.
The standard score is obtained by subtracting the mean from the value/ observation and
dividing the result by the standard deviation. The formula is
value−mean x− x
z= =
standard deviation s
If the score is positive, the score is above the mean. If the z score is 0, the score is the same
as the mean. If the z – score is negative, the score is below the mean.

47 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Example 1:
An IQ test has a mean of 105 and a standard deviation of 20. Find the corresponding z score
for each IQ.
a. 88 b. 122 c. 110
Solution:
x−x 88−105
a. z= = =−0.85
s 20
x−x 122−105
b. z= = =0.85
s 20
x−x 110−105
c. z= = =0.25
s 20
Example 2:
Which of the following exam grades has a better relative position?
A grade of 43 on an Algebra test with a mean of 40 and s = 3
Or
A grade of 75 on a Geometry test with a mean of 72 and s = 5?

Solution:
x−x 43−40
For a grade of 43: z= = =1
s 3
x−x 75−72
For a grade of 75: z= = =0.6
s 5
Since the z score for the Algebra test is larger, the position in the Algebra test is higher than the
position in the Geometry test.

Quartiles, Percentiles and Deciles


A quartile is a measure of relative standing. Let x 1 , x 2 , . . . , x n be a set of n measurements
arranged in order of magnitude. The first quartile, Q 1 , is the value of x that is less than the remaining
three-fourths. The second quartile,Q2, is the median. The third quartile, Q3, is the value of x that
exceeds three-fourths of the measurement and is less than one-fourth.

When the arrangements are arranged in order of magnitude, that is increasing or decreasing;
Q1=0.25 (n+1) Q2=0.50 (n+1) Q3=0.75(n+1)
Example:
Find Q1 ,Q 2 ,∧Q3 of the following set of data.
19, 12, 16, 0, 14, 9, 6, 1, 12 13, 10, 19, 7, 5, 8
Solution: Arrange the data from lowest to highest.
0, 1, 5, 6, 7, 8, 9, 10, 12, 12, 13, 14, 16, 19, 19
Using the formula.
Q1=0.25 ( n+1 )=0.25 ( 15+1 )=0.25 ( 16 ) =4 → 4 thdata:6
Q2=0.50 ( n+ 1 )=0.50 ( 15+1 )=0.50 ( 16 )=8 →8 th data: 10

48 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Q3=0.75 ( n+ 1 )=0.75 ( 15+1 )=0.75 ( 16 )=12→ 12th data :14

Percentiles:
Percentiles are position measures used in educational and health- related fields to indicate the
position of an individual in a group. It is symbolized by P1 , P2 , P 3 , . . ., P99 and divide the distribution into
100 groups.
The percentile corresponding to a given value x is computed by using the formula:
( number of values below x ) +0.5
percentile= x 100 %
total number of values
Example 1: Find the percentile rank of a test score of 49 in the data set.
12, 28, 35, 42, 47, 49, 50
Solution: Arrange the data in order from lowest to highest. Then substitute in the formula.

Let x = 49
( number of values below x ) +0.5
percentile= x 100 %
total number of values
( number of values below 49 ) +0.5
percentile= x 100 %
total number of values
5+ 0.5
percentile= x 100 %
7
5.5
percentile= x 100 %=78.57 %
7

The next examples illustrates how to find a value corresponding to a given percentile.
Example 2: The following scores in a Statistics test:
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
Find the value corresponding to the 25 th percentile.
np
Solution: Arrange the data set from lowest to highest. Compute c= , where n is the total number of
100
values and p is the percentile.
np (10)(25)
c= = =2.5
100 100

Since c is not a whole number, round it up to the next whole number; in this case, c= 3.
Therefore, the 25th percentile is the 3rd value which is 5.

Deciles:
Deciles divide the distribution into tenths or 10 equal parts. A data set has nine deciles which is
denoted by D1 , D2 , D3 ,. . . , D9 . Basically, the first decile, D1, is the number that divides the bottom

49 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

10% of the data from the top 90%. To obtain the deciles, divide the data set into tenths and then
determine the number dividing the tenths.
Note that the second quartile, fifth decile, and fiftieth percentile of a data set are all the same
and all equal to the median.
Median = Q2=D5 =P50.
Similarly,Q1=P25 , D 1=P10∧Q3=P75.

Example 1:
Find the value corresponding to the 60 th percentile for the given data set.
80 68 53 58 76 73 85 88 91 79
Solution: Arrange the data from lowest to highest.
53 58 68 73 76 79 80 85 88 91
Using the formula:
np (10)(60)
c= = =6
100 100
Since the value of c is a whole number, use the value halfway between 6 and 6 + 1 values
when counting from the lowest value – the 6 th and 7th values.
53 58 68 73 76 79 80 85 88 91

(79+80)
The value halfway between 79 and 80 is =79.5. Hence, 79.5 corresponds to the 60 th
2
percentile.

Grouped Data:
For grouped data, the quartiles, deciles or percentiles can be determined using the following
formula.
L+ (
kn−cf
f )
(w)

Where k is equal to
i i i
for quartiles ; for deciles ; for percentiles ;
4 10 100

i – ith quartile, decile or percentile


L – lower boundary of the quartile, decile or percentile class
n – total number of observations
w – class width
cf p – frequency of the preceding class
f – frequency of the quartile, decile or percentile class

50 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

3
For instance, if we are looking for the 3 rd quartile, Q3, then i = 3. Thus, k = . Or if we are
4
70
interested with the 70th percentile, P70, then i=70. Thus, k = .
100

Example: Find the third quartile, 4th decile and 70th percentile for the given frequency distribution below.

Class Boundaries Frequency cf


52.5 – 63.5 6 6
63.5-74.5 12 18
74.5-85.5 25 43
85.5-96.5 28 71
96.5-107.5 14 85
107.5-118.5 5 90

Solution:

51 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Activity No.22:

The following data give the hours worked last week by 30 employees of a company.

42 45 40 38 35 47 40 27 39 43

48 53 23 51 42 48 40 36 51 40

40 34 21 40 31 34 16 39 41 36

a. c

Lesson VII: Probability and Normal Distribution

Learning Objectives:
At the end of the lesson, the students should be able to:

52 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

a. Identify the properties of a normal distribution;


b. Determine normal and non-normal distributions;
c. Find the areas under the normal curve; and
d. Apply the properties of the normal distribution to real-world problems.

Learning Focus:
Most of the datasets out there have a central value. They are either narrowly or widely spread
out. Drawing bell-shaped curve on a histogram gives the normal distribution or the Gaussian
distribution, named after its founder, Karl Friedrich Gauss.

Probability is the branch of mathematics that studies the possible outcomes of given events
together with the outcomes' relative likelihoods and distributions. In common usage, the word
"probability" is used to mean the chance that a particular event (or set of events) will occur expressed
on a linear scale from 0 (impossibility) to 1 (certainty), also expressed as a percentage between 0 and
100%. The analysis of events governed by probability is called statistics.
There are several competing interpretations of the actual "meaning" of probabilities.
Frequentists view probability simply as a measure of the frequency of outcomes (the more conventional
interpretation), while Bayesians treat probability more subjectively as a statistical procedure that
endeavors to estimate parameters of an underlying distribution based on the observed distribution.
A properly normalized function that assigns a probability "density" to each possible outcome
within some interval is called a probability density function (or probability distribution function), and its
cumulative value (integral for a continuous distribution or sum for a discrete distribution) is called
a distribution function (or cumulative distribution function).
Probability is simply how likely something is to happen. Whenever we’re unsure about the
outcome of an event, we can talk about the probabilities of certain outcomes—how likely they are. The
analysis of events governed by probability is called statistics.
(¿ of wayscan happen)
Probability of an event=
(total number of outcomes)
( ¿ of wayscan happen)
P ( a )=
(total number of outcomes)

Examples:
1. The best example for understanding probability is flipping a coin:
a. What is the probability of flipping a HEAD?
Solution:
There are two possible outcomes—heads or tails or 2 outcomes.
(¿ of wayscan happen)
P ( H )=
(total number of outcomes )
1
P ( H )= =0.5
2

2. Rolling a Die

53 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Different outcomes rolling a die. Therefore, the total possible outcomes will be six (6).

a. What is the probability of rolling a One?


(¿ of wayscan happen)
P ( 1 )=
(total number of outcomes)
1
P ( 1 )= =0.166
6
b. What is the probability of rolling a One or Six?
(¿ of wayscan happen)
P ( 1∨6 ) =
(total number of outcomes)
2 1
P ( 1∨6 ) = = =0.33
6 3
c. What is the probability of rolling an even number? ( i. e. rolling a Two, or Four or Six, so
there must be 3 possible outcomes)
(¿ of wayscan happen)
P ( even numbers )=
(total number of outcomes)
3 1
P ( even numbers )= = =0.5
6 2

Normal Distribution is a continuous probability distribution. This means that it generally uses either
interval or ratio data. The histogram is a great approximation of a normal distribution. Drawing a bell-
shaped curve on the histogram determines whether the distribution is normal or not. A bell-shaped
curve symbolizes that there is one central peak. The rest of the data are on either side of the center
tapering off on the extremes.
The normal distribution is the most important probability distribution in statistics because it fits
many natural phenomena. For example, heights, blood pressure, measurement error, and IQ scores
follow the normal distribution. It is also known as the Gaussian distribution and the bell curve.
The normal distribution is a probability function that describes how the values of a variable are
distributed. It is a symmetric distribution where most of the observations cluster around the central peak
and the probabilities for values further away from the mean taper off equally in both directions. Extreme
values in both tails of the distribution are similarly unlikely.

Standard Normal Distribution. This is a distribution of a normal random variable with mean zero and
standard deviation equal to 1.
Gaussian distribution is another name for a normal distribution.

1. In statistics, the normal distribution is called the normal curve.


2. In the social sciences, it’s called the bell curve (because of its shape).
3. In physics, it’s called the Gaussian distribution.

Properties of a normal distribution


1. The mean, mode and median are all equal.
2. The curve is symmetric at the center (i.e. around the mean).

54 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

3. Exactly half of the values are to the left of center and exactly half the values are to the right.
4. The total area under the curve is 1.
5. The tails of the normal curve are asymptotic to the horizontal axis.
6. It is determined by the population mean μ and population standard deviation σ. The mean
controls the center and the standard deviation controls the spread of the distribution.

Parameters of the Normal Distribution


As with any probability distribution, the parameters for the normal distribution define its shape
and probabilities entirely. The normal distribution has two parameters, the mean and standard
deviation. The normal distribution does not have just one form. Instead, the shape changes based on
the parameter values, as shown in the graphs below.

Mean
The mean is the central tendency of the distribution. It defines the location of the peak
for normal distributions. Most values cluster around the mean. On a graph, changing the mean
shifts the entire curve left or right on the X-axis.

Standard deviation
The standard deviation is a measure of variability. It defines the width of the normal
distribution. The standard deviation determines how far away from the mean the values tend to
fall. It represents the typical distance between the observations and the average.

On a graph, changing the standard deviation either tightens or spreads out the width of
the distribution along the X-axis. Larger standard deviations produce distributions that are more
spread out.

When you have narrow distributions, the probabilities are higher that values won’t fall
far from the mean. As you increase the spread of the distribution, the likelihood that
observations will be further away from the mean also increases.

55 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Population parameters versus sample estimates


The mean and standard deviation are parameter values that apply to entire populations. For
the normal distribution, statisticians signify the parameters by using the Greek symbol μ (mu) for
the population mean and σ (sigma) for the population standard deviation.
Unfortunately, population parameters are usually unknown because it’s generally impossible to
measure an entire population. However, you can use random samples to calculate estimates of these
parameters. Statisticians represent sample estimates of these parameters using x̅ for the sample mean
and s for the sample standard deviation.

Probability and Normal Curve


According to Harvey Berman (2019), the normal distribution is a continuous probability
distribution. This has several implications for probability. The probability that a normal random variable
X equals any particular value is 0.
The probability that X is greater than A equals the area under the normal curve bounded by A
and plus infinity (as indicated by the non-shaded area in the figure below).

The probability that X is less than A equals the area under the normal curve bounded by A and
minus infinity (as indicated by the shaded area in the figure above).

The Empirical Rule for the Normal Distribution


When you have normally distributed data, the standard deviation becomes particularly
valuable. You can use it to determine the proportion of the values that fall within a specified number of
standard deviations from the mean. For example, in a normal distribution, 68% of the observations fall
within +/- 1 standard deviation from the mean. This property is part of the Empirical Rule, which

56 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

describes the percentage of the data that fall within specific numbers of standard deviations from the
mean for bell-shaped curves.
Additionally, every normal curve (regardless of its mean or standard deviation) conforms to the
following “rule”.

About 68% of the area under the curve falls within 1 standard deviation of the mean.
About 95% of the area under the curve falls within 2 standard deviation of the mean.
About 99.7% of the area under the curve falls within 3 standard deviation of the mean.

Those points are known as the empirical rule or the 68−95−99.7 rule. Clearly, given a normal
distribution, most outcomes will be within 3 standard deviations of the mean.

I t w a s s t a t e d t h

areas of a z – value is the same, whether it is positive or negative. Hence, area of –z is equal to the
area of +z.

Discrete Probability Functions


Discrete probability functions (Frost,2019) are also known as probability mass functions and can
assume a discrete number of values. For example, coin tosses and counts of events are discrete
functions. These are discrete distributions because there are no in-between values. For example, you
can have only heads or tails in a coin toss. Similarly, if you’re counting the number of books that a

57 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

library checks our per hour, you can count 21 or 22 books, but nothing in between.

Continuous Probability Distribution are also known as probability density functions. You know
that you have a continuous distribution if the variable can assume an infinite number of values between
any two values. Continuous variables are often measurements on a scale, such as height, weight, and
temperature.

Standard Normal Distribution and Standard Scores (Z-scores)


As we’ve seen above, the normal distribution has many different shapes depending on the
parameter values. However, the standard normal distribution is a special case of the normal distribution
where the mean is zero and the standard deviation is 1. This distribution is also known as the Z-
distribution.
A value on the standard normal distribution is known as a standard score or a Z-score. A
standard score represents the number of standard deviations above or below the mean that a specific
observation falls. For example, a standard score of 1.5 indicates that the observation is 1.5 standard
deviations above the mean. On the other hand, a negative score represents a value below the average.
The mean has a Z-score of 0.
Suppose you weigh an apple and it weighs 110 grams. There’s no way to tell from the weight
alone how this apple compares to other apples. However, as you’ll see, after you calculate its Z-score,
you know where it falls relative to other apples.

Standardization: How to Calculate Z-scores

Standard scores are a great way to understand where a specific observation falls relative to the
entire distribution. They also allow you to take observations drawn from normally distributed populations
that have different means and standard deviations and place them on a standard scale. This standard
scale enables you to compare observations that would otherwise be difficult.
This process is called standardization, and it allows you to compare observations and
calculate probabilities across different populations. In other words, it permits you to compare apples to
oranges. Isn’t statistics great!

To standardize your data, you need to convert the raw measurements into Z-scores.

To calculate the standard score for an observation, take the raw measurement, subtract the
mean, and divide by the standard deviation. Mathematically, the formula for that process is the
following:

X represents the raw value of the measurement of interest. Mu and sigma represent the parameters for
the population from which the observation was drawn.
x−μ
z=
σ
σ − population standard deviation
μ− population mean
x−raw value
z−z score
After you standardize your data, you can place them within the standard normal distribution. In
this manner, standardization allows you to compare different types of observations based on where
each observation falls within its own distribution.

58 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

How can we use the standard normal distribution in solving problems on probability?

Suppose we literally want to compare apples to oranges. Specifically, let’s compare their
weights. Imagine that we have an apple that weighs 110 grams and an orange that weighs 100 grams.
If we compare the raw values, it’s easy to see that the apple weighs more than the orange. However,
let’s compare their standard scores. To do this, we’ll need to know the properties of the weight
distributions for apples and oranges. Assume that the weights of apples and oranges follow a normal
distribution with the following parameter values:
Apples Oranges

Mean weight grams 100 140

Standard Deviation 15 25

Now we’ll calculate the Z-scores:

o Apple = (110-100) / 15 = 0.667


o Orange = (100-140) / 25 = -1.6
The Z-score for the apple (0.667) is positive, which means that our apple weighs more than the
average apple. It’s not an extreme value by any means, but it is above average for apples. On the other
hand, the orange has fairly negative Z-score (-1.6). It’s pretty far below the mean weight for oranges.
I’ve placed these Z-values in the standard normal distribution below.
While our apple weighs more than our orange, we are comparing a somewhat heavier than
average apple to a downright puny orange! Using Z-scores, we’ve learned how each fruit fits within its
own distribution and how they compare to each other.

Examples:
1. Given a normal distribution with mean = 50 and sd = 10, find the probability that X assumes
a value between 45 and 62.
Solution:
Transform the values, x 1=45∧x 2=62 to z values.
x −μ 45−50 −5
z 1= 1 = = =−0.5
sd 10 10

x1−μ 62−50 12
z 1= = = =1.2
sd 10 10

2. Given a normal distribution with mean = 300 and sd = 50, find the probability that X assumes
a value greater than 362.
Solution:

59 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

x−μ
z=
sd
362−300
z= =1.24
50

3. Zig merchandise sells Christmas light bulbs that have a length of life that is normally
distributed of 40 hours. Find the probability that a bulb burns between 778 and 834 hours.
Solution:
x−μ x−μ
z= z=
sd sd
778−800 778−800
z= =−0.55 z= =−0.55
40 40

Finding Areas under the Curve of a Normal Distribution

The normal distribution is a probability distribution. As with any probability distribution, the
proportion of the area that falls under the curve between two points on a probability distribution plot
indicates the probability that a value will fall within that interval.

Standard Normal Curve Table


The Standard Normal Curve table is used to calculate the precise percentage of scores between
the mean (Z-score of 0) and any other Z-score. It can be used to determine:
 The proportion of scores above or below a particular Z-score
 The proportion of scores between the mean and a particular A-score
 The proportion of scores between two Z-scores.

The standard normal curve table is used:


 By converting raw score to Z-scores then finding the probabilities, and
 To determine a Z score for a particular proportion of scores under the normal curve.

Calculating percentage of scores above or below a Z-score

Summary of Steps for determining percentage above or below Z-score:


 Draw a normal curve: indicate where the Z-score falls and shade the area you are trying to find.
 Use the table to find percentage to the left of the Z-score. If we need to find a percentage to the
right of the Z-score, or between two Z-scores, we will have to do some extra calculations.

There are different versions of the standard normal curve table. In this version, the Z column
contains values of the standard normal distribution; the second column contains the area below Z.
since the distribution has a mean of 0 and a standard deviation of 1, the Z column is equal to the
number of standard deviations below (or above) the mean.
For example:
A Z-score of 2.5 represents a value of 2.5 standard deviations above the mean. The area to the
left of a Z value of 2.5 is 0.9938

60 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

How to use a standard normal curve table?


Once the scores of a distribution have been converted into standard or Z-scores, a normal
distribution table can be used to calculate percentages and probabilities. Since the normal distribution
is a continuous distribution, the probability that x is greater than or less than a particular value can be
found.
A normal curve table gives the precise percentage of scores between the mean (Z-score = 0)
and any other Z score. The normal curve table can be used to:
 Calculate the proportion of scores above or below a particular Z score
 Calculate the proportion of scores between the mean and a particular Z-score
 Calculate the proportion of scores between two Z-scores

Example:
The table gives the proportion to the left of a chosen Z-value of up to 2 decimal places. To read
the table, find the Z score in the left column Z. If your score contains 2 decimal places, use the columns
to the right. For example, if you are looking for a Z score of 0.75, you will look at the intersection of 0.7
(Z column) and the column 0.05 (0.7+0.05=0.75).

To
obtain the probabilities, simply multiply the percentage by 100. Example: 0.7734 would be expressed
as 77.34%.

Examples:
Finding the percentage of values to the left of a Z score.
1. In a standard normal distribution, what percentage of values will be less than 1.28?
a. Draw a diagram: you are looking for the percentage of the graph to the left of 1.28.
b. Use the standard normal table to find the value to the left of 1.28.
c. The value is 0.89973, which means that the percentage of values less than 1.28 is 89.97%.

61 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

2. Finding the

percentage of values to the right of a Z-value.


In standard normal distribution, what percentage of values will be above 1.28?
a. Draw a diagram in this example, you are looking for the percentage of values to the right of
1.28.
b. As the table only gives us value to the left of a Z score, we will use the percentage of
values to the left of 1.28 that we found in the previous example:
 We know that 89.97% of values are below 1.28.
 To calculate the percentage of values above 1.28: 100 %−89.97 %=10.03 %

62 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

3. Finding the percentage of values between the mean and a particular Z-score.
What percentage of values are between 0 and 1.28?
a. First draw a diagram in this case, you are looking for values between the mean (0) and 1.28.
b. Since we can’t find areas between two values in the standard normal table, we will use the
information we know about the values that are to the left of 1.28:
 89.97% of values are below 1.28.
 The curve is symmetrical, which means that 50% of values lie above the mean and
50% of values lie below the mean.
 89.97%-50%=39.97%

4. Finding the percentage of values between two Z-scores.


What percentage of values will lie between -1.28 and 1.28?
a. Draw a diagram: in this example, you are looking for the percentage between a negative
and a positive score.
b. The curve is symmetrical. This means that the area between 0 and 1.28 is the same as the
area between 0 and -1.28.
c. The percentage of values between 0 and 1.28 is 39.97% (found in example 3). Thus, we
will have to multiply 39.97% x 2 = 79.98%.

63 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

5. Finding Z-scores and raw scores form percentage using the normal curve table. The table can
also be used to find the Z-scores and raw scores from specific percentages.
To find the Z-score from the percentage 90%, we look for the most approximate percentage in the
table: 0.8997. Working backwards we see that this figure corresponds to a Z-score of 1.28.
This Z-score can then be converted to a raw score using the mean and the standard deviation of
the distribution.

The notation P ( a< z <b ) , P ( z < a )∧P(z >a) will be used and their meanings are as follows:
 P ( a< z <b ) is read as “the probability or area of z between a and b”.
 P(z <a) is read as “the probability or are of z less than a or to the left of a”.
 P(z >a) is read as “the probability or area of z greater than a or to the right of z”.

Note that the symbols ≤∧≥ have the same meanings as < and >. To find the areas, the Table of areas
under the Normal Curve will be used.

Table of Areas under the Normal Curve

64 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Using the table, the area of z = -0.46 is 0.1772 and the area of z = 0.52 is 0.1985.

Learning Activity:

65 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

A. Consider the following problems:


1. The time for a major exam to be completed is normally distributed with an average of 55 minutes
and a standard deviation of 9 minutes. If 92% of the students competed the exam, when should
the test be terminated?
2. The average travel time from your residence to your school is 35 minutes with a standard
deviation of 10 minutes. If you want to be 99% certain that you will not be late for your first 8:00
am class, what is the latest time you should leave home? Assume that travel time is normally
distributed.

B. Find the areas of each of the following z-scores:

1. 0.99
2. -0.52
3. 0.66
4. 1.87
5. -2.58
6. 3.16
7. -0.12

C.Calculate the probabilities of the following:

1. P ( 0< z <1.44 )
2. P (−2.81< z< 0 )
3. P ( z←0.73 )
4. P ( z>2.92 )
5. P (−3.10< z <1.90 )
6. P ( 1.13< z< 1.39 )

D. Solve for the following:


1. Assume that the time a student stays in school is normally distributed with a mean of 5 hours
and a standard deviation of 0.5 hours. Every day, Ian stays in school for 5.5 hours. What
proportion of students stats less than 5.5 hours?
2. Assume that family incomes is normally distributed with mean Php30,000 and standard deviation
of Php10,000. If the poverty level is Php10,000, find the percentage of the population that lies in
poverty.
3. In 2018, the breaking distance of Toyota Camry cars on a wet surface follows a normal
distribution. Its mean is 122 feet with a standard deviation of 20 feet. What is the probability that
a randomly selected Toyota Camry will have a braking distance of more than 130 feet?

66 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Lesson VIII: Linear Regression and Correlation


Correlation (Diane Keirnan,2014) refers to the statistical association between two variables. A
correlation exists between two variables when one of them is related to the other in some way. A
scatterplot is the best place to start. A scatterplot (or scatter diagram) is a graph of the paired (x,y)
sample data with a horizontal x-axis and a vertical y-axis. Each individual (x,y) pair is plotted as a single
point.

A scatterplot can identify several different types of relationships between two variables.

A relationship has no correlation when the points on a scatterplot do not show any direction or pattern.

A relationship is non-linear when the points on a scatterplot follow a pattern but not a straight line.

A relationship is linear when the points on a scatterplot follow a somewhat straight line pattern. This is
the relationship that we will examine.

Linear relationships can be either positive or negative. Positive relationships have points that incline
upwards to the right. As x values increase, y values increase. As x values decrease, y values decrease.
For example, when studying plants, height typically increases as diameter increases.

Correlation coefficients are computed and the most widely used measure of correlation is the Pearson
Product Moment Correlation Coefficient or simply Pearson r:
r =n ¿ ¿
Where x=the observed data for the independent variable
y=theobserved data for the dependent variable
n=the sample ¿ ¿
∑ x =the summation of x values
∑ y=the summation of y values
∑ x =the summation of the square of each of x values
2

2
∑ y =the summation of the square of each of y values

Examples:

67 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

A study was conducted to investigate the relationship existing between the grade in Statistics
and the grade in Computer subject. A random sample of 10 computer students in a certain college were
taken and the data are as follows:
Student A B C D E F G H I J
Statistics 75 83 80 77 89 78 92 86 93 84
Computer 78 87 78 76 92 81 89 89 91 84

Is there a relationship between the performance of the students in Statistics and Computer
subjects?
Student x y xy x2 y2
A 75 78 5850 5625 6084
B 83 87 7221 6889 7569
C 80 78 6240 6400 6084
D 77 76 5852 5929 5776
E 89 92 8188 7921 8464
F 78 81 6318 6084 6561
G 92 89 8188 8464 7921
H 86 89 7654 7396 7921
I 93 91 8463 8649 8281
J 84 84 7056 7056 7056
N=10 ∑ x =837 ∑ y=845 ∑ xy =71030 ∑ x 2=70413 ∑ xy =71717
r =n ¿ ¿
10 (71030 )−( 837)( 845)
r=
√ [ 10 ( 70413 )−(837) ][ 10 ( 71717 ) −(845) ]
2 2

3035
r= =0.906906226=0.91
√(3561)(3145)
Therefore: There exists a very positive relationship between the performance of the students in
Statistics and Computer.

A simple linear regression model is a mathematical equation that allows us to predict a response for a
given predictor value. This is used in the process of prediction. Prediction is calculating scores of the
¿
criterion variable ( y ¿ on the basis of the knowledge of the predictor (x). one example is the prediction
of job performance of an applicant using information available during the time of his application.

Linear regression can be computed using the equation,


y ¿ =a+ bxwhich is called the least square line or the simple regression line
Where a=the y−intercept
b=the slope
x=the predictor variable ,∧¿
¿
y =estimate of the mean value of theresponse variable for any valueof the predictor variable .

The y-intercept is the predicted value for the response (y) when x = 0. The slope describes the change
in y for each one unit change in x.
The values of a∧b can be obtained by using the following:
b=n ∑ xy−¿ ¿ ¿
a=M ny −b M nx

68 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Where: M ny =the meanof the y values


M nx =the meanof the x values

Example:
Given the following data on correlation between the grade in Statistics and Computer, what
would be the predicted grade of a student in Computer who has grade of 85 in Statistics and what
regression equation could be used?
Student A B C D E F G H I J
Statistics 75 83 80 77 89 78 92 86 93 84
Computer 78 87 78 76 92 81 89 89 91 84

Solution:

Studen x y xy x
2
y
2

t
A 75 78 5850 5625 6084
B 83 87 7221 6889 7569
C 80 78 6240 6400 6084
D 77 76 5852 5929 5776
E 89 92 8188 7921 8464
F 78 81 6318 6084 6561
G 92 89 8188 8464 7921
H 86 89 7654 7396 7921
I 93 91 8463 8649 8281
J 84 84 7056 7056 7056
n=10 ∑ x =837 ∑ y=845 ∑ xy =71030 ∑ x 2=70413 ∑ 2
y =71717

b=n ∑ xy−¿ ¿ ¿
10 ( 71030 ) −(837)( 845) 3035
b= = =0.85
10 ( 70413 )−(837)
2
3561

M nx =
∑ x = 837 =83.7
n 10
M ny =
∑ y = 845 =84.5
n 10
a=M ny −b M nx
a=84.5−(0.85)(83.7)
a=13.36
¿
The regression equation is y =a+ bx=13.36+ ( 0.85 ) x
If the grade of a student in Statistics (x) is 85, the predicted Computer grade is:
¿
y =13.36+ ( 0.85 ) ( 85 ) =85.61∨86

Learning Activity:

A.Determine the relationship between family monthly income and the grades of the students. Show
your complete solutions.

69 GEC 4
Aldersgate College Mathematics in the Modern World
College of Arts, Sciences and Education

Student A B C D E F G
Family 30,000 21,000 45,000 54,000 86,000 34,000 49,000
Income
Grades 1.25 1.75 3.0 2.75 3.0 2.25 2.5

70 GEC 4

You might also like