Module Stat
Module Stat
1
WEEK 1-2. Random Variables and Probability Distribution
Learning Objectives
1. To illustrate random variables.
2. To classify random variables as discrete or continuous.
3. To find the possible values of a random variable.
4. To differentiate descriptive and inferential Statistics
5. Discuss some application of statistics
Discussion
2
It is also important to know the two significant terminologies in statistics, the parameter and
statistic. Both terms are actually homologous to one another except for the fact that parameter
describes a whole population while a statistics described a sample of a given population.
A Statistics is a concluded data from a proportion of a population. It gives the probability of how a
certain population might behave based on the sample considered. Think of it like this: if you have a bit
of information, it is a statistic. If you are sure about ¼ of people in a given institution or place then what
you know are statistics.
A parameter is constant since the totality of the population is surveyed to find the parameter. This is
one reason why determining the number of sample and choosing the individuals completing the
samples are very crucial since the accuracy of the statistics may be affected by these. For example, if
you want to survey who among the presidential candidates has the greater chance if probability, and
the sample that you used are all from the territory of candidate A, the tendency is that the statistics
that you will get is pointing in favor of candidate A. This result might have a huge discrepancy with
respect to the real standing when the whole population is considered.
Classification of Data
Qualitative versus Quantitative Data
Data can either be classified as qualitative or quantitative. The two differs from one another in
such a way that qualitative data is a measure of “types” and may be represented in terms of
characteristics, names or symbols. Moreover, a quantitative data is a measure of “values” or “counts”
and expressed in numerical values. For qualitative data answer as addition and averaging make
sense. Basically, qualitative data answer the question “what” while quantitative answers the question
“how many”.
3
Example No. 1. Below are the scores of 7 Pharmacy students during their first quiz in Statistics.
Raw Data Array Data
(Arranged in ascending order)
21, 22, 19, 28, 24, 22, 25 19, 21, 22, 22, 24, 25, 28
Example No. 2. The height in centimeter of the varsity players in basketball were summarized below:
Raw Data Array Data
(Arranged in ascending order)
170, 155, 156, 190, 168, 174 155, 156, 168, 170, 174, 190
Classification of Variables
In a study, the individuals or subjects are the people or objects to be studied. The variables, on
the other hand, are the characteristics of the individual to be observed or measured.
Example 1. A researcher wants to conduct a study on the performance of male athletes in the
university in their games. Identify the individual a and the variables.
Individuals or Subjects Variables
All male athletes in the university Winning and losing records in their games
4
According to Continuity of Values
1. Continuous Variables – variables that can be expressed in decimals.
Examples: Price of commodities, grades, height
Levels of Measurements
1. Nominal Scale
- Data that consist of names, labels, or categories only
- The data cannot be arranged in an ordering scheme
- Numbers or symbols are used to classify an object or person to identify the group the belong.
Examples: Gender (male or female)
Nationality (filipino, American, Japanese)
2. Ordinal Scale
- Data contain the properties of nominal level
- The data can be arranged in an ordering scheme or ranked
- The difference between the values of the data cannot be determined. The interval is
meaningless
Examples: Ranks in a contest (1st runner up, 2nd runner up, 3rd runner up, etc.)
Military Ranks (General, Colonel, etc.)
Performance ranks (good, better, best)
3. Interval Scale
- Data contain the properties of ordinal level
- Data values can be ranked
- The difference between the values of the data are of known sizes
- The interval between the values has meaning
- The “zero” does not imply the absence of characteristics
- The ratio of data values are meaningless
4. Ratio Scale
- Data contain the properties of interval level
- The “zero” indicates the absence of the characteristics under consideration
- The ratio of data values has meaning
Examples: Height in meters, weight in kilograms or pounds
5
Name:__________________________________________________Score:____________
Name of Teacher:_________________________________________Date:_____________
Course/Year/Section:______________________________________
Exercise 1.1
Classification of Data
Identify whether the data are qualitative or quantitative.
______________1. Monthly salary or a government employee.
______________2. Course taken NU freshmen students
______________3. Employee status of a worker
______________4. Student number
______________5. Height of a basketball player inches
______________6. Color of a person’s eye
______________7. Scores of Pharmacy students in Statistic quiz
______________8. Size of a family in a certain community
______________9. Price of commodities in peso
______________10. Volume of water in a bottle
Exercise 1.2
Level of Measurements
Determine the level of measurements stated below:
1. Monthly salary or a government employee.
2. Course taken NU freshmen students
3. Employee status of a worker
4. Student number
5. Height of a basketball player inches
6. Color of a person’s eye
7. Scores of Pharmacy students in Statistic quiz
8. Size of a family in a certain community
9. Price of commodities in peso
6
10. Volume of water in a bottle
Discussion:
Sources of Data
Sources of Data can be classified into 2 types. Statistical sources refer to data that are gathered for
some official purposes and incorporate censuses and officially administered surveys. Non-statistical
sources refer to the collection of data for other administrative purposes or for the private sector.
When data are collected from reports and records of the organization itself, it is known as the
internal source.
For example, a company publishes its ‘Annual Report’ on Profit and Loss, Total Sales, Loans,
Wages etc.
2. External Source
When data are collected from outside the organization, it is known as the external source.
For example, if a Tour and Travels Company obtains information on ‘Karnataka Tourism’ from
Karnataka Transport Corporation, it would be known as external sources of data.
Types of Data
A) Primary Data
7
For example the Address of a person taken from the Telephone Directory or Phone number of
a company taken from ‘Just Dial’.
The most commonly used methods are: published literature sources, surveys (email and mail),
interviews (telephone, face-to-face or focus group), observations, documents and records, and
experiments.
1. Literature sources
This involves the collection of data from already published text available in the public domain. Literature
sources can include: textbooks, government or private companies’ reports, newspapers, magazines,
online published papers and articles.
This method of data collection is referred to as secondary data collection. In comparison to primary
data collection, this is inexpensive and not time consuming.
2. Surveys
Survey is another method of gathering information for research purposes. Information are gathered
through questionnaire, mostly based on individual or group experiences regarding a particular
phenomenon.
There are several ways by which this information can be collected. Most notable ways are: web-based
questionnaire and paper-based questionnaire (printed form). The results of this method of data
collection are generally easy to analyze.
3. Interviews
Interview is a qualitative method of data collection whose results are based on intensive engagement
with respondents about a particular study. Usually, interviews are used in order to collect in-depth
responses from the professionals being interviewed.
4. Observations
Controlled observation is when the researcher uses a standardized procedure of observing participants
or the environment. Natural observation is when participants are being observed in their natural
conditions. Participant observation is where the researcher becomes part of the group being studied.
8
5. Documents and records
This is the process of examining existing documents and records of an organization for tracking
changes over a period of time. Records can be tracked by examining call logs, email logs, databases,
minutes of meetings, staff reports, information logs, etc.
For instance, an organization may want to understand why there are lots of negative reviews and
complains from customer about its products or services. In this case, the organization will look into
records of their products or services and recorded interaction of employees with customers.
6. Experiments
Experimental research is a research method where the causal relationship between two variables are
being examined. One of the variables can be manipulated, and the other is measured. These two
variables are classified as dependent and independent variables.
In experimental research, data are mostly collected based on the cause and effect of the two variables
being studied. This type of research are common among medical researchers, and it uses quantitative
research approach.
N
n=
1+ N e2
Where n = sample size
N = population
e = margin of error
N 10,000
Solution: n = 2 = 2
1+ N e 1+(10,000)(0.10 )
= 99 students
9
Sampling Technique
In the collection of data, the number of sample and the nature of sample to be chosen are very
critical for the study to have reliable result. In the part of his chapter, we will discuss the different ways
on how to choose samples.
1. Probability Sampling
This sampling technique is also called the simple random sampling. In this
technique, the samples are randomly picked and therefore the selection of sample is
without +any bias. Each member of the population has an equal chance of being picked
as part of the sample. A good example of this sampling are the lottery and raffle.
N
k th = Population ¿ ¿ Sample ¿ ¿ ¿ ¿ =
n
10,000
th
Solution: k = N ¿ ¿ = 99
= 101.
Thus, every 101st member of the population was picked.
Example: In a certain study, 200 samples are taken from the population of
50,000 individuals. The population is divided into strata based on their schools.
Using stratified sampling we have:
10
Strata Distribution of Percentage from Sample Units Per
Population the Population Stratum
UP 10,000 20% 40
3. Cluster Sampling
4. Non-random Sampling
In this technique, not all the population has equal chance to be selected. The
selection is influenced by the goal of the researcher. There are three forms of
non-random sampling:
11
Name:__________________________________________________Score:____________
Name of Teacher:_________________________________________Date:_____________
Course/Year/Section:______________________________________
Exercise 2.1
Sampling Data
Solve for what is asked in each question.
1. How many sample units must be obtained from 7,000 employees, if 10% margin of
error is used?
2. Find the precision of a certain study if a sample of 200 cars is used over a population of
15,300 cars.
12
Name:__________________________________________________Score:____________
Name of Teacher:_________________________________________Date:_____________
Course/Year/Section:______________________________________
Exercise 2.2
Sampling Technique
2. Determine the number of samples that can be selected from a population of 17,500 people in
every 301st person is taken as part of the sample.
13
WEEK 7- 9 LESSONS: PRESENTATION OF DATA
Learning Objectives:
The students should be able to:
Discussion:
These are:
1. Textual Presentation
This is the technique in a paragraph form. In this technique, it does not necessary
mean that the presentation consists of words only but figures can also be used as part
of the of the presentation.
2. Tabular Presentation
Example: The table below shows the distribution of male and female students taking
up computer courses at a Manila College. The population is 250 students.
Gender Frequency
Male
75
Female
14
175
F
Percentage (%) = x 100%
N
75
Male Percentage (%) = x 100% = 30%
250
175
Female Percentage (%) = x 1000% = 70%
250
Male
75 30%
Female
175 70%
3. Graphical Presentation
There are different types of graphs such as line graph, bar graph, pictograph, pie
chart, etc.
a. Line Graph. It shows relationship between two or more sets of quantities. In this
technique, the values are plotted using dots which are called “markers” to be
connected together by line segments.
Month: Weight in kg
May: 73
15
b. Bar Graph
It is the graphical technique in which each value in the data is represented by
rectangular bars. The length of the bars indicates the measure of a certain value
while its width has a fixed size.
Example of Bar Graph.
c. Pictograph
This is a graphical technique that expresses its meaning through its pictorial
resemblance to a physical object. Each object used in pictograph stands for a
corresponding measure.
Example of pictograph.
d. Pie Chart
This is the type of graphical presentation in which a circle (or sometimes a
cylinder) is divided into several partitions with each partition characterizing the
categories of the data.
Example of pie chart.
16
Name:__________________________________________________Score:____________
Name of Teacher:_________________________________________Date:_____________
Course/Year/Section:______________________________________
Exercise 3.1
1. ACMY Publishing has 1235 employees in which 13 are contractual, 515 are probationary, and
the rest are regular employees.
2. In a university in Manila, 350 freshmen are enrolled under the college of Nursing, 600 under
the college of computer studies, 475nunder the college of Engineering while 312 under the
college of Education.
17
Name:__________________________________________________Score:____________
Name of Teacher:_________________________________________Date:_____________
Course/Year/Section:______________________________________
Exercise 3.2
Graphical Presentation
1. ACMY Publishing has 1235 employees in which 13 are contractual, 515 are probationary, and
the rest are regular employees.
2. In a university in Manila, 350 freshmen are enrolled under the college of Nursing, 600 under
the college of computer studies, 475nunder the college of Engineering while 312 under the
college of Education.
18
WEEK 10-12 LESSONS: FREQUENCY DISTRIBUTION TABLE
Learning Objectives:
The students should be able to:
Discussion:
Now, let us discuss the step by step procedure in constructing a frequency distribution table.
Example No. 1: The data shown are the scores of 30 students in statistics exam.
Construct a FDT.
47 65 81 65 68 55
56 69 61 75 71 67
61 87 50 74 49 66
49 89 77 75 79 85
68 90 57 63 54 90
Definition: Range is the difference between the highest and the lowest score.
19
R = Highest Score – Lowest Score
R = 90 – 47 = 43
Definition: Class Interval is the grouping of category defined by a lower limit and an
upper limit. The ideal number of CI is between 5 and 15.
Definition:Class Size is the difference between two successive lower class limits. To
get i,
i = 43 ÷ 9 = 4.78 = 5
20
Basically, an FDT consists four fundamental columns as shown in the given table
above. These are the class interval, class boundary, class frequency, and class mark. In the
next part, each of these will be discussed further.
Step No. 5. Make the class intervals. Start with the lowest score until the highest is reached.
The lower limit of the first class interval is the lower value found in the summary
of data while the lower limit in the next class interval can be calculated using the formula
below.
However, the upper limit in a given interval can be calculated using the formula
below:
1
47 – 51
2
52 – 56
3
57 – 61
4
62 – 66
5
67 – 71
6
72 – 76
21
7
77 – 81
8
82 – 86
9
87 – 91
1
47 – 51 46.5 – 51.5
2
52 – 56 51.5 – 56.5
3
57 – 61 56.5 – 61.5
4
62 – 66 61.5 – 66.5
5
67 – 71 66.5 – 71.5
6
72 – 76 71.5 – 76.5
7
77 – 81 76.5 – 81.5
8
82 – 86 81.5 – 86.5
9
87 – 91 86.5 – 91.5
22
Step No. 7: Determine the Class Frequency (f).
1
47 – 51 46.5 – 51.5 4
2
52 – 56 51.5 – 56.5 3
3
57 – 61 56.5 – 61.5 3
4
62 – 66 61.5 – 66.5 4
5
67 – 71 66.5 – 71.5 5
6
72 – 76 71.5 – 76.5 3
7
77 – 81 76.5 – 81.5 3
8
82 – 86 81.5 – 86.5 1
9
87 – 91 86.5 – 91.5 4
23
47 – 51 46.5 – 51.5 4 49
2
52 – 56 51.5 – 56.5 3 54
3
57 – 61 56.5 – 61.5 3 59
4
62 – 66 61.5 – 66.5 4 64
5
67 – 71 66.5 – 71.5 5 69
6
72 – 76 71.5 – 76.5 3 74
7
77 – 81 76.5 – 81.5 3 79
8
82 – 86 81.5 – 86.5 1 84
9
87 – 91 86.5 – 91.5 4 89
Example No. 2. Construct an FDT given the number of people visited a shop for the past 18 days.
28 31 32 17 25 33
8 54 31 24 15 16
22 51 28 48 49 11
R = 54 – 8 = 46.
i = 46 ÷ 6 = 7.67 = 8
24
Class Interval Class Boundary Class Frequency Class Mark
1 8 – 15
2 16 – 23
3 24 – 31
4 32 – 39
5 40 – 47
6 48 – 55
25
Step No. 6: Determine the Class Boundary.
1 8 – 15 7.5 – 15.5
2 16 – 23 15.5 – 23.5
3 24 – 31 23.5 – 31.5
4 32 – 39 31.5 – 39.5
5 40 – 47 39.5 – 47.5
6 48 – 55 47.5 – 55.5
1 8 – 15 7.5 – 15.5 3
2 16 – 23 15.5 – 23.5 3
3 24 – 31 23.5 – 31.5 6
4 32 – 39 31.5 – 39.5 2
5 40 – 47 39.5 – 47.5 0
6 48 – 55 47.5 – 55.5 4
Name:__________________________________________________Score:____________
Name of Teacher:_________________________________________Date:_____________
Course/Year/Section:______________________________________
Exercise 4.1
1. In the exams given to the top 40 students, the following scores were obtained:
33 54 59 43 31 51 29 64 55 35
31 31 46 61 35 44 57 29 61 59
65 48 42 33 37 42 57 56 31 49
31 64 63 63 51 45 34 62 32 30
27
2. Jake listed all the scores that he got in all of his Math exams since the first term and are
summarized as follows:
12 15 22 12 10 38 21 16 24 15 18 22
20 13 22 33 35 21 22 11 15 18 21 27
Learning Objectives:
Discuss:
Mean
In this chapter, we discuss the three fundamental measures of central tendency. Measures of
Central is a value that describes to which a set of data will likely fall. The three measures of central
tendency of data are the mean, median and mode.
Mean is the average of a set of data and is denoted by a symbol X́ . It is the value equal to the
sum of all the values in a data (Σx) divided by the total elements in a given data (N) and is
summarized by the formula given below:
x1 + x 2 + x3 + x 4 +.... x n−1+ xn Σx
X́ = =
N N
The above formula is applicable only for ungrouped data. Ungrouped Data are set of values
not grouped per class interval while grouped data refers to a summary of values grouped into
numbers of class intervals in which frequency of values that fall per given class interval is counted. In
getting the mean for grouped data, the given table must be completed.
28
Σf = N =_______ Σfx =_______
After completing the table shown, the given formula can now be used to calculate for the mean
of a grouped data commonly called as the “weighted mean.”
Example No. 1: Find the mean of all the grades of James in his Engineering course if his grades are
as follows:
Mathematical Analysis 90
World Literature 78
Discrete Math 92
Calculus-based Physics 89
Engineering Economy 96
Solution:
N 1 2 3 4 5 Σx
X 90 78 92 89 96 445
Σx 445
X́ = = = 89
N 5
Example No. 2: The results of the scores in Mathematics test during the Teacher’s Board
Examination are summarized by the table below. Find the mean of the score of all the examinees.
29
Class Interval Class Frequency
(f)
10 – 20 5
21 – 31 10
32 – 42 11
43 – 53 7
54 – 64 23
65 – 75 56
76 – 86 6
87 – 97 8
98 – 108 4
Σf = N = ______
Solution:
10 – 20 5 15 75
21 – 31 10 26 260
32 – 42 11 37 407
43 – 53 7 48 336
54 – 64 23 59 1357
65 – 75 56 70 3920
76 – 86 6 81 486
Σfx = 7989
87 – 97 8 92 736
Σf = N = 130 30
X́ =
∑ fx = 7989 = 61.45
N 130
Median
~
Another important measure of central tendency is the median. It is usually denoted by X . By
definition, median is the value of the middle when all the elements in a set of data are arranged in
ascending order. If the number of elements in a given set of data is odd, then there is exactly one
element in the middle when the data is arrayed in ascending order. However, if there is even number
in a set of data, there will be two elements to be considered in the middle so that the median for a set
of data with even number of elements is equal to the average of the two middle elements.
Example 1. Find the median of the measured height of all the athletes of the YMA University
given below:
Solution:
Step No. 1: Identify the location of the median element using the given formula:
n+1
M=
2
Where: n = total number of elements in a set of data.
9+1
So, M= =5
2
Therefore, the 5th element is the median element of the given data.
Step No. 3: Take the 5th element in the arrayed data as the median (as in step No. 1)
~= 195
X
Example 2: Find the median of the following raw data:
5 11 3 6 9 11 6 12 16 4 6 10
Solution:
Step No. 1: Identify the location of the median element using the given formula:
n+1
M=
2
31
12+1
M= = 6.5
2
Therefore, the 6th and 7th elements are the median elements of the given data.
3 4 5 6 6 6 9 10 11 11 12 16
Step No. 3: Take the average of the 6th and 7th elements in the arrayed of data as the median
(as in Step No. 1)
~= 6+9 = 7.5
X
2
Arranging the data in ascending order is easy only for ungrouped data but this is
no longer possible for grouped data. The frequency distribution table is now important in the
calculation of median of grouped data.
N
~=~ −¿ cfb
X X LB+ i( 2 )
fm
~
Where: X LB = Lower class boundary of the median class
i = Class size
32
The median class is the class interval to which M is included with respect to the less than
cumulative frequency. For grouped data, M is calculated by dividing the total number of frequencies by
N
2( ). Let us consider the given FDT below:
2
10 – 20 5 5
21 – 31 10 15
32 – 42 11 26
43 – 53 7 33
54 – 64 23 56
65 – 75 55 111
76 – 86 7 118
87 – 97 8 126
98 - 108 4 130
Σf = N =130
If we solve for M:
N 130
M= = = 65
2 2
Since M is 65, therefore, the median class is the class interval 65 – 75 since it includes
elements 57 to 111 when arranged in ascending order, unlike class interval 54 – 64 which includes
only elements 34 to 56. Now that we know our median class, we can now get the values needed for
the formula of median for grouped data.
54 – 64 23 56
65 – 75 55 111
33
76 – 86 7 118
~
1. Lower class boundary of the median class ( X LB ¿.
Recall: Lower class boundary = Lower class limit - 0.5.
Upper class boundary = Upper class limit + 0.5.
So,
~
X LB = 65 – 0.5 = 64.5
2. Class Size
i = Upper class limit – Lower class limit +1
i = 75 – 65 +1 = 11
Therefore,
N 130
~=~
X X LB + i 2
(
−¿ cfb
fm
=
)
64.5 + 11 2
−56
55 ( )
= 66.3
Example 2.
47 – 51 4
52 – 56 3
57 – 61 3
62 – 66 4
67 – 71 5
72 – 76 3
77 – 81 3
82 – 86 1
87 - 91 4
34
If Class Interval Class Frequency (f) < Cumulative Frequency we will solve for M:
47 – 51 4 4 M= = =
52 – 56 3 7
57 – 61 3 10
62 – 66 4 14
67 – 71 5 19
N 30
72 – 76 3 22 2 2
77 – 81 3 25
82 – 86 1 26
87 – 91 4 30
Σf = N = 30
15
Since M is 15, therefore, the median class is the class interval 67 – 71. Now, that we know our
median class we can now get the values needed for the formula of Median for grouped data.
~
1. Lower class boundary of the median class ( X LB ¿
Recall: Lower class boundary = Lower class limit – 0.5
Upper class boundary = Upper class limit + 0.5
So,
~
X LB =¿67 – 0.5 = 66.5
2. Class Size
i = Upper class limit – lower class limit +1
i = 71 – 67 + = 5
35
Therefore:
N 30
~=~
X X LB + i 2
(
−¿ cfb
fm
=
)
66.5 + 5 2
−14
5 ( )
= 67.5
Mode
The third measure of central tendency is the mode. Mode is defined as the element in a set of
data that has the most number of frequencies. In an ungrouped data, mode can be simply identified by
inspection. Mode is denoted by ^
X.
Example 1. What is the mode of the measured height of all the athletes of the YMA University
given below:
Solution:
Step No. 1: List down all the values and their corresponding frequencies.
Value Frequency
181 1
211 1 Step No. 2: Take the value that has the highest
frequency.
195 2
X = 195 and 189
^
189 2
200 1
The mode of grouped data cannot be
easily determined by just looking at the class
206 1 frequency since the values are grouped per class
interval and the element inside a class interval
188 1 cannot be directly determined.
N=9
36
fm−fmb
X = X LB + i
^
( 2 fm−fma−fmb )
Where: X LB = Lower class boundary of the modal class
i = class size
The modal class is the class interval with the highest class frequency. Let us consider the
given FDT below:
32 – 42 11
43 – 53 7
54 – 64 23
65 – 75 55
( X ¿ ¿ LB) ¿
76 – 86 7
87 – 97 8
98 – 108 4
Σf= N = 130
X LB = 65 – 0.5 = 64.5
fm−fmb
X = X LB + i
^
( 2 fm−fma−fmb )
55−23
^
X = 64.5 + 11 ( 2(55)−7−23 )
^
X = 68.9
Example No. 1: The data shown are the scores of 30 students in Statistics exam. Find the median
and mode scores of the 30 students if 9 class intervals shall be used in grouping the data.
47 65 81 65 68 55
56 69 61 75 71 67
61 87 50 74 49 66
49 89 77 75 79 85
68 90 57 63 54 90
Solution:
N
X LB
fm(
~ = X + i 2 −¿ cfb
)
30
~
X = 66.5 + 5
( )
2
−14
5
~ = 67.5
X
Solving for the Mode
The modal class is 67 – 71 since it has the highest frequency of 5.
X LB = 66.5
i = 71 – 67 + 1 = 5
fm = 5
fma = 3
fmb = 4
5−4
^
X = 66.5 + 5 ( 2(5)−3−4 )
^
X = 68.17
Quantiles
Unlike mean, median and mode which generally describe the center of distribution, Percentile,
Quartile, and Decile characterize a specific location of the distribution.
For example, we want to know the score of the rank 30 students in a class of 100. We cannot
use the median formula since it will give us the 51 st student in rank. We need to use another measure
of location which can be either the percentile or decile.
Quartiles are three values that split a data in four equal parts. The three quartiles are names
as Q 1, Q 2∧¿Q ¿ , in which Q 1 is the (25%)th of the data, Q 2 is the (50%)th of the data or median of the
3
Deciles are nine values that split data in 10 equal parts. Each decile represents a multiple of
10% of the total data. D 1 , D 2 , D 3 ,∧so on until D 9 represents the (10%)th until the (90%)th of the data.
39
Percentiles are 99 values that split data in 100 equal parts. Each percentile represents a
multiple of 1% of the total value. P1 , P2 , P 3 ,∧so on until P99 represents the (1%)th until the (99%)th of
the data.
kN
Lk =
q
Where: Lk = location of the kth quantile element of a data.
The Formula for getting the value of a kth quantile (whether percentile, quartile, or decile) is
summarized below:
kN
(
q k = X LB + i q
−¿ cfb
f qk )
Where: X LB = Lower boundary of the k th quantile class
i = class size
q = types of quantile
f qk = the class frequency of the k th quantile class
Example: The data shown are the scores of 30 students in Statistics exam. Find the P23 and D7 score of the 30
students if 9 class intervals shall be used in grouping the data.
47 65 81 65 68 55
56 69 61 75 71 67
61 87 50 74 49 66
49 89 77 75 79 85
68 90 57 63 54 90
Solution:
Class Interval Class Frequency Less than Cumulative
(f) Frequency
47 – 51 4 4
52 – 56 3 7
57 – 61 3 10
62 – 66 4 14
67 – 71 5 19
72 – 76 3 22
77 – 81 3 25
82 – 86 1 26
87 – 91 4 30
40
Σf = N = 30
kN
Lk =
q
23(30)
P23 = = 6.9
100
Since P23 is located at 6.9th element, therefore the P23 class is 52 – 56.
5. Less than cumulative frequency before the k th quantile class (<cfb). By looking at the table:
<cfb = 22
kN
(
q k =X LB + i q
−¿ cfb
f qk )
23 (30)
q k = 51.5 +5
( 100
3
−4
)
P23= 56.33
Solving for D7
kN
Lk =
q
7(30)
D7 =
10
Since, D 7 is located at 21st element, therefore the D 7 class is 72 – 76.
41
1. Lower boundary of the k th quantile class ( X LB)
X LB = 72 – 0.5 = 71.5
5. Less than cumulative frequency before the k th quantile class (<cfb). By looking at the table:
<cfb = 19
kN
(
q k = X LB + i q
−¿ cfb
f qk )
7 (30)
(
q k = 71. 5 +5 10
3
−19
)
D7 = 74.83
Name:__________________________________________________Score:____________
Name of Teacher:_________________________________________Date:_____________
Course/Year/Section:______________________________________
42
Exercise 5.1
Mean, Median and Mode
Find the mean, median and mode of the following:
1. In the exams given to the top 40 students, the following scores were obtained:
33 54 59 43 31 51 29 64 55 35
31 31 46 61 35 44 57 29 61 59
65 48 42 33 37 42 57 56 31 49
31 64 63 63 51 45 34 62 32 30
2. Jake listed all scores that he got in all of his Math exams since the first term and are
summarized as follows:
12 15 22 12 10 38 21 16 24 15 18 22
20 13 22 33 35 21 22 11 15 18 21 27
Name:__________________________________________________Score:____________
Name of Teacher:_________________________________________Date:_____________
Course/Year/Section:______________________________________
43
Exercise 5.2
Quantiles
Find the quantiles asked in each of the following:
1. In the exams given to the top 40 students, the following scores were obtained:
33 54 59 43 31 51 29 64 55 35
31 31 46 61 35 44 57 29 61 59
65 48 42 33 37 42 57 56 31 49
31 64 63 63 51 45 34 62 32 30
a. P23
b. D9
c. Q1
d. P83
2. Jake listed all scores that he got in all of his Math exams since the first term and are
summarized as follows:
12 15 22 12 10 38 21 16 24 15 18 22
20 13 22 33 35 21 22 11 15 18 21 27
a. D4
b. D3
c. Q2
d. P63
Learning Objectives:
44
The students should be able to:
1. Discuss the importance of a measure of variation,
2. Calculate and interpret the various measures of variation
3. Define Biostatistics
4. Discuss the Bioinformatics advances in databases, data mining, and biological interpretation.
Discussion:
Range
In the previous chapter, we discussed the different measures that determine the central
tendency of data. In this chapter, we will discuss the different measures that describe how spread or
scattered a set of data is, of which range is the simplest. Range is the difference between the highest
and the lowest value in a set of data and is given the formula below:
R = Maximum Value – Minimum Value
Σ | X− X́|
MAD =
N
Where: x = a value from the data
For grouped data, MAD can be calculated using the formula below:
45
Σf |X − X́|
MAD =
N
Where: X = the class mark per class interval
Standard Deviation
Standard Deviation is another measure of variation that describes how scattered the data are
with respect to the mean of the given data. It is usually denoted as σ (sigma). Standard Deviation is
the commonly used measure of variation especially in real life application since it is considered as the
most accurate of all. The standard deviation of an ungrouped data can be calculated using the
formula below:
σ = √ Σ¿ ¿ ¿
For grouped data, standard deviation can be calculated using the formula below:
σ = √ Σf ¿ ¿ ¿
Where: X = the class mark per class interval (for grouped data)
X = the individual value (for ungrouped data)
Variance
Variance is almost similar to standard deviation. They both measure the spread of data with
respect to the mean. However, variance is equal to the square of standard deviation that is why it is
usually denoted as σ 2. The formula in the calculation of variance for ungrouped data is:
σ 2 =Σ ¿ ¿
However, for the grouped data the formula is as follows:
σ 2 = Σf ¿ ¿
Example No. 1: The data shown are the scores of 30 students in Statistics exam. Find the range,
quartile deviation, mean absolute deviation, standard deviation, and variance of the given data if 9
class intervals shall be used in grouping the data.
47 65 81 65 68 55
56 69 61 75 71 67
46
61 87 50 74 49 66
49 89 77 75 79 85
68 90 57 63 54 90
Solution:
R = 90 – 47 = 43
47 – 51 4 4
52 – 56 3 7
57 – 61 3 10
62 – 66 4 14
67 – 71 5 19
72 - 76 3 22
77 – 81 3 25
82 – 86 1 26
87 – 91 4 30
Σf = N = 30
Note that:
Q3−Q1
QD =
2
Therefore, we need to know first Q 1 and Q 3. Solving for Q 1.
1(30)
Q1 = = 7.5
4
Since Q 1 is located at 7.5th element, therefore, the Q 1 class is 57 – 61.
47
4. Class frequency of the k th quantile class ( f qk ). By looking at the table:
f qk = 3
5. Less than cumulative frequency before the k th quantile class (<cfb). By looking at the table:
<cfb = 7
kN
(
q k = X LB + i q
−¿ cfb
f qk )
1(30)
q k = 56.5 + 5
Q 1= 57.33
(4
3
−7
)
Solving for Q 3:
kN
Lk =
q
3(30)
Q3 = = 22.5
4
Since Q 3 is located at 22.5th element, therefore the Q 3 class is 77 – 81.
5. Less than cumulative frequency before the k th quantile class (<cfb). By looking at the table:
<cfb = 22
kN
(
q k = X LB + i q
−¿ cfb
f qk )
3 (30)
q k = 76.5 +5
Q 3 = 77.33
(4
3
−22
)
48
So,
Q3−Q1
QD =
2
77.3−57.33
QD =
2
QD = 10
Solving for the Mean Absolute Deviation
47 – 51 4 49 196
52 – 56 3 54 162
57 – 61 3 59 177
62 – 66 4 64 256
67 – 71 5 69 345
72 - 76 3 74 222
77 – 81 3 79 237
82 – 86 1 84 84
87 – 91 4 89 356
Σf = N = 30 Σfx = 2031
Σfx 2031
Therefore, the mean is: X́ = = = 67.7
N 30
47 – 51 4 49 18.7 74.8
52 – 56 3 54 13.7 41.1
57 – 61 3 59 8.7 26.1
62 – 66 4 64 3.7 14.8
67 – 71 5 69 1.3 6.5
72 - 76 3 74 6.3 18.9
77 – 81 3 79 11.3 3.9
82 – 86 1 84 16.3 16.3
87 – 91 4 89 21.3 85.2
49
Σf = N = 30 Σf | X− X́| =
317.6
Σf |X − X́|
MAD =
N
317.6
MAD =
30
MAD = 10.59
Solving for the Standard Deviation
Using the table constructed in while solving for the mean absolute deviation.
σ = √ Σf ¿ ¿ ¿
4834.7
σ=
√ 30
σ = 12.69
Solving for the Variance
σ 2 = Σf ¿ ¿
4834.7
σ2 =
30
σ 2 = 161.16
50
Biostatistics
Biostatistics are the development and application of statistical method to a wide range of topics
in biology. It encompasses the design of biological experiments, the collection and analysis of data
from those experiments and the interpretation of the results.
51
Statistics in Social Work
An Introduction to Practical Applications
Understanding statistical concepts is essential for social work professionals. It is key to
understanding research and reaching evidence-based decisions in your own practice―but that is only
the beginning. If you understand statistics, you can determine the best interventions for your clients.
You can use new tools to monitor and evaluate the progress of your client or team. You can recognize
biased systems masked by complex models and the appearance of scientific neutrality. For social
workers, statistics are not just math, they are a critical practice tool.
This concise and approachable introduction to statistics limits its coverage to the concepts
most relevant to social workers. Statistics in Social Work guides students through concepts and
procedures from descriptive statistics and correlation to hypothesis testing and inferential statistics.
Besides presenting key concepts, it focuses on real-world examples that students will encounter in a
social work practice. Using concrete illustrations from a variety of potential concentrations and
populations, Amy Bachelor creates clear connections between theory and practice―and
demonstrates the important contributions statistics can make to evidence-based and rigorous social
work practice.
“This is an excellent introduction to statistics for both students and practitioners in social works – it
demystifies terms and procedures and uses real world examples to help the reader to see the
everyday applicability of statistical knowledge, whether in practice or in study.”
- John Devaney Coauthor of Quantitative Research Methods for Social Work.
52
Name:__________________________________________________Score:____________
Name of Teacher:_________________________________________Date:_____________
Course/Year/Section:______________________________________
Exercise 6
Measure of Deviation
Find the measures of variation (Range, QD, MAD, SD, and Variance) of the following:
1. In the exams given to the top 40 students, the following scores were obtained:
33 54 59 43 31 51 29 64 55 35
31 31 46 61 35 44 57 29 61 59
65 48 42 33 37 42 57 56 31 49
31 64 63 63 51 45 34 62 32 30
3. Jake listed all scores that he got in all of his Math exams since the first term and are
summarized as follows:
12 15 22 12 10 38 21 16 24 15 18 22
20 13 22 33 35 21 22 11 15 18 21 27
53
References
54
Name:__________________________________________________Score:____________
Name of Teacher:_________________________________________Date:_____________
Course/Year/Section:______________________________________
I. Learning Insight
55
Name:__________________________________________________Score:____________
Name of Teacher:_________________________________________Date:_____________
Course/Year/Section:______________________________________
II. Reflection
56