Part1 141104090445 Conversion Gate01
Part1 141104090445 Conversion Gate01
INTRODUCTION TO
STATISTICS
1
CHAPTER I INTRODUCTION TO STATISTICS
Nature
Statistics is derived from the Latin word status which means “state”. In the word
statistics, it refers to the actual numbers derived from data and a method of analysis.
Definition
Statistics is a branch of mathematics which concerned with the methods for
collecting, organizing, presenting, analyzing and interpreting quantitative data to aid in
drawing valid conclusions and in decision making.
TYPES OF STATISTICS
Descriptive statistics
A type of statistics, which focuses on the collecting, summarizing, and presenting a
mass of data so as to yield meaningful information.
Examples:
1. A math teacher wants to determine the percentage of students who passed the
examination.
2. Lula is a bowler wants to find her bowling average for the past 10 games.
Inferential statistic
A type of statistics, that deals with making generalizations and analyzing sample
data to draw conclusions about a population. This is a process of obtaining
information about a large group from the study of a smaller group.
Examples:
1. Kim is a basketball player wants to estimate his chance of winning the MVP
award based on his current season averages and averages of his opponents.
2. A manager would like to predict base on previous year’s sales, the sales
performance of company for the next six years.
1. Population is consisting of all elements such as events, objects, and individuals whose
characteristic is being studied.
Example:
The researcher would like to determine the number of male BSE students in
CvSU-Imus campus.
2. Variable is a characteristic of an item or individual that will be analyzed using statistics.
Variables are usually denoted by any capital letter.
2
4. Parameter- a numerical measure that describes a characteristic of a population.
5. Statistics- a numerical measure that describes a characteristic of a sample.
TYPES OF VARIABLE
Qualitative variable- a variable, that cannot be measured numerically but can be classified
into different categories.
Examples: Names, Gender, Hair color, subjects enrolled in a semester, species.
Discrete Quantitative Variable – a variable results from either a finite number of possible
values or a countable number of possible values. In other words, a discrete variable can
assume only certain values no intermediate values.
Examples: number of patient, number of sold cars, number of book
LEVELS OF MEASUREMENT
The level of measurement of data determines the algebraic operations that can be
performed and th statistical tools that can be applied to the data set.
LEVEL 1. Nominal is characterized by data that consist of names, labels, or categories
only.
Examples: gender, marital status, employment, religion, address, degree program
LEVEL 2. Ordinal it involves data that may be arranged in some order, but differences
between data values either cannot be determined or are meaningless.
The data measured can be ordered or rank.
Examples: grades of the students, military rank, job position, year level
LEVEL 3. Interval In this level has precise differences between measures but there is no
true zero.
Examples: temperature, IQ score
LEVEL 4. Ratio is the interval level modified to include the inherent zero starting point.
For values at this level, differences and ratios are meaningful.
Examples: weekly allowance, area, and volume
3
Some Commonly Used Symbols in Statistics
∑ capital letter sigma denotes the sum of, and summation of
f small letter f denotes as frequencies
F capital letter F denotes cummulative frequencies
n small letter n denotes sample size
i small lettre i denotes interval
N capital letter N denotes population size
X capital letter X denotes independent variable
Y capital letter Y denotes dependent variable
𝑥̅ mean of the sample
µ capiltal letter m denotes population mean
4
CHAPTER II PRESENTATION AND GRAPHS
Mass or bulk of data that is collected from population sample based on the
observations, from primary or secondary source is still considered a raw data.
Raw data is the data recorded in the sequence in which they are collected and
before they are processed or ranked. This decision cannot be derive easily if the
results are not yet organized because it does not give a clear idea or presentation of
what has gathered
5
Graphical presentation
-it is the most interesting and most effective means of organizing and presenting
statistical data. Graphs tell a story with visuals rather than in words or numbers and can
help readers or students to understand and interpret the findings of the data collected.
Kinds of Graphs that commonly used in data presentation:
Example
Line Graph
– It shows the relationships between two sets of quantities. It was done by plotting
the points of x along the horizontal axes and y in the vertical axes. Broken lines connect the
plotted points, at last line segment was finally formed.
Example
This line graph shows the midday temperature over a period of 7 days.
6
You can see at a glance that the temperature was at its highest on Monday and that it
started to fall in the middle of the week before rising again at the end of the week.
Bar Graph
It is the comparisons of numerical values of a given item over a period. It was composed of
bars, rectangle or rectangular prism of equal width. It can be drawn vertically or
horizontally in single or paired bar graphs and it begins with zero
For example
7
CHAPTER III FREQUENCY DISTRIBUTION
UNGROUPED DATA
It names all the data in tabular form and tells how many times the value occured.
Example 1 :
Consider the following data obtained when a dice tossed 40 times.
2 5 4 6 3 6 1 6
1 2 5 4 6 3 2 1
5 6 6 1 2 4 4 4
4 2 1 3 5 6 4 6
5 2 3 6 3 2 3 1
Construct a frequency table
Solution :
Create a three column table. In the first row, you will put the numbers in the dice. In the
second row, the tally of each numbers in the dice appeared. In the last column, the
frequency by which the values occured.
8
To get the range of the data we have the highest value and the lowest value of the dice, 6
and 1 respectively. Thus,
R = Highest Value – Lowest Value
R=6–1
R=5
GROUPED DATA
Data that are organized and arranged in the form of frequency distribution
Example 2:
Ms. Hautea teaches math to 70 students. She has a list of the number of days each
student was absent in her class for the entire year.
13 8 5 6 3 4 1 1 2 9
2 9 10 11 18 20 16 15 17 9
4 2 1 8 7 7 2 19 11 10
5 8 6 4 7 19 22 5 10 13
14 11 14 14 2 3 8 9 9 11
26 5 15 12 1 6 5 9 6 15
2 5 6 8 7 7 6 6 1 1
( Construct a Frequency Table)
Solution:
STEP 1: Look for the range of the data.
Range = Highest Value – Lowest Value
R = 26 – 1
R = 25
9
Class Limits is the smallest and the largest values that would be placed in a given.
The lower limit is the lowest possible value in the class while the upper limit is the
highest possible value in the class. Where in 1, 6, 11, 16, 21, and 26 are the lower limits
while 5, 10, 15, 20, 25, and 30 are the upper limit.
CLASS
1–5
6 – 10
11 – 15
16 – 20
21 – 25
26 – 30
Number of
Class Boundaries Class Mark
Absenses Frequency
LTCB - UTCB (x)
( CLASS )
1–5 23 0.5 5.5 3
6 – 10 25 5.5 10.5 8
11 – 15 14 10.5 15.5 13
16 – 20 6 15.5 20.5 18
21 – 25 1 20.5 25.5 23
26 – 30 1 25.5 30.5 28
SUM 70
10
STEP 6: Cumulative Frequency Distribution
Relative Cummulative Frequency = Frequency ( number of observations/occurences
Total Number of Frequency
Less than
Number of Relative Percentage
Cummulative
Absenses Frequency Cummulative Cummulative
Frequency
( CLASS ) Frequency Frequency
( <CF )
1–5 23 23 0.33 33%
6 – 10 25 48 0.36 36%
11 – 15 14 62 0.2 20%
16 – 20 6 68 0.09 9%
21 – 25 1 69 0.01 1%
26 – 30 1 70 0.01 1%
SUM 70 100%
11
CHAPTER IV MEASURES OF CENTRAL TENDENCY
Summation Notation
The following are the weights of 5 people (in kilograms):
40, 45, 30, 50, 55
We can use symbols such as X1 =40, X2 =45, and so on, where x stands for the weight and
the subscript for person it represents. We can write it as:
X1 +X2 +X3 +X4 +X5
We can use summation notation symbol Σ to write the sum.
5
∑ 𝑋𝑖
𝑖=1
Where i indicates the subscript of the first term in the summation and the number on top,
5, indicates the subscript of the last term of summation.
Example 1:
If X1 =4, X2 =5, X3 =9, X4 =6, X5=7, find
5
∑ 𝑋𝑖
𝑖=1
Solution:
5
∑ 𝑋𝑖
𝑖=1
=X1 +X2 +X3 +X4 +X5
=4+5+9+6+7
=31
12
Example 2:
If X1 =13, X2 =15, X3 =19, find
3
∑(𝑋𝑖 − 1)
𝑖=1
Solution:
3
∑(𝑋𝑖 − 1)
𝑖=1
=(X1 -1) + (X2 -1) + (X3 -1)
= (13-1) + (15-1) + (19-1)
=12+14+18
=44
Rules on Summation Notation
Rule No.1: The summation notation is distributive over addition.
𝑛 𝑛 𝑛
∑(𝑋𝑖 + 𝑌𝑖 ) = ∑ 𝑋𝑖 + ∑ 𝑌𝑖
𝑖 =1 𝑖=1 𝑖=1
Rule No.2: If c is a constant, then
𝑛 𝑛
∑ 𝐶𝑋𝑖 = 𝑐 ∑ 𝑋𝑖
𝑖=1 𝑖 =1
Rule No. 3: If c is constant then
𝑛
∑ 𝑐 = 𝑛𝑐
𝑖 =1
Examples:
Use the rules on summation to write out the expansion of the given expression.
1.
3
∑(2𝑋𝑖 + 3)
𝑖=1
=
3 3
∑ 2𝑋𝑖 + ∑ 3
𝑖=1 𝑖=1
Rule No.1
=
3
∑ 2𝑋𝑖 + 3(3)
𝑖=1
Rule No.3
=
13
3
∑ 2𝑋𝑖 + 9
𝑖=1
Rule No.2
=2(X1 +X2 +X3 ) +9
2.
5 5
𝑥 = ∑ 𝑋𝑖
𝑖=1
n
where:
x=sample mean
n=total number of items in the sample
Xi=the ith observed value
Σ=summation notation
Example 1:
Find the mean score of 7 students whose quiz scores are 85, 90, 87, 83, 80, 75, and 85.
Solution:
x=85+90+87+83+80+75+85 = 585 = 83.57
7 7
Example 2:
What is the mean age of a group of 5 students whose ages are 12, 15, 19, 20, and 18?
Solution:
x=12+15+19+20+18 = 84 = 16.8
5 5
14
2. Grouped Data
Formula:
𝑥 = ∑ 𝑓𝑖𝑋𝑖
n
where: n=total number of observations
Xi=the class mark of the ith class interval
fi=the frequency of the ith interval
Example 1:
Find the mean score of 50 students in periodic exam in Math.
Class Interval Frequency of Class Mark (x) fx
Scores(f)
95-99 3 97 291
90-94 10 92 920
85-89 15 87 1305
80-84 10 82 820
75-79 5 77 385
70-74 5 72 360
65-69 2 67 134
n=50 Σfx=4215
Solution:
We can find the class mark by adding the class interval divided by two. The fx is simply
multiplying the frequency by the class mark. We can now find the summation of fx by
adding all of the values. Then, substitute to the formula:
𝑥 = ∑ 𝑓𝑖𝑋𝑖 =4215 =84.3
n 50
Thus, the mean score of 50 students in periodic exam is 84.3.
B. Median
-middlemost number that divides the values of observations into halves
-it is referred as counting average
1. Ungrouped Data
To find the median, we must arrange the data in decreasing or increasing of
magnitude. Then, find the middle value if it is odd and arithmetic average if it is even.
Example 1:
Find the median of the following test scores: 25, 16, 12, 23, and 18
Solution:
Arrange the data: 12, 16, 18, 23,25
Then, obviously it has 5 scores so the middle of 5 is 3. Thus, the median of the set of
scores is 18.
15
Example 2
Find the median of the following set of scores in Math:
4, 6, 9, 7, 3, 10
Solution:
Arrange the data: 3, 4, 6, 7, 9, 10
Then, the middle of the scores are 6 and 7 so we can use the arithmetic average to find
the median. Thus, 6+7 = 6.5. The median is 6.5.
2
2. Grouped Data
Formula:
Md=Ll + (𝑛2 − 𝑐𝑓𝑏) ci
f
16
n=60
Solution:
To obtain the median class:
First, solve for n = 60 = 30th
2 2
Then, we can now locate where 30 th item is equal or nearest but not greater than the value
in the less than cumulative frequency (<cf) distribution.
The median class is the 75-79 class interval. We can now substitute to the formula.
Md=Ll+ ( 𝑛2 − 𝑐𝑓𝑏) ci
f
=74.5+ (60
2
− 22) 5
22
=74.5+(30 − 22) 5
22
=74.5+1.82
Md =76.32
C. Mode
-is the number that occurs most frequently
It is possible to have more than one mode. If there are two modes, it is
called bimodal. If three, it is called trimodal.
1. Ungrouped Data
Example 1:
Find the mode of the following set of items. 3, 7, 9, 8, 7, and 2
Solution:
The number that occurs most frequent is 7. In the 5 numbers, 7 occur twice so it is
the mode. It is uni-modal.
Example No.2
Determine the mode of the following: 15, 20, 10, 20, 10, and 25
Solution:
The numbers that occur most frequent are 10 and 20. It is bimodal.
17
2. Grouped Data
Mode = x = Xlb+ 𝑑1 c
d1+d2
Example 1:
Classes Frequency(f) Less than cumulative
frequency (<cf)
1-8 4 4
9-16 5 9
17-24 12 21
25-32 15 36
33-40 9 45
41-48 10 55
n=55
Solution:
The modal class is the 25-32, since it is the class interval with highest frequency. We can
substitute to the formula:
x^=Xlb+ 𝑑1 𝑐
d1+d2
=24.5+ 15-12 8
(15-12)+(15-9)
=24.5+ 3 8
3+6
=24.5+2.67
=27.17
Thus, the mode is 27.17.
18
CHAPTER V Measures of Location or Fractiles
A. Percentiles
-are values that divide a set of observations into 100 equal parts
P1 , read as first percentile, is the value which 1% of the values fall
P99 , is the value below which 99% of the values fall
To compute for the ith percentile:
Pi=the value of the i(n+1) th observation in the array
100
Example:
The following were the scores of 8 students in a quiz:
4, 3, 5, 6, 8, 6, 7, 9
Solution:
First, we need to arrange the data from lowest to highest. That is,
3, 4, 5, 6, 6, 7, 8, 9
Then, substitute
P70 = 70(8+1) th observation=6.3 or the 7th observation
100
Therefore, the 70th percentile is 8, which interpreted as: 70% of the scores are below 8.
Approximating the ith Percentile from a Frequency Distribution
Formula:
𝑖𝑛
Pi=LCBpi+c (100 −< 𝑐𝑓𝑝𝑖 − 1)
fpi
where:
The Pith class where the <cf is equal to, or exceeds for the first time, in
100
th
LCBpi=the lower class boundary of the pi class
c =class size of the Pith class
fpi =frequency of the Pith class
<cfpi-1 =less than cumulative frequency of the class preceding the Pith class
19
Example: (Refer to the example on the scores of 100 students in an achievement test).
Find the 40th percentile.
Score Frequency(f) <cf
60-64 10 10
65-69 8 18
70-74 12 30
75-79 25 55
80-84 20 75
85-89 25 100
Total n=100
Solution:
P40 th class is the class containing
40x100 th observation which is the 40 th observation.
100
Thus, from the above FDT it can be found at the 75-79 class interval. We now substitute
to the formula.
=74.5+5(0.4)
=74.5+2
=76.5
Therefore, forty percent of the scores in the achievement test is below 76.5.
B. Deciles
- are values that divide the array into 10 equal parts.
D1 , read as first decile, is the value below which 10% of the values fall
D9 , read as ninth decile, is the value below which 20% of the values fall
To compute for the ith decile:
20
Example:
From the given set scores in a quiz find the 5 th Decile or D5.
5, 3, 7, 6, 4, 8, 9
Solution:
First, arrange the data from lowest to highest.
21
Thus, 80% of the scores are below 85.5.
C. Quartiles
-are values that divide the array into 4 equal parts.
Q 1 , read as first quartile, is the value below which 25% of the values fall
Q 2 , is the value below which 50% of the values fall
Q 3 , is the value below which 75% of the value fall
Example: From the given set scores in a quiz. Find the 2 nd percentile.
2, 5, 7, 8, 9
Solution:
We need to use the formula: i(n+1) th .That is,
100
Q 2 = 2(5+1) 12 = observation = 3rd observation
th
4 4
nd
Therefore, the 2 quartile is 7, this implies that: 50% of the scores in the quiz are below
7.
Approximating the ith Quartile from a Frequency Distribution
Formula to be used:
Q i= LCBQi + c (𝑖𝑛 4
−< 𝑐𝑓𝑄𝑖 − 1)
fQi
where:
The Qith class is the class where the <cf is equal to, or exceeds for the first time, in
4
th
LCBQi=the lower class boundary of the Qi class
c =class size of the Qith class
fQi =frequency of the Qith class
<cfQi-1 =less than cumulative frequency of the class preceding the Qith class
Example: (Refer to the example on the scores of 100 students in an achievement test)
Find the 3rd Quartile.
Score Frequency(f) <cf
60-64 10 10
65-69 8 18
70-74 12 30
75-79 25 55
80-84 20 75
85-89 25 100
Total n=100
Solution:
The third quartile class is the class containing 3x100 th observation which is the 75th
observation. 4
22
The said observation can be found at the 80-84 class interval.
We will use the formula, that is,
Q3=79.5+5 75-55
20
=79.5+5(1)
=79.5+5
=84.5
Thus, 75% of the scores are below 84.5.
23
ASSESSMENT
TEST I
A. Write the following expressions in summation notation.
1. X7 +X8 +X9 +X10
2. (X3 -2) + (X4 -2) + (X5 -2) + (X6 -2)
3. 2X1 +2X2 +2X3
4. (X1 +4) + (X2 +4) + (X3 +4)
5. 12 +22 +32 +42
B. Write the following without summation signs and simplify if possible.
5
1. ∑ 6
𝑖=3
10
2. ∑ 𝑖
𝑖=4
3
3. ∑ 2𝑋𝑖
𝑖=1
4
4. ∑ 𝑋𝑖𝑌𝑖
𝑖=1
3 2
5. ∑ 𝑋𝑖 + ∑(𝑌𝑖 − 2)
𝑖=1 𝑖=1
24
n=35
E. Find the median of the following test scores.
1. 15, 13, 19, 14, 20
2. 5, 6, 11, 19, 20, 25, 30, 40
3. 38, 28, 54, 44, 17, 20
4. 8, 11, 17, 19, 21, 27, 30
5. 2, 7, 6, 4, 5
Class interval f
46-50 7
41-45 18
36-40 6
31-35 12
26-30 5
21-25 2
n=50
25
I. Solve the following.
The following were the scores of 6 students in a quiz:
6, 5, 10, 8, 3, 4
1. Find the 40th percentile
2. Find the 6th decile
3. Find the 2nd quartile
4. Find the 60th percentile
5. Find the 3rd quartile
TEST II
a. 21, 35, 45, 25, 31, 54, 47, 39, 40, 28 ____________
b. Jenny took 10 Abstract Algebra tests for the first semester. What is the range of her
test scores if her scores given are 85, 80, 75, 90, 95, 79, 87, 73, 83, and 92 as the results?
26
2.) Find the range of grouped data in the given set below.
3.) Fill in the table below. Solve for the Quartile Deviation, Average and Standard
27