Module 1 Introduction to Statistics
Module 1 Introduction to Statistics
1. Definition of Statistics
In its plural sense, it refers to the data itself or to some numerical computations
derived from a set of data that are systematically collected and analyzed.
In its singular sense, it refers to the scientific discipline consisting of the theory
and methods for processing collections of quantitative and qualitative data
useful when making decisions in the face of uncertainty.
2. Learning the methods in statistics enable us to develop a way of thinking that helps us in
many ways:
describe or characterize persons, objects, situations, and some phenomena with
some reliability;
make assessments and comparisons in an objective manner;
make evidence-based decisions.
ii. Inferential Statistics – methods concerned with the analysis of a subset of data
leading to predictions or inferences about the entire set of data, that is, to
generalize results beyond the data collected provided that the data collected is
a part (sample) of a large set of items (population).
1
5. Examples of Descriptive Statistics
Total number of CMU students that belongs to the College of Education.
The CMU registrar cited statistics showing a decrease number of CMU students
during the past five years.
7. Key Definitions
Universe – is the set of all entities under study, that is, the collection of things or
observational units under study.
Variable – is a characteristic observed or measured on every unit of the universe.
Population - is the set of all possible values of the variable.
Sample – is a subset of the population.
Parameters – are numerical measures that describe the population or universe of
interest.
Statistics – are numerical measures of a sample.
Frame – a listing of all the elements in a population.
Census – the process in which information is gathered for all units in the population.
Sample survey or sampling – the process in which information obtained is only a part
of the population.
8. Illustration:
Universe (U): Senior High School (SHS) Students of Central Mindanao University
of the School Year 2017-2018, Second Semester
Populations: a) Gender of SHS students of Central Mindanao University of the school year
2017-2018 second semester
b) Height of SHS students of Central Mindanao University of the school year
2017-2018 second semester
Statistics: Average height, maximum height of a sample of 20 CMU SHS students of the
school year 2017-2018, second semester
Parameters: Average height, maximum height of all CMU SHS students of the school year
2017-2018, second semester
2
The observational units U are the enrolled SHS of CMU of the School Year 2018-
2019, second semester denoted by U1, U 2 ,..., U N . There is an observed characteristic for
every student like their gender, height, blood type and home address. Observed characteristics
are referred as variables. Consider the example below with N observational units and M th
variables
SHS Variables
Gender Skin Height ... M th Variable
Color
U1 Male Brown 160cm. ... ...
U2 Female Brown 148cm. ... ...
U3 Female Black 168cm. ... ...
.
... ... ... ... ...
. ... ... ... ... ...
. ... ... ... ... ...
Qualitative variables – These are variables that yield observations by which individuals can
be categorized according to some characteristic or quality.
- e.g., gender, marital status and blood type; they are expressed in categories
Quantitative variables – These are variables that yield observations that can be measured.
- e.g., weight, height, systolic blood pressure and body mass index.
3
Quantitative data is either classified as discrete or continuous data.
Discrete data – This refers to any data that can be counted, e.g., number of
patients in a hospital, number of students with Type A blood.
Continuous data – This refers to any data that can be measured, e.g., systolic
blood pressure, weight and height. These data result from infinitely many
possible values that can be associated with points on a continuous scale in such a
way that there are no gaps or interruptions.
Note: Arithmetical operations for quantitative data have some physical interpretation.
Some variables may take numerical values, but it does not make the variable
quantitative, e.g., sum of two zip codes or the difference of your cellular phone
number to your seatmate. Thus, the arithmetic operations of the above examples do
not make sense. The issue is whether performing arithmetical operations on these
data would make any sense. The figure in the next page illustrates the classification
of data collected on particular variables.
VARIABLES
Qualitative Quantitative
Discrete Continuous
4
10. Levels of Measurement or Measurement Scales
Measurement is the assignment of numbers to objects or events according to a
predetermined set of rules. For instance, if it is desired to measure a person’s weight in
kilograms, we may assign the number 50 to a person and say that a person’s weight is 50
kilograms. Determining the level of measurement of certain set of data is important
because it helps in deciding determine which statistical inference test that will be used to
analyze the data. There are four types of measurement scales: nominal, ordinal, interval
and ratio scales. They differ in the property of numbers (identity, order, additivity) that
they possess.
- Identity – the property that enables a person to distinguish one number from the
other. They are recognized by the shapes of the way they are written.
- Order – the property that numbers of observations are arranged in a sequence. For
any integers A, B, we can determine whether A B, A B, or B A.
- Additivity – the property that allows us to add two or more numbers. For any real
numbers A, B, C , and D, because of the equality of scale, we can determine if
A B C D, A B C D or A B C D.
- Absolute zero property means that there is a level at which there is nothing of the
characteristic being measured.
Nominal scale – the lowest level of measurement and is most often used with
variables that are qualitative in nature, rather than quantitative.
- Examples: gender, eye color, smoking status and nationality.
- Data in the nominal scale possess only the property of identity. Thus, numbers
or observations are only used to classify. For example in the variable gender,
if 1 is assign to male and 2 is for female, it does not necessarily mean that
female is better than male.
Ordinal scale – data in this case possesses the property of identity and order.
- can rank-order the objects as to whether they possess more, less or the same
amount of the variables being measured. Thus, we can determine whether
A B, A B, A>B, or A B.
- We still cannot determine how much greater or less A is than B in the attribute
being measured.
- Examples: level of educational attainment, military ranks.
Interval scale – Data in this level possesses the properties of identity, order and
additivity but do not have the absolute zero property.
- Examples: Temperature and intelligence score.
Ratio scale – Data at this level possesses the properties of identity, order, equality
of scale and absolute zero.
- Examples: weight and height of persons.
5
11. Index, Subscript, Notation
In statistics, we usually deal with group of data that result from measuring one or more
variables. The data are often derived from samples and occasionally from population, but
in either case it is useful to let symbols stand for the variables measured in the study.
Usually, statistics books used the Roman letter X and sometimes Y , to stand for the
variable(s) measured.
The number of observations is also represented by N and n for a population and a
sample, respectively. Let the symbol xi (read " x sub i" ) denote any of the N or n values;
x1 , x 2 , x3 ,..., x n constitute n assumed values by a variable X . The letter i in xi , which
stands for any of the numbers 1, 2, 3,…, n is called a subscript, or index. Any letter other
than i, such as j , k , v, q or r , could have been used as well.
Summation symbol - This is a compact way of writing the sum of a set of data
values:
n
- x
i 1
i is defined as
n
x
i 1
i = x1 x 2 ... x n
Example 1. Consider the age of a sample of six children as shown in the table below
6
12. Rules of Summation
n n n n
1. ( xi yi zi ) xi yi zi
i 1 i 1 i 1 i 1
n n
2. cxi c xi , where c is any given constant.
i 1 i 1
n
3. c nc , where c is any given constant.
i 1
i 1
4 2
c. ( xi y i ) .
i 1
The Factorial Symbol ! - This a compact way of writing the product of a sequence of
positive integers. The symbol n! is defined as
n! 1 2 3 ... n.
- n! is the product of all positive integers less than or equal to n.
- 0! 1 by agreement.
7
1.2. Exercises/Problems
2. For a given universe, your answer to problem #1 above, define at least 3 populations.
4. Investigate the following problems and determine what is more appropriate to use –
descriptive or inferential statistics:
a. CMU Math Department would like to know the number of BS Mathematics students
interested of the newly revised curriculum of the BS Mathematics program.
b. A biology student studies the mercury content of fishes in Pulangi River and found
that the average mercury content is 400 units.
c. CMU Office of Student Affairs would like to predict the number of students who
would like to stay at the University’s dormitories. However, the enrolment
period was a week before the classes start so the said office randomly selected 100
students and the results were used as an estimate.
d. Do girls learn to speak at an earlier age than boys?
6. Fill in the missing words to the quote: “Inferential statistics is defined as drawing
conclusions about ____________ based on ____________ computed from the
_____________.”
7. A random sample of 100 commuter students in CMU was selected and several variables
were recorded for each student. Which of the following is NOT CORRECT?
a. Their average allowance per month is a continuous variable.
b. Socioeconomic status was coded as 1=low income, 2=middle income, 3=high
income and is an interval scaled variable.
c. The primary language used at home is an ordinal scale variable.
8
8. The College of Agriculture obtained the following data representing the one week growth
in centimeters of 24 newly planted soybean plants:
1.3 4.9 3.9 0.8 4.1 1.1 3.1 2.2 2.4 2.4 1.8
1.8 2.4 3.9 1.8 3.9 3.9 4.1 3.9 2.4 4.0 4.2
3.7 1.6
2
24
c. Evaluate i 1
24 1
10. Write each of the following as a summation; that is, in the compact notation:
a. z1 z2 z3 z4 z5 z6 b. z2 z3 z4 z5 z6
c. x1 f1 x2 f 2 x3 f 3 x4 f 4 x5 f 5 d. x12 x22 x32 x42
e. 2 z2 2 z3 2 z4 2 z5 2 z6 f. ( x1 y1 ) ( x2 y2 ) ( x3 y3 )
g. ( x4 3) ( x5 3) ( x6 3) ( x7 3)
s
11. tat
i 1
9
Statistics 24 Exercises
2. The College of Agriculture obtained the following data representing the one week growth
in centimeters of 24 newly planted soybean plants:
2.3 3.9 3.9 1.8 4.1 1.1 3.1 2.2 2.4 2.4 1.8
1.8 2.4 3.9 2.8 3.9 3.9 4.1 3.9 2.4 4.0 4.2
3.7 1.6
i 1
2
24
xi
xi i 1
24
2
24
c. Evaluate i 1
24 1
Answer:
a.
b.
c.
10
11