0% found this document useful (0 votes)
5 views

Module 1 Introduction to Statistics

The document provides an introduction to statistics, defining it as both a collection of data and a scientific discipline for analyzing that data. It outlines the importance of statistical methods for making informed decisions and categorizes statistics into descriptive and inferential types. Additionally, it explains key concepts, measurement scales, and types of variables, along with examples and exercises to reinforce understanding.

Uploaded by

James john Binoy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Module 1 Introduction to Statistics

The document provides an introduction to statistics, defining it as both a collection of data and a scientific discipline for analyzing that data. It outlines the importance of statistical methods for making informed decisions and categorizes statistics into descriptive and inferential types. Additionally, it explains key concepts, measurement scales, and types of variables, along with examples and exercises to reinforce understanding.

Uploaded by

James john Binoy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

STAT 22

UNIT 1. INTRODUCTION TO STATISTICS


1.1. Things to know

1. Definition of Statistics
 In its plural sense, it refers to the data itself or to some numerical computations
derived from a set of data that are systematically collected and analyzed.
 In its singular sense, it refers to the scientific discipline consisting of the theory
and methods for processing collections of quantitative and qualitative data
useful when making decisions in the face of uncertainty.

Statistics as a science is basically concerned with the understanding of some structures


in a data set. As such, statisticians are involved with methods of data collection, data
organization, and analyses as well as interpretation of the results.
However, uncovering patterns embedded under the backdrop of uncertainty involves
not just science but also art.

2. Learning the methods in statistics enable us to develop a way of thinking that helps us in
many ways:
 describe or characterize persons, objects, situations, and some phenomena with
some reliability;
 make assessments and comparisons in an objective manner;
 make evidence-based decisions.

3. Some Applications of Statistics


 Determining the level of patient’s satisfaction on the nursing care administered
by student nurses of Central Mindanao University at Maramag Provincial
Hospital.
 Determining the distribution of the number of text messages sent per day of
CMU SHS enrolled in Statistics and Probability.
 Comparing the exam results in Statistics and Probability between male and
female CMU SHS.
 Relationship of faculty status and work commitment.
 Prediction of the number of first year CMU students for the next school year.

4. Major Categories of Statistics


i. Descriptive Statistics – methods concerned with collecting, describing, and
analyzing a set of data without drawing conclusions (or inferences) beyond
the data.

ii. Inferential Statistics – methods concerned with the analysis of a subset of data
leading to predictions or inferences about the entire set of data, that is, to
generalize results beyond the data collected provided that the data collected is
a part (sample) of a large set of items (population).

1
5. Examples of Descriptive Statistics
 Total number of CMU students that belongs to the College of Education.

 The CMU registrar cited statistics showing a decrease number of CMU students
during the past five years.

6. Example of Inferential Statistics


 A new teaching strategy designed to improve the performance of SHS students
in Mathematics was tested on randomly selected SHS students. Results show
that the new strategy is effective in improving the performance of students in
Mathematics.

7. Key Definitions
Universe – is the set of all entities under study, that is, the collection of things or
observational units under study.
Variable – is a characteristic observed or measured on every unit of the universe.
Population - is the set of all possible values of the variable.
Sample – is a subset of the population.
Parameters – are numerical measures that describe the population or universe of
interest.
Statistics – are numerical measures of a sample.
Frame – a listing of all the elements in a population.
Census – the process in which information is gathered for all units in the population.
Sample survey or sampling – the process in which information obtained is only a part
of the population.

8. Illustration:

Universe (U): Senior High School (SHS) Students of Central Mindanao University
of the School Year 2017-2018, Second Semester

Populations: a) Gender of SHS students of Central Mindanao University of the school year
2017-2018 second semester
b) Height of SHS students of Central Mindanao University of the school year
2017-2018 second semester

Statistics: Average height, maximum height of a sample of 20 CMU SHS students of the
school year 2017-2018, second semester

Parameters: Average height, maximum height of all CMU SHS students of the school year
2017-2018, second semester

2
The observational units U are the enrolled SHS of CMU of the School Year 2018-
2019, second semester denoted by U1, U 2 ,..., U N . There is an observed characteristic for
every student like their gender, height, blood type and home address. Observed characteristics
are referred as variables. Consider the example below with N observational units and M th
variables

SHS Variables
Gender Skin Height ... M th Variable
Color
U1 Male Brown 160cm. ... ...
U2 Female Brown 148cm. ... ...
U3 Female Black 168cm. ... ...
.
... ... ... ... ...
. ... ... ... ... ...
. ... ... ... ... ...

UN Male Brown 152cm. ... ...

“A statistic is to a sample as a parameter is to a population”.

9. Types of Variables and Data


The building blocks of statistical science are data. Specific characteristics (e.g., age,
height, and weight) that we want to assess for a certain population are referred to as
variables. Variables may be categorized further as qualitative and quantitative variables.

Qualitative variables – These are variables that yield observations by which individuals can
be categorized according to some characteristic or quality.
- e.g., gender, marital status and blood type; they are expressed in categories

Quantitative variables – These are variables that yield observations that can be measured.
- e.g., weight, height, systolic blood pressure and body mass index.

Constant – This is a variable or a variable that only assumes one value.

Data collected on particular variables are classified as either qualitative or quantitative.


Qualitative data (e.g., gender, marital status and blood type), are data obtained on
particular variables that are usually expressed in categories. Quantitative data are expressed
in numbers (e.g., weight, height, systolic blood pressure and body mass index); data
collected in these cases are measured and counted.

3
Quantitative data is either classified as discrete or continuous data.
 Discrete data – This refers to any data that can be counted, e.g., number of
patients in a hospital, number of students with Type A blood.
 Continuous data – This refers to any data that can be measured, e.g., systolic
blood pressure, weight and height. These data result from infinitely many
possible values that can be associated with points on a continuous scale in such a
way that there are no gaps or interruptions.

Note: Arithmetical operations for quantitative data have some physical interpretation.
Some variables may take numerical values, but it does not make the variable
quantitative, e.g., sum of two zip codes or the difference of your cellular phone
number to your seatmate. Thus, the arithmetic operations of the above examples do
not make sense. The issue is whether performing arithmetical operations on these
data would make any sense. The figure in the next page illustrates the classification
of data collected on particular variables.

VARIABLES

Qualitative Quantitative

Discrete Continuous

Figure 1.1.1 Classification of Data on Particular Variables

4
10. Levels of Measurement or Measurement Scales
Measurement is the assignment of numbers to objects or events according to a
predetermined set of rules. For instance, if it is desired to measure a person’s weight in
kilograms, we may assign the number 50 to a person and say that a person’s weight is 50
kilograms. Determining the level of measurement of certain set of data is important
because it helps in deciding determine which statistical inference test that will be used to
analyze the data. There are four types of measurement scales: nominal, ordinal, interval
and ratio scales. They differ in the property of numbers (identity, order, additivity) that
they possess.

- Identity – the property that enables a person to distinguish one number from the
other. They are recognized by the shapes of the way they are written.
- Order – the property that numbers of observations are arranged in a sequence. For
any integers A, B, we can determine whether A  B, A  B, or B  A.
- Additivity – the property that allows us to add two or more numbers. For any real
numbers A, B, C , and D, because of the equality of scale, we can determine if
A  B  C  D, A  B  C  D or A  B  C  D.
- Absolute zero property means that there is a level at which there is nothing of the
characteristic being measured.

 Nominal scale – the lowest level of measurement and is most often used with
variables that are qualitative in nature, rather than quantitative.
- Examples: gender, eye color, smoking status and nationality.
- Data in the nominal scale possess only the property of identity. Thus, numbers
or observations are only used to classify. For example in the variable gender,
if 1 is assign to male and 2 is for female, it does not necessarily mean that
female is better than male.

 Ordinal scale – data in this case possesses the property of identity and order.
- can rank-order the objects as to whether they possess more, less or the same
amount of the variables being measured. Thus, we can determine whether
A  B, A  B, A>B, or A  B.
- We still cannot determine how much greater or less A is than B in the attribute
being measured.
- Examples: level of educational attainment, military ranks.

 Interval scale – Data in this level possesses the properties of identity, order and
additivity but do not have the absolute zero property.
- Examples: Temperature and intelligence score.

 Ratio scale – Data at this level possesses the properties of identity, order, equality
of scale and absolute zero.
- Examples: weight and height of persons.

5
11. Index, Subscript, Notation
In statistics, we usually deal with group of data that result from measuring one or more
variables. The data are often derived from samples and occasionally from population, but
in either case it is useful to let symbols stand for the variables measured in the study.
Usually, statistics books used the Roman letter X and sometimes Y , to stand for the
variable(s) measured.
The number of observations is also represented by N and n for a population and a
sample, respectively. Let the symbol xi (read " x sub i" ) denote any of the N or n values;
x1 , x 2 , x3 ,..., x n constitute n assumed values by a variable X . The letter i in xi , which
stands for any of the numbers 1, 2, 3,…, n is called a subscript, or index. Any letter other
than i, such as j , k , v, q or r , could have been used as well.

 Summation symbol   - This is a compact way of writing the sum of a set of data
values:
n
- x
i 1
i is defined as
n

x
i 1
i = x1  x 2  ...  x n

Example 1. Consider the age of a sample of six children as shown in the table below

Table 1.1.1: Ages of Six Children


Child Number Age symbol Age (year)
1 x1 8
2 x2 10
3 x3 7
4 x4 6
5 x5 10
6 x6 12
a. Find the sum of their ages in compact form.
b. Solve the following mathematical expressions in step-by-step manner:
2
 6  6 6
i 1
b.1   xi  b.2  xi2 b.3  xi
 i 1  i 1 i 1

6
12. Rules of Summation
n n n n
1.  ( xi  yi  zi )   xi   yi   zi
i 1 i 1 i 1 i 1
n n
2.  cxi  c  xi , where c is any given constant.
i 1 i 1
n
3.  c  nc , where c is any given constant.
i 1

Example 2. Let y1  1, y 2  1, y 3  5, y 4  4, y 5  7 and y 6  6. Let the xi ' s be as


defined in Example 1. Find the following in a step-by-step manner:
6
a.  xi y i
i 1
6
b.  ( xi  y i )
2 2

i 1
4 2

c.  ( xi  y i ) .
i 1

 The Factorial Symbol ! - This a compact way of writing the product of a sequence of
positive integers. The symbol n! is defined as
n! 1 2  3  ... n.
- n! is the product of all positive integers less than or equal to n.
- 0!  1 by agreement.

Example 3. Solve for n !.


a) n = 7 b) n = 8 c) n = 9 d) n = 10

7
1.2. Exercises/Problems

1. Give an example of a universe.

2. For a given universe, your answer to problem #1 above, define at least 3 populations.

3. Through the given populations in question #2, answer the following:


a. Determine whether the variable of interest in the specified population is discrete or
continuous variable.
b. Determine the level of measurement of the data obtained considering the specific
variable of interest.

4. Investigate the following problems and determine what is more appropriate to use –
descriptive or inferential statistics:
a. CMU Math Department would like to know the number of BS Mathematics students
interested of the newly revised curriculum of the BS Mathematics program.
b. A biology student studies the mercury content of fishes in Pulangi River and found
that the average mercury content is 400 units.
c. CMU Office of Student Affairs would like to predict the number of students who
would like to stay at the University’s dormitories. However, the enrolment
period was a week before the classes start so the said office randomly selected 100
students and the results were used as an estimate.
d. Do girls learn to speak at an earlier age than boys?

5. Which of the following statements best describes statistical inference?


a. A decision, estimate, prediction, or generalization about the sample based on
information contained in a population. The population parameters are estimated
using the sample.
b. A statement made about a sample based on the measurements in that sample.
Statistical inference helps us draw conclusion about the unknown population
characteristics based on the sample.
c. A decision, estimate, prediction or generalization about the population based on
information contained in a sample.

6. Fill in the missing words to the quote: “Inferential statistics is defined as drawing
conclusions about ____________ based on ____________ computed from the
_____________.”

7. A random sample of 100 commuter students in CMU was selected and several variables
were recorded for each student. Which of the following is NOT CORRECT?
a. Their average allowance per month is a continuous variable.
b. Socioeconomic status was coded as 1=low income, 2=middle income, 3=high
income and is an interval scaled variable.
c. The primary language used at home is an ordinal scale variable.

8
8. The College of Agriculture obtained the following data representing the one week growth
in centimeters of 24 newly planted soybean plants:
1.3 4.9 3.9 0.8 4.1 1.1 3.1 2.2 2.4 2.4 1.8
1.8 2.4 3.9 1.8 3.9 3.9 4.1 3.9 2.4 4.0 4.2
3.7 1.6

Obtain the following:


a. Sum of the one week growth in centimeters of 24 newly planted soybean plants
in compact form.
b. Let X be the one week growth of soybean plants, find
2
24
 24 

i 1
x and   xi 
2
i
 i 1 
2
 24 
  xi 
xi   i 1 
24

 2

24
c. Evaluate i 1
24  1

10. Write each of the following as a summation; that is, in the compact  notation:
a. z1  z2  z3  z4  z5  z6 b. z2  z3  z4  z5  z6
c. x1 f1  x2 f 2  x3 f 3  x4 f 4  x5 f 5 d. x12  x22  x32  x42
e. 2 z2  2 z3  2 z4  2 z5  2 z6 f. ( x1  y1 )  ( x2  y2 )  ( x3  y3 )
g. ( x4  3)  ( x5  3)  ( x6  3)  ( x7  3)
s
11.  tat 
i 1

12. Solve for n! .


a) n = 2 b) n = 3 c) n = 4 d) n = 0

13. Determine for each of the following whether it is true or false:


a. 19!  19 18 17 16 15! d. 6! 3!  9!
5! 9!
b. 5 e.  36
3! 4  7!2!

9
Statistics 24 Exercises

Name: _________________________ Schedule: ________________


Section: ________________________ Score: __________________

1. Identify the following as qualitative or quantitative variable. If quantitative, classify whether


it is discrete or continuous. Also, indicate the appropriate level of measurement required in
each.
________ a. Car ownership (answers the question: Do you own a car?)
________ b. Citizenship
________ c. Tuition fees
________ d. Color of the skin
________ e. Air temperature of the Musuan Peak measured in degree Celsius.
________ f. Religion

2. The College of Agriculture obtained the following data representing the one week growth
in centimeters of 24 newly planted soybean plants:
2.3 3.9 3.9 1.8 4.1 1.1 3.1 2.2 2.4 2.4 1.8
1.8 2.4 3.9 2.8 3.9 3.9 4.1 3.9 2.4 4.0 4.2
3.7 1.6

Obtain the following:


a. Sum of the one week growth in centimeters of 24 newly planted soybean plants
in compact form.
b. Let X be the one week growth of soybean plants, find
2
24
 24 

i 1
xi and   xi 
2

 i 1 
2
 24 
  xi 
xi   i 1 
24

 2

24
c. Evaluate i 1
24  1

Answer:

a.

b.

c.

10
11

You might also like