0% found this document useful (0 votes)
18 views

Module 4 Data Management Introduction To Statistics

Module-4-Data-Management-Introduction-to-Statistics
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Module 4 Data Management Introduction To Statistics

Module-4-Data-Management-Introduction-to-Statistics
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

GEC 3

Mathematics in
the Modern
World

Course
Modules

Week 7
MODULE 4
Data Management: Introduction to Statistics

4.1 Introduction
When we hear the word Statistics, the first thing that comes to mind is set
of numerical figures, such as your monthly allowance, the number of hours you
spend in school, the number of hours you spend on Facebook, your vital
statistics, etc.
However, the study of statistics is not limited to knowing and memorizing
numerical figures. This module will give us a better understanding of what
Statistics is about. Discussion on how some of its processes are done is also
included.
4.2 Learning Outcomes
After finishing this module, you are expected to:

1. discuss the importance of statistics in your field of study;


2. compare and contrast between descriptive statistics and inferential
statistics;
3. define data;
4. identify different types of data as well as their level of measurement; 5.
identify appropriate data collection methods based on needed data; and
6. identify appropriate data presentation type for a set of data.

4.3 What You Need to Know


The following definition will give the meaning of the study of Statistics:

DEFINITION 5.1 (Statistics)

Statistics is the branch of science that deals with the collection, presentation,
organization, analysis, and interpretation of data.

Why are all processes involved in Statistics important? Statistics has the
ability to provide us with tools we need to convert raw data into information that
we can use to make sensible decisions and intelligent choices.

Page 1 of 15
People from various fields of interest need to obtain information to answer
different types of problems. Nowadays, we do this by performing a statistical
inquiry. This will allow us to answer problems with clearer understanding of a
particular collection of information.

DEFINITION 5.2 (population)

The population is the collection of all elements under consideration in statistical


inquiry. The sample is a subset of the population.

Usually, the population of interest may be too large that it becomes too
expensive and time-consuming to collect data from every element of the
population. Thus, we have no other option but to get the data we need from only
a subset of the population. We use the term sample to refer to this subset of the
population.
In any statistical inquiry, we study certain characteristics or attributes of
the elements in the population, which we call variables. Just like in algebra, we
denote variables with letters of the English alphabets. We refer to these
characteristics as variables because their realized values may vary for the
different elements in the sample or population.

DEFINITION 5.3 (variable, observation, and data)

The variable is a characteristic or attribute of the elements in a collection that can


assume different values for the different elements. An observation is a realized value
of a variable. Data is the collection of observations.

Example 1. Assigning the population and sample


If we define our population to be the set of all students of ISU for SY
20202021, a sample is the set of First year students in ISU for the SY 2020 -
2021.

Page 2 of 20
Example 2. Below are illustrations of variables together with their possible
values.

Variable Possible Observations

S sex of a student Male, Female

E employment status of an employee Temporary, Permanent, Contractual

I monthly income of a person in pesos

N number of children in a household

Example 3. Identifying population and variables of interest.


The research division of a certain pharmaceutical company is investigating
the effectiveness of a new diet pill in reducing weight on female patients.
Population: set of all female who will use the diet pill
Variable of interest: weight before taking the pill, weight after taking the pill

Regardless of whether we are using data collected from the population or


from the sample, it would be difficult to understand what all this numeric figures
convey. To give meaning to these numbers, it is necessary to summarize and
condense the information contained in this collection of observations into a
single numeric figure that describes a particular feature of the whole collection.
We call this single numeric figure a summary measure.

DEFINTION 5.4 (parameter and statistic)

The parameter is a summary measure describing a specific characteristic of the


population. The statistic is a summary measure describing a specific characteristic
of the sample.

Example 4.
A summary measure that we are familiar with is the proportion. The
proportion is the quotient obtained when we divide the magnitude of a part by
the magnitude of the whole. Suppose that among the 35 students, 28 claimed
that they own a cellular phone. We can now compute for the proportion of
students in the population with cellular phones.

Page 3 of 19
The proportion of students in our population with cellular phones is an
example of a parameter because it is a summary measure describing a
characteristic of the population.
Suppose we take a sample of 10 students from this class. Among the 10
students in the sample, 7 own cellular phones. We cannot compute the
proportion of students in the population with cellular phones but we can
compute for (read as “ hat”), where is the proportion of students in the
sample with cellular phones, as follows:

The proportion of students in our sample is an example of a statistic because


it is a summary measure describing a characteristic of the sample.

4.3.1 Major Areas of Applied Statistics


There are two major fields of Statistics. These are applied statistics and
theoretical or mathematical statistics. Applied statistics is concerned with
procedures and techniques used in the collection, presentation, organization,
analysis, and interpretation of data.
We study applied statistics in order to learn how to select and properly
implement the most appropriate statistical methods to answer research
problems. On the other hand, mathematical statistics is concerned with the
development of the mathematical foundations of the methods used in applied
statistics.
There are two major areas of interest in applied statistics. These are
descriptive statistics and inferential statistics.

DEFINITION 5.5 (Descriptive Statistics)

Descriptive Statistics includes all the techniques used in organizing, summarizing,


and presenting data on hand. It is concerned with summary calculations such as
averages, and percentages and construction of graphs, charts and tables.

We use methods in descriptive statistics to summarize and describe the


features of the data on hand. The data on hand may have come from all the
elements of the population so that the analysis using descriptive statistics will
allow us to describe the population. The data on hand may also come from the

Page 4 of 20
elements of a selected sample. In this case, the analysis using descriptive
statistics will only allow us to describe the sample. The methods used in
descriptive statistics will not allow us to generalize about the population using
sample data.
Example 5. Below is an illustration of application and restriction of descriptive
statistics.
Given the daily sales performance for a product for the previous year, we
can draw a line chart or a column chart to emphasize the upward/downward
movement of the series. Likewise, we can use descriptive statistics to calculate
a quantity index per quarter to compare the sales per quarter for the previous
year.

DEFINITION 5.6. (Inferential Statistics)

Inferential Statistics includes all the techniques used in analyzing the sample data
that will lead to generalizations about a population from which the sample came from.
It consists of performing hypothesis testing, determining relationships among variables,
and making predictions.

In inferential statistics, we do not simply describe the sample data. Rather,


we use the sample data to form conclusions about the population. Since the
sample is only a subset of the population, we arrive at the conclusions about the
population using inferential statistics under conditions of uncertainty. It should
be clear that whatever conclusions we make using inferential statistics is always
subject to some error.
Example 6. Below is an application of inferential statistics.
To determine if reforestation is effective, we can take a representative portion
of denuded forests and use inferential statistics to draw conclusions about the
effect of reforestation in all denuded forests.

4.3.2 Collection of Data

Data Collection is the process of gathering and measuring information on


variables of interest, in an established systematic fashion that enables one to
answer stated research questions, test hypotheses, and evaluate outcomes. The
data collection component of research is common to all fields of study including
physical and social sciences, humanities, business, etc. While methods vary by
discipline, the emphasis on ensuring accurate and honest collection remains the
same.

Page 5 of 19
4.3.2.1 Quantitative and Qualitative Variables or Data
In doing a report or research, initially, we have to define the variables
relevant to the data. There are two major classifications of variables: qualitative
and quantitative.

1. Qualitative Variables are nonnumeric variables and cannot be


measured.

Examples include gender, religious affiliation, and ethnicity.

2. Quantitative Variables are numerical variables and can be measured.

Examples include balance in your checking account, number of


children in your family.

Some quantitative variables can take on only specific or isolated values


along a scale, for example, the number of children in the family may be 1, 2, 3,
or any other whole number but it can never be 1.25 or 0.5. Thus, this variable
has values which can only be obtained through the process of counting and is
referred to as discrete or discontinuous variables.
Specifically, quantitative variables can be ordered and ranked. It can be
classified in to two groups:

1. Discrete variables are values that are obtained by counting. The results
are whole numbers. For example, the number of students in the room.

2. Continuous variables are values that are obtained by measuring. The


results can be any value between two specific values. For example, if
you take the height of each student in a room, you could get any
number between two reasonable amounts. So height is a continuous
variable.

4.3.2.2 Levels of Measurement


Variables can also be classified according to the level of measurement.
There are four levels of measurement: Nominal, Ordinal, Interval, and Ratio.

1. Nominal Data. In this case, numbers are used to represent an item or


characteristic. Examples include: names, gender, religious affiliation,
civil status, college majors. Note that such data should not be treated as
numerical, since relative size has no meaning.

Page 6 of 20
2. Ordinal or Rank Data. In this set, numbers can be ordered or ranked,
but a specific difference in the levels cannot be determined. For
example, the performance rating can be represented by numbers as
illustrated below:

Because order in this set is considered, we know that Outstanding


is higher than Very Satisfactory or Very Satisfactory is higher than
Satisfactory, etc., but there is no exact difference between any two of
them. For example, the grade of Outstanding and Very Satisfactory
may be close (4. and ) or may be far apart ( and 4 .), so the
exact difference cannot be determined.

3. Interval Data. In this set, numbers can be ordered and has exact
difference between any two units but has no meaningful zero or starting
point. For example, Temperature is an interval data since they can be
ordered, there is an exact difference between two degrees, but the zero
does not mean the starting point since there can be temperatures below
zero.

4. Ratio Data. This set is the highest level of measurement and allows for
all basic arithmetic operations, including division and multiplication.
Data at this level can be ordered, has exact difference between units,
and has a meaningful zero. Things that are counted are usually ratio
level, for example, business data, such as cost, revenue and profit.

4.3.2.3 Importance of Accurate and Appropriate Data Collection

Regardless of the field of study or preference for defining data (quantitative,


qualitative), accurate data collection is essential to maintaining the integrity of a
report or research. Both the selection of appropriate data collection instruments
(existing, modified, or newly developed) and clearly defined instructions for their
correct use reduce the likelihood of errors occurring.

In the case where data are not properly gathered, the consequences are as
follows:

1. inability to answer research questions accurately

Page 7 of 19
2. inability to repeat and validate the study
3. distorted findings resulting in wasted resources
4. misleading other researchers to pursue unproductive ways of
investigation
5. compromising decisions for public policy
6. causing harm to human participants and animal subjects

4.3.2.4 Data Collection Methods


We discuss the most widely used methods for collecting data. These include the
use of documented data, surveys, experiments, and observations.
4.3.2.4a Use of Documented Data
Sometimes information is difficult to gather or measure personally. If
information is already available for use, then it would be more practical to use
documented data in gathering needed information.
One can obtain documented data from previous studies of individuals,
written reports of government and nongovernment agencies, periodicals, and
others.
Example 7.
The Philippine Statistics Authority is a major collector of data for
government needs. It provides the public with basic data on various subject
matters. A few of these are household income and expenditure,
employment, and others.

DEFINITION 5.7. (primary data, secondary data)

Primary data are data documented by a primary source. The data collectors themselves
documented this data.

Secondary data are data documented by a secondary source. An individual/agency,


other than the data collectors, documented this data.

Example 8. The following agencies can provide primary data:

a. Central Bank (CB) is a primary source of data on banking and finance.


b. Philippine Statistics Authority (PSA) is a primary source of data on
population, housing, and establishments.

Page 8 of 20
c. Bureau of Agricultural Statistics (BAS) is a primary source of data on
agriculture and livestock.
d. The University Registrar’s Office is a primary source of student records.
Example 9. The following are examples of secondary data.

a. The United Nations’ compiled data for its yearbook, which were originally
gathered by government statistical agencies of different countries.
b. A medical researcher’s documented data for his research paper, which
were originally collected by the Department of Health.
c. The documented data of a student for his thesis, which were originally
collected by the Department of Labor and Employment.

4.3.2.4b Surveys
DEFINITION 5.8. (survey, census, sample survey)

The survey is a method of collecting data on the variable of interest by asking people
questions. When data came from asking all the people in the population, then this is
called a census. On the other hand, when data came from asking a sample of people
from a well-defined population, then this is called a sample survey.

The interviewees are the respondents of the survey. A questionnaire which


contains all the questions that each respondent will have to answer is used.
Usually, respondents are selected objectively by employing probability sampling
procedure. By following an objective method of selecting the sample, the
reliability of generalizations about the population under study can be assessed.
Example 10.
Pulse Asia conducted a sample survey on voter response to political ads in
the May 2016 election. Its respondents were selected registered voters who
intend to vote in the 2016 election.
There are various methods of communicating with the respondents in a
survey. Some of the most commonly used methods are personal interviews,
telephone interviews, self-administered questionnaires, online surveys, and
focus group discussions.
4.3.2.4c Experiments
DEFINITION 5.9. (experiment)

The experiment is a method of collecting data where there is direct human


intervention on the conditions that may affect the values of the variable of
interest.

Page 9 of 19
In an experiment, there are different types of variables:

• The explanatory variable is the factor under study.


• The response variable is the observation which the researcher uses
for comparison after conducting the experiment using the factor
under study.
• The extraneous variables are such which the researcher believes may
have an effect on the response variable
We consider the classical mongo experiment. We randomly select mongo
seeds, planted them in two pots, one pot we exposed to sunlight and the other
we did not. Both pots have the same soil type. We watered the pots at the same
time using the same amount of water. A few weeks later, we observed the heights
of the mongo plants.
In this experiment, the objective is to determine the effect of sunlight on the
height of a mongo plant. The explanatory variable is the amount of sunlight.
Categories for the explanatory variable are called “treatments” or factor levels.
The response variable is the height of the mongo plant and the extraneous
variables are identified to be the soil type and amount of water.
The extraneous variables are usually controlled making sure that the two
groups will receive the same levels or amounts. The use of randomization
mechanism in assigning the treatments and controlling the identifies extraneous
variables makes the experiment a more effective method of data collection in
establishing cause and effect.
Example 11.
The school administration wishes to determine which of the two methods is
more effective in training new student leaders. They randomly assigned twenty
student leaders to training method 1 and twenty student leaders to training
method 2. After one month of training, they administered a standardized
achievement test to the two groups and compared their scores.
4.3.2.4d Observation
DEFINITION 5.10. (observation method)

The observation method is a method of collecting data on the phenomenon of


interest by recording the observations made about the phenomenon as it actually
happens.

The observation method is useful in studying the reactions and behavior of


individuals or groups of persons/objects in a given situation or environment as

Page 10 of 20
it happens. For example, a researcher may use the observation method to study
the behavior patterns in panic situations like a big fire, the landslide in Itogon,
Benguet, or the destruction of structures when Typhoon Yolanda hit Tacloban
City.
It is also practical to use observation method when the subjects under study
cannot express their sentiments or are unable to speak. For example,
researchers often use the observation method to study the behavior of animals
in the wild, or the behavior of newborn babies in the nursery.

The table below shows the comparison of survey, experiment, and


observation methods.
Data Collection Method
Aspect
Survey Experiment Observation
Assessing the reliability of
generalizations about a welldefined Generally Sometimes Oftentimes
population possible difficult difficult

Ability to establish cause-andeffect Poor Superior Poor

Realism of data Realistic Least realistic Most realistic

4.3.3 Presentation of Data


After data collection, we need to organize and analyze the data. After
organizing and analyses, we present the results in forms that will allow us to
reveal important information we obtained from the data.
There are three ways to present the information from our data. These
include textual, tabular, and graphical presentations.
4.3.3.1 Textual Presentation
Textual presentation of data incorporates important figures in a paragraph
of text. In this type of presentation, we insert important data figures or summary
measures within the paragraph of text to support our conclusions.
Textual presentation allows us to direct reader’s interest to vital information
we want to highlight. Summary measures like minimum, maximum, total, and
percentages are just few information that may be included in a textual
presentation.
It is necessary to select the most important figures we want to focus on.
Whenever we use textual presentation, we must always provide our readers with
additional discussion about the relevance of the figures in our presentation.

Page 11 of 19
Example 12. Here is an illustration of textual presentation.
Excerpts taken from the Isabela Covid-19 Case Updates.
“As of 4PM today, the Department of Health reports a total number of COVID19
cases at 290,190, after 3,475 newly-confirmed cases were added to the list of COVID-
19 patients.

DOH likewise announces 400 recoveries. This brings the total number of recoveries
to 230,233.
Twenty-eight duplicates were removed from the total case count. Of these, 19 were
recovered cases.
Moreover, 13 cases previously reported as recovered were reclassified as death
(12) and active (1) cases after final validation.”

From the illustration given, the paragraphs showed and highlighted only the
most important figures. Few numbers were included and minute details or a
large quantity of data were not presented. If we want to refer to other details of
the data, then it would be more appropriate to use tabular presentation.

4.3.3.2 Tabular Presentation


Tabular presentation of data arranges figures in a systematic manner in
rows and columns. It is the most common method of data presentation. We can
use it for various purposes such as description, comparison, and in showing
relationships between two or more variables of interest.
In tabular presentation, we arrange the data figures or summary measures
in rows and columns for easy reading. Tables should be simple and easy to
understand. Each row and column must have an appropriate label.
Three types of tabular presentation will be discussed in this module namely,
leader work, text tabulation, and the formal statistical table.

4.3.3.2a Leader Work


Leader work has the simplest layout among all three types of tables. It
contains no table title or column headings and has no table borders. We
incorporate this type within a paragraph presenting one or two columns of
figures as supporting data.

Page 12 of 20
Example 13.
The population in the Philippines for the census years 1975 to 2000 is as
follows:

1975 42,070,660
1980 48,098,460
1990 60,703,206
1995 68,616,536
2000 76,498,735

4.3.3.2b Text Tabulation


The format of text tabulation is a little bit more complex than leader work.
It already has column headings and table borders, making it easier to
understand than leader work. This type does not have table title and table
number. Thus, it still needs introductory description for reader comprehension.
Example 14.
The distribution of cellular subscribers per telephone operator as of
December 2003 is as follows:
Telephone Operator Number of Subscribers

SMART 10,080,112
GLOBE 8,800,000
PILTEL 2,867,085
EXTELCOM 29,896
Total 22,509,560

4.3.3.2c Formal Statistical Table


The formal statistical table is the most complex type of table since it has all
the different parts like the table number, table title, head note, box head, stub
head, column headings, and so on. It is a stand-alone table and can be easily
understood even without a description.

Page 13 of 19
The following presents the different parts of a formal statistical table:
number that identifies the position of the table in a
Table number
sequence

Heading states in telegraphic form the subject, data


Located on top of Table title classification, and place and period covered by the
the table of figures figures in the table
appears below the title but above the top cross rule of
Head note the table and provides additional information about the
table.

Spanner head caption or label describing two or more column heads


Box head
Column head label that describes the figures in a column

Panel set of column heads under the same spanner head

Row caption label that describes the figures in a row


Stub
Located at the left Center head label describing a set of row captions
side of the table
caption or label that describes all of the center heads
Stub head
and row captions and is located at the first row

Field collection of figures in the table

Line row of figures

Column column of figures

Cell contains the intersection of a row caption and a column heading

a descriptive statement about a particular part of the table or the whole


Footnote
table located at the bottom of the table

Source note gives the name of the agency that collected the data

Page 14 of 20
Example 14.
Below is an example of a formal statistical table.

Page 15 of 19
4.3.3.3 Graphical Presentation
Graphical presentation of data portrays numerical figures or relationships
among variables in pictorial form. Some statistical charts used in this type of
presentation is given in the following table:
Type of
Description Example
Chart
Line Chart • Useful for presenting historical
data
• Effective in showing movement
of a series over time
• Appropriate when comparing
two or more time series data
and trends over time

Column • Compare amounts in a time


Chart series data
• Emphasis is on difference in
magnitude
• For time series data, columns
are arranged on the horizontal
axis

Horizontal • Appropriate when we wish to


Bar Chart show the distribution of
categorical data.
• Used to compare magnitudes
for different categories of a
qualitative variable.

Pie Chart • Circle divided into several


sections
• Each section indicates the
proportion of each component

Page 16 of 20
Pictograph • Like a horizontal bar chart that
uses symbols or pictures
instead of bars
• The purpose is to get the
attention of the readers

4.4 Supplementary Learning Content


Importance of Knowledge in Statistics
Data is an important part of an inquiry and Statistical knowledge is essential
in carrying out the different steps from the proper methods of data collection and
correct data analyses to effective data presentation. A deep understanding of
statistics is necessary for the following reasons:

1. It enables anyone to become a better and more effective problem solver.

2. It provides procedures to gather data systematically and logically for


the advancement of knowledge.

3. It helps in organizing questions and testing theories.

4. It assists in describing and understanding the relationship between


variables that are often important in decision-making.

5. Knowledge of the statistical process can help us measure current


change and improve the forecasting process in predicting future with
accuracy.
Role of Statistics in Data Analysis
The following list provides us with the role Statistics play in data analysis

1. To organize the number derived from measuring a trait or a variable.

2. To describe and interpret the distribution of data, relationships


between variables, hypothesis being tested or parameters being
predicted or estimated.

Page 17 of 19
3. To help the researcher in making credible decisions based on
quantitative data or arguments.

4. To cope with changes by forecasting the future based on data on hand.

5. To provide a plausible foundation for building new learning or teaching


theory in education.

4.5 Supplementary Learning Resources


Excel Charts & Graphs: Learn the Basics for a Quick Start by Leila
Gharani
https://www.youtube.com/watch?v=DAU0qqh_I-A

Creating a Table in Word from Skillsoft YouTube


youtube.com/watch?v=koDeGamrxV4
4.6 Flexible Teaching-Learning Modality
Remote (asynchronous)

Module, exercises, problem sets, powerpoint lessons

4.8 References:
Beaver, B.M. and Beaver R.J. (1999). Introduction to Probability and Statistics.
10th ed. New York: Duxbury Press.
Bluman, A. (1998) Elementary Statistics: A Step by Step Approach. 3 rd ed.
McGraw-Hill Book Co.
Deuna, Melecio C. (1996), Elementary Statistics for Basic Education. Quezon
City: Phoenix Publishing House, Inc.
Febre, F.A. and Virginia F. Cawagas (Consultant)(1987) Introduction to
Statistics. Metro Manila, Pheonix Publishing House, Inc.
Reyes, C.Z. and Saren, L.L. (2003). Metro Manila. M.G. Reprographics.
Spiegel, M. and Stephens, L. (1999). Schaum’s Outline Theory and Problems
in Probability and Statistics. 3rd. Edition. Singapore: McGraw-Hill
Book Company.

Page 18 of 20
Thorndike,R.M. & Dinnel,D.L. (2002)Basic Statistics for the Behavioral
Sciences.Prentice Hall,Inc.
Triola, Mario (1995) Elementary Statistics. New York: Addison-Wesley
Publishing Company.
Most, .M.M., Craddick, S., Crawford, S., Redican, S., Rhodes, D., Rukenbrod,
F., Laws, R. (2003). Dietary quality assurance processes of the
DASHSodium controlled diet study. Journal of the American Dietetic
Association, 103(10): 1339-1346.
Web Sources:
http://lsc.cornell.edu/wp-content/uploads/2016/01/Why-study-
statistics.pdf

Page 19 of 19

You might also like