0% found this document useful (0 votes)
221 views29 pages

MMW - Midterm - Modules - DATA MANAGEMENT

This document provides an overview of data management and the key elements of statistics. It discusses defining data and variables, and identifies the typical components of a statistical study: (1) formulating questions, (2) collecting data, (3) organizing and summarizing data, and (4) making conclusions. It also differentiates between qualitative and quantitative data, including discrete and continuous quantitative data. An example is provided on calculating college completion rates to illustrate applying the statistical process.

Uploaded by

krazy uwu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
221 views29 pages

MMW - Midterm - Modules - DATA MANAGEMENT

This document provides an overview of data management and the key elements of statistics. It discusses defining data and variables, and identifies the typical components of a statistical study: (1) formulating questions, (2) collecting data, (3) organizing and summarizing data, and (4) making conclusions. It also differentiates between qualitative and quantitative data, including discrete and continuous quantitative data. An example is provided on calculating college completion rates to illustrate applying the statistical process.

Uploaded by

krazy uwu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

MODULE 05

DATA
MANAGEMENT

LEARNING OBJECTIVES

1. Define statistics, data and variables.


2. Identify and use the components of statistics in given topics.
3. Discuss the role and importance of statistics in decision making
process
4. Discuss the role of computer science in statistics and vice versa.
5. Explain the typical research process
6. Apply the scales of measurement

LOADING YOUR KNOWLEDGE


Say you would like to find out the perceptions of your fellow students and
their parents with regards to the mode of teaching delivery in this new normal
set up. What would you do first? How can you tally the total responses of those
you have asked? Unknowingly what you are doing is a simple survey wherein
you would like to collect and collate responses for a conclusion. As such you are
working on statistics.
You are about to learn to apply statistical tools correctly, interpret the
findings appropriately and get an idea about the possibilities of analyzing
research questions employing statistics. Though it is not possible and not
worthwhile to learn all statistical methods in just a single course with a limited
time and constrained environment, this course, however, will be successful if it
enables you to improve your knowledge in statistical methods on your own. This
course gives you profound knowledge about some statistical analyzing tools and
shows you the correct application of them.

1
LESSON 01
The Elements, Components and Role of Statistics

LET’S PROCESS
Statistics is defined as a mathematical body of science that pertains to the collection,
organization and analysis, interpretation or explanation, and presentation of data, in such a
way that meaningful conclusions can be drawn from them. It is a crucial process behind how
we make discoveries in science, make decisions based on data, and make predictions. In a
simpler way of definition, Statistics are numbers with a context or underlying meaning.

We will refer to information collected from people or objects as data. Actually, data is
the plural form and a single piece of information is called datum, although data is now
commonly used to represent one or more pieces of information. Data is defined as the collection
of facts, such as numbers, words, measurements, observations or just descriptions of things.
Data can either be qualitative or quantitative.

Moreover, data are individual pieces of factual information recorded and used for the
purpose of analysis. It is the raw information from which statistics are created. Statistics are
the results of data analysis - its interpretation and presentation. These types of statistics are
referred to as 'statistical data'.

Why do we care about data? We collect data to help us learn about our world. We start
off with some questions about something we wish to learn about, and we find appropriate data
to help us answer these questions.

1. Qualitative data is descriptive information (it describes something).


2. Quantitative data is numerical information (numbers).

Quantitative data can be Discrete or Continuous:

• Discrete data can only take certain values (like whole numbers)
• Continuous data can take any value (within a range)

Put simply: Discrete data is counted; Continuous data is measured.

Example: What do we know about Arrow the Dog?

Source: www.mathsisfun.com

Qualitative:

• He is brown and black


• He has long hair
• He has lots of energy

2
Quantitative:

• Discrete:
• He has 4 legs
• He has 2 brothers
Continuous:
• He weighs 25.5 kg
• He is 565 mm tall

Note: To help you remember think "Quantitative is Quantity"

More examples of data types.

Qualitative:

• He is brown and black


• He has long hair
• He has lots of energy

Quantitative:

• Discrete:
• He has 4 legs
• He has 2 brothers
Continuous:
• He weighs 25.5 kg
• He is 565 mm tall

To help you remember think "Quantitative is Quantity"!


Components of Statistics
You’ve seen that statistics are numerical data that we work with. But statistics has a
second deeper meaning -- it's the science of using these data to learn about our world and
make conclusions.

There are four basic components of statistics which are:

1. FORMULATING QUESTIONS: First, state some questions or problems that you would
like to address by collecting relevant data.
2. COLLECTING DATA: Second, specify effective ways of collecting data that are useful in
answering the questions of interest.
3. ORGANIZING & SUMMARIZING: Next, organize and summarize the collected data to
learn about its general features.
4. MAKING CONCLUSIONS: Last, use the data to make conclusions. (It turns out that
probability or chance plays an important role in decision-making.)

Any statistical study reported in the media will have these four components. At the
beginning, there will be some questions that motivated the researcher to study a problem. If
there were no questions, then there would be no reason to proceed further into a statistical
study. Second, the researcher will collect data that he or she believes will be useful in
answering the question. You will see that data can be collected or found from many
sources. Next, the researcher organizes the data in some useful way and make graphs and or
calculations that are helpful in answering the main questions. Finally, the researcher has to
use the graphs and calculations to address the questions of interest. It is possible that the
data are insufficient or inconclusive on answering the questions and perhaps a new statistical
study will be undertaken.
For example, you want to work on the completion rate of QSU among its different
programs for the past five years. You can work on it observing the above-mentioned
components.

Formulating Questions

3
You can get started by looking at some of the completion rate data. Most likely, those
who enrolled in college had their own ambitions and dreams and are determined to finish
college. Ironically, there are those who failed to finish or pursue their careers. There were
those who ended up dropping or discontinuing from their course or program. Unfortunately,
the number of those who graduate do not compliment the number of enrolled during the very
1st semester of the program.
Basically, you can ask,

1. How many from those who enrolled in each program were able to graduate?
2. How many of these students finish the program or their course within the prescribed
time?
3. Who are those people who didn’t make it till graduation?
4. What might have contributed to their non-completion of the program?
5. Which program has the highest or the lowest completion rate?
Collecting Data

In order to find answer to you queries, you need to identify where to find data.
Possibilities include but are not limited to:

• Graduation programs’ invitation


• Deans/Program Heads/Program Chairperson
• Registrar and Guidance Offices
• Graduates
Note: Always observe the ethical procedure and the legal laws embedded in the acquisition of data.

The possible sources of data listed earlier are intended only for the given topic above.
There are however other sources of data suitable for each topic you intended to make. Other
ways to collect data also include direct observation, interview and document analysis. Take
note that when you are collecting data you are simply doing a survey.
Organizing & Summarizing
Once you have all the data, you can organize them by extracting information following
each of the question you have formulated. For example:

• number of enrolled in each program


• number of graduates in each program
• number of graduates for each program within the prescribed time
• profile of the enrollees during the first semester of the program
Note: Organizing, summarizing and interpreting data can be done through the use of a SPSS which
will be discussed in detail in MODULE 2 for this course.

Drawing Conclusions
With the available data you organized and summarized, you can now work on your
conclusions by answering each of the questions you formulated.

• Answer for Questions 1 ---- number of graduates in each program


• Answer for Questions 2 & 5 ---- number of enrolled in each program and number of
graduates for each program within the prescribed time
• Answer for Question 3 & 4 ---- profile of the enrollees during the first semester of the
program

An analysis of the result of your study can help you traced the roots or causes of certain
completion rate issues and concerns which can be used as a basis in reviewing and enhancing
each program of QSU.
In the example you worked on, you have collected different types of information from
each graduate. The object that you are collecting information from is called the observational
unit. In this case, the observational unit is the graduates. The different types of information
you collect for each graduate are called variables. Here some variables are the names of the
programs, number of graduates for each program, the percentage of completion rate within the
prescribed time and the percentage of completion rate per program. There are two distinct
types of variables depending on how the variable is recorded. The name of the program is an
example of a categorical variable -- this is in which its values can be grouped into different
4
categories. The percentage of completion rate from each program is a quantitative variable --
this is a variable where the values are numerical and refer to the quantity or size of something.
For our second example, suppose we record the current status of learning readiness
under the new normal setting of the Bachelor of Science in computer Science students.
Here the student would be the observational unit. Learning readiness and addresses of
students would be a categorical variable, and the number of those having gadgets and
connectivity and those who have none would be quantitative variables.
Sometimes it can be difficult to tell if a collected “number” is a categorical or a
quantitative variable. Nevertheless, when we collect data, it is important to recognize if a given
data value represents a categorical or quantitative variable. Our exploration of data will depend
on its type. The way we explore categorical data will be fundamentally different from our
treatment of quantitative.
Statistics represent a common method of presenting information helping us to
understand what the data are telling us. Statistical knowledge helps you use the proper
methods to collect the data, employ the correct analyses, and effectively present the results.
Basically, it helps us understand the world a little bit better through numbers and other
quantitative information. Below are but some examples of the importance and uses of statistics
in our daily lives and the events happening around us.

The Role of Statistics

1. Statistics teaches people to use a limited sample to make intelligent and accurate
conclusions about a greater population. The goal of virtually all quantitative research
studies is to identify and describe relationships among constructs. The use of tables,
graphs, and charts play a vital role in presenting the data being used to draw these
conclusions.

2. Statistics is one of the most important disciplines to provide tools and methods to find
structure in and to give deeper insight into data, and the most important discipline to
analyze and quantify uncertainty. Data are collected in a very systematic manner and
conclusions are drawn based on the data.

3. At a basic level, statistical techniques allow us to aggregate and summarize data in


order for researchers to draw conclusions from their study.

SELF–CHECK!
Now that you were introduced to the elements, components and importance of
statistics, let us try to gauge how far have you gone in understanding the
discussions. Please write your answers in your big notebook specific for the course.
Complete the following statements below by filling the blanks with the appropriate
word/s.

a) _____(1)________ is defined as a mathematical body of science that pertains to


the ____(2)_______, _______(3)______ and analysis, interpretation or
_________(4)__, and presentation of ____(5)____, in such a way that meaningful
conclusions can be drawn from them.
b) _____(6)_______ is defined as the collection of facts, such as _____(7)____,
words, measurements, ______(8)_________or just descriptions of things which
can either be ______(9)_______ or ______(10)_______.
c) Qualitative data is ______(11)________ information while quantitative data is
_______(12)________ information.
d) The third component of statistics is ________(13)___________.
e) The object that you are collecting information from is called the
_____(14)_________

LET’S TRY THIS!


Now that you were able to test your knowledge on the important concepts and
features of statistics, try doing the following activities confidently.
5
Activity 1 (Identifying quantitative and qualitative data)

As you walk, or in the car or at home, look around and ask yourself questions about the
world around you. Then write down 5 of those questions then identify the qualitative and
quantitative aspects of each.

Examples:
1. How many trees are there?
2. How many houses are there?
3. How may are busy working? Bystanders?
4. Which stores/outlet are selling the most?

Use the matrix below in doing your activity. Referring to questions above you
should be able to extract information from the given observable unit.

Question Qualitative Data Quantitative Data

COMPILE YOUR KNOWLEDGE


To help you refresh your knowledge on the lesson, here are few pointers for you to
remember:
1. Statistics deals with the collection and analysis of data
2. Data plays an important role as a unit of information in statistics.
3. Data is either qualitative or quantitative.
4. Formulating questions, collecting data, organizing and summarizing information
and drawing conclusions thereafter are the basic components of statistics.
5. Statistics is used for data mining, speech recognition, vision and image
analysis, data compression, artificial intelligence, and network and traffic
modeling. A statistical background is essential for understanding algorithms
and statistical properties that form the backbone of computer science.

DEBUG YOUR SKILLS


Quiz 1 (Working on a given topic)
Select one from any topics below then elaborate the topic observing the
process involved in the four components of statistics. For each component
correctly and perfectly discussed, you will be assessed based on the criteria
found on rubric #2.
1. Flexible Learning
2. Frontliners
6
3. Covid-19

ANSWER KEY TO SELF–CHECK

1. Statistics
2. collection,
3. organization
4. explanation
5. data
6. Data
7. Numbers
8. observations
9. qualitative
10.quantitative
11. descriptive
12. numerical
13. Organizing and summarizing
14. observational unit

LESSON 02
The Research Process

LET’S PROCESS
As cited by Calderon and Gonzales (1993), research in general is a systematic,
refined, careful, critical and disciplined inquiry varying in method directed to the
clarification and/or resolution of a problem.

RESEARCH…
• starts with a problem,
• collects data or facts,
• analyzes and interprets these critically and
• reaches a decision based on actual evidence.

• A research method is a process through which you are going to move the reader from
questions, to data, to findings, and to conclusion. Findings become unconvincing if
process is poor and research cannot be replicated if the process is unclear.

Thus, the research process involves identifying, locating, assessing, and


analyzing the information you need to support your research question, and then
developing and expressing your ideas.

The typical quantitative study involves a series of steps, one of which is statistical
analysis.

Source: Abraham Fischler, Nova Southwestern University -School of Education

• RESEARCH…
• starts with a problem,
7
• collects data or facts,
• analyzes and interprets these critically and
• reaches a decision based on actual evidence.

Step 1: Formulate the Research Questions


• Research questions reflect the problem that the researcher wants to investigate.
• Research questions can be formulated based on theories, past research, previous
experience, or the practical need to make data-driven decisions in a work environment.
• Research questions are vitally important because they, in large part, dictate what type
of statistical analysis is needed as well as what type of research design may be employed.

Examples of Research Questions:


1. How is financial need related to retention after the freshman year of college?
2. What types of advertising campaigns produce the highest rates of inquiries among
prospective students of QSU?
3. How do males and females differ with respect to statistics self-efficacy?
4. How does a body image curriculum improve body image in college females?
5. Which mode of learning delivery is preferred by the majority in the new normal?
6. What is the perception of parents regarding the opening of classes in the new
normal?
While research questions are fairly general, hypotheses are specific predictions about
the results, made prior to data collection.
Examples of Hypotheses:
1. As financial need increases, the likelihood of retention decreases.
2. Personalized letters result in more inquiries than brochures.
3. Males have higher levels of self-efficacy than females.
4. Body image will improve as a result of the new curriculum.
5. Blended learning is best for the new normal.
6. Parent are in favor of opening classes on October 5, 2020.
Step 2: Operationalize & Choose Measures
1. Many variables of interest in education and society are abstract concepts that cannot
be directly measured.
2. This doesn’t preclude us from studying these things, but requires that we clearly
define the specific behaviors that are related to the concept of interest.
Measuring Abstract Concepts
1. How does one measure retention, inquiry rate, statistics self-efficacy, body image,
preference and perception?
2. The process of defining variables and choosing a reliable and accurate measurement
tool is called operationalizing your variables.
3. Good measurement is vital to the trustworthiness of your results!
Step 3: Choose a Research Design
This step involves developing a plan for collecting the data we need which will serve as
the “blueprint” of the study. This is called research design, and includes things such as:
1. Who will participate in the study?
2. Who will receive the intervention?
3. Will there be a “control group”?
4. Will data be collected longitudinally?
5. What instrument will be used to collect data?
6. What type of data will be collected?
The choice of design impacts the validity of your final results. Threats of validity can be
internal which are problems associated with the experimental procedures or experiences of
participants or external which can cause problems that affect the generalizability of the results
Step 4: Analyze the Data
Once the data have been collected, the results must be organized and summarized so
that we can answer the research questions which is the very purpose of statistics. The choice
of analysis at this stage depends entirely on two prior steps which are:
8
1. The research questions
2. How the variable is measured
Step 5: Draw Conclusions
After analyzing the data, we can make judgments about our initial research questions
and hypotheses. Are these results consistent with previous studies? The conclusions drawn
from a study may provide a starting point for new research.
Despite the anxiety usually associated with statistics, data analysis is a relatively small
piece of the larger research process.! There is a misconception that the trustworthiness of
statistics is independent of the research process itself. This is absolutely incorrect! A statistical
analysis can in no way compensate for a poorly designed study.

SELF–CHECK!
Can you recall some fundamental concepts of research? Consider doing the
following activity.

I. Supply the missing correct word/s in the paragraph below about research.

Research in general is a systematic, ________________, careful, ____________


and disciplined _____________ varying in method directed to the clarification and/or
resolution of a problem. Research always starts with a _________________. The
research process involves ________________, locating, _______________, and
________________ the _____________ you need to support your research question, and
then developing and expressing your ideas.

II. Enumerate the five research processes.

LET’S TRY THIS!


Identify the research process involved in the following statements. Please write
the name of the whole process. Example: Draw conclusions

1. Ten female respondents answered to the survey with a YES or NO while 15 male
respondents answered either YES or NO.
2. Competencies of graduates do not match the industries preference.
3. What is the level of awareness of students with the QSU VMGO?
4. Descriptive statistics will be used in analyzing the data.
5. The respondents of the study will be the BSCS students.

COMPILE YOUR KNOWLEDGE


Always REMEMBER!

Research observes a process which starts with a problem, then it collects


data or facts to be analyze which if interpreted critically will help you reach sound
conclusion which eventually will help you arrived at a sound decision.

iCONNECT
Go over scholarly articles at google scholar.com or www.tandfonline.com for any
flexible learning or IT related topics. This will help you be more familiar with the
processes of research.

9
DEBUG YOUR SKILLS
Using the knowledge you gained in the research process, provide the details of
the following topics basing on and observing the research process.

1. Mobile gaming and academic performance of students.


2. Workers and quarantine health and safety protocols.

ANSWER KEY TO SELF–CHECK


Research in general is a systematic, refined, careful, critical and disciplined
inquiry varying in method directed to the clarification and/or resolution of a
problem. Research always starts with a problem. The research process involves
identifying, locating, assessing, and analyzing the information you need to
support your research question, and then developing and expressing your ideas.

LESSON 03
Statistical Terminology

LET’S PROCESS
We can spend the whole term or semester in defining the different
terminologies you will come across as you study statistics. Thus, this study will
focus on the terms basically used in statistics. These are population, sample, parameter and
variables.

Population
➢ It is the entire set of individuals or objects that you are interested in studying.
➢ The group that you want to generalize your results to.
➢ It can vary in sizes, they are usually quite large that’s why it is usually not feasible to
collect data from the entire population.

Sample
➢ It is a subset of individuals selected from the population.
➢ In the best case, the sample will be representative of the population.
➢ The characteristics of the individuals in the sample will mirror those in the population.

Parameter
➢ It is a quantitative characteristic of the population that you’re interested in estimating
or testing (such as a population mean or proportion).
➢ These are generally unknown, and must be estimated from a sample.
➢ The sample estimate is called a statistic
Examples: retention rate, average level of self-efficacy, body image, preference and perception

Variables
➢ A characteristic that takes on different values for different individuals in a sample.
Example:
▪ Retention (yes/no)
▪ Inquiry about QSU programs (yes/no)
▪ Self-efficacy (score on self-efficacy questionnaire)
▪ Body image (score on body image questionnaire)
▪ Flexible learning (percentage of preferred mode of teaching delivery in a survey)
▪ School Opening (number of favorable for school opening in a survey)
➢ Any characteristics, number, or quantity that can be measured or counted. It is also
called a data item.

10
Examples: age, sex, business income and expenses, country of birth, capital
expenditure, class grades, eye color and vehicles.

➢ Its value may vary between data units in a population, and may change in value over
time.

Common Variable Types:


1. INDEPENDENT VARIABLES (IV)
➢ The “explanatory” variable
➢ The variable that attempts to explain or is purported to cause differences in a
second variable.
➢ The “intervention” in experimental designs
Example: Does a new curriculum improve body image? The curriculum is the IV.
This means that the whatever happened to the body image, it will not have any
effect on the curriculum.

2. DEPENDENT VARIABLES (DV)


➢ The “ outcome ” variable
➢ The variable that is thought to be influenced by the independent variable.
Example: Does a new curriculum improve body image? Body image is the DV. This
means that the improved body image is dependent on the curriculum

3. CONFOUNDING VARIABLES
➢ represent unwanted sources of influence on the DV
➢ sometimes referred to as “nuisance” variables.
Example: Does a new curriculum improve body image?
Such things as heredity, family background, previous counseling experiences,
etc. can also impact the DV.

How to Control Confounding Variables


Typically, researchers are interested in excluding, or controlling for, the effects of
confounding variables. This is generally not a statistical issue, but is accomplished by
the research design. In certain types of designs (e.g., experiments) better control the
effects of confounding variables. If an experiment or an equivalent control group is not
possible.

iCONNECT
To further your understanding on the dependent and independent variables, you
may go over the article by Market Research Guy found exhibited here as Appendix A or if you
are online and if possible you can click on this site:
https://www.mymarketresearchmethods.com/dependent-independent-variables-whats-
difference/.

SELF–CHECK!
Now that you’re through with the lesson, let us try checking your knowledge on
the following:
1. The four (4) basic terms used in statistics
2. It is the entire set of individuals or objects that you are interested in
studying.
3. It is a quantitative characteristic of the population that you’re interested in
estimating or testing (such as a population mean or proportion).
4. It is a subset of individuals selected from the population.
5. Any characteristics, number, or quantity that can be measured or counted. It
is also called a data item.
6. common variable Types

11
LET’S TRY THIS!
Activity 1 (Identifying IVs and DVs)

State what is the IV and DV in the following:


1. Do students prefer learning statistics online or face to face?
2. How do students who have never had statistics compare to students who have
previously had statistics in terms of their anxiety levels?

COMPILE YOUR KNOWLEDGE


1. There are four common terms basically used in statistics. They are
population which is the entire set of individuals or objects of the study,
first is sample which is a subset of individuals selected from the
population; parameter- a quantitative characteristic of the population and
variables which is a characteristic that takes on different values for
different individuals in a sample.
2. The common types of variables are dependent variable (DV), independent
variable (IV) and the confounding variable.

DEBUG YOUR SKILLS

Identify the IV and the DV in the following research titles:


1. Administrator’s Role in Teacher’s Professional Development
2. Feeding Program and the Children’s Weight
3. Students’ Academic Performance and Teachers’ Methodology
4. Improve Mathematical Ability and Reading Comprehension
5. The Effects of Sleeping in Students’ Tests Scores

LESSON 04
Scales of Measurement

LET’S PROCESS
For any given variable that you are interested in, there may be a variety of
measurement scales that can be used. Variable measurement is the second factor
that influences the choice of statistical procedure. Say,
What is your annual income? _________
What is your annual income?
a. 10,000-20,000
b. 20,000-30,000
c. 30,000-40,000
d. 40,000-50,000
e. 50,000 or above
Scales of measurement can be nominal, ordinal, interval or ratio. In nominal scale,
observations fall into different categories or groups and differences among categories are
qualitative, not quantitative. Examples are gender, ethnicity, counseling method (cognitive vs.
humanistic), retention (retained vs. not retained).
On the other hand, class standing, letter grades (A,B,C,D,F) and Likert-scale survey
responses (SD, D, N, A, SA) are examples of ordinal scale. In this scale, categories can be rank
ordered in terms of amount or magnitude. Also, categories possess an inherent order, but the
amount of difference between categories is unknown.
In interval scale, categories are ordered, but now the intervals for each category are
exactly the same size. That is, the distance between measurement points represent equal

12
magnitudes (e.g., the distance between point A and B is the same as the distance between B
and C). The examples of this scale could include Fahrenheit scale of measuring temperature,
chronological scale of dates (1997 A.D.) and Standard scores (z-scores).
Moreover, ratio scale has same properties as the interval scale, but with an additional
feature. Ratio scale has an absolute 0 point which permits the use of ratios (e.g., A is “twice
as large” as B). Examples of this scale are number of children, weight, height, annual income,
etc.

There are different ways variables can be described according to the ways they can be
studied, measured and presented. In practice, it is not usually necessary to make such fine
distinctions between measurement scales for two distinctions, categorical and continuous are
usually sufficient.

Level of Measurement

In practice, the four levels of measurement can usually be classified as follows:

Source: abs.gov.au/websitedbs

There are different ways variables can be described according to the ways they can be
studied, measured, and presented.
Numeric variables have values that describe a measurable quantity as a number, like
'how many' or 'how much'. Therefore, numeric variables are quantitative variables. Numeric
variables may be further described as either continuous or discrete:

➢ A continuous variable is a numeric variable. Observations can take any value between a
certain set of real numbers. The value given to an observation for a continuous variable
can include values as small as the instrument of measurement allows. Continuous
variables are generally preferable because a wider range of statistical procedures can be
applied. Continuous variables yield values that fall on a numeric continuum, and can
(theoretically) take on an infinite number of values. Examples of continuous variables
include height, time, age, and temperature. Further examples include:
What is the level of measurement of:
▪ Temperature OC?
▪ Color?
▪ Income of professional basketball players?
▪ Degree of importance (1 = not important, 5 = very important)

➢ A discrete variable is a numeric variable. Observations can take a value based on a


count from a set of distinct whole values. A discrete variable cannot take the value of a
fraction between one value and the next closest value. Examples of discrete variables
include the number of registered cars, number of business locations, and number of
children in a family, all of which measured as whole units (i.e. 1, 2, 3 cars).

Note: The data collected for a numeric variable are quantitative data.

13
Categorical variables have values that describe a 'quality' or 'characteristic' of a data
unit, like 'what type' or 'which category'. Categorical variables fall into mutually exclusive
(in one category or in another) and exhaustive (include all possible options)
categories. Therefore, categorical variables are qualitative variables and tend to be
represented by a non-numeric value. Categorical variables consist of separate, indivisible
categories

➢ An ordinal variable is a categorical variable. Observations can take a value that can be
logically ordered or ranked. The categories associated with ordinal variables can be
ranked higher or lower than another, but do not necessarily establish a numeric
difference between each category. Examples of ordinal categorical variables include
academic grades (i.e. A, B, C), clothing size (i.e. small, medium, large, extra large) and
attitudes (i.e. strongly agree, agree, disagree, strongly disagree).

➢ A nominal variable is a categorical variable. Observations can take a value that is not
able to be organized in a logical sequence. Examples of nominal categorical variables
include sex, business type, eye color, religion and brand.

Note: The data collected for a categorical variable are qualitative data.

SELF–CHECK!
Let’s review on some basic aspects of the lesson.
1. List the four (4) scales of measurement.
2. What are the ways in which variables can be presented?

LET’S TRY THIS!


Let’s do the following exercises:

A. Indicate which scale or level of measurement is involve in the following scenario.


1. The teacher of a class of third graders records the height of each student.
2. The teacher of a class of third graders records the percentage that each student got
correct on the last science test.
3. A film critic lists the top 50 greatest movies of all time.
4. A meteorologist compiles a list of temperatures in degrees Celsius for the month of
May.
5. The roster of a basketball team lists the jersey numbers for each of the players.
6. Which of these is NOT an example of a nominal scale?

a. Numbers on a football jersey


b. Number on pool balls
c. Gender
d. Exam Grades

COMPILE YOUR KNOWLEDGE


The following pointers can help you remember the basic concepts of the lesson.

Scale of Measurement Simple Hints


Nominal Scale Qualitative/Categorical
Data Examples: Names, Gender, Colors, Labels, etc.
Orders does not matter

Ordinal Scale Data Ranking/Placement


The order matters
Differences cannot be measured
Interval Scale Data The order matters
Differences can be measured (except ratios)
No true “0” starting point
14
The order matters
Differences are measurable
Contains “0” starting point

DEBUG YOUR SKILLS


Choose the appropriate scale for the following.

1. As part of the requirements for the admission to the university, students need to
take the English Proficiency Test (EPT) which scores can range from 75 to 150 with
a population mean of 500 and a standard deviation of 100.
2. Children in an elementary school were evaluated for their reading readiness
through the PHIL-IRI.
3. During an interview with the survivors of an earthquake are asked to state “yes” or
“no” as to whether they experienced Post-Traumatic Stress Disorder (PTSD). The
number “0” is assigned to “no” and “number “1” is assigned to “yes”.
4. A certain university wants to know the dormitory preference of the students. The
administrators assigned a rank to each dorm based on applications received.
5. A researcher wants to determine whether the temperature of the customers
recorded in the logbook of a supermart to compare the temperature of older
customers and the younger customers.

ANSWER KEY TO SELF–CHECK


1. Ordinal, nominal, interval, ratio
2. Numeric and categorical

15
MODULE 06
WORKING WITH
DATA ON SPSS

LEARNING OBJECTIVES

1. Familiarize the SPSS window


2. Identify the functions of the different menu of SPSS
3. Identify and use appropriate statistical tool for data analysis and
interpretation using SPSS

LOADING YOUR KNOWLEDGE


The SPSS software package was created for the management and
statistical analysis of social science data. Most top research agencies use SPSS to
analyze survey data and mine text data so that they can get the most out of their
research projects.
Working on data is a complex and time-consuming process, but this
software can easily handle and operate information with the help of some
techniques. These techniques are used to analyze, transform, and produce a
characteristic pattern between different data variables. The output can be
obtained through graphical representation so that a user can easily understand
the result.

INITIALIZE YOUR KNOWLEDGE


Begin by installing SPSS 23 in your computers.
The following set of instructions will walk you through installing IBM SPSS
Statistics on your computer. Version 23 is recommended if you are running Windows 10. If
you are using an older Operating System, you may use Version 21.

We suggest that you first obtain the SPSS license code before you begin downloading
SPSS. Obtain the SPSS license code from this link:
http://ezp.waldenulibrary.org/limited/spsslicense.html You will need to enter your Walden
user name and password if you are not already logged into the Library or Blackboard. Simply
copy and paste the code into a Word document so that you have it available when prompted
to enter it at the end of the installation sequence. You can always enter the code later; however,
having it on hand to enter during the installation is much easier. SPSS Statistics software
installation link for Windows is given below. Access the appropriate installation link depending
on your operating system: SPSS v23 Windows 32-bit install:
http://mym.cdn.laureatemedia.com/2dett4d/software/IBM/SPSS/v23/Windows/32-
bit/SPSSSC_32- BIT_23.0_MW_ML.zip
16
SPSS v23 Windows 64-bit install
- http://mym.cdn.laureatemedia.com/2dett4d/software/IBM/SPSS/v23/Windows/64-
bit/SPSSSC_64- BIT_23.0_MW_ML.zip

Not sure which if you have a 32-bit or 64-bit Windows Operating System? Access
Microsoft’s page for further clarification. This installation requires at least 1GB of free space
on your computer. Because of the large size of the installation file, it is recommended that you
are on a DSL or better internet connection. Even with a strong internet connection, the
installation may still take up to 30 minutes or longer. While the tool is installing, you may
continue to work within other applications on your computer.
Does your course require SPSS AMOS? If yes, access the install link and license code.
If you are using a Mac operating system, you may follow the instructions which can be found
in page 2 of Appendix B.

LET’S PROCESS
What is SPSS?
Statistical Package for the Social Sciences (SPSS) is used by various kinds of
researchers for complex statistical data analysis. The software package was created for
the management and statistical analysis of social science data. It was originally
launched in 1968 by SPSS Inc., and was later acquired by IBM in 2009. Most users
refer to it as SPSS though it is officially dubbed IBM SPSS Statistics. As the world
standard for social science data analysis, SPSS is widely coveted due it’s straightforward
and English-like command language and impressively thorough user manual.
Basically, SPSS first store and organize the provided data, then it compiles the
data set to produce suitable output. SPSS is designed in such a way that it can handle
a large set of variable data formats.
SPSS is used by market researchers, health researchers, survey companies,
government entities, education researchers, marketing organizations, data miners, and
many more for the processing and analyzing of survey data. Most top research agencies
use SPSS to analyze survey data and mine text data so that they can get the most out
of their research projects.

What are the Core Functions of SPSS?


SPSS offers four programs that assist researchers with their complex data analysis
needs.
1. Statistics Program. This program provides a plethora of basic statistical
functions, some of which include frequencies, cross tabulation, and bivariate
statistics.
2. Modeler Program. This program enables researchers to build and validate
predictive models using advanced statistical procedures.
3. Text Analytics for Surveys Program. This program helps survey administrators
uncover powerful insights from responses to open ended survey questions.
4. Visualization Designer. This program allows allows researchers to use their data
to create a wide variety of visuals like density charts and radial boxplots with ease.

In addition to what had been mentioned above, SPSS also:


➢ provides solutions for data management, which allow researchers to perform
case selection, create derived data, and perform file reshaping.
➢ offers the feature solution of data documentation, which allows researchers to
store a metadata dictionary --- acts as a centralized repository of information
pertaining to data such as meaning, relationships to other data, origin, usage,
and format.

There are a handful of statistical methods that SPSS can provide help with including:
➢ Descriptive statistics, including methodologies such as frequencies, cross tabulation,
and descriptive ratio statistics.
➢ Bivariate statistics, including methodologies such as analysis of variance (ANOVA),
means, correlation, and nonparametric tests.
➢ Numeral outcome prediction such as linear regression.
➢ Prediction for identifying groups, including methodologies such as cluster
analysis and factor analysis.

17
Why use SPSS?
SPSS is an extremely powerful tool for manipulating and deciphering survey
data. Exporting survey data to SPSS’s proprietary .SAV format makes the process of pulling,
manipulating, and analyzing data clean and easy. By so doing, SPSS will automatically set up
and import designated variable names, variable types, titles, and value labels, meaning that
minimal legwork is required from researchers. Once survey data is exported to SPSS, the
opportunities for statistical analysis are practically endless. In short, when you need a flexible,
customizable way to get super granular on even the most complex data sets, use SPSS. This
gives you more time to do what you do best and identify trends, develop predictive models, and
draw informed conclusions.

The SPSS Navigation Window


For a detailed instruction on the functions and features of the SPSS window and how
to use it, PLEASE refer faithfully to APPENDIX C which is found at the end of the course
modules. To help you further with this task you can reach me through zoom (you should have
crated your zoom account so that you can access the link I will be sending you), video
conferencing via messenger or through phone call but be sure you are with your SPSS window
by the time of your call.

SELF–CHECK!
1. SPSS stand for ________________ ____________________ _______________
__________.
2. It was originally launched in __________ and acquired by _________ in 2009.
3. SPSS is used by (give at least three users).
4. SPSS is a powerful tool for _________________ and ______________ of survey data.
5. The core functions of SPSS include:

LET’S TRY THIS!


The following exercises form part of the Using SPSS course. Save your activity in
your folder with your file name.

Exercise 1: Defining variables and entering data


You are responsible for collecting data from a clinical trial of Drug X. Drug X is postulated to
affect blood levels of a certain hormone (hormone H), so levels of the hormone will be measured
before and after treatment with X. In addition to the hormone data, five other pieces of
information will be collected from each participant in the trial.
1. Switch to the Variable View and define the seven variables listed in the table below. Use
numeric variables in SPSS for categorical data.

Variable name Data type


Surname
Gender Categorical (categories are Female and Male)
Age Continuous
Income Continuous
Smoker Categorical (categories are Smoker and Non-smoker)
Hbefore Continuous
Hafter Continuous
2. The table below shows the data which you have collected from five patients who took part in a pilot
study for the clinical trial. Switch to the Data Editor and enter the data on these five patients.
Remember to use numeric codes where necessary.

Surname Gender Age Income Smoker Hbefore Hafter


ROBBINS Female 32 46000 Non- 94.58 88.79
smoker

18
MCGREGOR Male 33 58000 Non- 106.12 78.25
smoker
KUMAR Male 38 47000 Smoker 88.11 102.45
ALLINSON- Female 51 55000 Non- 83.62 63.82
HENRY smoker
OLDER Male 44 28000 Non- 72.31 77.50
smoker
Save the data as pilotgroup.sav.

Exercise 2: Variable and value labels


Open the data file medicaltrialX.sav in SPSS.
1. Create the following labels for the variables:
income Household income
smoker Smoker or non-smoker
hbefore Blood levels of H before treatment
hafter Blood levels of H after treatment
2. Create the following value labels for the gender variable:
1 = Female 2 = Male
3. Save the file with the new definitions.

Exercise 3: Missing data


Some of the data in medicaltrialX.sav contains missing values.
1. Inspect the data in your data sheet and spot any missing values. Take note of which
variables have missing values.
2. For each of the variables identified above, decide on an appropriate coding for missing
values. For example, for a numeric variable which cannot be negative, -1 might be
used. For text data, an X might be used.
3. In Variable View, specify missing values for each variable identified in paragraph 1.
above.
4. After specifying the missing values, return to the Data Editor and change any blank
cells to the appropriate missing value code.
5. Run the Frequencies command which shows that the missing values have been coded.
(Hint: Use the menu option Analyse | Descriptive statistics | Frequencies, move all the
variables into the right-hand box, then click OK.)
6. Close the output window (do not save the output). Save the changed data file, giving it
the new name fixed.sav .

Exercise 4: Importing an Excel file


1. Start a new, blank data sheet.
2. Import the file results.xls into SPSS, using the column headings as variable names.
3. Switch to Variable View. Add a new variable called SEN, and give it the label “Special
Educational Needs”.
4. The SEX variable (as in an earlier exercise) uses values of 1 and 2 to indicate “Female”
and “Male” respectively. Enter the value labels for the SEX variable.
5. Save the file to results2.sav.

Exercise 5: Sort Cases and Select Cases


Load the data file medicaltrialX.sav.
1. Sort the data in order of age (oldest first).
2. Sort the file in order of smoker within gender (i.e. gender is the primary ordering).
3. Select all the males in the group.
4. Select “all cases” again.

19
5. Now select the patients whose hormone levels were greater after the treatment than
they were before.

There is no need to save the data file at this point.


If you have extra time, see if you can select all the smokers.

Exercise 6: Recoding variables


Helpful hint:
If you still have a subset of the cases selected (from the previous exercise), make sure you
select all cases before you proceed.

We need to recode some of the data in medicaltrialX.sav (or fixed.sav). The information in the
smoker column is coded as text (Y and N). It would be better to code it as numeric data (e.g. 1
and 0), and to use labels to indicate the meaning of these numbers.
1. Use Recode to convert the smoker information into a new variable called smoker1, so
that 'Y' becomes 1 and 'N' becomes 0. (Define the label for this new variable as
"Smoker or non-smoker")
2. Create value labels for the smoker1 variable so that 1 is displayed as 'Smoker', and 0
is displayed as 'Non-smoker'.
3. Check that the recode has worked properly. If it has, then delete the old variable
smoker from your data sheet.
4. Using Recode, create a new variable incband (with label "Income band") to categorise
the household income: up to $25,000 as band 1, between $25,000 and $40,000 as
band 2, and more than $40,000 as band 3.
5. Save the data file if you are happy with the results of this exercise.

Exercise 7: Computing variables


Open the data file results.sav created in Exercise 2. This shows the exam scores for a class of
high-school students.
1. Each student has a percentage mark for Maths, English, and History. Compute a new
variable, named total, to calculate their total score out of 300.
2. We wish to compute the average (mean) mark over the three tests for each student.
Compute a new variable, average, calculating this information.
3. Save the file.

Exercise 8: Creating and saving output


This exercise uses the file results.sav created in Exercise 2.
1. Use the Case Summaries command to produce a listing of your data.
2. Save the SPSS output in a file called result1.spo.
3. Produce a printout of the SPSS Viewer window. (Or use Print preview if you don't have
access to a printer.)

Exercise 9: Frequencies command


This exercise uses the medicaltrialX data, and the incband variable as calculated earlier. If
you do not have this, you can load the data file medicaltrialX-part2.sav.
1. Create a Frequencies command with a bar chart to find the following:
• The numbers within each of the three incband classifications.
• How many subjects are male, and how many female.
2. Create a Frequencies command for the two hormone level variables (i.e. hbefore and
hafter). Include the following in the output:
• Do not display the actual frequency tables.
• A histogram with a normal curve superimposed for each
variable.
• Produce values for the mean, mode and median.

20
Objects from the SPSS Viewer can be copied and pasted into other applications (e.g.
Microsoft Word or Excel). If you have time, open a new document in Microsoft Word. Copy
and paste one of the histograms which you have just produced into the Word document.
Save it with the title histogram.doc. Do not save the SPSS Viewer output.

Exercise 10: Descriptives command


Use the medicaltrialX data you have been working on — or you may wish to load the data file
medicaltrialX-part2.sav.
1. Use the Descriptives command to display the default information for age and income.
• How old is the oldest participant?
• And the youngest?
• What is the average household income of the participants?
2. Produce a Descriptives command for the variables hbefore and hafter. This time use
the Options to include the skewness and range in the output.
• Which of the two measurements has the largest range of
readings? What is the range?
• Use the Help menu to find out what skewness means. Using
this information, which of the two measurements do you think
is closest to being normally distributed? Explain why.

Exercise 11: Crosstabulations command


This exercise uses the incband column as defined earlier. If you do not have this variable in
your data, you can load the data file medicaltrialX-part2.sav.
1. Run a Crosstabs command for the variables incband and gender, including the
following information:
• Each cell of the table should list the observed values, the
expected values, and the unstandardised residuals.
• Also run the chi-square test.
2. Run a second Crosstabs command for the same variables. This time do not run the
chi-square test, but include the following within each cell of the table:
• The row, column and total percentages.

Exercise 12: Means command


Use the medicaltrialX data you have been working on — or you may wish to load the data file
medicaltrialX-part2.sav.
1. Build a Means command where the dependent variables are the two sets of hormone
level measurements (i.e. hbefore and hafter) and the independent variable is incband.
Answer the following questions:
• Which group achieves the highest average hormone level, and is
this before or after treatment?
• Which group has the lowest mean hormone level before the
treatment?
• Which group shows the most varied hormone levels before the
treatment? And after? (Hint: look at the standard deviations.)
2. Build a similar means command as the previous one. This time make gender the
independent variable. Answer the following question:
• Which gender showed the largest increase in blood hormone
levels, on average?
3. Build another Means command which will again analyse the results for each subject,
but this time looking at the results for each gender within each incband group. (Hint:
enter one category variable in the independent list and click on the Next button before
inserting the second category variable.)
• Which group of men show the highest mean level of H before
treatment? What is that mean level?
• In the highest income category, which gender shows the
highest mean level of H after treatment?

21
Exercise 13: T Tests
Use the medicaltrialX data you have been working on - or you may wish to load the data file
medicaltrialX-part2.sav.
1. Perform a T Test to show whether there is a significant difference in the inital hormone
levels (hbefore) between men and women. Is there?
2. Perform a T Test across all the cases, to decide whether there is a significant difference
in the mean hormone concentrations before and after the treatment.
3. Now use the Select cases function to select only the women in the study, and repeat
step (2). What do you find?
If you have time, repeat the test, selecting men instead of women.

Exercise 14: Correlation


Before we test for correlations, we need to calculate a new variable. Refer to the Computing
new data section in the course workbook if you need to. Use the medicaltrialX data you have
been working on — or you may wish to load the data file medicaltrialX-part2.sav.
1. We need to calculate the change in blood levels of hormone H using the two values we
already have (the blood levels before and after). Create a new variable dh using the
Compute function to calculate the "after" level minus the "before" level.

Now to perform the correlations.


2. Measure the strength of association between income and age with a correlation
coefficient and its significance.
3. Measure the strength of association between age and dh with a correlation coefficient
and its significance.
4. Produce a scatter plot of dh against age (placing dh on the vertical axis), and:
• Include a title.
• Produce a fit line on the graph (Hint: Double-click on the chart
to edit it, then click on one of the data points to select the data
points, and from the menus choose Chart | Add chart
element | Fit line at total).

Exercise 15: Regression


Use the medicaltrialX data you have been working on — or you may wish to load the data file
medicaltrialX-part2.sav.
Perform a linear regression analysis, analysing the dependence of dh upon age (i.e. the same
variables as in the previous exercise), drawing some conclusions about the regression line
produced.

Exercise 16: Graphical plots


Use the medicaltrialX data you have been working on — or you may wish to load the data file
medicaltrialX-part2.sav.
1. Produce a pie chart showing the number of people that fall into each of the three
income bands.
• Include a title in the output.
• Produce labels for the frequency of each income band
and the percentage of the population.
2. Produce a clustered bar chart summarising the mean "before" and "after"
measurements separately, for men and for women (hint: gender is the category
variable).

COMPILE YOUR KNOWLEDGE


With its core functions, SPSS is an extremely powerful tool for manipulating
and deciphering survey data. It makes the process of pulling, manipulating, and
analyzing data clean and easy then it automatically set up and import designated variable
names, variable types, titles, and value labels, meaning that minimal legwork is required from

22
researchers which gives you more time to do what you do best and identify trends, develop
predictive models, and draw informed conclusions.

DEBUG YOUR SKILLS


Using/Observing the steps you practiced in the “Let’s Try This” section of this
module, do the activities found in the APPENDIX C of this course module.

ANSWER KEY TO SELF–CHECK

1. Statistical Package for the Social Sciences


2. 1968, IBM
3. market researchers, health researchers, survey companies, government
entities, education researchers, marketing organizations, data miners
4. manipulating, deciphering
5. Statistics Program, Modeler Program, Text Analytics for Surveys Program
and Visualization Designer

23
MODULE 07
DESCRIPTIVE
STATISTICS

LEARNING OBJECTIVES

1. Define statistics, data and variables.


2. Identify and use the components of statistics in given topics.
3. Discuss the role and importance of statistics in decision making
process
4. Discuss the role of computer science in statistics and vice versa.
5. Explain the typical research process
6. Apply the scales of measurement

Purposes: of Descriptive Statistics


1. Provides basic information about variables in a dataset and
2. Highlights potential relationships between variables.

LESSON 01
Measures of Central Tendency

LET’S PROCESS
The study of statistics can be categorized into two main branches. These
branches are descriptive statistics and inferential statistics.
To collect data for any statistical study, a population must first be defined.
'Population' indicates a group that has been designated for gathering data from.
The data is information collected from the population. A population is not necessarily
referring to people. A population could be a group of people, measurements of rainfall
in a particular area or a batch of batteries.
Descriptive statistics give information that describes the data in some manner.
For example, suppose a pet shop sells cats, dogs, birds and fish. If 100 pets are sold,
and 40 out of the 100 were dogs, then one description of the data on the pets sold would
be that 40% were dogs.

24
This same pet shop may conduct a study on the number of fish sold each day for
one month and determine that an average of 10 fish were sold each day. The average is
an example of descriptive statistics.
Some other measurements in descriptive statistics answer questions such as
'How widely dispersed is this data?', 'Are there a lot of different values?' or 'Are many of
the values the same?', 'What value is in the middle of this data?', 'Where does a
particular data value stand with respect with the other values in the data set?'
A graphical representation of data is another method of descriptive statistics.
Examples of this visual representation are histograms, bar graphs and pie graphs, to
name a few. Using these methods, the data is described by compiling it into a graph,
table or other visual representation.
This provides a quick method to make comparisons between different data sets
and to spot the smallest and largest values and trends or changes over a period of time.
If the pet shop owner wanted to know what type of pet was purchased most in the
summer, a graph might be a good medium to compare the number of each type of pet
sold and the months of the year.
Three measures of central tendency
i. Mean
➢ The mean is simply the arithmetic average.
➢ The mean would be the amount that each individual would get if
we took the total and divided it up equally among everyone in the
sample
➢ Alternatively, the mean can be viewed as the balancing point in
the distribution of scores (i.e., the distances for the scores above
and below the mean cancel out)
ii. Median
➢ The median is the score that splits the distribution exactly in half
➢ 50% of the scores fall above the median and 50% fall below
➢ The median is also known as the 50th percentile, because it is the
score at which 50% of the people fall below
A desirable characteristic of the median is that it is not affected by extreme
scores. Thus, the median is not distorted by skewed distributions.
Example: Sample 1: 18, 19, 20, 22, 24
Sample 2: 18, 19, 20, 22, 47
iii. Mode
➢ The mode is simply the most common score.
➢ There is no formula for the mode
➢ When using a frequency distribution, the mode is simply the score
(or interval) that has the highest frequency value
➢ When using a histogram, the mode is the score (or interval) that
corresponds to the tallest bar
Unfortunately, no single measure of central tendency works best in all circumstances.
Nor will they necessarily give you the same answer.

Distribution Shape and Central Tendency


In a normal distribution, the mean, median, and mode will be approximately
equal.
In a skewed distribution, the mode will be the peak, the mean will be pulled
toward the tail, and the median will fall in the middle.

Choosing the Proper Statistic


For Continuous data
➢ Always report the mean
➢ If data are substantially skewed, it is appropriate to use the median as well

For Categorical Data


➢ For nominal data you can only use the mode
➢ For ordinal data the median is appropriate (although people often use the
mean)

25
LESSON 02
Measures of Variability

LET’S PROCESS
The fluctuation of scores about a central tendency is called “variability.
We can use measures of variability to compare two sets of scores.
Although the means may be the same, the distribution may be different.

Measure of Variability
1. Range
➢ Range is the distance between two extreme scores.
➢ ! It informs us about the dispersion of our distribution.
➢ The larger the range the larger the dispersion from the mean value.
➢ Although the mean of the scores of two distributions can be identical their
ranges may be different.
Drawbacks to the Range
Good preliminary measure, but one single extreme value can influence the range
significantly. The calculation of the range is derived from the highest and lowest values and
doesnʼt tell us anything about the variability of the different values.
2. Standard Deviation
➢ Defined as the variability of the scores around the mean
➢ Each score in a distribution varies from the mean by a greater or lesser
amount, except when the score is the same as the mean.
➢ Deviations from the mean can be noted as either positive or negative deviations
from the mean.
➢ The average of these deviations would equal “ zero.

3. Variance
➢ The variance and the closely-related standard deviation are measures of how
spread out a distribution is.

26
LESSON 03
Frequency Distribution

LET’S PROCESS
After collecting data, researchers are faced with pages of unorganized numbers,
stacks of survey responses, etc. The goal of descriptive statistics is to aggregate the
individual scores (datum) in a way that can be readily summarized. A frequency
distribution table can be used to get “picture” of how scores were distributed.

➢ A frequency distribution displays the number (or percent) of individuals that


obtained a particular score or fell in a particular category.
➢ As such, these tables provide a picture of where people respond across the
range of the measurement scale.
➢ One goal is to determine where the majority of respondents were located.

When to Use Frequency Tables


➢ Frequency distributions and tables can be used to answer all descriptive research
questions.
➢ It is important to always examine frequency distributions on the IV and DV when
answering comparative and relationship questions

Three Components of a Frequency Distribution Table


1. Frequency is the number of individuals that obtained a particular score (or response).
2. Percent is the corresponding percentage of individuals that obtained a particular
score.
3. Cumulative Percent is the percentage of individuals that fell at or below a particular
score (not relevant for nominal variables).
Example:
1. What are the ages of students in an online course?
2. Are students likely to recommend the course to others?

Step 1: Input the Data into SPSS


Step 2: Run the Frequencies
Analyze Descriptive Statistics Frequencies
Move variables to the variable box (select the variables and click on the arrow).
Click OK.

Frequency distribution showing the ages of students who took the online course.

Frequency table of Student responses when asked whether or not they would
recommend the online course to others.

Most would recommend the course

LET’S TRY THIS!


Running the Descriptive Statistics
1. Are there differences in the anxiety levels of students who have had statistics before
versus students who have never had statistics?

Step 1: Input the data into SPSS


Step 2: Run the descriptive statistics
Analyze Descriptive Statistics Frequencies
Anxiety = Dependent List
Stats History = Independent List
Click Options
• Move Median over
27
• Move Minimum over
• Move Maximum over
Click Continue
Click OK
Step 3: Create a Histogram for Anxiety with a normal curve option
Graphs Legacy Dialogs Histogram
Variable = anxiety
Check the “Display normal curve” check box
Click Ok
Step 4: Write up the results
➢ Descriptive statistics revealed that students who had previous experience
with statistics (M = 57.00, SD = 16.43) had lower anxiety at the beginning of
the semester than students who did not have any previous experience with
statistics (M = 84.00, SD = 11.40) .

28
29

You might also like