Stat Basics-of-Research
Stat Basics-of-Research
A variable is a characteristic or
condition that can change or take on
different values. Ex?
The values of the variable are data
5
Continuous variables have
Real Limits
To define the units for a continuous
variable, a researcher must use real
limits which are boundaries located
exactly half-way between adjacent
categories. Ex. 1.5 to 2.5
1 2 3 4
5
6
Types of Variables (3)
Univariate
Bivariate
Multivariate
Types of Variables (3)
Univariate, bivariate, and multivariate
are terms used to describe the number of
variables being analyzed in a statistical
analysis.
• Univariate: Analyzes one variable at a
time
• Bivariate: Analyzes two variables at a
time
• Multivariate: Analyzes more than two
variables at a time
Types of Variables (3)
Examples
• Univariate: Analyzing the length of iris
flower sepals in a dataset
• Bivariate: Analyzing the relationship
between temperature and ice cream
sales
• Multivariate: Analyzing how the
popularity of advertisements on a website
depends on age, gender, and location
Types of Variables (4)
Independent Variables
Dependent Variables
Note: In Experiments: IV-DV
Is there an effect of IV on DV
In Correlation:
Is there a relationship between IV & DV
In Differences:
Is there a difference in DV when grouped by
IV?
Measuring Variables
Researchers must observe the
variables and record their
observations. This requires that the
variables be measured.
The process of measuring a variable
requires a set of categories called a
scale of measurement and a process
that classifies each individual into one
category.
11
4 Types of Measurement Scales
(NOIR)
1.nominal scale
2.ordinal scale
3.interval scale
4.ratio scale
12
4 Types of Measurement Scales
1. A nominal scale is an unordered set
of categories identified only by
name. Nominal measurements only
permit you to determine whether two
individuals are the same or different.
Examples:
1=male, 2=female
1=public, 2=private
1=accountancy, 2=nursing, 3=criminology
13
4 Types of Measurement Scales
2. An ordinal scale is an ordered set of
categories. Ordinal measurements tell
you the direction of difference between
two individuals. But does not indicate
how much is the difference.
15
4 Types of Measurement Scales
4. A ratio scale is an interval scale where a
value of zero indicates none of the
variable. Ratio measurements identify
the direction and magnitude of
differences and allow ratio comparisons
of measurements.
16
Review of Definitions
Variable - any characteristic of an individual or entity. A variable can
take different values for different individuals. Variables can be
categorical or quantitative: Discrete or Continuous.
• Nominal - Categorical variables with no inherent order or ranking sequence
such as names or classes (e.g., gender). Value may be a numerical, but without
numerical value (e.g., I, II, III). The only operation that can be applied to Nominal
variables is enumeration.
• Ordinal - Variables with an inherent rank or order, e.g. mild, moderate, severe.
Can be compared for equality, or greater or less, but not how much greater or less.
• Interval - Values of the variable are ordered as in Ordinal, and additionally,
differences between values are meaningful, however, the scale is not absolutely
anchored. Calendar dates and temperatures on the Fahrenheit scale are examples.
Addition and subtraction, but not multiplication and division are meaningful
operations.
• Ratio - Variables with all properties of Interval plus an absolute, non-arbitrary
zero point, e.g. age, weight, temperature (Kelvin). Addition, subtraction,
multiplication, and division are all meaningful operations.
QUIZ: SELFIE
List 15 sentences that explain ideas
about basic concepts in statistics
discussed in class. Each sentence must
contain a basic statistical concept.
Underline the concept.
Ex.
1. a variable is a characteristic that can
have different values.
QUIZ: Multiple choice
1. The characteristics of a sample are
called statistic while the characteristics
of a population are called______.
a. inferential statistic
b. variable
c. descriptive statistic
d. parameter
MODULE 2
BASIC CONCEPTS
IN RESEARCH
BASIC CONCEPTS IN research
What are the ways of knowing?
What is research?
What is Basic & Applied Research?
What are the 3 basic purposes of
research?
What are the stages of research
What is the traditional thesis format
What is the IMRAD format of research?
Ways of Knowing
Five ways we can know something
Personal experience
Tradition
Experts and authorities
Logic
Inductive
Deductive
The scientific method (Research)
Obj.
1.2
What is Research?
Research is Creative work undertaken
systematically to increase the stock of
knowledge (of humanity, culture and society),
and the use of this knowledge to devise new
applications (OECD)
Ex. Thesis/Dissertation:
Theory Building
Theory Testing
Theory Expansion
Basic vs Applied
APPLIED
Seeks to increase understanding in order to
address the needs of an individual, group or
organization.
Ex. Culminating Project/Thesis
Assessment/Diagnosis
Program Development
Material Development
Evaluation Research
Case Study
Action Research
Purpose of Research
Description
-Document a particular phenomena
-Usually involves survey, naturalistic observations, case
studies, review of records (archives), etc
Explanation
– Establish causality
– Usually involves experimental methods
Prediction
• To forecast the variables, events and behaviors
associated With or resulting from the phenomenon
• Correlational studies
Stages of conducting research:
Research report format
Traditional Thesis Format
Chap 1: THE PROBLEM & ITS BACKGROUND
Rationale
Statement of the Problem
Hypotheses
Theoretical/Conceptual/Analytical
Framework
Assumptions
Significance of the Study
Scope & Delimitations
Definition of Terms
Research report format
Traditional Thesis Format
Chap II: REVIEW OF RELATED
LITERATURE AND STUDIES
Data Collection
Data collection
Types of Data
Methods of Data Collection
Sampling
Sampling Techniques
Sampling Size
Types of Data:
Quantitative and Qualitative Data
IPDET © 2009 39
Qualitative Approach
Data that deal with description
Data that can be observed or self-
reported, but not always precisely
measured
Less structured, easier to develop
Can provide “rich data” — detailed and
widely applicable
Is challenging to analyze
Is labor intensive to collect
Usually generates longer reports
IPDET © 2009 40
Which Data? Quali-Quanti?
IPDET © 2009 41
Obtrusive vs. Unobtrusive
Methods
Obtrusive Unobtrusive
data collection data collection
methods that methods that do not
directly obtain collect information
information from directly from evaluees
those being e.g., document analysis,
evaluated GoogleEarth,
e.g. interviews, observation at a distance,
surveys, focus trash of the stars
groups
IPDET © 2009 42
What method shall I use?
It all depends…
Sample:
A subset of the population
Target Population:
The population to be studied/ to which the
investigator wants to generalize his results
Types of sampling strategies:
Probability: Nonprobability:
• Why? • Why? Generalizability
Generalize to not as important.
Want to focus on
population.
“right cases.”
Some examples:
Some examples:
– Simple random
– Quota sample
sample
– “Purposeful” sample
– Stratified sample
– “Convenience” or
– Cluster sample “opportunity” sample
– Systematic sample
• Random sampling
– Each subject has a known probability of
being selected
• Allows application of statistical sampling
theory to results to:
– Generalise
– Test hypotheses
• Ensure
– Representativeness
– Precision
Stratified sampling
Multi-stage sampling
Cluster sampling
Simple random sampling
Table of random numbers
684257954125632140
582032154785962024
362333254789120325
985263017424503686
Systematic sampling
Sampling fraction
Ratio between sample size and population
size
Systematic sampling
Cluster sampling
Cluster: a group of sampling units close to each
other i.e. crowding together in the same area or
neighborhood
Cluster sampling
Section 1 Section 2
Section 3
Section 5
Section 4
Cluster Sampling
Divide the population
into groups (called
clusters), randomly
select some of the
groups, and then
collect data from ALL
members of the
selected groups
Used extensively by
government and
private research
organizations
Examples:
Exit Polls
Stratified sampling
Divide into groups and randomly sample from
each group (Male & Female) (Year level)
Multi-stage sampling
Ex. Randomly sample 20 of 80 provinces
From each of the 20 provinces, randomly
select 5 towns
From each town, select 10 barangays
Sampling Size
Depends on Type of Research:
If Survey,
Slovin's Formula. - is used to calculate
the sample size (n) given the population size
(N) and a margin of error (e). -It is computed as
n = N / (1+Ne2). - If a sample is taken from a
population, a formula must be used to take
into account confidence levels and margins of
error.
Sampling Size-recomended
If Experiment- 15 per cell
If Interview- 15
If case study, 1 to 5
If correlation, more than 30
MODULE 4
ORGANIZING AND
PRESENTING DATA
ORGANIZING AND PRESENTING DATA
Distribution of data
Frequency distribution
Cumulative frequency distribution
Presenting Categorical variables
Bar Graph
Pie chart
Presenting Numerical data
Histogram
Boxplot
Pareto chart
Ogive
Distribution
Distribution - (of a variable) tells us what values the variable
takes and how often it takes these values.
• Unimodal - having a single peak
• Bimodal - having two distinct peaks
• Multimodal-having several peaks
• Unimodal Symmetric - left and right half are mirror images.
• Also known as normal distribution
Frequency Distribution
Consider a data set of 26 children of ages 1-6 years. Then the
frequency distribution of variable ‘age’ can be tabulated as
follows:
Frequency Distribution of Age
Age 1 2 3 4 5 6
Frequency 5 3 7 5 4 2
Grouped Frequency Distribution of Age:
Age Group 1-2 3-4 5-6
Frequency 8 12 6
Cumulative Frequency
Cumulative frequency of data in previous page
Age 1 2 3 4 5 6
Frequency 5 3 7 5 4 2
Cumulative Frequency 5 8 15 20 24 26
Cumulative Frequency 8 20 26
Data Presentation
Two types of statistical presentation of data - graphical and numerical.
Graphical Presentation: We look for the overall pattern and for striking
deviations from that pattern. Over all pattern usually described by
shape, center, and spread of the data. An individual value that falls
outside the overall pattern is called an outlier.
Bar diagram and Pie charts are used for categorical variables.
Histogram, stem and leaf ,Box-plot are used for numerical variable.
Data Presentation –Categorical
Variable
Bar Diagram: Lists the categories and presents the percent or count of
individuals who fall in each category.
30
25
1 15 (15/60)=0.25 25.0
20
15 2 25 (25/60)=0.333 41.7
10
5
3 20 (20/60)=0.417 33.3
0 Total 60 1.00 100
1 2 3
Treatm ent Group
Data Presentation –Categorical
Variable
Pie Chart: Lists the categories and presents the percent or count of
individuals who fall in each category.
1 15 (15/60)=0.25 25.0
25% 1 2 25 (25/60)=0.333 41.7
33%
2 3 20 (20/60)=0.417 33.3
Mean 90.41666667
Figure 3: Age Distribution
Standard Error 3.902649518
16 Median 84
14 Mode 84
Number of Subjects
Pie chart
Table
AKA frequency
distributions –
good if more
than 20
observations
Good if more
than 20
observations Bar chart
Distributions
The distribution of scores or values can also be
displayed using Box and Whiskers Plots and Histograms
Continuous Categorical
It is possible to take
continuous data
(such as hemoglobin
levels) and turn it
into categorical data
by grouping values
together. Then we
can calculate
frequencies and
percentages for each
group.
Continuous Categorical
Distribution of
Glasgow Coma
Scale Scores
Even though
this is
continuous
data, it is
being treated
as “nominal”
as it is broken
down into
groups or
Tip: It is usually better to collect continuous data and then break it categories
down into categories for data analysis as opposed to collecting data
that fits into preconceived categories.
Ordinal Level Data
Frequencies and percentages can be computed
for ordinal data
– Examples: Likert Scales (Strongly Disagree to Strongly
Agree); High School/Some College/College
Graduate/Graduate School
Interval/Ratio Data
We can compute frequencies and percentages
for interval and ratio level data as well
– Examples: Age, Temperature, Height, Weight,
Many Clinical Serum Levels
Distribution of Injury Severity
Score in a population of patients
Interval/Ratio Distributions
The distribution of interval/ratio data often
forms a “bell shaped” curve.
– Many phenomena in life are normally
distributed (age, height, weight, IQ).
• MAKE A FREQUENCY DISTRIBUTION AND
GRAPH OF THE GIVEN DATA
Averages and
measures of position
(Measures of Central
Tendency)
Numerical Presentation
A fundamental concept in summary statistics is that of a
central value for a set of observations and the extent to
which the central value characterizes the whole set of data.
03/30/25 106
Methods of Central value (or
measurement)
Center measurement is a summary measure of the overall level of
a dataset
Mean: Summing up all the observation and dividing
by number of observations. Mean of 20, 30, 40 is
(20+30+40)/3 = 30.
Notation : Let x1 , x2, ...xn are n observations of a variable
x. Then the mean of this variable,
n
x
x1 x2 ... xn i 1 i
x
n n
Compute: Mean
Given Numbers:
7 26 54 82 32 26 51
Total up the numbers
7+26+54+82+32+26+51= 287
Divide the total by the n (number of
values)
(287 / 7 = 39.71)
03/30/25 108
Methods of Center Measurement
Median: The middle value in an ordered sequence
of observations.
Given: 3,3,4,5,5,6,6,6,6,6,7,7,7,8,8,9
Mode = 6
Raw Data: 2, 0, 1, 0, 2, 2, 1, 0, 0, 2, 1, 3,
3, 3, 2, 1, 3, 5, 1, 1, 0, 3, 1, 2, 2
Do not attempt the find the middle value of a
raw data set. It must first be sorted.
Median
ta: 2, 0, 1, 0, 2, 2, 1, 0, 0, 2, 1, 3, 3, 3, 2, 1, 3, 21, 1, 1, 0, 3, 1, 2,
2
Sorted Data: 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2,
2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 5
Median
Ranked: 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
2, 3, 3, 3, 3, 3, 5
Median is 2
Median
Median: (1 + 2) / 2
Median: 1.5
Effects of outliers on
Mean and Median
The median is less sensitive to outliers (extreme
scores) than the mean and thus a better measure
than the mean for highly skewed distributions, e.g.
family income.
For example mean of 20, 30, 40, and 990 is
(20+30+40+990)/4 =270. The median of these four
observations is (30+40)/2 =35. Here 3 observations
out of 4 lie between 20-40.
So, the mean 270 really fails to give a realistic
picture of the major part of the data. It is influenced
by extreme value 990.
Measures of Central Tendency – And Outliers
When there is an
Real outlier, your reporting
World options are to report:
Use
(1)Median, or
2 ( x1 x ) 2 .... ( xn x ) 2
S
n 1
Variance of 5, 7, 3? Mean is (5+7+3)/3 = 5 and the variance is
(5 5) 2 (3 5) 2 (7 5) 2
4
3 1
Standard Deviation: Square root of the variance. The standard
deviation of the above example is 2.
Methods of Variability Measurement
Quartiles: Data can be divided into four regions that cover the total
range of observed values. Cut points for these regions are known as
quartiles.
In notations, quartiles of a data is the ((n+1)/4)q th observation of the
data, where q is the desired quartile and n is the number of
observations of data.
The first quartile (Q1) is the first 25% of the data. The second quartile
(Q2) is between the 25th and 50th percentage points in the data. The
upper bound of Q2 is the median. The third quartile (Q3) is the 25% of
the data lying between the median and the 75% cut point in the data.
Box Plot: A box plot is a graph of the five number summary. The
central box spans the quartiles. A line within the box marks the
median. Lines extending above and below the box mark the
smallest and the largest observations (i.e., the range). Outlying
samples may be additionally plotted outside the range.
Boxplot
Distribution of Age in Month
160
160
140
140
120
120 q1
100 q1
100 min
min
80 median
80 median
60 max
60 max
q3
40 q3
40
20
20
0
0
1
1
Choosing a Summary
The five number summary is usually better than the mean and standard
deviation for describing a skewed distribution or a distribution with
extreme outliers.
2000
800
1000
1500
600
1000
400
500
500
200
0
0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
z z z
n=2 n=5 n = 15
mean of sample mean of sample mean of sample
means = 10 means = 10 means = 10
SD of sample means = SD of sample means = SD of sample means =
4.16 2.41 0.87
Summary of the Variable ‘Age’ in
the given data set
Mean 90.41666667 Histogram of Age
10
Median 84
Mode 84
8
Standard Deviation 30.22979318
Number of Subjects
6
Sample Variance 913.8403955
Kurtosis -1.183899591
4
Skewness 0.389872725
Range 95 2
Minimum 48
0
Maximum 143
40 60 80 100 120 140 160
Sum 5425
Age in Month
Count 60
Summary of the Variable ‘Age’ in
the given data set
100
80
60
Class Summary (First Part)
So far we have learned-
Statistics and data presentation/data summarization
Any questions ?
USING STATISTICAL SOFTWARE
Statistical Softwares-Which to use?
There are many softwares to perform statistical analysis and visualization
of data.
Some of them are SAS (System for Statistical Analysis), S-plus, R, Matlab,
Minitab, BMDP, Stata, SPSS, StatXact, Statistica, LISREL, JMP, GLIM, HIL,
MS Excel etc. We will discuss MS Excel and SPSS in brief.
http://www.galaxy.gmu.edu/papers/astr1.html
http://ourworld.compuserve.com/homepages/Rainer_Wuerlaender/
statsoft.htm#archiv
http://www.R-project.org
Microsoft Excel
A Spreadsheet Application. It features calculation, graphing tools,
pivot tables and a macro programming language called VBA (Visual
Basic for Applications).
There are many versions of MS-Excel. Excel XP, Excel 2003, Excel 2007
are capable of performing a number of statistical analyses.
Worksheet: Consists of a multiple grid of cells with numbered rows down the
page and alphabetically-tilted columns across the page. Each cell is referenced by
its coordinates. For example, A3 is used to refer to the cell in column A and row 3.
B10:B20 is used to refer to the range of cells in column B and rows 10 through 20.
Microsoft Excel
Opening a document: File Open (From a existing workbook). Change the
directory area or drive to look for file in other locations.
Creating a new workbook: FileNewBlank Document
Saving a File: FileSave
Selecting more than one cell: Click on a cell e.g. A1), then hold the Shift key
and click on another (e.g. D4) to select cells between and A1 and D4 or Click on a
cell and drag the mouse across the desired range.
Creating Formulas: 1. Click the cell that you want to enter the
formula, 2. Type = (an equal sign), 3. Click the Function Button, 4.
Select the formula you want and step through the on-screen
instructions.
fx
Microsoft Excel
Entering Date and Time: Dates are stored as MM/DD/YYYY. No need to
enter in that format. For example, Excel will recognize jan 9 or jan-9 as
1/9/2007 and jan 9, 1999 as 1/9/1999. To enter today’s date, press Ctrl and ;
together. Use a or p to indicate am or pm. For example, 8:30 p is interpreted
as 8:30 pm. To enter current time, press Ctrl and : together.
Copy and Paste all cells in a Sheet: Ctrl+A for selecting, Ctrl +C for copying
and Ctrl+V for Pasting.
EDIT used to copy and paste data values; used to find data in a
file; insert variables and cases; OPTIONS allows the user to
set general preferences as well as the setup for the
Navigator, Charts, etc.
VIEW user can change toolbars; value labels can be seen in cells
instead of data values
EDIT undo and redo a pivot, select a table or table body (e.g., to
change the font)
Click on the toolbar (but not on one of the pushbuttons) and then drag the toolbar to
its new location
• Customize a toolbar
You will now have an SPSS data file containing the former tab-delimited data. You
simply need to add variable and value labels and define missing values.
1. Open the data file (from the menus, click on FILE ⇒ OPEN ⇒ DATA) of
interest.
DESCRIPTIVES
PASTE SPECIAL
4. Select Formatted Text (RTF) and then click on OK
5. Enlarge the graph to a desired size by dragging one or more of the black squares
along the perimeter (if the black squares are not visible, click once on the graph).
Statistics Package
for the Social Science (SPSS)
BASIC STATISTICAL PROCEDURES: CROSSTABS
1. From the ANALYZE pull-down menu, click on DESCRIPTIVE STATISTICS ⇒
CROSSTABS.
2. The CROSSTABS Dialog Box will then open.
3. From the variable selection box on the left click on a variable you wish to
designate as the Row variable. The values (codes) for the Row variable make up
the rows of the crosstabs table. Click on the arrow (>) button for Row(s). Next,
click on a different variable you wish to designate as the Column variable. The
values (codes) for the Column variable make up the columns of the crosstabs
table. Click on the arrow (>) button for Column(s).
4. You can specify more than one variable in the Row(s) and/or Column(s). A
cross table will be generated for each combination of Row and Column variables
Statistics Package
for the Social Science (SPSS)
Limitations: SPSS users have less control over data manipulation and
statistical output than other statistical packages such as SAS, Stata etc.
n!
n Pr .
(n r )!
Factorial Formula for Combinations
n Pr n!
n Cr .
r ! r !(n r )!
10.3 – Using Permutations and Combinations
Evaluate each problem.
a) 5P3 b) 5C3 c) 6P6 d) 6C6
54
6 1 720 1
3
0 0
10.3 – Using Permutations and Combinations
How many ways can you select two letters followed by
three digits for an ID if repeats are not allowed?
Two parts:
1. Determine the set of two letters. 2. Determine the set of three
P digits. P
26 2 10 3
26 10
25
650 98
720
650
720
468,0
00
10.3 – Using Permutations and Combinations
A common form of poker involves hands (sets) of five cards each,
dealt from a deck consisting of 52 different cards. How many
different 5-card hands are possible?
Hint: Repetitions are not allowed and order is not important.
52 C5
2,598,9 5-card
60 hands
10.3 – Using Permutations and Combinations
Find the number of Find the number of
different subsets of size 3 arrangements of size 3 in the
in the set: {m, a, t, h, r, o, set: {m, a, t, h, r, o, c,
c, k, s}. k, s}.
9C3 9P3
98
50 arrange
7
4 ments
84 Different
subsets
10.3 – Using Permutations and Combinations
Guidelines on Which Method to Use
8. PROBABILITY
Basic
Probability
Concepts
Male, Female
Definition
Complement ==> sometimes, we want to know
the probability that an event will not happen; an
event opposite to the event of interest is called
a complementary event.
If A is an event, its complement is The
probability of the complement is AC or A
Example: The complement of male event
is the female
P(A) + P(AC) = 1
Views of Probability:
1-Subjective:
1- Multiplication rule
P(B)
1- Multiplication rule
Dependence and
the modified multiplication rule
P(B) P(B\A)
2- Addition rule
A and B are mutually exclusive
The occurrence of one event precludes
the occurrence of the other
Addition
Rule
P(A) P(B)
Mo
d ifi
Ad ed
d i ti
on
Ru
le
P(A) P(B)
P(A ∩ B)