STA02 Lab Prelim Module 1
STA02 Lab Prelim Module 1
MODULE 1:
INTRODUCTION TO
STATISTICAL
METHODOLOGY
STATISTICAL
ANALYSIS
ARJ AY T. ALTOVAR
C A S - M N S D I N S T R U C TO R
Statistics
Statistics is the science of collecting and organizing, summarizing,
analyzing, and drawing conclusions from data
1
24/09/2023
Reminders in using
Population/Census Data
▪ All students in SLSU ▪ Sampled students in • In answering objectives like determining significant difference, correlation, or
Lucban, Quezon SLSU Lucban, Quezon effect…
Selected residents in Statistical Tests like Hypothesis Tests are no longer needed.
▪ All residents of Quezon ▪
Province Quezon Province
You can answer your objectives by using Descriptive Statistics and analyzing them.
Sampling Method
SAMPLING TECHNIQUES
Statistical Inference
2
24/09/2023
Variables and
Data Types
▪ Names and Proper Names ▪ Size (Small, Medium, Large) ▪ (Distance is meaningful and ▪ (division is defined between
equal, no true zero) two variables)
▪ Religion ▪ Level of Agreement (Strongly
▪ Temperature
▪ Gender Agree, Agree, Disagree, ▪ Age
Strongly Disagree) ▪ IQ Scores
▪ Civil Status (Single, ▪ Weight (in kgs, in lbs, etc.)
▪ Aptitude Test Scores
Married, Widowed, etc.) ▪ Age Structure (Children, ▪ Height (in cm, in ft, etc)
Early Working Age, Prime ▪ Rating preference on a scale
of 1-10 ▪ Exam Scores
Working Age, etc.)
3
24/09/2023
Reminder on
Variables and Data Types
Topic Outline
I. Frequency Distribution Table
I. Frequency Distribution
4
24/09/2023
I. Frequency Distribution
The number of classes or k are set by the researcher, which is between 5 and 20. You may
use this formula if you do not know what number to set as your number of classes:
To determine the class limits, you must determine the class interval
or class width which can be calculated as:
Descriptive Summary
The range can be computed as:
Measures
If the computed value has decimal places, round-up the number. (Ex. 6.7 = 7).
Descriptive Summary
In practice… Measures: Measure of Central Tendency
• Researchers present or
summarize data collected by • Measures of Central
tables and graphs.
Tendency
• Other measures are also used
to make discussion of results
• Measures of Variation
much more insightful. • Measures of Position
5
24/09/2023
Appropriate Average
Inappropriate Averages
• Reporting a median for unordered qualitative data with nominal
measurement
▪ Mean is most appropriate for any ▪ Mode is always appropriate for • Reporting a mean for ANY categorical data.
kind of Numerical Data Categorical Data.
▪ Median can be used when data
are considered Ordinal level of
measurement
6
24/09/2023
7
24/09/2023
MEASURES OF VARIATION
Preview
▪ Averages are important, but they tell only part of the story ▪ Standard Deviation is the ▪ Measures of variation are
measure of variation that is essentially NON-EXISTENT for
▪ When summarizing a set of data, we not only specify measures of central most appropriate to any categorical data (nominal).
tendency, such as the mean, but also measures of dispersion numerical data
▪ If categorical data, it is
▪ Coefficient of Variation is most appropriate to describe
appropriately used when you
variation by identifying
have two datasets with different
unit of measurement you want
extreme scores (highest and
to compare directly lowest)
8
24/09/2023
Range Example
Other Formula
9
24/09/2023
Illustration continued..
Averages for each objective: Interpretation for each measure of variation:
1. Age: Std. Dev. = 1.31 years old 1. Age: On average, the respondents age
varies around 1.31 years around the mean
(mean = 18 years old)
2. Quiz 1: Std. Dev. = 5.24 2. Template: The average score of the students
Make-up Quiz 1: Std. Dev. = 6.52 in Quiz 1/Make-up Quiz 1/Prelim/Midterm is
Prelim: Std. Dev. = 16.71 varies around 5.24/6.52/16.71/9.31 around
Midterm: Std. Dev. = 9.31 the mean.
2. Median 2. =MEDIAN()
3. =AVERAGE()
3. Mean
4. =VAR.P(), =VAR.S()
4. Variance
5. =STDEV.P(), STDEV.S()
5. Standard Deviation
10
24/09/2023
References
Topic Outline
III. Outlier
11
24/09/2023
Step 2:
II Interquartile Range
Interquartile Range
Definition
The Interquartile range (IQR) is the difference
between 𝑸𝟏 and 𝑸𝟑 and is the range of the middle
50% of the data. This is used in finding outliers.
IQR= 𝑸𝟑 - 𝑸𝟏
12
24/09/2023
III. Outliers
Outlier
Definition
An outlier is an extremely high or extremely low
data values when compared with the rest of the data
values.
Example:
Check the following data set for outliers.
5, 6, 12, 13, 15, 18, 22, 50
Definition
A boxplot is a graph of a data set where a
horizontal line from the minimum value to 𝑸𝟏 and
from 𝑸𝟏 to the maximum values and a box whose
vertical sides pass through 𝑸𝟏 and 𝑸𝟏 with a
vertical line inside the box passing through the
median are drawn.
13
24/09/2023
IV. Exploratory Data Analysis (EDA) IV. Exploratory Data Analysis (EDA)
Box-Plot Interpretation:
Example: 1. If the median is near the center of the box, the distribution is
The number of diseased persons found in 10 cities in approximately
the Philippines is 89, 47, 164, 296, 30, 215, 138, 78, 48, symmetric.
39. Construct a boxplot for the data. 2. If the median falls to the left of the center of the box, the
distribution is positively
skewed.
3. If the median falls to the right of the center, the distribution is
negatively skewed.
4. If the lines are about the same length, the distribution is
approximately symmetric.
5. If the right line is longer than the left line, the distribution is
positively skewed.
6. If the left line is larger than the right line, the distribution is
negatively skewed.
IV. Exploratory Data Analysis (EDA) IV. Exploratory Data Analysis (EDA)
Distributional Shapes
Distributional Shapes
STATISTICAL ANALYSIS
ARJ AY T. ALTOVAR
C A S - M NS D I N S T R U CTOR
14