0% found this document useful (0 votes)
31 views14 pages

STA02 Lab Prelim Module 1

The document provides an introduction to statistical methodology, covering key concepts such as descriptive and inferential statistics, data collection methods, and sampling techniques. It emphasizes the importance of understanding variables, data types, and levels of measurement in statistical analysis. Additionally, it outlines measures of central tendency and variation, including examples and appropriate applications for different types of data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views14 pages

STA02 Lab Prelim Module 1

The document provides an introduction to statistical methodology, covering key concepts such as descriptive and inferential statistics, data collection methods, and sampling techniques. It emphasizes the importance of understanding variables, data types, and levels of measurement in statistical analysis. Additionally, it outlines measures of central tendency and variation, including examples and appropriate applications for different types of data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

24/09/2023

MODULE 1:
INTRODUCTION TO
STATISTICAL
METHODOLOGY
STATISTICAL
ANALYSIS
ARJ AY T. ALTOVAR
C A S - M N S D I N S T R U C TO R

Statistics
Statistics is the science of collecting and organizing, summarizing,
analyzing, and drawing conclusions from data

Descriptive Statistics Inferential Statistics


Introduction to Statistical Methodology: consists of generalizing from samples
are used to describe the basic
Statistics, Population, Sample features of the data and provide to populations, performing
estimations and hypothesis tests,
and Sampling Techniques simple summaries about the sample
and the measures.
determining relationships among
variables, and making predictions.

METHOD OF DATA COLLECTION


Population or
Sample?
▪ Direct observation ▪ Data from
• Scientific problems concern large
▪ Experiment institutions/agencies groups or populations about which
▪ Published inference are made.
▪ Questionnaire/Survey
studies/articles • A subset of the population is called
▪ Business Reports a sample and make inferences
about the entire population.

1
24/09/2023

Reminders in using
Population/Census Data

▪ All students in SLSU ▪ Sampled students in • In answering objectives like determining significant difference, correlation, or
Lucban, Quezon SLSU Lucban, Quezon effect…

Selected residents in Statistical Tests like Hypothesis Tests are no longer needed.
▪ All residents of Quezon ▪
Province Quezon Province
You can answer your objectives by using Descriptive Statistics and analyzing them.

Sampling Method
SAMPLING TECHNIQUES

▪ Simple Random Sampling ▪ Convenience Sampling


Population Sample
▪ Systematic Random ▪ Purposive Sampling
Sampling ▪ Referral/Snowball Sampling
▪ Stratified Sampling ▪ Quota Sampling
▪ Cluster Sampling

Statistical Inference

When is a sample SAMPLING TECHNIQUES


suitable for
inference?
▪ Random basis of selection ▪ Subjective basis of selection
• A sample is suitable for inference
when it is randomly selected. ▪ Used for conclusive ▪ Used for exploratory research
• The selection itself is crucial for
inference to prevent sampling
research ▪ Produces a biased result
bias. ▪ Can make statistical ▪ Can make analytical inferences
• If the sample is random, statistical inferences
tests can be performed, and ▪ The hypothesis is generated
conclusions about the population ▪ The hypothesis is tested
can be inferred.

2
24/09/2023

Introduction to Statistical Methodology:

Variables and
Data Types

Variables LEVELS OF MEASUREMENT


Interval
Random Variable
▪ Variables whose values are determined by chance
Qualitative Variable
▪ Variables that can be placed into distinct categories, Ratio
according to same characteristic or attribute. Also known
as Categorical Variables Nominal
Quantitative Variable
▪ Variables that can be ordered or ranked. Also known as
Ordinal
Numerical Variables
▪ Has two types: Discrete and Continuous

LEVELS OF MEASUREMENT LEVELS OF MEASUREMENT

▪ Names and Proper Names ▪ Size (Small, Medium, Large) ▪ (Distance is meaningful and ▪ (division is defined between
equal, no true zero) two variables)
▪ Religion ▪ Level of Agreement (Strongly
▪ Temperature
▪ Gender Agree, Agree, Disagree, ▪ Age
Strongly Disagree) ▪ IQ Scores
▪ Civil Status (Single, ▪ Weight (in kgs, in lbs, etc.)
▪ Aptitude Test Scores
Married, Widowed, etc.) ▪ Age Structure (Children, ▪ Height (in cm, in ft, etc)
Early Working Age, Prime ▪ Rating preference on a scale
of 1-10 ▪ Exam Scores
Working Age, etc.)

3
24/09/2023

Reminder on
Variables and Data Types

• It is important to remember that the appropriateness of the type statistical


tests to be used depends on the measurement scale of the data to be
analyzed.

Topic Outline
I. Frequency Distribution Table

II. Measure of Central Tendency


A. Mean
B. Median
MODULE 2: C. Mode
DESCRIPTIVE STATISTICS
II. Measure of Variation
A. Range
B. Variance
C. Standard Deviation
D. Coefficient of Variation

I. Frequency Distribution

Frequency Distribution Table

A frequency distribution is the organization of raw


data in a table form, using classes and frequencies

Two most used types of frequency distributions:


Categorical Frequency Distribution
Grouped Frequency Distribution

4
24/09/2023

I. Frequency Distribution Table I. Frequency Distribution Table


Grouped Frequency Distribution
Categorical Frequency Distribution
Class Class Tally Frequency Classmark <cf >cf Relative
Plants/ Frequency Relative Cumulative Relative Limits Boundaries Frequency
Quadrats Frequency Frequency Cumulative
Frequency
24-30 III
0 268
1 316 31-37 I
2 135
38-44 IIIII
3 61
4 15 45-51 IIIII-IIII
5 3
52-58 IIIII-I
6 1
7 1 59-65 I
SUM
Distribution of the number of hours that AA batteries lasted

I. Frequency Distribution

The number of classes or k are set by the researcher, which is between 5 and 20. You may
use this formula if you do not know what number to set as your number of classes:

To determine the class limits, you must determine the class interval
or class width which can be calculated as:

Descriptive Summary
The range can be computed as:
Measures
If the computed value has decimal places, round-up the number. (Ex. 6.7 = 7).

Descriptive Summary
In practice… Measures: Measure of Central Tendency
• Researchers present or
summarize data collected by • Measures of Central
tables and graphs.
Tendency
• Other measures are also used
to make discussion of results
• Measures of Variation
much more insightful. • Measures of Position

5
24/09/2023

Averages Which Average?


▪ Consists of numbers (or words) about which the data are, in some sense,
centered. • For numerical data:
▪ Measures of Central Tendency/Averages typically used are the Mean, Median,
Mean and/or Median
and Mode.
▪ With several types that yields numbers or words to attempt to describe: • For categorical data:
▪ Most generally Mode or Median
▪ The middle
▪ Typical value

Appropriate Average
Inappropriate Averages
• Reporting a median for unordered qualitative data with nominal
measurement
▪ Mean is most appropriate for any ▪ Mode is always appropriate for • Reporting a mean for ANY categorical data.
kind of Numerical Data Categorical Data.
▪ Median can be used when data
are considered Ordinal level of
measurement

LEVELS OF MEASUREMENT Illustration


AND AVERAGES
What measures of central tendency can be used for Objectives 1, 2 and 3?

Make-up Quiz 1 Prelim Exam Midterm Exam Estimated


Respondent Sex Age Quiz 1 Scores
Scores Scores Scores Screen Time
1 Female 17 14 19 67 51 6
2 Female 17 12 27 77 54 5
3 Male 17 11 26 79 38 4
4 Male 18 8 24 78 42 4
5 Male 17 13 23 65 54 5
6 Female 18 11 32 82 60 5
7 Female 16 18 24 82 60 3
8 Female 18 17 24 77 69 5
9 Male 19 25 20 88 58 5
10 Female 20 23 21 75 58 5
11 Male 17 23 31 69 36 6
12 Male 20 24 29 67 50 3
13 Female 17 16 33 85 58 3
14 Female 19 19 33 87 68 3

Note: Table presents the data for the first 15 respondents

6
24/09/2023

Illustration continued.. II. Measure of Central Tendency


Averages for each objective: Interpretation for each average
1. Sex: Mode = Female (34 counts) 1. Sex: Majority of the respondents is female. Mean of Ungrouped Data
Age: Mean = 18.33 years old Age: On average, the respondents are
around 18 years of age

2. Quiz 1: Mean = 18.9 2. Template: The average score of the


Make-up Quiz 1: Mean = 24.07 students is 18.9 for Quiz 1, 24.07 for
Prelim: Mean = 62.52 Make-up Quiz 1, and so on.
Midterm: Mean = 56.08

3. Screen Time: Mean = 4.23 hours 3. Screen Time: On Average, selected


students spend about 4 hours on cell
phones and laptops

II. Measure of Central Tendency II. Measure of Central Tendency

Mean Example Mean of Grouped Data

II. Measure of Central Tendency II. Measure of Central Tendency

Mean of Grouped Data Example Median

Class Frequency Classmark Fx


Limits
24-30 3 27 81
31-37 1 34 34
38-44 5 41 205
45-51 9 48 432
52-58 6 55 330
59-65 1 62 62
n 25 SUM 1144

7
24/09/2023

II. Measure of Central Tendency II. Measure of Central Tendency

Median Example Mode


The table below gives the circumferences at chest height (CCH) (in cm) and
their corresponding depths for 15 sugar maples, Acer saccharum, measured
in a forest in southeastern Ohio

II. Measure of Central Tendency

Mode Example Measure of Variation


The table below gives the circumferences at chest height (CCH) (in cm) and
their corresponding depths for 15 sugar maples, Acer saccharum, measured
in a forest in southeastern Ohio

MEASURES OF VARIATION

Preview
▪ Averages are important, but they tell only part of the story ▪ Standard Deviation is the ▪ Measures of variation are
measure of variation that is essentially NON-EXISTENT for
▪ When summarizing a set of data, we not only specify measures of central most appropriate to any categorical data (nominal).
tendency, such as the mean, but also measures of dispersion numerical data
▪ If categorical data, it is
▪ Coefficient of Variation is most appropriate to describe
appropriately used when you
variation by identifying
have two datasets with different
unit of measurement you want
extreme scores (highest and
to compare directly lowest)

8
24/09/2023

III. Measure of Variation III. Measure of Variation

Range Example

Determine the range in the given sample data set below:


3, 18, 9, 11, 15, 17, 20, 1, 7, 15, 2, 8

Copyright 2018: Mathematics in the Modern World by Winston S. Sirug, Ph.D.

III. Measure of Variation III. Measure of Variation

Variance and Standard Deviation Variance and Standard Deviation Example

Find the variance and standard deviation of Jollibee


stocks in a week:
135 134 133 131 132

Other Formula

III Measure of Variation III. Measure of Variation

Interpretation of Standard Deviation


Coefficient of Variation

9
24/09/2023

III. Measure of Variation


Illustration
Coefficient of Variation Example What measures of variation can be used for Objectives 1, 2 and 3?

Make-up Quiz 1 Prelim Exam Midterm Exam Estimated


If Sample A has a mean of 100 liters and a standard Respondent Sex Age Quiz 1 Scores
Scores Scores Scores Screen Time
deviation of 10 liters and Sample B has a mean of 300 1 Female 17 14 19 67 51 6
pounds and a standard deviation of 20 pounds, which 2 Female 17 12 27 77 54 5
3 Male 17 11 26 79 38 4
sample has greater variation? Notice how Sample A and 4 Male 18 8 24 78 42 4
Sample B have different units 5 Male 17 13 23 65 54 5
6 Female 18 11 32 82 60 5
7 Female 16 18 24 82 60 3
8 Female 18 17 24 77 69 5
9 Male 19 25 20 88 58 5
10 Female 20 23 21 75 58 5
11 Male 17 23 31 69 36 6
12 Male 20 24 29 67 50 3
13 Female 17 16 33 85 58 3
14 Female 19 19 33 87 68 3

Note: Table presents the data for the first 15 respondents

Illustration continued..
Averages for each objective: Interpretation for each measure of variation:

1. Age: Std. Dev. = 1.31 years old 1. Age: On average, the respondents age
varies around 1.31 years around the mean
(mean = 18 years old)
2. Quiz 1: Std. Dev. = 5.24 2. Template: The average score of the students
Make-up Quiz 1: Std. Dev. = 6.52 in Quiz 1/Make-up Quiz 1/Prelim/Midterm is
Prelim: Std. Dev. = 16.71 varies around 5.24/6.52/16.71/9.31 around
Midterm: Std. Dev. = 9.31 the mean.

3. Screen Time: Std. Dev. = 1.79 3. Screen Time: On Average, selected


EXCEL
hours students varies about 1.79 hours around its
mean (4.23 hours)
Functions

Using Data Analysis ToolPak


Descriptive
To enable Data Analysis ToolPak in your EXCEL:
Summary Measures EXCEL Functions FILE > OPTIONS > Click “Go” > Select “Analysis ToolPak” and “Analysis ToolPak VBA”
> Click “OK”
1. Mode 1. =MODE.MULT(), MODE.SNGL()

2. Median 2. =MEDIAN()
3. =AVERAGE()
3. Mean
4. =VAR.P(), =VAR.S()
4. Variance
5. =STDEV.P(), STDEV.S()
5. Standard Deviation

10
24/09/2023

References

UPLB INSTAT Lecture Materials


UP Diliman School of Statistics Lecture Materials
Bluman, A. G. (2012). Elementary Statistics: A Step-by-Step Approach. 8th edition. ISBN 978-0-07-
338610-2
Walpole, A., et al., (2012). Probability and Statistics for Engineering and Scientist 9th Edition.
STATISTICAL ANALYSIS
Gonic, L. & Smith, W. (1993). A Cartoon Guide to Statistics. ISBN 10:0-06-273102-5
ARJ AY T. ALTOVAR
PennState Eberly College of Science. Applied Statistics https://online.stat.psu.edu/stat500/ C A S - M NS D I N S T R U CTOR

*most Images attached are from Microsoft online pictures

Topic Outline

I. Measure of Location Measure of Location/Position


A. Percentile
B. Decile
C. Quartiles

II. Interquartile Range

III. Outlier

IV. Exploratory Data Analysis

I. Measure of Location I. Measure of Location

Finding a data value corresponding to a given


percentile, decile, quartile rank:
Definition
Percentile divides the data into 100 equal groups;
Step 1: Arrange the data in order from lowest to
Decile 10 equal groups; and
highest.
Quartiles into 4 equal groups.
Step 2:
Denoted by 𝑃𝑘 , 𝐷𝑘 , 𝑄𝑘 respectively.

11
24/09/2023

I. Measure of Location I. Measure of Location

Finding a data value corresponding to a given Example:


Find the data corresponding to 70th percentile, 6th decile, and
percentile, decile, quartile rank:
3rd quartile of the following set of data:
98 87 102 106 92 99 110 85 88
Step 3A: If is not a whole number, round up to the 86 94 94 101 87 115
next whole number. Count over to the number that
corresponds to the rounded-up value starting at the
lowest value.

Step 3B: If c is a whole number, use the value halfway


between the cth and (c + 1)st values when counting
from the lowest value. Add the values then divide by 2.

I. Measure of Location I. Measure of Location

Finding the corresponding rank given a data Example:


point: Find the percentile rank for 98, decile rank for 88, and quartile
rank for 102 in the given set of data below:
98 87 102 106 92 99 110 85 88
Step 1: Arrange the data in order from lowest to 86 95 94 101 87 115
highest.

Step 2:

II Interquartile Range

Interquartile Range

Definition
The Interquartile range (IQR) is the difference
between 𝑸𝟏 and 𝑸𝟑 and is the range of the middle
50% of the data. This is used in finding outliers.

IQR= 𝑸𝟑 - 𝑸𝟏

12
24/09/2023

III. Outliers

Outlier

Definition
An outlier is an extremely high or extremely low
data values when compared with the rest of the data
values.

III. Outliers III Outliers

Example:
Check the following data set for outliers.
5, 6, 12, 13, 15, 18, 22, 50

IV. Exploratory Data Analysis (EDA)

Exploratory Data Analysis

Definition
A boxplot is a graph of a data set where a
horizontal line from the minimum value to 𝑸𝟏 and
from 𝑸𝟏 to the maximum values and a box whose
vertical sides pass through 𝑸𝟏 and 𝑸𝟏 with a
vertical line inside the box passing through the
median are drawn.

13
24/09/2023

IV. Exploratory Data Analysis (EDA) IV. Exploratory Data Analysis (EDA)
Box-Plot Interpretation:
Example: 1. If the median is near the center of the box, the distribution is
The number of diseased persons found in 10 cities in approximately
the Philippines is 89, 47, 164, 296, 30, 215, 138, 78, 48, symmetric.
39. Construct a boxplot for the data. 2. If the median falls to the left of the center of the box, the
distribution is positively
skewed.
3. If the median falls to the right of the center, the distribution is
negatively skewed.
4. If the lines are about the same length, the distribution is
approximately symmetric.
5. If the right line is longer than the left line, the distribution is
positively skewed.
6. If the left line is larger than the right line, the distribution is
negatively skewed.

IV. Exploratory Data Analysis (EDA) IV. Exploratory Data Analysis (EDA)

Distributional Shapes
Distributional Shapes

STATISTICAL ANALYSIS
ARJ AY T. ALTOVAR
C A S - M NS D I N S T R U CTOR

14

You might also like