0% found this document useful (0 votes)
42 views97 pages

Introduction

This document provides an introduction to statistics, including definitions, scope, types of variables, and levels of measurement. It defines statistics as the science of collecting, organizing, analyzing, and interpreting data. Descriptive statistics involves summarizing and presenting data, while inferential statistics involves using samples to draw conclusions about populations. Variables can be qualitative, taking non-numerical values, or quantitative, having actual numerical units. Measurement assigns numbers to observations in a way that preserves their relationships, and can be nominal, ordinal, interval, or ratio.

Uploaded by

Jeff Yams
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views97 pages

Introduction

This document provides an introduction to statistics, including definitions, scope, types of variables, and levels of measurement. It defines statistics as the science of collecting, organizing, analyzing, and interpreting data. Descriptive statistics involves summarizing and presenting data, while inferential statistics involves using samples to draw conclusions about populations. Variables can be qualitative, taking non-numerical values, or quantitative, having actual numerical units. Measurement assigns numbers to observations in a way that preserves their relationships, and can be nominal, ordinal, interval, or ratio.

Uploaded by

Jeff Yams
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 97

INTRODUCTION TO STATISTICS

INTRODUCTION
 Definition of Statistics
 Scope of Statistics
 Kinds of Variables
 Levels of Measurement
 Types of Data
STATISTICS

Statistics is the science of collecting, organizing,


analyzing, and interpreting data in order to make
decisions.

Data consists of information coming from observations, counts,


measurements, or responses.

A population is the collection of all outcomes,


responses, measurement, or counts that are of interest.

A sample is a subset of a population.


Definition of Statistics

Collection
of numeric or
quantitative data.
Organization
X Houses are built of hollow
STATISTICS blocks, wood, etc
is the art and
science of
 How many houses were
Presentation
built of hollow blocks
 Proportion of houses
built of wood.
Analysis and
Interpretation
Parameter & Statistic

A parameter is a numerical measure that describes a


characteristic of a population.

A statistic is a numerical measure that describes a


characteristic of a sample.

Parameter Population

Statistic Sample
Parameter & Statistic

EXAMPLES

Parameter

The population mean of the electricity bills of the


residents of a certain city is Php 1500.00

Statistic
The sample mean of the electricity bills of 20
residents of a certain city is Php 1500.00
Parameter & Statistic

A sociologist wants to the proportion of adults with children


under the age of 18 that eat dinner together 7 nights a week. A
simple random sample of 1122 adults with children under the age
of 18 was obtained, and 337 of those adults reported eating
dinner together with their families 7 nights a week.
Parameter
The proportion of adults with kids under 18 who ate together
7 nights a week.
Statistic
337/1122 = 0.300, the proportion in the sample who ate
together.
Parameter & Statistic

An education official wants to estimate the proportion of adults


aged 18 or older who had read at least one book during the
previous year. A random sample of 1006 adults aged 18 or older
is obtained, and 835 of those adults had read at least one book
during the previous year.

Parameter
The proportion of adults 18 or older who read a book in the
previous year.

Statistic
835/1006 = 0.830, the proportion who read a book in the
sample.
Scope of Statistics
The study of statistics has two major branches:
descriptive statistics and inferential statistics.
Statistics

Descriptive Inferential
statistics statistics
Involves the Involves using a
organization, sample to draw
summarization, conclusions about
and display of data. a population.
Scope of Statistics

Survey
Observation
Gathering
Use of Existing records
Experimental

Raw Data
Classification Array
(Data Organization) Frequency Distribution Table
DESCRIPTIVE Single Value Grouping
STATISTICS
Textual
Presentation Tabular
Graphical

Central Tendency
Collection of
Summarizing Variability
values (computing Percentages/Ratio/Proportions
measures) Others (Quantiles / Fractiles)
Descriptive Statistics

It describes the important characteristics/


properties of the data using the measures the
central tendency like mean/ median/mode and
the measures of dispersion like range, standard
deviation, variance etc.

Data can be summarized and represented in an


accurate way using charts, tables and graphs.
Descriptive Statistics

Example:

We have marks of 1000 students and we may be


interested in the overall performance of those students
and the distribution as well as the spread of marks.

Descriptive statistics provides us the tools to define


our data in a most understandable and appropriate
way.
Scope of Statistics

INFERENTIAL STATISTICS

Method or technique using small portion of the total


set of data in order to draw conclusions or
judgments regarding the entire set.
Scope of Statistics

Statistical Inference

 Predict life span of bulbs  Probability


theory
 Compare effectiveness of two
reducing diets  Risks/odds
methods
VARIABLES
A variable is a characteristic of a unit of
observation or subject that can take on
different values for different units/subjects
or for the same unit/subject at different
periods.
About Variable

HEIGHT
Variable

Small Medium Tall


Attributes/
characteristic 5’6
5’2 5’8
Kinds of Variables

Qualitative Variable

A qualitative variable takes on


non-numerical values.
It simply describes which class or category
the observations fall, thus also known as
categorical data.
Kinds of Variables

Qualitative Variables

Sex
Hair color: Black, Blonde, Brown

Religion: Catholic, Protestants,

INC
Occupation: Teacher, Doctor,
Engineer
Male Female
Nationality: Filipino, American,
Hispanic
Kinds of Variables

Quantitative Variable
A quantitative variable may take any
value from a given set of values. It
has actual units of measure
Height

Family
Ages Size
Kinds of Variables

Quantitative Variables

 Discrete
Number of overweight persons
0, 1, 2, 3 ….
 Continous
Weight in kilograms
65.6 kg, 55.34 kg, 100 kg, ¾ kg . . .
Exercises:

For the following statements, decide whether it


belongs to the field of descriptive statistics or
inferential statistics.

1. A badminton player wants to know


his average score for the past 10
games.

Answer:
Descriptive Statistics
Exercises:

For the following statements, decide whether it


belongs to the field of descriptive statistics or
inferential statistics.

2. A car manufacturer wishes to estimate the


average lifetime of batteries by testing a
sample of 50 batteries.
Answer:
Inferential Statistics
Exercises:
For the following statements, decide whether it
belongs to the field of descriptive statistics or
inferential statistics.

3. Janine wants to determine the variability of her six


exam scores in Physics

Answer:
Descriptive Statistics
Exercises:
For the following statements, decide whether it
belongs to the field of descriptive statistics or
inferential statistics.

4. A shipping company wishes to estimate the number of


passengers traveling via their ships next year using their
data on the number of passengers in the past three years.

Answer:

Inferential Statistics
Exercises:
For the following statements, decide whether it
belongs to the field of descriptive statistics or
inferential statistics.

5. A politician wants to determine the total number


of votes his rival obtained in the past election based
on his copies of the tally sheet of electoral returns.

Answer:
Descriptive Statistics
Levels of Measurement

MEASUREMENT
is a set of rules for assigning
numbers to attributes of observations.

It is structured in such way that the


existing relationship between the
observations is preserved in the
numbers assigned to them.
About Measurement

Levels of Measurement

 Nominal

 Ordinal

 Interval

 Ratio
Levels of Measurement

Nominal scale

 Is the simplest scale of


measurement where a value or unit
of data is assigned to one of at
least two qualitative classes or
categories.
Levels of Measurement

SEX

MALE FEMALE MALE

1 2 1

RULE: Identification
LEVEL: Nominal
Levels of Measurement

The psychiatric system of a NJB020401, NUU112900


diagnostic groups
 Schizophrenic
28 16
 Paranoid
 Manic-depressive
 Psychoneurotic
jersey numbers

Employment classification
1 - Educator
2 – Construction worker
NJB020
NUU129 3 – Manufacturing worker
4 – Lawyer
Automobile license plates 5 – Doctor
6 - Others
Levels of Measurement

Nominal scale
Conditions:

1. Exhaustive – every value or unit


of data can be assigned to a
category.

2. Mutually exclusive – it is not


possible to assign a value to more
than one category because the
categories do not overlap.
Levels of Measurement

Ordinal scale

 It involves placement of values or


codes in some rank order to create
an ordinal scale variable.
 The relationship between observations
takes on the form of “greater than” and
“less than” or “higher than” and “lower
than”
Levels of Measurement

EDUCATION

ELEM HS COLLEGE

1 2 3

Rule: MAGNITUDE
Level: ORDINAL
Levels of Measurement

Nominal Ordinal

qualitative qualitative
Exhaustive/Mutually Exhaustive/Mutually
exclusive exclusive

equal in value ranked

TeamA=TeamB=TeamC
1st place >2nd place >3RD
Levels of Measurement

Social
Class Academic Grades

Lower
Upper Middle

A, B, C, D, E, F
Quality of
Intensity
Service
of attitude

Strongly agree, agree, (5) Excellent


neutral, disagree, (4) Very Satisfactory
strongly disagree (3) Satisfactory
(2) Needs Improvement

(1) Poor
Levels of Measurement

Interval scale

 Assigning of numbers to observations is


based not only on the order to which they
possess a certain attribute but also indicates
exactly how much they posses the attributes.

 In this measurement we can determine how


many units’ difference there are from one
rank to the next.
Levels of Measurement

GRADES IN STATISTICS

80 85 90

80 85 90

Rule: INTERVAL
Level: INTERVAL
Levels of Measurement

Interval scale
 zero point has no meaning

Example:

Celsius -18 0 10 30 100


Fahrenheit 0 32 50 86 212
Levels of Measurement

Ratio Measurement

 Has all the features of an interval scale.


 Requires on absolute, fixed and
non-arbitrary zero point.
 Ratio of two numbers is meaningful
Levels of Measurement

WEEKLY INCOME

P 2,000 P 2,500 P0

2,000 2,500 0

Rule: ABSOLUTE ZERO


Level: RATIO
Levels of Measurement

Ratio Measurement

Weight 2 kg, 40 lbs,


ounce 90 80
85 Age
HEIGHT TIME
VOLUME
Years of school completed Per capita GNP
Number of children born Weeks of unemployment
Years in present job
Travel time to work (minutes)
Levels of Measurement

COMPARATIVE SUMMARY

Ratio
Absolute zero
Interval
Ordinal Distance bet.
attributes is
Nominal meaningful

Attributes can
be ordered

Attributes are
labels only
Levels of Measurement

IMPORTANCE OF UNDERSTANDING THE


LEVELS OF MEASUREMENT

1. Helps you decide how to interpret


the data.
2. Helps you decide what statistical
analysis is appropriate on the
values that were assigned.
Types of Data

2 Types

 PRIMARY DATA
 SECONDARY DATA
Types of Data

PRIMARY DATA

 Any set of data or information that


are directly collected from the source
(informants or respondents or records).
 Government statistical agencies are
given the responsibility to collect,
publish and disseminate statistical
series.
Types of Data

SECONDARY DATA

 Data are provided directly by an


organization or government agency in
convenient form such as written report.

 Data that are processed and re-processed


by individuals or entities from sources other
than the primary source of information.
DATA
CLASSIFICATION
Types of Data
Data sets can consist of two types of data:
qualitative data and quantitative data.
Data

Qualitative Quantitative
Data Data
Consists of Consists of
attributes, labels, numerical
or nonnumerical measurements or
entries. counts.
Qualitative and Quantitative Data

 Example:
 The grade point averages of five students are listed in the table. Which data are qualitative data and which are quantitative data?

Student GPA
Sally 3.22
Bob 3.98
Cindy 2.75
Mark 2.24
Kathy 3.84
Qualitative data Quantitative data
Qualitative and Quantitative Data

I identify which represent qualitative variables, which


represent quantitative variables

1. hair color
2. height weight
3. time in the 100 yard dash
4. religion
5. number of items sold to a shopper
6. political party, profession
Levels of Measurement
The level of measurement determines which statistical
calculations are meaningful. The four levels of
measurement are: nominal, ordinal, interval, and ratio.

Nominal
Levels Lowest
Ordinal to
of
Measurement Interval highest

Ratio
Nominal Level of Measurement

Data at the nominal level of measurement are


qualitative only.
Nominal
Levels Calculated using names,
of labels, or qualities. No
Measurement mathematical computations
can be made at this level.

Colors Names of Textbooks you


in the students in your are using this
US flag class semester
Ordinal Level of Measurement
Data at the ordinal level of measurement are qualitative
or quantitative.

Levels
of Ordinal
Measurement Arranged in order, but
differences between data
entries are not meaningful.

Class standings: Numbers on the Top 50 songs


freshman, back of each played on the
sophomore, player’s shirt radio
junior, senior
Interval Level of Measurement

Data at the interval level of measurement are


quantitative. A zero entry simply represents a position
on a scale; the entry is not an inherent zero.
Levels
of
Measurement Interval
Arranged in order, the differences
between data entries can be
calculated.
Temperatures Years on a Atlanta Braves
timeline World Series
victories
Ratio Level of Measurement
Data at the ratio level of measurement are similar to the
interval level, but a zero entry is meaningful.

A ratio of two data values can be


Levels
formed so one data value can be
of
expressed as a ratio.
Measurement

Ratio

Ages Grade point Weights


averages
Summary of Levels of Measurement

Arrang Determine if
Put data
Level of e data Subtract one data value
in
measurement in data values is a multiple of
categories
order another
Nominal Yes No No No
Ordinal Yes Yes No No
Interval Yes Yes Yes No
Ratio Yes Yes Yes Yes
STATISTICS APPLIED
TO RESEARCH
SAMPLING DESIGN: Basic
Concepts and Procedure

The goal in sampling is to obtain


individuals for a study in such a way that
ac c urate i nf o rm a t i o n abo ut the
population can be obtained.
Reason for Sampling

• Important that the individuals included


in a sample represent a cross section of
individuals in the population.
• If sample is not representative it is
biased -- you cannot generalize to the
population from your statistical data.
Definitio
n:

Sampling technique/Sampling
Strategies
It is a plan you set forth to be sure that the
sample you use in your research study
represents the population from which you
drew your sample.
Definitio
n:
Sampling Frame
This is the list of the elements in your
population and from this your sample is drawn.

Sampling Bias
This involves problems in your sampling,
which reveals that your sample is not
representative of your population.
Selection Bias

1. Deliberately or purposively selecting a


“representative” sample.
2. Mis specifying the target population.
3. Failing to include all of the target population in
the sampling frame, called under coverage.
4. Including population units in the sampling
frame that are not in the target population,
called over coverage.
Selection Bias

5. Having multiplicity of listings in the sampling


frame.
6. Substituting a convenient member of a population
for a designated member who is not readily
available.
7. Failing to obtain responses from all of the chosen
sample. (Nonresponse)
8. Allowing the sample to consist entirely of
volunteers.
Advantage of Sampling Over Complete Enumeration

Less Labor Reduced Cost Greater Speed


Greater Scope
Greater Efficiency and Accuracy
Convenience
Ethical Considerations
Two Type of Samples

1. Probability Sample
2. Non -
Probability Sample
Probability Samples

• Samples obtained using some


are objectivechance mechanism, thus
involving randomization.
•They require the use of a complete
listing of the elements of the universe
called the sampling frame.
Probability Samples

• The probabilities of selection are known.


• They are generally referred to as random
samples.
• They allow drawing of valid generalizations
about the universe/ population.
Non-Probability Samples

• Samples are obtained haphazardly,


selected purposively or are taken as
volunteers.
• The probabilities of selection are
unknown.
• They should not be used for statistical
inference.
Sampling Procedure

 Identify the population.


 Determine if population is accessible.
 Select a sampling method.
 Choose a sample that is representative of
the population.
 Ask the question, can I generalize to the
general population from the accessible
population?
Basic Sampling Technique of
Probability Sampling
• Simple Random Sampling
• Systematic Random
Sampling
• Stratified Random Sampling
• Cluster Sampling
• Multi-stage Sampling
Simple Random Sampling

• Most basic method of drawing


a probability sample
• Assigns equal probabilities of selection to
each possible sample
• Results to a simple random sample
Simple Random Sampling
Advantage:
It is very simple and easy to use.
Disadvantage:
Difficulty of gaining access to a list of a larger
population, time consuming and expensive.
When to Use:
This is preferable to use if the population is not
widely spread geographically. Also, this is more
appropriate to use if the population is more or less
homogenous with respect to the characteristics of
the population.
Systematic Random Sampling

• It is obtained by selecting every kth


individual from the population.
• The first individual selected corresponds
to a random number between 1 to k.
Systematic Random Sampling
Obtaining a Systematic Random Sample
Advantage:
Drawing of the sample is easy. It is easy to
administer in the field, and the sample is spread
evenly over the population.
Disadvantage:
May give poor precision when unsuspected
periodicity is present in the population.
When to Use:
This is advisable to us if the ordering of the
population is essentially random and when
stratification with numerous data is used.
Example:

We want to select a sample of 50


students from 500 students under this
method kth item and picked up from the
sampling frame.
Solution:

We start to get a sample starting form i and for


every kth unit subsequently. Suppose the
random number i is 6, then we select 15, 25, 35,
45, .. .
Stratified Random Sampling

• It is obtained by separating the population


into non-overlapping groups called strata
and then obtaining a simple random
sample from each stratum.
• The individuals within each stratum
should be homogeneous (or similar) in
some way.
Stratified Random Sampling
Advantage:
The selection of units using a stratified procedure
adds greater precision because it improves the
potential for the units to be more evenly spread
over the population.
Disadvantage:
Values of the stratification variable may not be
easily available for all units in the population
especially if the characteristic of interest
homogeneous. It is possible that there
is are not
representative in one or two strata. Also,
transportation costs can be high if the population
c o v e r s a w i d e g e ographic area.
P oly tec hn ic Univ ersi ty of th eP hilip pin es
When to Use:

We need to have information in the sampling


frame that can be used to form the strata. For
each group, we need to know how many and
which members of the population belong to that
group. When such information is available, it is
easy to use stratified random sampling.
Example:

A sample of 50 students is to be drawn


from a population consisting of 500
students belonging to two institutions A
and B. The number of students in the
institution A is 200 and the institution B
is 300. How will you draw the sample
using proportional allocation?
Solution:
There are two strata in this case.

Given: N1 = 200
N2 = 300 N = 500 n = 50
If n1 and n2 are the sample size,
n 50
n1 = N1 = 200 = 20
(N) ( 500 )
n2 = n N2 = 50 300 = 30
( N20)from A(and
The sample sizes are 50030) from B. Then the units
from each institution are to be selected by simple random
sampling.
Cluster Sampling

• You take the sample from naturally


occurring groups in your population.
• The clusters are constructed such that the
sampling units are heterogeneous within
the cluster and homogeneous among the
clusters.
Cluster Sampling
Obtaining a Cluster Sample

1.Divide the population into non-overlapping clusters.


2.Number the clusters in the population from 1 to N.

3.Select n distinct numbers from 1 to N using a


randomization mechanism. The selected clusters are the
clusters associated with the selected numbers.
4.The sample will consist of all the elements in the
selected clusters.
Advantage:
There is no need to come out with a list of units in the
population; all what is needed is simply a list of the
clusters. It is also less costly since the elements are
physically closer together.
Disadvantage:
In actual field applications, adjacent households tend to
have more similar characteristics than households
distantly apart.
When to Use:
If the population can be grouped into clusters where
individual population elements are known to be
different with respect to the characteristics under study,
this preferable to used.
Example:
A researcher wants to survey academic performance of
high school students in MIMAROPA.

1.He / She can divide the entire population into different clusters
(Mindoro, Marinduque, Romblon, and Palawan). There are 4 clusters.

2. Then the researcher selects a number of clusters depending on


his research through simple or systematic random sampling.
3. Then, from the selected clusters the researcher can either
include all the high school students as subject or he can select a
number of subjects from each cluster.
Multi - Stage Sampling

Selection of the sample is done in two or


more steps or stages, with sampling units
varying in each stage.
Multi - Stage Sampling
Obtaining a Multi-Stage
Sampling
1.Organize the sampling process into stages
where the unit of analysis is systematically
grouped.

2.Select a sampling technique for each


stage.
3.Systematically apply the sampling
technique to each stage until the unit of
Advantage:

Transportation costs are greatly reduced since there is


some form of clustering among the ultimate or final
samples; i.e., they are in the sample lower-stage units.
Disadvantage:
Due to the fact that multi-stage sampling cuts out
portions of the population from the study, the study’s
findings can never be 100 percent representative of the
population.

When to Use:
If the population covers a wide area.
Example:

https://research-methodology.net/sampling-in-primary-data-collection/multi-stage-sampling/
Example:

A researcher wants to survey academic performance of high


school students in MIMAROPA.
1.He/She can divide the entire population into different clusters
(Mindoro, Marinduque, Romblon, and Palawan). There are 4
clusters.
2.Then the researcher selects a number of clusters depending
on his research through simple or systematic random sampling.
3.Then, from the selected clusters the researcher can either
include all the high school students as subject or he can select a
number of subjects from each cluster.
References:
http://www.economicsdiscussion.net/statistics/sampling/
advantages-of-sampling-over-complete-enumeration-in-
statistics/11980
http://www.natco1.org/research/files/SamplingStrategies.pdf
https://data36.com/statistical-bias-types-explained/

Statistics. Informed Decision using Data by Michael Sullivan, III,.


Fifth Edition
Sampling: Design andAnalysis by Sharon L. Lhr. Second
Edition
UP Mathematics – Ms. Katrina D. Elizon

You might also like