0% found this document useful (0 votes)
403 views47 pages

Engineering Introduction To Statistics 2024

Uploaded by

Malack Chagwa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
403 views47 pages

Engineering Introduction To Statistics 2024

Uploaded by

Malack Chagwa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

PROBABILITY AND STATISTICS

INTRODUCTION TO STATISTICS

WEEK 1
Introduction
• In simple daily language, the word statistics
is synonymous with numerical information or
facts.
•For example, number of goals scored by a
football player in a season, median salary of
employees in a company, average number
of deaths due to covid-19 in a year,
percentage of households affected by floods
Definition
• In this usage, the term statistics refers to
numerical facts such as frequencies, averages
and percentages that help us understand a
variety of situations.
• These statistics are usually informative and
time saving because they condense large
quantities of information into a few simple
figures.
Statistics as a field of Study (Definition)
•However, in this course we will be looking at
Statistics in broader sense; as a field of study.
•As a field of study, Statistics is the art or
science of collecting, organizing, and
summarizing data for analysis and prediction
in order to make effective decisions
WHY STUDY STATISTICS
•Methods and techniques in statistics allow
different kinds of engineers to make a
i. consistent machine or product,
ii. detect problems,
iii. understand phenomena subject to
variation,
iv. and predict systems.
TWO MAJOR BRANCHES/TYPES OF STATISTICS

•1. Descriptive Statistics


•2. Inferential Statistics
Descriptive Statistics
• Involves collection, organization, summarization, and presenting
of data in an informative way.
• Masses of unorganized data—such as the census of population
or the weekly earnings of thousands of bankers— are of little
value as is.
• However, statistical techniques are available to organize this
type of data into a meaningful form such as frequency
distributions and charts.
• Specific measures of central location, such as the mean, mode
and median, describe the central value of a group of numerical
data
Descriptive statistics (Cont’d)
• A number of statistical measures are used to
describe how closely the data cluster about an
average like the variance and standard deviation
Inferential Statistics
• Generally Inferential statistics are methods that
utilize data to make estimates, decisions,
predictions.
• This is however a somewhat incomplete definition.
• Let’s complete the definition by first looking at
population and sampling
Inferential statistics (cont’d)
• A population consists of all subjects (human or
otherwise) that are being studied.
• A sample is a group of subjects selected from a
population.
• In the majority of cases in statistics, we study data
from a sample instead of a population because…
Inferential statistics (Cont’d)
1. Reduced time and costs
2. Destructive nature of some studies
3. It sometimes impossible to check all elements
in the population
Inferential statistics (Cont’d)
• Therefore we use sampled data to make generalizations
(inferences) about the population. The definition of
inferential statistics then changes to…
• Inferential statistics are methods that utilize sample data
to make estimates, decisions, predictions, or other
generalizations about a population
Inferential Statistics (Cont’d)
• Parameter vs Statistic
• A parameter is a summary measure describing a
specific characteristic of a population. It is computed
using population data e.g. population mean (𝜇, read
as myu).
• A statistic is a summary measure describing a
specific characteristic of a sample. It is computed
using sample data e.g. sample mean (𝑥̅, read as x-
bar).
SAMPLING TECHNIQUES

•There are two main groups of sampling


techniques
1.Probabilistic sampling techniques
2.Non probabilistic techniques
Probabilistic techniques
• Probability sampling methods are methods were
the chance of selecting a unit from the
population has a known probability.
• All probabilistic sampling methods require a
sampling frame
• A sampling frame is a complete list of all the
units (people, households, geographical areas
etc) of the population of interest.
Probabilistic techniques (cont’d)
• There are multiple probabilistic methods,
below are the ones we’ll look at
• Simple Random Sampling (SRS)
• Systematic Sampling
• Stratified Sampling
• Cluster Sampling
• Multi-stage Sampling
Simple Random Sampling (SRS)
• In this method, each unit in the population has
an equal chance of being selected.
• The most common ways of collecting a simple
random sample are through
❖Numbered papers in a bowl
❖Randomly generated numbers
Simple Random Sampling (SRS) (Cont’d)
• ADV
❖ Generally accepted as unbiased, even by the
layman.

• DISADV
❖Expensive and sometimes unrealistic when the
sampling frame isn’t readily available and population
is large
❖Chance of underrepresentation of
characteristics/attributes significant to the study
Systematic Sampling
• This method uses set intervals where every Kth
item in a population is selected for the sample
dependent on the required sample size.
• STEPS
1. Number the units on your frame from 1
to N (where N is the total population size).
2. Determine the sampling interval (K) by dividing
the number of units in the population by the
desired sample size.
.
Systematic Sampling (Cont’d)
3. Select a number between one and K at
random. This number is called the
random start and it would be the first
number included in your sample
4. Select every Kth unit after that first
number
Systematic Sampling (Cont’d)
• DISCUSSION
1. Suppose we want to take a sample of 100
units out of 400 (where N = 400).
i. How would you go about it using SRS?
ii. How would you go about it using Systematic
sampling?
iii. Compare and contrast the results of the
methods within this particular context
Systematic sampling (Cont’d)
• ADV
❖Ideal for quality assurance on production lines.
❖Fairly easy when a sampling frame is available.

• DISADV
❖ Expensive when sampling frame isn’t readily
available.
❖A periodical feature coinciding with the sampling
interval could lead to an unrepresentative sample
Stratified Sampling
• When using stratified sampling, the
population is divided into homogeneous,
mutually exclusive groups called strata, and
then independent samples are selected from
each stratum.
• A population can be stratified by any variable
for which a value is available for all units on the
sampling frame prior to sampling (e.g. age,
gender, and income).
Stratified sampling (Cont’d)
• ADV
❖Ensures representation of all important subgroups
to the study.
❖Ensures an adequate sample size for subgroups of
interest in the population.

• DISADV
❖Expensive if sampling frame isn’t readily available.
❖Selection of important strata may be subjective.
Cluster Sampling
• Cluster sampling divides the population into groups or
clusters. A number of clusters are selected randomly to
represent the total population, and then all units within
selected clusters are included in the sample.
• No units from non-selected clusters are included in the
sample. They are represented by those from selected
clusters. This differs from stratified sampling, where some
units are selected from each stratum.
• Examples of clusters are factories, schools and
geographic areas such as electoral subdivisions.
Cluster Sampling (cont’d)

• ADV
❖It is cost effective
❖Clusters are easy to create even when an
extensive sampling frame isn’t available
• DISADV
❖There is no control over the final sample
size
Multistage Sampling
• Multi-stage sampling is like cluster sampling,
except that it involves selecting a sample within
each selected cluster, rather than including all
units from the selected clusters.
• This type of sampling requires at least two stages.
In the first stage, large clusters are identified and
selected. In the second stage, units are selected
from within the selected clusters using any of the
probability sampling methods
Multistage Sampling (Cont’d)
• In this context, the clusters are referred
to as primary sampling units (PSU) and
units within clusters are referred to as
secondary sampling units (SSU). When
there are more than two stages, tertiary
sampling units (TSU) are selected within
SSE, and the process continues until
there is a final sample.
Multistage Sampling (Cont’d)
• ADV
❖It is cost effective
❖Clusters are easy to create even when an
extensive sampling frame isn’t available
• DISADV
❖Sample is not as concentrated as Cluster
sampling
Non-Probabilistic Sampling
• Non-probability sampling is a method of selecting units
from a population using a subjective (i.e. non-random)
method
• There are multiple non-probabilistic methods, below are the
ones we’ll look at
• Convenience Sampling
• Volunteer Sampling
• Judgement Sampling
• Quota Sampling
• Snowball Sampling
Convenience Sampling
• Units are selected in an arbitrary manner with
little or no planning involved. Convenience
sampling assumes that the population units are
all alike, then any unit may be chosen for the
sample.
• An example of convenience sampling is the vox
pop survey where the interviewer selects any
person who happens to walk by.
Convenience Sampling (Cont’d)
• ADV
❖Quick and convenient
❖Inexpensive
• DISADV
❖Noncoverage (undercoverage) bias
Volunteer Sampling
• The respondents are only volunteers in this
method. Generally, volunteers must be screened
so as to get a set of characteristics suitable for the
purposes of the survey (e.g. individuals with a
particular disease).
Volunteer Sampling (Cont’d)
• An example of volunteer sampling is callers to a
radio or television show, when an issue is
discussed and listeners are invited to call in to
express their opinions. Only the people who care
strongly enough about the subject one way or
another tend to respond. Volunteer sampling is
often used to select individuals for focus groups
or in-depth interviews (i.e. for qualitative testing,
where no attempt is made to generalize to the
whole population).
Volunteer Sampling (Cont’d)
•ADV
❖Quick and convenient
❖Inexpensive
❖Reduces respondent burden
•DISADV
❖Self-selection bias
Judgement Sampling
• With this method, sampling is done based on
previous ideas of population composition and
behaviour. An expert with knowledge of the
population decides which units in the population
should be sampled. In other words, the expert
purposely selects what is considered to be a
representative sample.
Judgement Sampling (Cont’d)
•ADV
❖Quick and convenient
❖Inexpensive
❖Useful for exploratory studies
•DISADV
❖Selection bias
Quota Sampling
•This is one of the most common forms of
non-probability sampling. Sampling is
done until a specific number of units
(quotas) for various subpopulations have
been selected. Quota sampling is a
means for satisfying sample size
objectives for the subpopulations.
Quota Sampling (Cont’d)
• Quota sampling is somewhat similar to stratified
sampling, which is probability sampling, in that
similar units are grouped together. However, it
differs in how the units are selected. In probability
sampling, the units are selected randomly while in
quota sampling a non-random method is used—it
is usually left up to the interviewer to decide who
is sampled.
Quota Sampling (Cont’d)
• Market researchers often use quota sampling
(particularly for telephone surveys) instead of
stratified sampling to survey individuals with
particular socio-economic profiles. This is because
compared with stratified sampling, quota
sampling is relatively inexpensive and easy to
administer and has the desirable property of
satisfying population proportions.
Quota Sampling (Cont’d)
•ADV
❖Quick and convenient
❖Inexpensive
❖Satisfies population proportions
•DISADV
❖Does not take self selection bias into
consideration
Snowball Sampling
• Suppose a researcher wishes to find rare
individuals in the population, and already knows
of the existence of some of these individuals and
how to contact them. One approach is to contact
those individuals and simply ask them if they
know anyone like themselves, then contact those
people, etc. The sample grows like a snowball
rolling down a hill to hopefully include virtually
everybody with that characteristic.
Snowball Sampling (Cont’d)
• Snowball sampling is useful for rare or hard to
reach populations such as people with
disabilities, homeless people, drug users, or
other persons who may not belong to an
organised group or such as musicians,
painters, or poets, not readily identified on a
survey list frame.
Snowball Sampling (Cont’d)
•ADV
❖Inexpensive
•DISADV
❖Survivorship bias
Reading Resources
• Bluman, A. G. (2009). Elementary Statistics: A
Step by Step Approach. 8th Edition. New York:
McGraw Hill. [Chapter 1 and Chapter 14]
• https://www150.statcan.gc.ca/n1/edu/power-
pouvoir/ch13/prob/5214899-eng.htm
• https://www150.statcan.gc.ca/n1/edu/power-
pouvoir/ch13/nonprob/5214898-eng.htm
• https://www.masterclass.com/articles/sampling-
bias
Thank You.

You might also like