STA108 Chapter 1 - Introduction To Statistics
STA108 Chapter 1 - Introduction To Statistics
Statistics &
Probability
Chapter 1
Introduction to Statistics
Introduction to Statistics
1
1.1 What is Statistics?
STATISTICS is the study of COLLECTION, ORGANIZATION,
ANALYSIS, INTERPRETATION AND PRESENTATION of data.
Statistical methods help us to make scientific and intelligent
decision. For instance, sample data are used to forecast sales
and profit.
2
1.2.1 Descriptive Statistics
Consists of methods of
i) collecting data,
ii) characterizing data,
iii) presenting data by using tables,
graphs, and
iv) summary measures
Without attempting to infer anything that goes
beyond the data themselves.
Purpose : Describe data
3
1.2.2 Inferential Statistics
4
b) SAMPLE c) SAMPLE SURVEY
- a portion of the population selected for study. - a process for collecting data on
a sample of observations which
are selected from the population
of interest using a probability-
based sample design.
d) CENSUS
- a survey that includes every member of the population.
e) PARAMETER
- summary measure about population.
f) STATISTIC
- summary measure about sample.
Pilot study – a study done before the actual fieldwork is carried out.
10
5
Example:
1.4 Basic Terms
2018
11
12
6
1.6 Types of Variable
1.6.1 Quantitative Variable (numerical)
- is a variable that can be measured numerically.
Eg: no of children, total income
may be classified as either discrete variables or continuous variables
a) Discrete variable
A variable which has limited number of values that are collected by counting.
Can assume only certain values with no intermediate values.
Eg: the number of cars sold, number of books read
b) Continuous variable
A variable which can assume any numerical value over a certain interval or
intervals.
Eg: the time taken to finish a test, weight of athletes
13
14
7
1.6 Types of Variable
Example:
Windows is a computer software product made by Microsoft Corporation. In designing
Windows 10, Microsoft telephoned thousands of users of Windows Vista (old version)
and asked them how the product could be improved. Assume customers were asked
the following questions:
i. Are you the most frequent user of Windows in your household?
ii. Are the tutorial instructions that accompany Windows helpful?
iii. When using a printer, do you most frequently use a laser printer or another
type of printer?
iv. If the speed of the windows could be changed, which one of the following
would you prefer: slower, unchanged, or faster?
v. How many people in your household have used Windows at least once?
Each of these questions defines a variable of interest to the company. Classify the
data generated for each variable as quantitative or qualitative. Justify your
classification.
15
16
8
1.7.1 Nominal Scale
A scale in which the Gender: Male (M)
number or letter Female (F)
assigned to objects
serve as labels for Payment: Cash (1)
identification. Credit Card (2)
Cheque (3)
Categories (describe or
names) only Language : BM (1)
Mandarin (2)
The order of selection English (3)
of answers is of not
important.
17
9
1.7.1 Interval Scale
A scale that not only Temperature:
arrange objects according 450C 1000C
to their magnitude, but 0
47 C 500C
also assign a meaning to
distinguishes this order (500C is not twice as hot as
arrangement in unit of 1000C)
equal interval.
Year born etc
Classifies quantitative
data. Standardized exam scores
(IQ test scores)/Aptitude test
Do not have true zero scores
point – zero is arbitrary.
19
20
10
Summary for scale of measurement
21
22
11
1.8 Sampling
Sampling– process of choosing a sample of elements from a total
population of elements.
Sampling Process:
Establishing Specifying
Defining the Determining Selecting
sample sampling
population sample size the sample
frame method
23
1.8 Sampling
Sampling unit – a single element or a group of elements subject to
selection in the sample.
24
12
Sampling Frame
A list of updated information of all unit of population.
It is prepared so that samples can be selected randomly from a
population.
Example:
- List of all villages in Kedah
- List of all workers in ABC factory
- List of all students’ names in UiTM
- List of all names and addresses of subscribers in telephone
directory.
Essential to select a sample of units from population
25
Probability Non-Probability
Sampling Sampling
26
13
1.8.2 Probability Sampling
each and every individual in the population has the equal
chance to be selected as the sample.
the element of biases does not exist
is used when the objective of the study is to make descriptive
statements about the sample or to make inferences about the
population.
elements or sampling units is needed to produce a random
sample.
Eg. Telephone directories, list of registered small industries in
Kuching.
27
28
14
1.8.2.1 Simple Random Sampling
Two common procedures in selecting sample.
a) Lottery Method Example:
Selecting 10 students out of 30 students.
i- List down the members of
population and number them. i- List down all the students name randomly and
number them.
ii- Assign numbers on pieces of cards ii- Assign numbers on pieces of cards representing
each students (30 cards)
representing each unit of population
Iii- Put the cards in a box and draw one card at a
time without replacement until you have 10 cards.
Iii- Put the cards in a box and draw
one card at a time without iv- Unit of population is drawn and list down the
selected samples.
replacement until the required
number of samples are selected. 1) #20 = Ali
2) #16 = Abu
iv- Unit of population is drawn and
3) # 7 = Siti
list down the selected samples
….10) #26 = Sara
29
30
15
1.8.2.1 Simple Random Sampling
Example: Selecting 10 students out of 30 students.
i) List down the students randomly and number them.
ii) Choose randomly a starting point e.g. 10005. Form 2 digit number
starts from 10005 i.e. 10, 00, 50, 82, 16, 25, 90, 60, 24,……….
iii) Numbers selected: 10, 16, 25, 24,….. (should select 10)
iv) Unit of population is drawn and list down the selected samples.
1) #10 = Ali
2) #16 = Abu
3) # 25 = Siti
….10) #19 = Sara
31
32
16
1.8.2.2 Systematic Sampling
Example: If there are a total of N = 500 primary school canteen operators in
the Klang Valley in 2016 who are registered with the Ministry of Education.
We require a sample of n = 25 operators for a particular study.
Step 1: Make sure the list is random, i.e.,
The names of the companies are sorted alphabetically.
Then number each operator from 1 to 500 for identification
purpose.
Step 2: Divide the operators into interval contains k
operators. Here k =
population size 500
20
sample size 25
33
34
17
1.8.2.3 Stratified Sampling
Divide the population elements into non-overlapping groups,
called strata, and then one sample is selected from each of
these stratum.
Within stratum, elements are homogenous.
Between strata, elements are heterogeneous.
Appropriate when there is a large variation within the
population.
Eg: To select a sample from population of a city and we want
households with different income levels to be equally
represented in the sample. Therefore we separate the
population to different income level groups.
The sizes of the samples selected from different strata are
proportionate to the sizes of the subpopulations in these strata.
35
36
18
1.8.2.3 Stratified Sampling
Pros Cons
- Accurate and effective - Requires accurate
representation of all information on proportion
subgroups for each stratum
- Characteristics of each - Stratified list costly to
stratum can be estimated, prepare
and comparisons made
37
38
19
1.8.2.4 Cluster Sampling
39
40
20
1.8.2.5 Multistage Sampling
Example: Suppose we want to study the average monthly
income of Petronas pump station. We know Malaysia has 14
states and we only need 100 pump stations throughout
Malaysia.
Steps:
1) In this case, states are clusters.
2) Select 4 states randomly (using SRS)
3) Choose 5 districts from each selected state (using SRS).
4) Select 5 pump stations to make up our sample of 100
(4 states x 5 districts x 5 pump stations)
41
42
21
1.8.3.1 Convenient/Chunk
Sampling
Use elements most available to obtain the
results quickly.
Eg.: An opinion poll may be conducted in a few
hours by collecting information from certain
shoppers at a single shopping mall.
Advantages: (i) the least expensive
(ii) the least time consuming
(iii) does not require sampling
frame
Disadvantages: not representative of the
population, introduce bias in researcher’s
classification of samples
Recommended for : pretesting questionnaire,
generating ideas, insights or hypotheses.
43
1.8.3.2 Judgmental
Sampling
Members are selected based on the judgement or expertise of the
researcher. From his judgement, he believes that these elements are
representative of the population of interest.
Eg.: If a researcher wants to find out what takes women managers to
make it to the top, the potential respondents would be those in top
management positions.
Advantages:
(i) the least expensive
(ii) the least time consuming
(iii) sample guaranteed to meet
specific objectives
Disadvantages: Introduce bias in researcher’s classification of samples
44
22
1.8.3.3 Quota Sampling
Similar to convenience sampling except the member
allocated to each group of respondents is based on the
population statistics or researcher’s choices.
Advantages :quick and cheap to organise
Disadvantages : not as representative of the population
as a whole as other sampling methods
Eg.: In a city, there are 48% men and 52% women. If a
total of 1000 residents needs to be interviewed, then the
quota of 480 men and 520 women will form the samples.
45
46
23
1.9 Data Collection Methods
Face-to-face Interview/Personal Interview
Advantages Disadvantages
47
24
1.9 Data Collection Methods
Direct Questionnaire/Mail/E-mail Questionnaire/
Online Survey
Advantage Disadvantage
Advantages Disadvantages
50
25
1.10 QUESTIONNAIRE DESIGN
a set of questions A good questionnaire must be well designed:
designed to generate the Introduce the questionnaire
data for achieving the Keep the questionnaire as short as possible
objectives of a research Provide clear and understandable instructions
project. Space out the questions
Ask short, simple, and clearly worded questions
it must be able to Start with demographic/general questions to help
provide a two-way respondents get started comfortably
communication between Use dichotomous and multiple choice questions
the respondent and the Avoid using double barreled questions (questions
include two questions in one)
researcher to ensure that
Use open-ended questions cautiously
accuracy of the data
Allow space on the right of the page for coding
collected.
Is useful to pretest a questionnaire
51
26