0% found this document useful (0 votes)
55 views26 pages

STA108 Chapter 1 - Introduction To Statistics

This document is an introduction to statistics chapter from a textbook. It discusses key topics in statistics including [1] what statistics is, its theoretical and applied aspects, [2] descriptive and inferential statistics, [3] key terms like population, sample, parameter and statistic, [4] quantitative and qualitative variables and different scales of measurement. The chapter aims to explain basic statistical concepts and their practical applications.

Uploaded by

2022473608
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views26 pages

STA108 Chapter 1 - Introduction To Statistics

This document is an introduction to statistics chapter from a textbook. It discusses key topics in statistics including [1] what statistics is, its theoretical and applied aspects, [2] descriptive and inferential statistics, [3] key terms like population, sample, parameter and statistic, [4] quantitative and qualitative variables and different scales of measurement. The chapter aims to explain basic statistical concepts and their practical applications.

Uploaded by

2022473608
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

STA108

Statistics &
Probability
Chapter 1
Introduction to Statistics

At the end of this chapter, students


should be able to explain basic terms
in statistics and its practical
application.

Introduction to Statistics

1
1.1 What is Statistics?
 STATISTICS is the study of COLLECTION, ORGANIZATION,
ANALYSIS, INTERPRETATION AND PRESENTATION of data.
 Statistical methods help us to make scientific and intelligent
decision. For instance, sample data are used to forecast sales
and profit.

1.1 What is Statistics?


Statistics has 2 aspects:
 Theoretical or Mathematical Statistics
• deals with the development, derivation, and proof of statistical
theorems, formulas, rules and laws.
 Applied Statistics
• involves the applicants of those theorems, formulas, rules, and
laws to solve real-world problems.

2
1.2.1 Descriptive Statistics
 Consists of methods of
i) collecting data,
ii) characterizing data,
iii) presenting data by using tables,
graphs, and
iv) summary measures
 Without attempting to infer anything that goes
beyond the data themselves.
 Purpose : Describe data

1.2.2 Inferential Statistics


 Methods that use to draw conclusions or
inferences about characteristics of
population based on sample data.
 Includes:
 estimation
 hypothesis testing
 Purpose: Make generalizations/decisions
about population characteristic.

3
1.2.2 Inferential Statistics

1.3 Key Terms


a) POPULATION
 consists of all elements – individuals, items or
objects whose characteristics are being studied.
 Target population: the population that is being
studied
Examples:
• All companies in Kuching.
• All undergraduates in UiTM Samarahan.
• All AirAsia stewardesses who have attended Basic
Grooming course.

4
b) SAMPLE c) SAMPLE SURVEY
- a portion of the population selected for study. - a process for collecting data on
a sample of observations which
are selected from the population
of interest using a probability-
based sample design.

d) CENSUS
- a survey that includes every member of the population.

e) PARAMETER
- summary measure about population.

f) STATISTIC
- summary measure about sample.

1.4 Basic Terms


 Element or Member – a specific object or object (for eg. a person, firm,
or item) about which the information is collected.

 Variable - a characteristic or attribute of interest that assumes different


values for different elements.

 Observation or measurement – the value of a variable for an element.

 Data set – is a collection of observations on one or more variables.

 Pilot study – a study done before the actual fieldwork is carried out.

10

5
Example:
1.4 Basic Terms

2018

11

1.5 Sources of Data


Data
Sources
Primary Secondary
Data Data
 First-hand data gathered from primary  Collected by other parties.
sources
Sources: journals, newspapers,
Eg.: data from experiments, survey, internet etc.
observation
Pros:
Pros:
+ easily accessible (convenient),
+ more accurate & consistent with variety
research objectives
+ save cost, time & manpower
+ will be able to explain how data is
collected & limitation of their use Cons:
Cons: - lack of accuracy & not reliable,
transcription error
- costly (time, manpower, money)
- data does not meet specific needs
- inconvenient

12

6
1.6 Types of Variable
1.6.1 Quantitative Variable (numerical)
 - is a variable that can be measured numerically.
 Eg: no of children, total income
 may be classified as either discrete variables or continuous variables

a) Discrete variable
 A variable which has limited number of values that are collected by counting.
 Can assume only certain values with no intermediate values.
 Eg: the number of cars sold, number of books read

b) Continuous variable
 A variable which can assume any numerical value over a certain interval or
intervals.
 Eg: the time taken to finish a test, weight of athletes

13

1.6 Types of Variable


1.6.2 Qualitative Variable (categorical)
 Variable that cannot be measured numerically but can be
divided into different categories.
 Eg: Gender is measure as ‘male or ‘female’.
 Opinion is measured as ‘strongly disagree’, ‘disagree’, ‘neutral’,
‘ agree’ or ‘strongly agree’.

14

7
1.6 Types of Variable
Example:
Windows is a computer software product made by Microsoft Corporation. In designing
Windows 10, Microsoft telephoned thousands of users of Windows Vista (old version)
and asked them how the product could be improved. Assume customers were asked
the following questions:
i. Are you the most frequent user of Windows in your household?
ii. Are the tutorial instructions that accompany Windows helpful?
iii. When using a printer, do you most frequently use a laser printer or another
type of printer?
iv. If the speed of the windows could be changed, which one of the following
would you prefer: slower, unchanged, or faster?
v. How many people in your household have used Windows at least once?

Each of these questions defines a variable of interest to the company. Classify the
data generated for each variable as quantitative or qualitative. Justify your
classification.

15

1.7 Scales of Measurement


 The variable’s scale indicates the accuracy at which the data has
been measured.
 This classification has implications as to the type of analysis that can
be performed on the variable.

16

8
1.7.1 Nominal Scale
 A scale in which the Gender: Male (M)
number or letter Female (F)
assigned to objects
serve as labels for Payment: Cash (1)
identification. Credit Card (2)
Cheque (3)
 Categories (describe or
names) only Language : BM (1)
Mandarin (2)
 The order of selection English (3)
of answers is of not
important.

17

1.7.1 Ordinal Scale


 A scale that arrange Rating of the services:
object or alternatives  Excellent
according to their  Good
magnitude.  Fair
 Poor
 Classifies qualitative data.
Education level:
 Implies ordering/ranking.  Secondary School
 Undergraduate
 Normally we use ranks to  Master
give ordering to this type PhD
of data.
Student’s Grade: A, B, C, D..
18

9
1.7.1 Interval Scale
 A scale that not only Temperature:
arrange objects according 450C 1000C
to their magnitude, but 0
47 C 500C
also assign a meaning to
distinguishes this order (500C is not twice as hot as
arrangement in unit of 1000C)
equal interval.
Year born etc
 Classifies quantitative
data. Standardized exam scores
(IQ test scores)/Aptitude test
 Do not have true zero scores
point – zero is arbitrary.
19

1.7.1 Ratio Scale


 A scale that arranges Length of the steel rod:
object according to 3 cm , 4 cm, 50cm
their magnitude and
distinguishes this
ordered arrangement Age
in unit of equal interval. Sales
Income
 Classifies quantitative Cost
data

 There is inherent point


– has a true zero.

20

10
Summary for scale of measurement

21

1.7 Scale of Measurement


Example:
Windows is a computer software product made by Microsoft Corporation. In
designing Windows 10, Microsoft telephoned thousands of users of Windows
Vista (old version) and asked them how the product could be improved. Assume
customers were asked the following questions:
i. Are you the most frequent user of Windows in your household?
ii. Are the tutorial instructions that accompany Windows helpful?
iii. When using a printer, do you most frequently use a laser printer or another
type of printer?
iv. If the speed of the windows could be changed, which one of the following
would you prefer: slower, unchanged, or faster?
v. How many people in your household have used Windows at least once?
Classify the data generated for each variable into nominal, ordinal, interval, ratio.
Justify your classification.

22

11
1.8 Sampling
 Sampling– process of choosing a sample of elements from a total
population of elements.

 Random sampling – a procedure of sampling from a population in


which the selection of sample is based on chance and every element
has a known, non zero probability of being selected.
 Advantages of sampling as compared to census (the whole population)
are:
(a) easy to handle
(b) save time & cost
(d) convenient

Sampling Process:
Establishing Specifying
Defining the Determining Selecting
sample sampling
population sample size the sample
frame method

23

1.8 Sampling
 Sampling unit – a single element or a group of elements subject to
selection in the sample.

 Sampling error – statistical error that results when an analyst selects a


sample that is NOT representative of the population as a whole.

 Non sampling error - any error or inaccuracies caused by factors other


than sampling error.

24

12
Sampling Frame
 A list of updated information of all unit of population.
 It is prepared so that samples can be selected randomly from a
population.
 Example:
- List of all villages in Kedah
- List of all workers in ABC factory
- List of all students’ names in UiTM
- List of all names and addresses of subscribers in telephone
directory.
 Essential to select a sample of units from population
25

1.8.1 Type of Sampling


Type of
Sampling

Probability Non-Probability
Sampling Sampling

1) Simple random sampling


1) Convenience/Chunk
2) Systematic sampling
sampling
3) Cluster sampling
2) Judgmental sampling
4) Stratified sampling
3) Snowball sampling
5) Multistage sampling
4) Quota sampling

- A complete sampling frame is a must!


- A complete sampling frame
- Assures every unit in the population has the is not needed
equal chance to be selected.

26

13
1.8.2 Probability Sampling
 each and every individual in the population has the equal
chance to be selected as the sample.
 the element of biases does not exist
 is used when the objective of the study is to make descriptive
statements about the sample or to make inferences about the
population.
 elements or sampling units is needed to produce a random
sample.
Eg. Telephone directories, list of registered small industries in
Kuching.

27

1.8.2.1 Simple Random Sampling


 Each population element has an equal chance of being selected.
 Selecting one subject does not affect selecting others
 May use random number table, lottery or draw.
 The existence of a complete and updated sampling frame is a must.
 Target population has to be homogenous
Pros Cons
- samples selected have - Requires a complete
least bias sampling frame that is
- easy to implement sometimes difficult to
- Minimizes possible get
classification errors - Very unrepresentative
sample if population is
not homogeneous
- Higher cost and time
consuming (respondents
may be widespread)

28

14
1.8.2.1 Simple Random Sampling
 Two common procedures in selecting sample.
a) Lottery Method Example:
Selecting 10 students out of 30 students.
i- List down the members of
population and number them. i- List down all the students name randomly and
number them.
ii- Assign numbers on pieces of cards ii- Assign numbers on pieces of cards representing
each students (30 cards)
representing each unit of population
Iii- Put the cards in a box and draw one card at a
time without replacement until you have 10 cards.
Iii- Put the cards in a box and draw
one card at a time without iv- Unit of population is drawn and list down the
selected samples.
replacement until the required
number of samples are selected. 1) #20 = Ali
2) #16 = Abu
iv- Unit of population is drawn and
3) # 7 = Siti
list down the selected samples
….10) #26 = Sara

29

1.8.2.1 Simple Random Sampling


 Two common procedures in selecting sample.
b) Random Number Table Method
Let’s say N = 4000, n = 400
i- List down the members of
population and number them.
ii- Choose randomly a starting position,
form a required digit number from the
starting position. The number can be
selected in any direction.
iii- Select numbers until the required
number of samples are selected.
iv- Unit of population is drawn and list
down the selected samples

30

15
1.8.2.1 Simple Random Sampling
 Example: Selecting 10 students out of 30 students.
i) List down the students randomly and number them.
ii) Choose randomly a starting point e.g. 10005. Form 2 digit number
starts from 10005 i.e. 10, 00, 50, 82, 16, 25, 90, 60, 24,……….
iii) Numbers selected: 10, 16, 25, 24,….. (should select 10)
iv) Unit of population is drawn and list down the selected samples.
1) #10 = Ali
2) #16 = Abu
3) # 25 = Siti
….10) #19 = Sara

31

1.8.2.2 Systematic Sampling


 Similar to SRS - but not fully randomized since only the first sampling unit is
selected randomly while the rest follow intervals of k.

32

16
1.8.2.2 Systematic Sampling
 Example: If there are a total of N = 500 primary school canteen operators in
the Klang Valley in 2016 who are registered with the Ministry of Education.
We require a sample of n = 25 operators for a particular study.
 Step 1: Make sure the list is random, i.e.,
The names of the companies are sorted alphabetically.
Then number each operator from 1 to 500 for identification
purpose.
 Step 2: Divide the operators into interval contains k
operators. Here k =
population size 500
  20
sample size 25

- For every 20 operators we select only one to represent that interval.

33

1.8.2.2 Systematic Sampling


 Step 3: For the first interval only, select r
at random. Let's say ‘7’. Therefore
operator with identification number 7
will be the first sample operator selected
for the study. The rest of the operators
Pros Cons
selected in the remaining intervals will
depend on this interval. - Simple & easy to - May result in
draw samples systematic bias
 Step 4: Skip 20 units in between each - Less costly than - Very
selected identification number.
SRS unrepresentative
 Step 5: The remaining selection will be - Samples can be sample if
operators with the following chosen faster population is not
identification numbers: 27 47 67 87 homogeneous
107 … 487. (Last selection= N-k+r)
1) #7 = Ali
2) #27 = Siti 3) #47
…. 25) #487 = Raju

34

17
1.8.2.3 Stratified Sampling
 Divide the population elements into non-overlapping groups,
called strata, and then one sample is selected from each of
these stratum.
 Within stratum, elements are homogenous.
 Between strata, elements are heterogeneous.
 Appropriate when there is a large variation within the
population.
Eg: To select a sample from population of a city and we want
households with different income levels to be equally
represented in the sample. Therefore we separate the
population to different income level groups.
 The sizes of the samples selected from different strata are
proportionate to the sizes of the subpopulations in these strata.
35

1.8.2.3 Stratified Sampling


Steps:
1) Divide the population into
homogeneous strata.
2) List down all elements in
each stratum.
3) Calculate the proportion for
each stratum.
No. of samples:
Group A = 3
Group B = 4
Group C = 3
4) For each stratum, elements
are taken randomly using SRS
method.

36

18
1.8.2.3 Stratified Sampling

Pros Cons
- Accurate and effective - Requires accurate
representation of all information on proportion
subgroups for each stratum
- Characteristics of each - Stratified list costly to
stratum can be estimated, prepare
and comparisons made

37

1.8.2.4 Cluster Sampling


 Divide population into clusters, then randomly select
clusters (using SRS method). Finally, a random sample
of elements or all elements from each of the selected
clusters is selected.
 All clusters are similar and, hence, representative of
the population.
Advantages: - It can be applied to a large study area
- Less costly and practical
 Disadvantages: - higher sampling error

38

19
1.8.2.4 Cluster Sampling

39

1.8.2.5 Multistage Sampling


 An extension of cluster sampling.
 Covering large geographical areas.
 It is used when the initial clusters involved seem to be complex to
handle. They are broken down further into smaller clusters, to a
size, which is more manageable.

40

20
1.8.2.5 Multistage Sampling
 Example: Suppose we want to study the average monthly
income of Petronas pump station. We know Malaysia has 14
states and we only need 100 pump stations throughout
Malaysia.
 Steps:
1) In this case, states are clusters.
2) Select 4 states randomly (using SRS)
3) Choose 5 districts from each selected state (using SRS).
4) Select 5 pump stations to make up our sample of 100
(4 states x 5 districts x 5 pump stations)

41

1.8.3 Non Probability


Sampling
 A sample where not all elements of the population of
interest have equal chance of being chosen.

 The element of biases exists.

 Unfavourable because it will affect the reliability and


accuracy of the study.

 Conclusion made is restricted to the samples only, it cannot


be extended to the actual population.

42

21
1.8.3.1 Convenient/Chunk
Sampling
 Use elements most available to obtain the
results quickly.
Eg.: An opinion poll may be conducted in a few
hours by collecting information from certain
shoppers at a single shopping mall.
 Advantages: (i) the least expensive
(ii) the least time consuming
(iii) does not require sampling
frame
 Disadvantages: not representative of the
population, introduce bias in researcher’s
classification of samples
 Recommended for : pretesting questionnaire,
generating ideas, insights or hypotheses.

43

1.8.3.2 Judgmental
Sampling
 Members are selected based on the judgement or expertise of the
researcher. From his judgement, he believes that these elements are
representative of the population of interest.
 Eg.: If a researcher wants to find out what takes women managers to
make it to the top, the potential respondents would be those in top
management positions.
 Advantages:
(i) the least expensive
(ii) the least time consuming
(iii) sample guaranteed to meet
specific objectives
Disadvantages: Introduce bias in researcher’s classification of samples

44

22
1.8.3.3 Quota Sampling
 Similar to convenience sampling except the member
allocated to each group of respondents is based on the
population statistics or researcher’s choices.
 Advantages :quick and cheap to organise
 Disadvantages : not as representative of the population
as a whole as other sampling methods
 Eg.: In a city, there are 48% men and 52% women. If a
total of 1000 residents needs to be interviewed, then the
quota of 480 men and 520 women will form the samples.

45

1.8.3.4 Snowball Sampling


 An initial group of respondents is selected, usually at random.
Then, respondents are asked to recommend others who belong to
the target population of interest.
 Probability sampling is used to select the initial respondents. The
final sample is a nonprobability sample because the referrals will
have demographic and psychographic characteristics more similar
to the persons referring them than would occur by chance.
 Advantages :can estimate rare characteristics
 Disadvantages :
- not as representative
of the population as a whole
as other sampling methods

46

23
1.9 Data Collection Methods
Face-to-face Interview/Personal Interview

approach a selected person and ask questions from the questionnaire


and record the answers.

Advantages Disadvantages

- can clarify any doubts from the - expensive


respondent - interviewer’s gestures can
- can note specific reactions from affect the response obtained
respondent - errors in recording the
- respondent will respond response
spontaneously - interviewer needs to be
- incorrect information can be supervised
detected

47

1.9 Data Collection Methods


Telephone Interview
Advantages Disadvantages

- less expensive - lower response rate


- easy to monitor (as - cannot provide any
long as specified help
interview procedure is - fewer questions may
being followed) be asked
- restrict ourselves to
individuals who can
be reached by phone
48

24
1.9 Data Collection Methods
Direct Questionnaire/Mail/E-mail Questionnaire/
Online Survey
Advantage Disadvantage

- low cost - lower rate of response


- need not monitor the - cannot provide any help
interviewers - those who do answer the
- enough time is given to the questionnaire may not
respondents to answer always be the ones to
before the specified whom the questionnaire is
deadline addressed to – non
- no gestures from the representative sample
interviewers to affect the
49

1.9 Data Collection Methods


Direct Observation
- used in many studies that do not involve measurements on
individuals.
- Eg. Observing what Malaysian do while waiting to be called
in an outpatient government clinic.

Advantages Disadvantages

- not affected by - inconsistencies


respondents among the
observers

50

25
1.10 QUESTIONNAIRE DESIGN
 a set of questions A good questionnaire must be well designed:
designed to generate the  Introduce the questionnaire
data for achieving the  Keep the questionnaire as short as possible
objectives of a research  Provide clear and understandable instructions
project.  Space out the questions
 Ask short, simple, and clearly worded questions
 it must be able to  Start with demographic/general questions to help
provide a two-way respondents get started comfortably
communication between  Use dichotomous and multiple choice questions
the respondent and the  Avoid using double barreled questions (questions
include two questions in one)
researcher to ensure that
 Use open-ended questions cautiously
accuracy of the data
 Allow space on the right of the page for coding
collected.
 Is useful to pretest a questionnaire

51

26

You might also like