0% found this document useful (0 votes)
18 views51 pages

STA116 Chapter 1 - Descriptive Statistics (Part A)

The document provides an introduction to key concepts in statistics including descriptive statistics, inferential statistics, population and sample, types of variables, scale of measurement, sources of data, and sampling. It defines important terms and provides examples to illustrate statistical concepts.

Uploaded by

Aensya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views51 pages

STA116 Chapter 1 - Descriptive Statistics (Part A)

The document provides an introduction to key concepts in statistics including descriptive statistics, inferential statistics, population and sample, types of variables, scale of measurement, sources of data, and sampling. It defines important terms and provides examples to illustrate statistical concepts.

Uploaded by

Aensya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

STA116

INTRODUCTION TO STATISTICS &


PROBABILITY
AT THE END OF THIS CHAPTER, STUDENTS
SHOULD BE ABLE TO EXPLAIN BASIC Chapter 1: Descriptive
Statistics (Part A)
TERMS IN STATISTICS AND ITS
PRACTICAL APPLICATION.
1.1 WHAT IS STATISTICS?
STATISTICS is the study of COLLECTION, ORGANIZATION,
ANALYSIS, INTERPRETATION AND PRESENTATION of data.
Statistical methods help us to make scientific and intelligent
decision. For instance, sample data are used to forecast sales
and profit.
1.1 WHAT IS STATISTICS?
Statistics has 2 aspects:
Theoretical or Mathematical Statistics
•deals with the development, derivation, and proof of statistical
theorems, formulas, rules and laws.
Applied Statistics
•involves the applicants of those theorems, formulas, rules, and laws
to solve real-world problems.
1.2.1 DESCRIPTIVE STATISTICS
- Consists of methods of
i) collecting data,
ii) characterizing data,
iii) presenting data by using tables,
graphs, and
iv) summary measures
without attempting to infer anything that goes
beyond the data themselves.
- Purpose : Describe data
1.2.2 INFERENTIAL STATISTICS
- Methods that use to draw conclusions or
inferences about characteristics of
population based on sample data.
Includes:
 estimation
 hypothesis testing
- Purpose: Make generalizations/decisions
about population characteristic.
1.2.2 INFERENTIAL STATISTICS
1.3 KEY TERMS
a) POPULATION
consists of all elements – individuals, items or objects
whose characteristics are being studied.
Target population: the population that is being
studied
Examples:
• All companies in Kuching.
• All undergraduates in UiTM Samarahan.
• All AirAsia stewardesses who have attended Basic
Grooming course.
1.3 KEY TERMS
b) SAMPLE c) SAMPLE SURVEY
- a portion of the population selected for study. - a process for collecting data on
a sample of observations which
are selected from the population
of interest using a probability-
based sample design.
d) CENSUS
- a survey that includes every member of the population.

e) PARAMETER
- summary measure about population.

f) STATISTIC
- summary measure about sample.
1.4 BASIC TERMS
Element or Member – a specific object or object (for eg. a person, firm, or
item) about which the information is collected.

Variable - a characteristic or attribute of interest that assumes different


values for different elements.

Observation or measurement – the value of a variable for an element.

Data set – is a collection of observations on one or more variables.

Pilot study – a study done before the actual fieldwork is carried out.
The advantage of pilot study (2 advantages)
1.4 BASIC TERMS
2018
1.5 SOURCES OF DATA
Data
Sources
Primary Secondary
Data Data
 First-hand data gathered from primary  Collected by other parties.
sources, raw data
Sources: journals, newspapers,
Eg.: data from experiments, survey, internet etc.
observation
Pros:
Pros:
+ easily accessible
+ more accurate & consistent with (convenient), variety
research objectives
+ save cost, time & manpower
+ will be able to explain how data is
collected & limitation of their use Cons:
Cons: - lack of accuracy & not
reliable, transcription error
- costly (time, manpower, money)
- data does not meet specific
- inconvenient needs
1.6 TYPES OF VARIABLE
1.6.1 Qualitative Variable (categorical)
- Variable that cannot be measured numerically but can be
divided into different categories.
E.g: Gender is measure as ‘male or ‘female’.
- Opinion is measured as ‘strongly disagree’, ‘disagree’, ‘neutral’,
‘ agree’ or ‘strongly agree’.
1.6 TYPES OF VARIABLE
1.6.2 Quantitative Variable (numerical)
- is a variable that can be measured numerically.
E.g: no of children, total income
may be classified as either discrete variables or continuous variables

a) Discrete variable
A variable which has limited number of values that are collected by counting.
Can assume only certain values with no intermediate values.
Eg: the number of cars sold, number of books read

b) Continuous variable
A variable which can assume any numerical value over a certain interval or intervals.
Eg: the time taken to finish a test, weight of athletes
1.6 TYPES OF VARIABLE
Example:
Windows is a computer software product made by Microsoft Corporation. In designing
Windows 11, Microsoft surveyed thousands of users of Windows 10 and asked them
how the product could be improved. Assume customers were asked the following
questions:
i. Are you the most frequent user of Windows in your household? (Response:
Yes/No)
Variable: The frequentness of user using Windows in a household
Type: QL
i. Are the tutorial instructions that accompany Windows helpful? (Response: Very
helpful/Helpful/Average/Little bit helpful/Not helpful)
ii. When using a printer, do you most frequently use a laser printer or another
type of printer?
iii. If the speed of the windows could be changed, which one of the following
would you prefer? (Response: In certain speed unit)
iv. How many people in your household have used Windows at least once?
Each of these questions defines a variable of interest to the company. Classify the
data generated for each variable as quantitative discrete/continuous or qualitative.
Justify your classification.
1.7 SCALE/LEVEL OF MEASUREMENT
The variable’s scale indicates the accuracy at which the data has been
measured.
This classification has implications as to the type of analysis that can be
performed on the variable.
NOMINAL SCALE
 A scale in which the Gender: Male (M)
number or letter assigned Female (F)
to objects serve as labels
for identification. Payment: Cash (1)
Credit Card (2)
 Categories (describe or Cheque (3)
names) only
Language : BM (1)
 The order of selection of Mandarin (2)
answers is of not important. English (3)
ORDINAL SCALE
 A scale that arrange Rating of the services:
object or alternatives  Excellent
according to their  Good
magnitude.  Fair
 Poor
 Classifies qualitative data.
Education level:
 Implies ordering/ranking.  Secondary School
 Undergraduate
 Normally we use ranks to  Master
give ordering to this type PhD
of data.
Student’s Grade: A, B, C, D..
1.7.1INTERVAL
Interval Scale
SCALE
 A scale that not only arrange Temperature:
objects according to their 450C 1000C
magnitude, but also assign a 470C 500C
meaning to distinguishes
this order arrangement in unit (500C is not twice as hot as
of equal interval. 1000C)

 Classifies quantitative data. Year born etc

 Do not have true zero point Standardized exam scores (IQ


– zero is arbitrary. test scores)/Aptitude test scores
1.7.1RATIO
1.7.1 Ratio
IntervalScale
Scale
SCALE
 A scale that arranges Length of the steel rod:
object according to their 3 cm , 4 cm, 50cm
magnitude and
distinguishes this ordered
arrangement in unit of Age
equal interval. Sales
Income
 Classifies quantitative Cost
data

 There is inherent point –


has a true zero.
SUMMARY FOR SCALE OF MEASUREMENT
1.7 SCALE OF MEASUREMENT
Example:
Windows is a computer software product made by Microsoft Corporation. In
designing Windows 10, Microsoft telephoned thousands of users of Windows Vista
(old version) and asked them how the product could be improved. Assume customers
were asked the following questions:
i. Are you the most frequent user of Windows in your household?
ii. Are the tutorial instructions that accompany Windows helpful?
iii. When using a printer, do you most frequently use a laser printer or another
type of printer?
iv. If the speed of the windows could be changed, which one of the following
would you prefer: slower, unchanged, or faster?
v. How many people in your household have used Windows at least once?
Classify the data generated for each variable into nominal, ordinal, interval, ratio.
Justify your classification.
1.8 SAMPLING
Sampling– process of choosing a sample of elements from a total
population of elements.

Random sampling – a procedure of sampling from a population in


which the selection of sample is based on chance and every element has
a known, non zero probability of being selected.
Advantages of sampling as compared to census (the whole population)
are:
(a) easy to handle
(b) save time & cost
(d) convenient

Sampling Process:

Defining the Establishing Determining Specifying Selecting the


population sample sample size sampling sample
frame method
1.8 SAMPLING
Sampling unit – a single element or a group of elements subject to
selection in the sample.

Sampling error – statistical error that results when an analyst selects a


sample that is NOT representative of the population as a whole.

Non sampling error - any error or inaccuracies caused by factors other


than sampling error.
SAMPLING FRAME
A list of updated information of all unit of population.
It is prepared so that samples can be selected randomly from a
population.
Example:
-List of all villages in Kedah
-List of all workers in ABC factory
-List of all students’ names in UiTM
-List of all names and addresses of subscribers in telephone
directory.
Essential to select a sample of units from population
1.8.1 TYPE OF SAMPLING
Type of
Sampling

Probability Non-Probability
Sampling Sampling

1) Simple random sampling


1) Convenience/Chunk
2) Systematic sampling sampling
3) Cluster sampling 2) Judgmental sampling
4) Stratified sampling 3) Snowball sampling
5) Multistage sampling 4) Quota sampling

- A complete sampling frame is a must! - A complete sampling


- Assures every unit in the population frame is not needed
has the equal chance to be selected.
1.8.2 PROBABILITY SAMPLING
- each and every individual in the population has the equal chance
to be selected as the sample.
- the element of biases does not exist
- is used when the objective of the study is to make descriptive
statements about the sample or to make inferences about the
population.
- elements or sampling units is needed to produce a random
sample.
Eg. Telephone directories, list of registered small industries in
Kuching.
1.8.2.1 SIMPLE RANDOM SAMPLING
- Each population element has an equal chance of being selected.
- Selecting one subject does not affect selecting others
- May use random number table, lottery or draw.
- The existence of a complete and updated sampling frame is a must.
- Target population has to be homogenous

Pros Cons
- samples selected have - Requires a complete
least bias sampling frame that is
- easy to implement sometimes difficult to get
- Minimizes possible - Very unrepresentative
classification errors sample if population is
not homogeneous
- Higher cost and time
consuming (respondents
may be widespread)
1.8.2.1SIMPLE
1.8.2.1 SimpleRANDOM
Random Sampling
SAMPLING
Two common procedures in selecting sample.
a) Lottery Method Example:
Selecting 10 students out of 30 students.
i- List down the members of population
and number them. i- List down all the students name randomly and
number them.

ii- Assign numbers on pieces of cards ii- Assign numbers on pieces of cards representing
each students (30 cards)
representing each unit of population
Iii- Put the cards in a box and draw one card at a
time without replacement until you have 10 cards.
Iii- Put the cards in a box and draw
one card at a time without iv- Unit of population is drawn and list down the
selected samples.
replacement until the required number
of samples are selected. 1) #20 = Ali
2) #16 = Abu
iv- Unit of population is drawn and list
3) # 7 = Siti
down the selected samples
….10) #26 = Sara
1.8.2.1SIMPLE
1.8.2.1 SimpleRANDOM
Random Sampling
SAMPLING
Two common procedures in selecting sample.
b) Random Number Table Method
Let’s say N = 4000, n = 400
i- List down the members of population
and number them.
ii- Choose randomly a starting position,
form a required digit number from the
starting position. The number can be
selected in any direction.
iii- Select numbers until the required
number of samples are selected.
iv- Unit of population is drawn and list
down the selected samples
1.8.2.1SIMPLE
1.8.2.1 SimpleRANDOM
Random Sampling
SAMPLING
Example: Selecting 10 students out of 30 students.
i) List down the students randomly and number them.
ii) Choose randomly a starting point e.g. 10005. Form 2 digit number starts from
10005 i.e. 10, 00, 50, 82, 16, 25, 90, 60, 24,……….
iii) Numbers selected: 10, 16, 25, 24,….. (should select 10)
iv) Unit of population is drawn and list down the selected samples.
1) #10 = Ali
2) #16 = Abu
3) # 25 = Siti
….10) #19 = Sara
1.8.2.2 SYSTEMATIC SAMPLING
Similar to SRS - but not fully randomized since only the first sampling unit is
selected randomly while the rest follow intervals of k.
1.8.2.2 SYSTEMATIC SAMPLING
Example: If there are a total of N = 500 primary school canteen operators in
the Klang Valley in 2016 who are registered with the Ministry of Education. We
require a sample of n = 25 operators for a particular study.
Step 1: Make sure the list is random, i.e.,
The names of the companies are sorted alphabetically.
Then number each operator from 1 to 500 for identification
purpose.
Step 2: Divide the operators into interval contains k
operators. Here k =
population size 500
= = 20
sample size 25

- For every 20 operators we select only one to represent that interval.


1.8.2.2SYSTEMATIC
1.8.2.2 SystematicSAMPLING
Sampling
Step 3: For the first interval only, select r at
random. Let's say ‘7’. Therefore operator
with identification number 7 will be the first
sample operator selected for the study. The
rest of the operators selected in the
remaining intervals will depend on this Pros Cons
interval. - Simple & easy to - May result in
Step 4: Skip 20 units in between each draw samples systematic bias
selected identification number. - Less costly than - Very
SRS unrepresentative
Step 5: The remaining selection will be - Samples can be sample if
operators with the following identification chosen faster population is not
numbers: 27 47 67 87 107 … 487.
(Last selection= N-k+r) homogeneous

1) #7 = Ali
2) #27 = Siti 3) #47
…. 25) #487 = Raju
1.8.2.3 STRATIFIED SAMPLING
- Divide the population elements into non-overlapping groups,
called strata, and then one sample is selected from each of these
stratum.
- Within stratum, elements are homogenous.
- Between strata, elements are heterogeneous.
- Appropriate when there is a large variation within the population.
E.g: To select a sample from population of a city and we want
households with different income levels to be equally represented in
the sample. Therefore we separate the population to different
income level groups.
- The sizes of the samples selected from different strata are
proportionate to the sizes of the subpopulations in these strata.
1.8.2.3 STRATIFIED SAMPLING
Steps:
1) Divide the population into
homogeneous strata.
2) List down all elements in
each stratum.
3) Calculate the proportion for
each stratum.
No. of samples:
Group A = 3
Group B = 4
Group C = 3
4) For each stratum, elements are
taken randomly using SRS
method.
1.8.2.3 STRATIFIED SAMPLING

Pros Cons
- Accurate and effective - Requires accurate information
representation of all on proportion for each
subgroups stratum
- Characteristics of each - Stratified list costly to
stratum can be estimated, prepare
and comparisons made
1.8.2.4 CLUSTER SAMPLING
- Divide population into clusters, then randomly select
clusters (using SRS method). Finally, a random sample of
elements or all elements from each of the selected
clusters is selected.
- All clusters are similar and, hence, representative of the
population.
- Advantages: - It can be applied to a large study area
- Less costly and practical
- Disadvantages: - higher sampling error
1.8.2.4 CLUSTER SAMPLING
1.8.2.5 MULTISTAGE SAMPLING
- An extension of cluster sampling.
- Covering large geographical areas.
- It is used when the initial clusters involved seem to be complex to handle.
They are broken down further into smaller clusters, to a size, which is more
manageable.
1.8.2.5 MULTISTAGE SAMPLING
Example: Suppose we want to study the average monthly income
of Petronas pump station. We know Malaysia has 14 states and
we only need 100 pump stations throughout Malaysia.
Steps:
1) In this case, states are clusters.
2) Select 4 states randomly (using SRS)
3) Choose 5 districts from each selected state (using SRS).
4) Select 5 pump stations to make up our sample of 100
(4 states x 5 districts x 5 pump stations)
1.8.3 NON-PROBABILITY SAMPLING
- A sample where not all elements of the population of
interest have equal chance of being chosen.

- The element of biases exists.

- Unfavourable because it will affect the reliability and


accuracy of the study.

- Conclusion made is restricted to the samples only, it


cannot be extended to the actual population.
1.8.3.1 CONVENIENT/CHUNK
SAMPLING
- Use elements most available to obtain the results
quickly.
- Eg.: An opinion poll may be conducted in a few
hours by collecting information from certain
shoppers at a single shopping mall.
- Advantages: (i) the least expensive
(ii) the least time consuming
(iii) does not require sampling
frame
- Disadvantages: not representative of the
population, introduce bias in researcher’s
classification of samples
- Recommended for : pretesting questionnaire,
generating ideas, insights or hypotheses.
1.8.3.2 JUDGMENTAL
SAMPLING
- Members are selected based on the judgement or expertise of the
researcher. From his judgement, he believes that these elements are
representative of the population of interest.
- Eg.: If a researcher wants to find out what takes women managers to make it
to the top, the potential respondents would be those in top management
positions.
- Advantages:
(i) the least expensive
(ii) the least time consuming
(iii) sample guaranteed to meet
specific objectives
- Disadvantages: Introduce bias in researcher’s classification of samples
1.8.3.3 QUOTA SAMPLING
- Similar to convenience sampling except the member
allocated to each group of respondents is based on the
population statistics or researcher’s choices.
- Advantages :quick and cheap to organise
- Disadvantages : not as representative of the population as
a whole as other sampling methods
- Eg.: In a city, there are 48% men and 52% women. If a
total of 1000 residents needs to be interviewed, then the
quota of 480 men and 520 women will form the samples.
1.8.3.4 SNOWBALL SAMPLING
- An initial group of respondents is selected, usually at random. Then,
respondents are asked to recommend others who belong to the target
population of interest.
- Probability sampling is used to select the initial respondents. The final
sample is a nonprobability sample because the referrals will have
demographic and psychographic characteristics more similar to the persons
referring them than would occur by chance.
- Advantages :can estimate rare characteristics
- Disadvantages :
- not as representative
of the population as a whole
as other sampling methods
1.9 DATA COLLECTION METHODS
Face-to-face Interview/Personal Interview

approach a selected person and ask questions from the questionnaire


and record the answers.

Advantages Disadvantages

- can clarify any doubts from the - expensive


respondent - interviewer’s gestures can affect
- can note specific reactions from the response obtained
respondent - errors in recording the response
- respondent will respond spontaneously - interviewer needs to be supervised
- incorrect information can be detected
1.9
1.9Data
DATA Collection Methods
COLLECTION METHODS
Telephone Interview
Advantages Disadvantages

- less expensive - lower response rate


- easy to monitor (as long - cannot provide any help
as specified interview - fewer questions may be
procedure is being asked
followed) - restrict ourselves to
individuals who can be
reached by phone
1.9
1.9Data
DATA Collection Methods
COLLECTION METHODS
Direct Questionnaire/Mail/E-mail Questionnaire/
Online Survey
Advantage Disadvantage
- low cost - lower rate of response
- need not monitor the interviewers - cannot provide any help
- enough time is given to the - those who do answer the
respondents to answer before questionnaire may not always be
the specified deadline the ones to whom the
- no gestures from the interviewers questionnaire is addressed to –
to affect the response obtained non representative sample
1.9
1.9Data
DATA Collection Methods
COLLECTION METHODS
Direct Observation
- used in many studies that do not involve measurements on individuals.
- Eg. Observing what Malaysian do while waiting to be called in an
outpatient government clinic.

Advantages Disadvantages
- not affected by respondents - inconsistencies among the
observers
1.10 QUESTIONNAIRE DESIGN
 a set of questions designed A good questionnaire must be well designed:
to generate the data for  Introduce the questionnaire
achieving the objectives of a
research project.  Keep the questionnaire as short as possible
 Provide clear and understandable instructions
 it must be able to provide a
 Space out the questions
two-way communication
between the respondent and  Ask short, simple, and clearly worded questions
the researcher to ensure that  Start with demographic/general questions to help
accuracy of the data collected. respondents get started comfortably
 Use dichotomous and multiple choice questions
 Avoid using double barreled questions (questions
include two questions in one)
 Use open-ended questions cautiously
 Allow space on the right of the page for coding

 Is useful to pretest a questionnaire

You might also like