0% found this document useful (0 votes)
107 views16 pages

Tutorial Introduction To Statistics

Statistics is the science of collecting, organizing, and analyzing numerical data. It is used in many fields like medicine, psychology, business, and government. There are two main types of statistics: 1. Descriptive statistics involves gathering and presenting quantitative information about data that has been collected. It summarizes data from a sample, but does not make broader generalizations. 2. Inferential statistics uses sample data to draw conclusions or make predictions about a larger population. It involves methods that help decision-makers use sample results to generalize to the overall population. Examples include determining if a new product will be successful or analyzing unemployment and inflation data to inform economic decisions. Statistics has become an important tool across many disciplines

Uploaded by

Instagram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views16 pages

Tutorial Introduction To Statistics

Statistics is the science of collecting, organizing, and analyzing numerical data. It is used in many fields like medicine, psychology, business, and government. There are two main types of statistics: 1. Descriptive statistics involves gathering and presenting quantitative information about data that has been collected. It summarizes data from a sample, but does not make broader generalizations. 2. Inferential statistics uses sample data to draw conclusions or make predictions about a larger population. It involves methods that help decision-makers use sample results to generalize to the overall population. Examples include determining if a new product will be successful or analyzing unemployment and inflation data to inform economic decisions. Statistics has become an important tool across many disciplines

Uploaded by

Instagram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Chapter 1 What Is Statistics

1.1 Introduction
People had been recording and using data, for example governments of

Chapter 1 ancient Babylonia, Egypt and Rome gathered detail records of population
and resources. The word statistics comes from the Italian word statista
(meaning “statesman”). It was first used by Gottfried Achenwall (1719-
WHAT IS STATISTICS 1772), a professor at Marlborough and Gottingen. Dr. E.A.W. Zimmermam
introduced the word statistics to England. Its use was popularized by Sir
John Sinclair in his work “Statistical Account of Scotland 1791-1799”.

Content Today, statistics has become an important tool in the work of many
academic disciplines such as medicine, psychology, education, sociology,
1.1 Introduction.......................................................................................................................... 1 engineering and physics, just to name a few. Statistics is also important in
1.1 Introduction.......................................................................................................................... 2 many aspects of society such as business, industry and government.
1.2 What Is Statistics?................................................................................................................ 3 Because of the increasing use of statistics in so many areas of our lives, it
1.3 Types Of Statistics ............................................................................................................... 3 has become very desirable to understand and practice statistical thinking.
1.4 Population Versus Sample ................................................................................................... 4 This is important even if you do not use statistical methods directly.
1.5 Types (Or Sources) Of Data................................................................................................. 5
1.6 Types Of Variable................................................................................................................ 6
The correct usage of statistical technique will enable the decision maker to
1.7 Scales Of Measurement........................................................................................................ 8
1.8 Sampling Techniques ......................................................................................................... 11 extract useful conclusion from a set of data. Virtually every area of serious
1. Random (Or Probabilistic) Sample Or Non-Random (Non–Probabilistic) Sample ......... 12 scientific inquiry can benefit from statistical analysis. For example;
2. Systematic Sampling (Interval Random Sampling, Quasi Random Sampling) ............... 15
3. Stratified Sampling .......................................................................................................... 16 1. Marketing research – statistics is of invaluable assistance in
4. Clusters Sampling (Area Sampling, One Stage) .............................................................. 18 determining whether a new product is likely to prove successful.
5. Multistage Sampling ........................................................................................................ 19
1.9 Data Collection Methods.................................................................................................... 22 2. Each month, government statistical offices release the latest
1.10 Designing A Questionnaire .............................................................................................. 24 numerical information on unemployment and inflation. Economists
Practice Exercises .................................................................................................................... 26 and financial advisors as well as policy makers in government and
business study these data in order to make informed decisions.

3. Medical researcher – concerned about the effectiveness of a new


drug, and statistics are evidence of the success of research efforts
When you have completed this chapter, you
will be able to: Statistics in the dictionary:
‰ Understand what statistics is.
‰ Understand, recognize and able to a. Webster’s New World Dictionary:
distinguish basic terms used in
statistics. 1. facts or data of a numerical kind, assembled, classified, and
‰ Have an idea about how to do a tabulated so as to present significant information about a given
statistical project. subject.
2. the science of assembling, classifying, and tabulating such facts
or data.

b. Meriam’s Webster Collegiate Dictionary

1. a branch of mathematics dealing with the collection, analysis,


interpretation, and presentation of masses of numerical data
2. a collection of quantitative data

2
Chapter 1 What Is Statistics Chapter 1 What Is Statistics

In descriptive statistics we are not interested in other data that were not
1.2 What is statistics? gathered but might have been; that is the subject of inferential statistics.

The word statistics in our everyday life means different things to different 2. Inferential Statistics - A decision, estimate, prediction, or generalization
people. As a field of study, statistics is the science of about a population based on a sample. It consists of methods that use
i. collecting, sample results to help make decisions or predictions about a population.
ii. organizing,
iii. presenting, Example of inferential statistics;
iv. analyzing, and
v. interpreting numerical data, (i) Based on a sample survey by a lecturer at a higher learning
for the purpose of assisting in making a more effective decision. The institution, only 45% of diploma graduates further their
decision making process must not be based on personal opinion or on studies in the Bachelor’s program in local IPTA.
belief. (ii) Department of Labor (Jabatan Buruh) uses the average
income of a sample of several hundreds workers to estimate
Statistics will enable you to be proficient data producers and efficient data the average income of all 3 million workers, it is using a
users. It extends to almost every realm of human endeavor. Statistics is a simple form of inferential statistics
powerful method for getting answers from data, and it is sometimes the (iii) This sample of 512 families from a district indicates with
best way to persuade others that your conclusions are correct. 95% confidence we can conclude that the average family
income in the county is between RM2518 and RM2932.

1.3 Types of Statistics Comparison between descriptive statistics and inferential statistics

Statistics can be divided into two categories: Descriptive Statistics and Descriptive statistics Inferential statistics
Inferential Statistics. 1. describe the data set 1. use the data to draw
2. concerned with describing and conclusions about the
1. Descriptive Statistics - The process of collecting, compiling, summarizing a sample population
summarizing and presenting data into graphical forms such us charts, 2. concerned with going beyond
graphs, tables or numerical forms such as averages and percentages the sample to make
derived from them so that one can evaluate the data set easily. predictions about the
population from which the
Descriptive statistics include a large variety of methods for summarizing or sample is being drawn.
describing a set of numbers. These methods may involve computational or
graphical analysis. For example, price index numbers are one example of a
descriptive statistic. The measures of central tendency and dispersion 1.4 Population versus sample
presented in this chapter are also descriptive statistics, because they
describe the nature of the data collected.
In statistics, a population is the entire collection of all observations of
interest to the researcher. It consists of all elements, individual, items or
Example of descriptive statistics: objects whose characteristics are being studied. Population can be finite or
(i) The percentage growth of Malaysia’s population from one infinite.
decade to the next.
(ii) Uses of pictorial display e.g. bar charts & pie charts. A sample is a portion, or part or subset of the population of interest.
(iii) The average income of the 104 families in our company is
RM28,673 per annum As an illustration, suppose we want to estimate the characteristics of a
population such as the average weight of all 30-year-old men in Perlis. If
In descriptive statistics, our objective is to describe the properties of a from each district 100 men are selected at random, then the selected men
group of scores or data that we have "in hand," i.e., data that are are a sample.
accessible to us in that we can write them down on paper or type them
into a spreadsheet.

3 4
Chapter 1 What Is Statistics Chapter 1 What Is Statistics

Statistical terms related to population and sample. Advantages and disadvantages of primary data

1. Parameter is a numerical characteristic or measure of a population, Advantages Disadvantages


a fixed and usually unknown quantity. For example, percentage 1. More accurate, reliable and 1. Very costly
(%) of voters, p, in Kedah who think the Government is doing a up-to-date
good job to control unemployment among fresh university 2. Time consuming
graduates. 2. If the data needed by decision
makers aren’t available from 3. Requires a lot of manpower.
2. Census is a study of the entire population. Data is gathered on other sources (secondary
every member of the population. data), primary data has to be
gathered
3. Sample survey is a study on some selected portion of the
population 3. Primary sources usually
explain how the data were
4. Statistic is a numerical characteristic of the sample data such as gathered and what limitation
the mean, proportion or variance that provide estimates of the exists to their use.
corresponding population parameters.
4. Usually satisfies the objectives
5. Element (Experimental Units) is objects (people or things) on which of a research
measurements is taken.
2. Secondary data
6. Pilot study is a pretest or trial run on a small number of elements Secondary data is a primary data that has been collected, processed and
(respondent) before conducting the actual survey. The objectives is published for the use of other people. There are various ways to obtain
to: secondary data, which is through:
i. To improve the questionnaires i. Newspapers, magazines and books
ii. To identify the problems that occur during the survey ii. economic reports (e.g. Laporan Bank Negara),
iii. To predict the cost, time and workforce needed. iii. statistical abstract,
iv. annual report of companies,
v. the Statistics department
vi. online sources
1.5 TYPES (OR SOURCES) OF DATA vii. and other sources
The definition of data is the value of the characteristic of an element. We
Advantages and disadvantages of secondary data
collect data due to the following reasons:
Advantages Disadvantages
1. Require less time 1. may contain errors in
1. Obtain Input to a Research Study
printing and transcription
2. Measure Performance
2. Require less effort from the primary sources.
3. Assist in Formulating Decision Alternatives.
4. Satisfy Curiosity
3. Inexpensive data source 2. do not know the conditions
5. Knowledge for the Sake of Knowledge
under which the data were
collected and summarized.
There are two classifications of data that is primary data and secondary
data.

1. Primary data (or raw data) 1.6 TYPES OF VARIABLE


Data gathered and collected by the researcher direct from his respondents
is primary data. It is exhaustive and exclusive. Data not arranged or 1. Definition of a variable
organized in any manner called as a set of raw data. Normally,
researchers collect primary data through survey, experiment or Measurement, the assigning of numbers or codes according to prior-set
observation. rules, is central to science, the scientific method, and statistical analyses.

5 6
Chapter 1 What Is Statistics Chapter 1 What Is Statistics

Performing statistical operations will depends on the nature of collected weights, heights of individuals, temperature in a
data and what type of measurements used. This is crucial to make sure room and liters of gasoline pumped
that there are no violations of certain assumptions in statistical analysis
and avoid drawing misleading conclusion. ii. Qualitative (Attribute) or Categorical variable
A variable that cannot assume a numerical value but
Variables are measurements that can vary or expressed as more than one classified into two or more nonnumeric categories is called a
value during a study. Variable is also defined as the characteristic qualitative or categorical variable. It is a non-numerical
(distinguishing feature) of the elements in a population or a sample under valued variable. The data collected on such a variable are
study. Thus, variables represent the general “thing” being measured and called qualitative data. Such data are inherently discrete, in
not any specific value or code. that there are a finite number of possible categories into
which each observation may fall. Examples are:
Examples are
i. Color of eyes: blue, green, brown etc.
i. If the DIB Course Tutor is interested in the percentage spent by ii. Exam result: pass or fail.
his student after receiving PTPTN, the variable is amount money. iii. Socio-economic status: low, middle or high.
ii. In a study concerning the income of wage earners of all Northern iv. Humans are classified as having one of four blood
Region UiTM Branch Campus graduate, the variable is income. types: A, B, AB or O.
iii. If a researcher measures the weight of 30 subjects, then weight v. Level of education
would be a variable. vi. Makes of a computer
iv. Examples of variables for humans are height, weight, number of vii. Mobile phone operators
siblings, sex, marital status and eye color. viii. Types of occupation
Remark
2. Types of variable (Quantitative variable and Qualitative variable) Because qualitative data always have a limited number of alternative
values, such variables are also described as discrete. All qualitative data
i. Quantitative (numerical) is a variable in numerical form (can are discrete, while some numeric data are discrete and some are
be measured numerically) such as income, age, height, and continuous
weight. It is measurable or countable. The data collected on
a quantitative variable referred to quantitative data. It is
always numeric and indicate either how much or how many.
There are two types of quantitative variable:
1.7 SCALES OF MEASUREMENT
a. discrete variable Statistics deals with measurements either quantitative or qualitative. The
Variable that can take countable values, which form measurements are the actual numerical values of a variable. Variables
a finite (or countably infinite) set of numbers. It can differ in "how well" they can be measured, i.e., in how much measurable
assume only certain values with no intermediate information their measurement scale can provide. Variable can be
values. Examples are: classified on the basis of their level of measurement. The way we classify
variables greatly affects how we can use them in our analysis.
i. the number of students in the library
ii. number of cars sold by Proton weekly There are four generally used scales of measurement, listed here from
iii. The Statistics Department collects data on weakest to strongest.
household size and publishes the information
in Current Population Reports. 1. Nominal – classifies data into mutually exclusive (non-
overlapping), exhausting categories in which no order or ranking
b. continuous variable can be imposed on data.
A continuous variable can assume any value over a
certain range, and we cannot count these values. Nominal measurement consists of assigning items to groups or
Continuous quantitative data is a value that categories. No quantitative information is conveyed and no ordering
measured and recorded to some degree of accuracy. of the items is implied. Nominal scales are therefore qualitative
The values are approximate. Examples are distance, rather than quantitative. Religious preference, race, and sex are all
examples of nominal scales. Frequency distributions are usually

7 8
Chapter 1 What Is Statistics Chapter 1 What Is Statistics

used to analyze data measured on a nominal scale. The main would a difference between a score of 50 and a score
statistic computed is the mode. Variables measured on a nominal of 51. For the anxiety scale, it would not be valid to
scale are often referred to as categorical or qualitative variables. say that a person with a score of 30 was twice as
anxious as a person with a score of 15.
For example, all we can say is that 2 individuals are different in ii. the Celsius scale for temperature. Equal differences on
terms of variable A (e.g., they are of different race), but we cannot this scale represent equal differences in temperature,
say which one "has more" of the quality represented by the but a temperature of 30 degrees is not twice as warm
variable. Typical examples of nominal variables are gender, race, as one of 15 degrees.
color, city, etc iii. IQ – there is a meaningful difference between 109 and
110
Many statistical techniques can be applied to this data. Its measure
2. Ordinal – classifies data into categories that can be ranked; of central tendency is the arithmetic mean and their measures of
however, precise differences between the ranks do not exist. dispersion are range, the standard deviation and the variance.

Measurements with ordinal scales are ordered in the sense that


higher numbers represent higher values. However, the intervals 4. Ratio – possesses all the characteristics of interval
between the numbers are not necessarily equal. For example, measurement, and there exists a true zero. In addition, true ratios
exist when the same variable is measured on two different
i. on a five-point rating scale measuring attitudes toward members of the population.
gun control, the difference between a rating of 2 and a
rating of 3 may not represent the same difference as the Ratio is flexible, consequently all descriptive and inferential
difference between a rating of 4 and a rating of 5. There is techniques are applicable. It has the characteristic of nominal,
no "true" zero point for ordinal scales since the zero point is ordinal and interval plus it has a true zero point (e.g. weight and
chosen arbitrarily. The lowest point on the rating scale in height).
the example was arbitrarily chosen to be 1. It could just as
well have been 0 or -5. Examples are:

ii. Exam results, i. money,


ii. heights, weights of UiTM volleyball players.
iii. Socio-economic status. iii. The Kelvin scale of temperature. This scale has an
absolute zero. Thus, a temperature of 300 Kelvin is
iv. During a taste test of 4 colas, cola C was ranked number twice as high as a temperature of 150 Kelvin.
1, cola B was ranked number 2, cola A was ranked number
3, and cola D was ranked number 4. Example of measurement of scales (for comparison)

Nominal-level Ordinal-level Interval-level Ratio-level


3. Interval – ranks data, and precise differences between units of data data data data
measure do exist; however there is no meaningful zero. - Zip code - Grade - Score: MUET, - Height
- Gender(male, - Judging IQ, EQ - Weight
On interval measurement scales, one unit on the scale represents female) - Rating scale - Temperature - Time
the same magnitude on the trait or characteristic being measured - Eye color - Ranking of - ‘Money’:
across the whole range of the scale. Interval scales do not have a (blue, brown, items or Salary,
"true" zero point, however, and therefore it is not possible to make green, hazel) persons Wages
statements about how many times higher one score is than - Political - Socio - Age
another. For example, affiliation economic
- Religious status
i. if anxiety were measured on an interval scale, then a affiliation
difference between a score of 10 and a score of 11
would represent the same difference in anxiety, as

9 10
Chapter 1 What Is Statistics Chapter 1 What Is Statistics

1.8 SAMPLING TECHNIQUES


Sampling Techniques – The design of the sample
1. Sampling 1. Random (or Probabilistic) sample or Non-Random (Non –
In practice, a census is rarely taken because it is very expensive and Probabilistic) sample
time consuming. Furthermore, in many cases it is impossible to identify Depending on how a sample is drawn, it may be a random sample
each member of the target population. What you want to do is select a or a non-random sample. A random sample is a sample drawn in such a
sample of the population and make inferences from that sample, i.e. it way that each member of the population has some chance of being
involves generalizing from the sample to the population. selected in the sample. In a non-random sample, some members of the
population may not have any chance of being included in the sample.
2. Advantages of sampling
i. Its cuts the costs of research.
The more data to be collected and handled, the higher the 2. Random (or Probabilistic) sample
costs. Thus, sampling will result in fewer expenses.
Probability sampling method depends on:
ii. It cuts the time (period) of research. i. availability of sampling frame.
Sampling can produced adequate information in a shorter ii. spread of the population.
period. iii. Estimate the cost to survey members of the population
iv. Analysis of the data
iii. Accuracy of sample results.
With proper sampling techniques, the results using small A probability sample design should consider two important objectives:
sample provide information that’s almost as accurate as that
resulting from census. i. to minimize the sampling error of the estimates for the most
important survey variables.
iv. Suitable for destructive test. ii. to simultaneously minimizing the time and cost of conducting the
Tests often employed in testing product quality in which the survey.
product is destroyed. For example:
• testing or items produced, cars, cable We are going to discuss the following probabilistic sampling:
• testing of cholesterol in blood
Reduction of non-sampling errors (errors that occur from the survey i.e. 1. Simple Random Sampling (SRS)
non-response from respondents, respondents give false information, error SRS is used when true random sampling is essential. A list of
in installing the data, etc.) all elements of the populations is needed i.e. the sampling frame.
The target population must have the same characteristic
Basic statistical terms in sampling (homogenous)

1. Sampling Frame is a list of all the elements in a population A simple random sample can be selected using random numbers or
under study (e.g. I.C., name and address) drawing tickets.
2. Sampling Unit is the elements listed in the frame. i. The lottery method - every unit of the population is
3. The procedure for selecting the samples is known as sample identified by a number disc or slip. They are well
survey design. mixed and then the appropriate numbers of samples
4. Sampling error is the difference between a sample statistic and are chosen.
its corresponding population parameter.
5. Non-Sampling Error ii. Random Tables - these are tables
i. Occurs in collecting, recording and tabulating of data produced for sampling where random
(Coding and data entry errors, faulty measuring device) numbers are given for populations.
ii. It is a case of human mistakes, for example response errors
(non-response or giving false information from respondents) Example of SRS;
iii. An inadequate sampling frame
i. Using lottery (lucky draw) method
iv. Field errors

11 12
Chapter 1 What Is Statistics Chapter 1 What Is Statistics

You wanted to select 40 students from a class of 150;


Table of Random numbers
a. List down the 150 names.
b. Then, write the numbers from 1 to 150 on individual slips of
paper. Put all the slips in a box and mix them thoroughly.

c. Next, we draw one name randomly from the box. We repeat


this experiment forty times. The forty names correspond to the
numbers drawn make up the simple random sample.

ii. Using a random table

Our sample size is 15 and the population is 758 households. The


procedure to choose 15 households out of 758:
a. List all 758 households.
b. Next, refer to the random number table on the next page.
We can start anywhere in the table. Let’s pick line 3
columns 30 in the random table. 3 digits are chosen since
there are 758 households (3 digits). Then scan downward
as shown. At the same time, record all numbers that is less
than 758. The numbers are 069, 386, 539, 303, 097, 628,
458, 009, 036, 652, 694, 024, 178,578 and 404. (Refer the
page succeeding the random number table). The fifteen
names correspond to the numbers in the list of the
households; make up the simple random sample.
In practice, true random sampling is not possible unless there is a good
sampling frame and the population is fairly small.
Advantages and Disadvantages of SRS
Advantage Disadvantage
1. Don’t have to know the 1. Complete list is difficult to
characteristics of a population or obtain.
requires only a minimum 2. Always a chance of drawing a
knowledge of the population. misleading sample.
2. Tends to be completely 3. Needs a larger sample size.
representative, i.e. a pretty good
unbiased sample.
3. Simple to apply.
4. Analysis of data is reasonably
easy and has a sound
mathematical basis.

Procedure used by a researcher to obtain 15 random numbers


between 1 and 728 from Table of Random Numbers.

13 14
Chapter 1 What Is Statistics Chapter 1 What Is Statistics

2. Systematic Sampling (Interval random sampling, Quasi random 3. Stratified Sampling


sampling)
Stratified sampling also requires sampling frame. In a stratified
Systematic sampling is appropriate where very large numbers random sample, we first divide the population into subpopulations
are included in the target population and simple random sampling (non-overlapping groups), which are called strata. Elements within
is difficult, or where lists are already grouped into sections or each strata must have the same characteristic but between stratum
classes. A list of all elements of the populations is needed i.e. the must be different i.e. the population is homogenous. Then, one sample
sampling frame. The target population must have the same is selected from each of these strata using SRS or systematic
characteristic (homogenous) sampling. Usually, the sizes of the samples selected from different
strata are proportionate to the sizes of the subpopulations in these
From a random starting point, select every nth unit or individual
strata. The collection of all samples from all strata gives the stratified
until the intended sample size is obtained.
random sample.
• Steps for implementing systematic random sampling:
i. Divide the population size by the sample size and round
Reasons for Stratification
the result down to the nearest whole number, k.
Population Size N
Interval, i = = • Increase sample efficiency (i.e. lower sampling variance)
Sample Size n • To ensure that certain key subgroups will have sufficient sample
• Creation of strata permits the use of different sample designs for
ii. Use a random number table or a lottery method (or a different portions of the population.
similar device) to obtain a number, m, between 1 and k.

iii. Select for the sample those members of the population Example 1
that are numbered i, i+k, i+2k, . . . until the intended
sample size is obtained. Suppose you want to know the mean income of employees in a
factory. You recognized that the wages of managers are likely to be
different from those of production workers or supervisors. Therefore,
Example the population under study is heterogeneous in terms of their wages.
In a survey covering a population of 10,000 it may be decided In order to get a sample that represents the whole population, we
to take a sample of 250. categorized the employees based on their jobs characteristics.
i. The sampling interval will be 10,000/250=40.
ii. A randomly selected number between 1 and 40 is
chosen (using lottery method or random table or No. of Employees No. of Employees
Job
mathematical/statistical software). in the population in the sample
iii. The sampling series then becomes 3, (3+40) 43, (40+3) Managers 5 1
83, etc. until we reach 250 elements. Supervisors 10 3
Advantages and Disadvantages of Systematic Sampling Production
185 46
Workers
Advantage Disadvantage
1. It is usually quicker/faster 1. It is not perfectly random since N= 200 n = 50
than SRS. the first number chosen pre-
2. easier to draw, without determines the other elements.
Number of employees in the sample for each stratum must be
mistakes. 2. It is not suitable if there are
proportional to the numbers of employees in the population for each
3. Simple to apply. some pattern occurring at regular
stratum.
4. More precise than simple intervals or prearranged list that
random sampling as more evenly coincides with the sampling Samples from each job categories or strata can be selected by simple
spread over population. interval in the ordering of the random or systematic sampling.
population elements.

15 16
Chapter 1 What Is Statistics Chapter 1 What Is Statistics

Example 2 4. Clusters Sampling (Area Sampling, One stage)

A town council is planning to build a swimming pool. A planner of the In cluster sampling, the whole population is first divided into
town needs to sample voter sentiment on using public funds to build (usually geographically) groups (preexisting or natural) called clusters.
the pool. We divide the voters into three strata: upper-income, middle- Elements within each cluster should be as different as possible but
income and low-income, in which, 10% of the population is from between clusters should be the same. Each cluster is representative of
upper-income group, 70% is from middle-income and 20% is from the population. List of all clusters is the sampling frame for this
lower income. population. Then a random sample of clusters is selected (using SRS,
systematic or stratified). Finally, all elements from each of the selected
If the sample size needed were 500, the number or upper-income,
clusters are selected.
middle-income, and low-income individuals sampled would be,
respectively 50 (10% of 500), 350 (70% of 500), 100(20% of 500).
Example 3
Advantages and Disadvantages of Stratified Sampling
The researcher is interested in sampling the attitudes of persons living
Advantages Disadvantages
in a large city. He finds out that it is difficult to get a list of all the
1. Can be sure no relevant group 1. Need to know about the people living in the city that can be divided into 9 sections. The
is omitted. population. researcher has the list of all 9 sections. He decided to select at random
2 sections as the sample. He will survey all of the people in these two
2. Greater precision possible with 2. Proportions must be known.
areas.
lower sample size.
3. Problems if strata not clearly
3. If data of known precision is defined. Advantages and Disadvantages of Cluster Sampling
wanted for certain subdivisions
4. Problems if strata not clearly Advantages Disadvantages
of the population, then
defined.
each subdivision or strata can 1. Less cost; for example 1. Clusters may not be
be treated as a population. 5. Analysis is (or can be) quite reduced field costs. representative of whole
complicated. population but may be too alike.
4. Administratively easy - 2. Applicable where no complete
Administrative convenience 6. If the variable is somewhat list of units is available 2. Analysis more complicated than
may dictate its use, so that complex or ambiguous (such (special lists only need be for simple random sampling.
each field office can supervise as beliefs, attitudes, or formed for clusters).
3. Elements in a cluster may not
one stratum. prejudices), it is difficult to
3. It can simplify fieldwork. have the same variations in
separate individuals into the
5. Sampling problems may differ characteristics as elements
subgroups according to these 4. It is convenient.
markedly within a population selected individually from a
variables.
(e.g. people in prisons and population
people outside). 7. Difficulty in locating cases, i.e.
if there are many variables of Differences between cluster and stratified sampling
6. Stratification will almost
interest, dividing a large
certainly produce a gain in
population into representative Cluster Stratified
precision in the estimates of
subgroups requires a great 1. The cluster is treated as a 1. The analysis is done on
the whole population, because
deal of effort. sampling unit so analysis is elements within strata.
a heterogeneous population is
split into fairly homogeneous done on a population of
strata. clusters. 2. A random sample is
drawn from each of the
Rationales for using stratified sampling over SRS are: 2. Only the selected clusters are strata.
i. the cost per observation in the survey may be reduced studied.
ii. estimates of the population parameters may be wanted for 3. The main objective is
each sub-population 3. The main objective is to increased precision.
iii. increased accuracy at a given cost. reduce costs by increasing
sample efficiency.

17 18
Chapter 1 What Is Statistics Chapter 1 What Is Statistics

5. Multistage sampling Advantages and Disadvantages of Multistage Sampling


This technique is `an extension’ of cluster sampling procedure. Advantages Disadvantages
Statisticians sometimes refer to cluster sampling as one stage 1. Focuses on important 1. can be difficult to select
sampling method. subpopulation but ignore relevant stratification
irrelevant ones. variables.
With “large'' populations it is often necessary to carry out the sampling
in 2 or more stages until the final number of sampling units is reached.
2. improves the accuracy of 2. not useful when there no
If a survey covers a large geographical area, several stages of estimation. homogenous subgroup.
sampling are often involved. Suppose a survey done in a particular 3. can be expensive.
state. Since the state covers too large an area, a sample of districts 3. efficient.
within the state is selected. Then, a sample of mukim within each 4. requires accurate information
selected district is chosen. about the run.

Example 4 Remark:
A researcher wanted to survey the attitudes towards English Language
of secondary school children in Kedah. One of the possibilities of This type of sampling is particularly useful where the populations under
choosing the sample is by: survey are widely dispersed, and it would be impractical to take a simple
random sample. It is different from stratified sampling in that you start
(i) sample of education authorities (Pejabat Pendidikan Daerah with naturally occurring clusters, you don't develop them.
– PPD)
(ii) sample of schools in each PPD.
(iv) sample of classes in each school. 3. Non-probability sampling
(v) sample of pupils in each class.
When probability sampling is impossible, non-probability sampling
Example 5 is an alternative procedure for taking sample from targeted population.
This method is also appropriate under some circumstances, for example
Clusters are formed by boxes of components coming off production when the focus of the researcher is on cultural data that require experts in
lines, one cluster of components per line. If all the lines have the respective field.
approximately the same rate of defects, then the components in each
cluster (box) are as variable with respect to quality as the population There are many types of non-probability sampling but we are
as a whole. In this situation, a good estimate of the proportion of going to discuss two only, quota sampling and convenience sampling.
defectives produced could be obtained from one or two clusters.
1. Quota Sampling
Example 6
Quota sampling divides the population into subgroups, which are then
In the selection of forests plots to estimate proportion of diseased sampled in proportion to their occurrence in the population. Sampling
rubber trees, if there is variability in the density of diseased trees frame is not needed. The key to this is estimating the percentage of
across the rubber estate, then many small plots (clusters), randomly folks in each subgroup. The researcher may choose any respondent he
or systematically located, would be desirable. However, to randomly consider appropriate for his research.
locate a plot in an estate is quite time-consuming, and once it is
located, sampling many trees in that one plot is economically This type of sampling has some similarity to stratified sampling,
desirable. Thus, many small plots are advantageous for controlling however the selection of the correspondents within strata is non-
variability, but a few large plots are advantageous economically. A random, i.e. you don't care how you got the people as long as the
balance between size and number of plots must be achieved. Pilot quota groups are filled appropriately. They could all be friends of yours
surveys with various plot sizes might help point the researcher in the in the quota group and that would be okay.
correct direction. This type of sampling is suitable for descriptive survey such as opinions
of people about a certain issue or acceptance on a new product in
marketing research. When the nature of the issues to be investigated
means that it is important to give respondents from particular
subgroups a chance of being selected which is disproportionate to their

19 20
Chapter 1 What Is Statistics Chapter 1 What Is Statistics

numerical strength e.g. where it is important to include a significant Comparison between random sample and non-random sample
number of respondents from minority populations, female Random sample Non random sample
entrepreneurs etc. 1. Each member of the 1. Members are selected from the
population has a known non- population in some non-
Statisticians criticize it for theoretical weakness. Market and opinion zero probability of being random manner.
researchers defend it for its cheapness and convenience. selected.
2. Non probability includes
Advantages and Disadvantages of Quota Sampling
2. Probability methods include convenience sampling,
Advantages Disadvantages
SRS, Systematic, Stratified judgmental sampling, quota
1. Less costly. 1. Poor or biased judgment can
and Cluster sampling. sampling and snowball
lead to a non-representative
sampling.
2. Administratively easy. sample.
3. Sampling error can be
calculated. 3. The degree to which the
3. Quick reply. 2. Difficult to assess their validity.
sample differs from the
4. Is more complex, more time- population remains unknown.
4. Does not need any sampling 3. Estimates of standard
consuming and usually more
frame. deviations are not possible.
costly
4. Within quota the sampling may
be unrepresentative (e.g. all
young, attractive females). 1.9 DATA COLLECTION METHODS
5. Widely used social class There are several methods used to collect data in survey research. Each
grouping is subjective. has its own advantages and disadvantages. The choice of methods will
depend upon the type and size of the audience, the purpose of the study,
6. Checking of fieldwork is timeline, budget and staff available, etc. This fact sheet provides
difficult. information to help make the appropriate choice of data collection method
for survey research and evaluation.

2. Convenience sampling 1. Personal Interviews (Face-to-face interview)


This method employs an interviewer who meets in-person with
This nonprobability method is often used during preliminary respondents either spontaneously (such as in a public place) or via
research efforts to get a gross estimate of the results, without scheduled appointment.
incurring the cost or time required to select a random sample. A
convenience sample is a group of people who you can conveniently Advantages and Disadvantages of personal interviews
locate and administer the survey. Unfortunately, the convenience Advantage Disadvantage
sample may not generalize to the target population. Convenience 1. Quick and possibly high 1. Time consuming if interviews
sample can be representative of the population if the researcher response possible. are conducted at subjects’
investigates the characteristics of the population. It is use in locations.
exploratory research where the researcher is interested in getting an 2. Interviews can be done in one
inexpensive approximation of the truth. location. 2. Respondents may be more
likely to give socially
3. Face-to-face contact offers acceptable answers.
personal element and trust.
3. Respondents may not always
4. Can interview groups of people be available for interviews.
(e.g. families) at one time.
4. The travel costs of the
5. Can reach inaccessible interviewer could be high.
audiences.

21 22
Chapter 1 What Is Statistics Chapter 1 What Is Statistics

5. Facial expression or statement Remark:


by the interviewer can affect Mail surveys can be preferable when:
the response obtained.
• survey costs are a concern
• intended respondents have busy schedules
2. Telephone Interviews • questions are sensitive and of a personal nature
This method involves calling respondents via telephone, typically on
• the survey is lengthy or complex
a spontaneous (as perceived by the respondent) basis, although it can also
be done via scheduled appointment in consideration of the respondent’s
schedule. It is also possible to use an automated system where users reply A telephone survey may be preferable when:
via touch-tone telephone to a computer-based interview system.
• survey results are needed quickly
Advantages and Disadvantages of telephone interviews • respondents need to be qualified
Advantage Disadvantage • the survey population is small
1. Quick and possibly high 1. Not everyone has a telephone.
response possible. 3. Direct Observations
2. Caller ID, answering machines Direct observations method is usually used by scientists; for
2. Can be inexpensive if dialing is limit access. Must establish example biologists studying an animal life cycle, their habitats and etc.
local and staff/volunteers are credibility in competition with However, this method is not limited to only this kind of studies. It is also
available. telemarketers. being used in marketing research when the behavior of customers on
certain product is being observed without they knowing it. At this stage,
3. Only people with telephones an observation must be carried out very carefully such that if the
can be interviewed. respondents know that they are being observed, the observers may not
get the actual behavior of its respondents.
4. Limited number of questions
can be asked Advantages and Disadvantages of Direct Observation
Advantages Disadvantages
1. If well executed, best 1. Usually expensive.
for obtaining data
3. Mail Questionnaires about behavior of 2. Needs well-qualified staff.
This method uses a printed questionnaire that is mailed or individuals and
delivered to respondents and permits them to respond at will and return groups 3. Observation may affect behavior being
the survey via mail. studied.

Advantages and Disadvantages of mail Questionnaire 4. Access of information from objective


Advantages Disadvantages sources that are not affected by the
1. Requires minimum staff to 1. Takes time, requires follow-up respondents themselves.
prepare & mail. to get responses.

2. Relatively inexpensive. 2. Requires literacy.

3. No gestures from the 3. Can get buried in junk mail.


1.10 Designing a questionnaire
Definition of questionnaire:
interviewer to effect the
response obtained. 4. Can be difficult to get accurate Questionnaire is a written instrument that contains a series of questions or
mailing lists. items that attempt to collect information on a particular topic.
Questionnaires may be handed out personally by the researcher to his
respondents or may be posted though the mail.

23 24
Chapter 1 What Is Statistics Chapter 1 What Is Statistics

There is a lot to consider when developing a questionnaire. The following


is a list of some key points to think about when designing your
Practice Exercises
questionnaire:
1. Define the statistical terms.
a) Statistics
• Is the introduction informative? Does it stimulate respondent
b) Descriptive statistics
interest?
c) Pilot study
• Are the words simple, direct and familiar to all respondents?
d) Variables
• Do the questions read well? Did the overall questionnaire flow?
e) Continuous variables
• Are the questions clear and as specific as possible?
f) Population
• Does the questionnaire begin with easy and interesting questions?
g) Primary data
• Does the question specify a time reference?
h) Probability sampling
• Are any of the questions double-barreled?
i) Sampling error
• Are any questions leading or loaded?
j) Census study
• Should the questions be open-ended or close-ended? If the
questions are close-ended are the response categories mutually
2. In each part below, decide whether the specified study would be
exclusive and exhaustive?
descriptive or inferential. Provide a reason for each of your
• Are the questions applicable to all respondents?
answers.
The introduction of the questionnaire is very important because it outlines a) A tyre manufacturer wants to estimate the average life of a
the pertinent information about the survey being conducted. The new type of steel-belted radial.
introduction should: b) A sports writer plans to list the winning times for all
swimming events in the 2000 Olympics.
• provide the title or subject of the survey; c) A politician obtains the exact number of votes that were cast
• identify the sponsor; for her opponent in 2004.
• explain the purpose of the survey; d) A medical researcher tests an anticancer drug that may have
• request the respondent's co-operation; and harmful side effects.
• inform the respondent about confidentiality issues, the status of the e) A candidate for Majlis Perwakilan Pelajar (MPP) estimates the
survey (voluntary or mandatory) and any existing data-sharing percentage of voters that will vote for him in the upcoming
agreements with other organizations. MPP’s election.
f) An economist estimates the average income of all Kedah
Respondents frequently question the value of the gathered information to residents.
themselves and to others. Therefore, be sure to explain why it is important g) The owner of a small business determines the average salary
to complete the questionnaire, how the information will be used, and how of her 20 employees.
respondents can access the results. Ensuring that respondents understand
the value of their information is vital in undertaking a survey. 3. Classify each as nominal level, ordinal level, interval level or ratio level
data.
The opening questions of any survey questionnaire should establish the
respondents' confidence in their ability to answer the remaining questions. a) Horsepower of motorcycle engines.
If necessary, the opening questions should help determine the respondent b) Ratings of newscasts in Malaysia (poor, fair good, excellent).
as a member of the survey population. c) Temperature of automatic hand dryer.
d) Time required by car drivers to complete a course.
A good questionnaire ends with a comments section that allows the e) Salaries of cashiers of The Tops grocery stores.
respondent to record any other issues not covered by the questionnaire. f) Marital status of respondents to a survey on current
This is one way of avoiding any frustration on the part of the respondent, accounts.
as well as allowing them to express any thoughts, questions or concerns g) Ages of students enrolled in a short computer course.
they might have. Lastly, there should be a message at the end thanking h) Weights of beef cattle fed a special diet.
the respondents for their time and patience in completing the i) Ranking of weight lifters of Malaysia for the year 2002.
questionnaire. j) Pages in the telephone book for the town of Alor Setar.
k) The number of absences per semester a student has.

25 26
Chapter 1 What Is Statistics Chapter 1 What Is Statistics

l) Data that can be classified according to color. e) The amount of liters required to fill up a car.
m) Number of quizzes given in a statistics course. 7. Classify the level of measurement of the following:
n) Classification of UiTM students according to programs. a) The number of children in a family.
b) The weight of a teenager.
4. Fill in the blanks with appropriate word/statistical term. c) The temperature in summer.
a) Measurements or observations for a variable is called d) The types of mammal located in a zoo
_____________ e) Sizes of a t-shirt
b) A person’s skin colour is an example of a _____________ f) Citizenship.
data. g) Year of birth.
c) The collection, organizing, summarizing and presentation of
data is called ______________ statistics. 8. State the best sampling method employ/to be employed for the
d) The weight of school children is classified as _____________in situation below.
the level of measurement. a) A shopkeeper wish to determine whether the crate of
e) The age of e-PJJ student is a _______________ variable. mandarins sent to his premise is still in good condition. He
f) A sampling frame is not needed in ____________ sampling. closes his eye and chooses at random 6 crates.
g) One disadvantage of ____________as a method of data b) A telephone company in northern region wanted to survey the
collection is the low response rate. customers’ satisfaction towards their services by selecting
h) ____________ data can be obtain from other sources such as every 100 names starting from the third name in their
newspaper, economic journal and annual report of company. telephone book and called them.
i) A ____________ is a subset of a population. c) A survey was conducted by choosing 20 males and 35 females
j) A _____________ is the numerical characteristics of a at random according to gender to answer a set of
population. questionnaire on “Should smoking be banned in public”
d) A nation wide survey on the ‘Anti Dadah Program’ was
5. State whether the statements below is true or false. conducted by interviewing all the people in a chosen
a) Data collection involving the whole population is called census residential area at random.
study. e) In a project, the researcher instructs the interviewer to find
b) A sample is a collection of all items or measurement under any 4 people between the age of 20 to 25 and living in certain
study. area to answer the questionnaire.
c) A continuous variable is a variable that can be obtained by
counting. 9. State the best data collection method for the situation below.
d) The process by which the sample includes elements from each a) A manufacturer of certain product would like to know the
of the homogeneous segment is called stratified sampling. response of their product users in Malaysia.
e) One of the advantages of personal interview is having b) A lecturer in an institution would like to survey students in his
response instantly. campus regarding a course.
f) Opinions of an issue, i.e., strongly agree, agree, disagree, and c) A group of researchers wanted to find out about the
strongly disagree, is the ordinal level of measurement. accounting system used by small-medium industry in the Perlis
g) Generalizing from samples to population is an area of and Kedah. They obtained the list of SMI’s with their phone
inferential statistics. number.
h) Qualitative data are can be represented by using charts and 10. In order to overcome water problem in Taman Kelana, the Water
bar graphs. Supply Department conducted a survey to estimate the average
i) The highest level of measurement is the nominal level. water consumption per household in the residential area. The
j) In simple random sampling, every item in the population has Department decides to choose a few blocks only in the area and
the same chance of being selected. then the pre-selected respondents are selected at random and will
be interviewed by an interviewer.
6. State the type of variable for each of the statement below.( In the study above, state
Qualitative, discrete quantitative, continuous quantitative) a. the population
a) The number of times per week students went to the library. b. the variable
b) The types of shampoo brand used by a group of women. c. the characteristic of the data collected
c) The birth weight of newborn at a particular hospital. d. the method of sampling used
d) Number of rooms available in a hotel. e. the method of data collection used.

27 28
Chapter 1 What Is Statistics Chapter 1 What Is Statistics

h) ratio
11. A firm has 1300 workers. 507 of them were women. Some of them i) ordinal
workers work in the manufacturing department, storage section, j) ratio
and administration department. The firm wishes to conduct a k) ratio
survey on the relationship between job satisfaction and incentives l) ordinal
given. Each worker is assigned to a number and a random sample m) ratio
of 200 workers was selected. n) ordinal
a) State the population of the survey.
b) State the sampling frame. 4.
c) State the variable/s under study. a) Data
d) Name the method of sampling used. b) Qualitative
e) Suggest other sampling method that can also be used for the c) Descriptive
above population. d) Ratio
f) Which data collection method is the most appropriate? Give two e) continuous quantitative
advantages of the method chosen. f) cluster
g) mail questionnaire
12. Faculty of Mathematics has conducted a seminar on ICT usage in h) secondary
teaching. A number of 250 lecturers attended the seminar. 130 of i) sample
the participants were male and the rest were female. At the end of j) parameter
the seminar, the faculty wanted to estimate the proportion of
lecturers had used ICT in their classroom. For the study, they 5.
wanted to obtain a sample of 50 lecturers. a) T
a) State the population of the study. b) F
b) State the sampling frame of the study. c) F
c) Suggest an appropriate sampling technique to be used. Give d) T
two reasons for your answer. e) T
d) Elaborate the technique that you choose in (c). f) F
e) Suggest one data collection method that can be used to gather g) T
information from the sample. Give two reasons. h) T
i) F
j) T
Answers to Practice Exercises
6
2. a) discrete
a) Inferential b) qualitative
b) Descriptive c) continuous
c) Descriptive d) discrete
d) inferential e) continuous
e) inferential
f) inferential 7
g) Descriptive a) ratio
b) ratio
3. c) interval
a) interval d) nominal
b) ordinal e) ordinal
c) interval f) nominal
d) ratio g) interval
e) ratio
f) nominal 8.
g) ratio a) Simple random sampling

29 30
Chapter 1 What Is Statistics

b) Systematics random sampling


c) Stratified sampling
d) Cluster sampling
e) Quota sampling.
9.
a) Mail questionaire
b) Personal interview
c) Telephone interview

10.
a) residents of Taman Kelana
b) average water consumption per household
c) primary data
d) multistage
e) personal interview

11.
a) 1300 workers
b) list of all workers
c) job satisfaction, incentive given
d) SRS
e) Systematic
f) questionnaire

12.
a) 250 lecturers
b) ist of all lecturers attending the seminar
c) stratified; They can be group according to sex :male and
female lecturers;
d) Step 1 : Obtain the proportion for each group. ( 50/250 x
100 = 20%)
Step 2 : Get the proportionate number accordingly for each
group. (male : 130 x 20% = 26, female : 120 x 20%
= 24. A total of 50 samples.
Step 3 : From each group, chose at random the respective
number of samples by using simple random or
systematic .
e) Self administered questionnaire.

31

You might also like