0% found this document useful (0 votes)
29 views

Lecture 3 - Types of Data, Data Collection and Sampling: Notes

This document provides an overview of descriptive statistics, types of data, methods of data collection and sampling. It defines key terms like variables, data types (numerical, nominal, ordinal), and sampling methods (simple random, stratified, cluster). Sources of data are described as published, observational, experimental, or from surveys. Factors in sampling like sample size, sampling vs non-sampling errors are also summarized.

Uploaded by

Lovely Princess
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Lecture 3 - Types of Data, Data Collection and Sampling: Notes

This document provides an overview of descriptive statistics, types of data, methods of data collection and sampling. It defines key terms like variables, data types (numerical, nominal, ordinal), and sampling methods (simple random, stratified, cluster). Sources of data are described as published, observational, experimental, or from surveys. Factors in sampling like sample size, sampling vs non-sampling errors are also summarized.

Uploaded by

Lovely Princess
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Lecture 3 – Types of Data, Data Collection and Sampling

NOTES
Descriptive statistics
 Involves arranging, summarising, and presenting a set of data in such a way that
useful information is produced.
 Its methods make use of graphical techniques and numerical descriptive
measure (averages) to summarise and present the data.
Key definitions
 Variable
o some characteristic of population or sample
o a variable is typically denoted with a capital letter: X, Y, Z…
o the values of a variable are the range of possible values for that variable
 Data
o The observed values of a variable (e.g. student marks: 14, 16…)
Types of data
 Numerical data – can be treated as ordinal or nominal
o The values of numerical data are real numbers
o Arithmetic operations can be performed on numerical data, thus its
meaningful to talk about 2*height, or price +$1
o Numerical data are also called quantitative
 Nominal data – cannot be treated as ordinal or numerical
o The values (arbitrary numbers) of nominal data are categories (e.g.
responses to questions about marital status: single = 1, married = 2)
o These data are categorical in nature; arithmetic operations don’t make any
sense
o All we can calculate is the proportion of data that falls into each category
o Nominal data are also called qualitative or categorical
 Ordinal data – can be treated as nominal but not as numerical
o Ordinal data appear to be categorical in nature, but their values have an
order; a ranking to them (e.g. university course evaluating system: poor =
1, fair = 2)
o While it’s still not meaningful to do arithmetic on this data, we can say
things like excellent > poor < very good
o Order is maintained no matter what numeric values are assigned to each
category.
o Ordinal data are also called ranked
Methods of collecting data
 Statistics is a tool for converting data into useful information
 Sources of data
o Published data
o Data collected from observational studies (observation data)
o Data collected from experimental studies (experimental data)
o Data collected from surveys (survey data)
 Published data
o This type of data has already been collected by an organisation or by a
statistical agency and made available for others to use.
o This is often a preferred source of data due to low cost and convenience.
o Published data typically comes in digital form (or, if it’s old, as printed
material, disks, or tapes).
o Types of published t=data may be primary data and secondary data
 Primary data
o Data published by the organisation that has collected it is called primary
data.
o E.g. data published by the ABS
 Secondary data
o Data published by an organisation different from the one that was
originally collected and published is called secondary data.
o E.g. the OECD compiles data from national sources for OECD countries
 Observational and experimental data
o When published data is unavailable, one needs to conduct a study to
generate the data.
o Observational study is one in which measurements representing a
variable of interest are observed and recorded, without controlling any
factor that might influence their values. (e.g. measuring height of a tree
over time)
o Experimental study is one in which measurements representing a
variable of interest are observed and recorded, while controlling factors
that might influence their values. (e.g. measuring yield of difference type of
rice using a certain fertilizer)
 Surveys
o A survey solicits information from survey participants
o The response rate (i.e. the proportion of selected participants who
completed the survey) is a key survey parameter.
o Surveys may be administered in a variety of ways (e.g. personal interview,
telephone interview)
 Sampling
o If the data are collected from the whole population, we have a census.
o However, a census is expensive! Thus, statistical inference permits us to
draw conclusions about a population from a sample.

2
o Sampling means selecting a sub-set of a whole population. This is often
done instead of census for a number of reasons including cost and
practicality.
o Target population is the population about which we want to draw
inferences.
o Sampled population is the actual population from which the sample has
been drawn.
o In any case, the sample population and the target population should be
similar to one another, otherwise the sample selected may become self-
selected.
o E.g. a survey of opinion on a radio talk-back show topic has a target
population of all radio listeners. The sample selected are those listeners
who are interested in the topic and managed to contact the radio station.
The sampled population are those listeners who are interested in the
topic.
 Sampling plans
o A sampling plan is just a method or procedure for specifying how a
sample will be taken from a population.
o Most commonly used sampling plans are simple random sampling,
stratified random sampling and cluster sampling.
 Simple random sampling
o a simple random sample is a sample selected in such a way that every
possible sample of the same size is equally likely to be chosen. (e.g.
drawing three names from a hat containing all the names of the
students in a class of 200, any group of three names is as equally
likely as picking any other group of three names)
o to conduct simple random sampling
- assign a number to each element of the chosen population (or use
already given numbers)
- randomly select the sample numbers (members) using a software
 Stratified random sampling
o A stratified random sample is obtained by diving the population into
mutually exclusive sets (or strata), and then drawing simple random
sample from each stratum.
o With this procedure we can acquire information or make inferences about
- The whole population
- Each stratum
- The relationships among strata
 Cluster sampling
o Cluster sample is a simple random sample or groups or clusters of
elements (vs. a simple random sample consists of individual objects).
o This procedure is useful when

3
- It is difficult and costly to develop a complete list of the population
members (making it difficult to develop a simple random sampling
procedure).
- The population members are widely dispersed geographically
- Cluster sampling may increase sampling error, because of probable
similarities among cluster members. (e.g. to draw a cluster sample
of residents in Adelaide, first select a number of streets in the
Adelaide city area using a simple random sampling method and
then include all residents in those selected streets to form the
cluster sample)
 Sample size
o The larger the sample size, the more accurate we can expect the sample
estimates to be.
o The closer the sample estimate to the unknown population parameter we
wish to estimate.
 Sampling and non-sampling errors
o two major types of errors can arise when a sampling procedure is
performed
- sampling error
- non-sampling error
 Sampling errors
o Sampling error refers to differences between the sample and the
population, because of the specific observations that happen to be
selected.
o E.g. estimating a population mean using a sample mean (sampling error =
sample mean – population mean)
o Increasing the sample size will of course reduce the sampling error.
 Non-sampling errors
o Mistakes made along the process of data acquisition
o Sample observations being selected improperly
o There are three types of non-sampling errors
- Errors in data acquisition
- Non-response errors
- Selection bias
o Increasing the sample size will NOT reduce this type of error.
 Errors in data acquisition
o Errors in data acquisition arises from the recording of incorrect responses,
due to
- Incorrect measurements being taken because of faulty equipment
- Mistakes made during transcription from primary sources
- Inaccurate recording of data due to misinterpretation of terms
- Inaccurate responses to questions concerning sensitive issues
- Clerical mistakes when transferring/recording data

4
 Non-response error
o Non-response error refers to error (or bias) introduced when responses
are mot obtained from some members of the sample survey due to refusal
by them to response for some reason.
o The sample observations that are collected may not be representative of
the target population.
o The response rate (i.e. the proportion of all people selected who complete
the survey) is a key survey parameter and helps in the understanding in
the validity of the survey.
 Selection bias
o Selection bias occurs when the sampling plan is such that some members
of the target population cannot possibly be selected for inclusion in the
sample. (e.g. selecting a sample of households in NSW using telephone
numbers listed in NSW White Pages, as not every NSW household
telephone number is listed in the White Pages)

You might also like