0% found this document useful (0 votes)
7 views32 pages

1 IntroductiontoStatisticsB

The document provides an overview of statistics, including definitions, types of data, and methods of data collection. It discusses the importance of defining populations, the distinction between primary and secondary data, and various sampling methods such as random, stratified, and cluster sampling. Additionally, it highlights the significance of data analysis and statistical inference in drawing conclusions from data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views32 pages

1 IntroductiontoStatisticsB

The document provides an overview of statistics, including definitions, types of data, and methods of data collection. It discusses the importance of defining populations, the distinction between primary and secondary data, and various sampling methods such as random, stratified, and cluster sampling. Additionally, it highlights the significance of data analysis and statistical inference in drawing conclusions from data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Statistics

Data Sources

Michael Carr

Dublin Institute of Technology, Ireland

January 2015

Michael Carr Statistics


Statistics
Data Sources

Outline

1. Statistics
Introduction
Data Types

2. Data Sources
Simple Random Samples

Michael Carr Statistics


Statistics Introduction
Data Sources Introduction

Introduction

Statistics
The word statistics may refer to the subject ’statistics’ or to
collections of sets of figures or data. A statistic is the result of a
calculation based on a set of data.
Statistics as a subject, is used to explain and simplify a set of
figures making them easier to understand and enabling conclusions
to be drawn.

Michael Carr Statistics


Statistics Introduction
Data Sources Introduction

Statistics in practise is applied to study the effectiveness of medical


treatments, the reaction of consumers to television advertising, the
popularity of political leaders, and much more.
For example when processing examination results for large groups
it is standard practice to calculate the arithmetic mean and
standard deviation. This draws out patterns in the data, rather
than obscuring them and allows the performance of the students in
each subject to be easily noticed. Statistics includes:

Michael Carr Statistics


Statistics Introduction
Data Sources Introduction

1. Data analysis which consists of methods and ideas for


organising and describing data using graphs, tables and
numerical summaries.
2. Data production consists of selecting samples and designing
experiments to produce data that can give clear answers to
specific questions.
3. Statistical inference draws conclusions from sample data and
indicates how trustworthy the conclusions are. It involves
1 Estimation of the unknown parameters of the mathematical
model.
2 Testing of Hypothesis about the mathematical model.

Michael Carr Statistics


Statistics Introduction
Data Sources Introduction

Probability A knowledge of probability is necessary for statistical


inference.
Population In statistical usage the term ’population’ means the
entire group of objects or individuals under consideration in any
statistical exercise. It is essential to clearly and precisely define the
population under consideration at the start of any problem.

Michael Carr Statistics


Statistics Introduction
Data Sources Introduction

Exercise 1 Define exactly the population of people you would


sample if your are conducting an Opinion Poll for an Irish General
Election ?

Michael Carr Statistics


Statistics Introduction
Data Sources Introduction

Example 1 a study on all Irish owned cars is not the same as a


study of all cars on Irish roads.
Example 2 when performing a study of third year engineering
students at D.I.T. Bolton Street one must clarify which if any of
the following groups are included
1. Repeat students that are officially attending the course a
second time.
2. Repeat students that are unofficially attending the course.
3. Repeat students that are not attending the course a second
time.
4. Part-time students attending the same course at night.
5. Foreign students such as Erasmus students.

Michael Carr Statistics


Statistics Introduction
Data Sources Introduction

Qualitative data Data which is not easily quantified or measured.


For example: female or male, voter or non-voter, travellers by bus
or travellers by car, etc., and it limits the type of statistical analysis
you can undertake.
Quantitative data Data which involves measurement.Eg Height,
Weight, Earnings.

Michael Carr Statistics


Statistics Introduction
Data Sources Introduction

Discrete data Data which can be counted and results in integer


values only i.e. not fractions and not decimal but whole numbers.
Example The number of children in a family, the number of
students in a class, the number of rooms in a house, the number of
phone calls coming into a switchboard in intervals of 5 minutes,
etc.,
The number of children in a family can change by adding one or
two or three or by subtracting one, and can never have a value like
2.5.

Michael Carr Statistics


Statistics Introduction
Data Sources Introduction

Continuous data Data which changes gradually rather than in


discrete or integer amounts. For example the height of a person
does not have to be either 1 metre or 2 metres (integer values) but
can be 1.75 metres and if the person grows a little they can be
1.755 m and later 1.76 m. Weights, temperatures and distances
are all examples of continuous data.
Continuous data often looks like discrete data. For example age is
a continuous variable but is often rounded down to the nearest
year and hence can look like discrete data.
Continuous data is usually divided into groups or intervals. For
example a study of examination results might group results as
follows:
0-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-100. Where say
the interval 30-40 means 30 ≤ 40, etc.

Michael Carr Statistics


Statistics Introduction
Data Sources Introduction

Finite data Data that has a finite (not infinity) upper limit. For
example if a family have five children and you are asked to find the
probability that three of the children are male. Then the number
of males in the family can take on the values 0, 1, 2, 3, 4, 5. This
is a finite set of values and has a finite upper limit of 5. Finite data
can be either discrete or continuous.
Infinite data Data that has no finite upper limit. For example the
number of phone calls coming into a switchboard in intervals of
one hour has no upper limit. Levels of data measurement (Ref:
Rouncefield & Holmes Chapter 16)

Michael Carr Statistics


Statistics Introduction
Data Sources Introduction

Categorical data
1. In nominal scales only classification and counting are possible.
For example when considering work status there is no order
associated with: retired, in-school, keeping house, other; e.g.
gender, nationality, colour of eyes, hair etc.
2. In ordinal scales items may be ranked in order of preference.
For example mark 1, 2, and 3 in order of your preference to
indicate your preferred time slot for a certain class and for
example job satisfaction can be listed as: 1 very satisfied, 2
moderately satisfied, etc i.e. involves order.

Michael Carr Statistics


Statistics Introduction
Data Sources Introduction

Continuous data
1. Data which changes gradually rather than in discrete amounts
or integer amounts
2. An interval scale ranks items and gives them a numerical
value. The position of zero is fixed by convention, rather than
at an absolute zero. For example the difference between 10 C
and 20 C is the same as the difference between 20 C and 30
C, however a temperature of 20 C is not double that of 10 C.
3. A ratio scale is an interval scale, which has an absolute zero
such as temperature, weight, area, height, etc. This means
that ratios such as twice as heavy and so on can be calculated,
and an actual income of 50,000 is twice an income of 25,000.

Michael Carr Statistics


Statistics
Simple Random Samples
Data Sources

Primary data This is data collected by the investigator for solving


the current problem or question. The sources of primary data are:
1. Questionnaires
2. Interviews (i.e. asking personally for the required information.)
3. Direct observation (e.g. counting the number of cars in a car
park.).

Michael Carr Statistics


Statistics
Simple Random Samples
Data Sources

Exercise 1 Discuss two advantages and disadvantages of each of


the above sources of data.

Michael Carr Statistics


Statistics
Simple Random Samples
Data Sources

Secondary data This is data, already in existence, that satisfies the


objectives of the current project. Secondary data can sometimes
be obtained from published sources (such as census data from the
Central Statistics Office publications) or existing records (such as
college records of students or staff).

Michael Carr Statistics


Statistics
Simple Random Samples
Data Sources

List some useful sources of secondary data

Michael Carr Statistics


Statistics
Simple Random Samples
Data Sources

Sampling Methods
When dealing with a large population it is not always practical to
include the entire population. Instead we must produce a
representative sample of the population.

Michael Carr Statistics


Statistics
Simple Random Samples
Data Sources

Sample Frame A list of the entire population from which items can
be selected to form a sample. It is difficult to get a fully accurate
sample frame. The following errors can occur in a sample frame:
missing members, duplicate members and incorrect members, e.g.
the Register of Electors (for elections).
Example What percentage of irish people are taller than 1.9m ?
sample frame: Entire population of ireland

Michael Carr Statistics


Statistics
Simple Random Samples
Data Sources

A random sample is a sample selected in such a way that every


item in the population has an equal chance of being included. For
example, give a number to each item of the population and then
draw from a hat as in a lottery.
Advantages
1. The method of selection is free from bias. However there is no
guarantee that the sample is free from bias. For example, a
sample of Europeans might turn out to be all Swedes and if
investigating ”fair-hair” it would be a biased sample as
Swedes are a faired nation.

Michael Carr Statistics


Statistics
Simple Random Samples
Data Sources

Disadvantages
1. Sample could involve expensive travelling for the interviewers.
2. People selected might be difficult to find.
3. There might be no sample frame available. E.g if a geneticist
wanted to test the percentage of people with red hair who
have green eyes. There may not exist a list of people with red
hair.

Michael Carr Statistics


Statistics
Simple Random Samples
Data Sources

Stratified Sampling (Ref: Rouncefield Holmes Ch. 6 and 16)


1 Decide on the total sample size. Generally there is a trade off
between accuracy and cost. ( 1000 is often an appropriate
sample size).

Michael Carr Statistics


Statistics
Simple Random Samples
Data Sources

2 Divide the sample into sub-samples with the same proportions


as the groups in the population.
Population of Cork = 400,000.
Population of Ireland = 4,000,000
⇒ Sample of 1000, should contain 100 people from cork.

Michael Carr Statistics


Statistics
Simple Random Samples
Data Sources

3 Select at random from within each group (strata) the


appropriate sub-sample.
4 Add sub-sample results together to obtain the figures for the
overall sample.

Michael Carr Statistics


Statistics
Simple Random Samples
Data Sources

Exercise

You have been commisioned to conduct an opinion poll for an


upcoming referendum, outline below how you do this. In your
answer you should refer both sample size and sampling strategy. In
addition outline any errors that might exist in your results.

Michael Carr Statistics


Statistics
Simple Random Samples
Data Sources

Cluster Sampling The country is divided into many small areas.


Interviewers are sent to the areas with instructions to interview
every person they can find who fits the definition given, e.g.
fair-haired mothers, home-workshops, oak trees. A sample frame is
not necessary.

Michael Carr Statistics


Statistics
Simple Random Samples
Data Sources

Quota Sampling Interviewers are told to interview all the people


they meet up to a given number, (i.e. quota). The quota can be
divided up into different types of people (e.g. age groups) with sub
quotas for each.
Advantage
1. A sample frame is not needed.

Michael Carr Statistics


Statistics
Simple Random Samples
Data Sources

Disadvantages
1. This is non-random sampling and very few tests can be used.
2. It is biased due to interviewer discretion on who is interviewed.

Michael Carr Statistics


Statistics
Simple Random Samples
Data Sources

Multi-Stage sampling (Object is to cut down the cost)


1. Divide the country into many areas and select 3 or 4 by
random means.
2. Repeat the process a few times.
3. Then do a random sample of people in a few small areas.

Michael Carr Statistics


Statistics
Simple Random Samples
Data Sources

Advantages
1. The interviewer need only travel to a few areas.
2. A sample frame is not necessary.

Michael Carr Statistics


Statistics
Simple Random Samples
Data Sources

Systematic sampling If a 10 % (say) sample is required then the


sample can be selected by taking every tenth item in the sample
frame.

Michael Carr Statistics

You might also like