1 IntroductiontoStatisticsB
1 IntroductiontoStatisticsB
Data Sources
Michael Carr
January 2015
Outline
1. Statistics
Introduction
Data Types
2. Data Sources
Simple Random Samples
Introduction
Statistics
The word statistics may refer to the subject ’statistics’ or to
collections of sets of figures or data. A statistic is the result of a
calculation based on a set of data.
Statistics as a subject, is used to explain and simplify a set of
figures making them easier to understand and enabling conclusions
to be drawn.
Finite data Data that has a finite (not infinity) upper limit. For
example if a family have five children and you are asked to find the
probability that three of the children are male. Then the number
of males in the family can take on the values 0, 1, 2, 3, 4, 5. This
is a finite set of values and has a finite upper limit of 5. Finite data
can be either discrete or continuous.
Infinite data Data that has no finite upper limit. For example the
number of phone calls coming into a switchboard in intervals of
one hour has no upper limit. Levels of data measurement (Ref:
Rouncefield & Holmes Chapter 16)
Categorical data
1. In nominal scales only classification and counting are possible.
For example when considering work status there is no order
associated with: retired, in-school, keeping house, other; e.g.
gender, nationality, colour of eyes, hair etc.
2. In ordinal scales items may be ranked in order of preference.
For example mark 1, 2, and 3 in order of your preference to
indicate your preferred time slot for a certain class and for
example job satisfaction can be listed as: 1 very satisfied, 2
moderately satisfied, etc i.e. involves order.
Continuous data
1. Data which changes gradually rather than in discrete amounts
or integer amounts
2. An interval scale ranks items and gives them a numerical
value. The position of zero is fixed by convention, rather than
at an absolute zero. For example the difference between 10 C
and 20 C is the same as the difference between 20 C and 30
C, however a temperature of 20 C is not double that of 10 C.
3. A ratio scale is an interval scale, which has an absolute zero
such as temperature, weight, area, height, etc. This means
that ratios such as twice as heavy and so on can be calculated,
and an actual income of 50,000 is twice an income of 25,000.
Sampling Methods
When dealing with a large population it is not always practical to
include the entire population. Instead we must produce a
representative sample of the population.
Sample Frame A list of the entire population from which items can
be selected to form a sample. It is difficult to get a fully accurate
sample frame. The following errors can occur in a sample frame:
missing members, duplicate members and incorrect members, e.g.
the Register of Electors (for elections).
Example What percentage of irish people are taller than 1.9m ?
sample frame: Entire population of ireland
Disadvantages
1. Sample could involve expensive travelling for the interviewers.
2. People selected might be difficult to find.
3. There might be no sample frame available. E.g if a geneticist
wanted to test the percentage of people with red hair who
have green eyes. There may not exist a list of people with red
hair.
Exercise
Disadvantages
1. This is non-random sampling and very few tests can be used.
2. It is biased due to interviewer discretion on who is interviewed.
Advantages
1. The interviewer need only travel to a few areas.
2. A sample frame is not necessary.