0% found this document useful (0 votes)
24 views4 pages

Chapter 1

The document provides an introduction to statistics, including definitions of key statistical concepts like data types, scales of measurement, and sources of data. It discusses how statistics originated from government practices and games of chance. The different types of data are defined, including quantitative vs qualitative, discrete vs continuous, and scales of measurement. Common analysis methods for different data types are also introduced.

Uploaded by

Olorato Rantaba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views4 pages

Chapter 1

The document provides an introduction to statistics, including definitions of key statistical concepts like data types, scales of measurement, and sources of data. It discusses how statistics originated from government practices and games of chance. The different types of data are defined, including quantitative vs qualitative, discrete vs continuous, and scales of measurement. Common analysis methods for different data types are also introduced.

Uploaded by

Olorato Rantaba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

STATS 101

Introduction to statistics

Chapter 1 – Introduction
Statistics is defined as the collection, processing, interpretation, and presentation of data.

Brief historical background

The origin of statistics can be traced from the areas of (a) government and (b) games of
chance.

• Governments have long used censuses to count persons and properties for different
purposes including taxation, listing economic resources, etc.
• Games of chances date back, thousands of years ago. The use of dice was first
discovered in Egypt about 3500 B.C. However, the mathematical study of such
games began less than four centuries ago. In 1964, Blaise Pascal & Pierre de Fermat
(two mathematicians) identified a gambling problem and solved it independently
using different approaches. Until then the game of chances (known as probability
theory) was overlooked. Nowadays, the theory of probability is applied to many
problems in social and physical sciences. For example, measuring uncertainties or
risks of an event.

Sources of statistical data

Data collection may be classified as follows;

i. Primary data: This is the raw data collected by researchers (e.g. organisations,
person, authority, agency etc) through experiments, surveys, focus groups, interviews
and questionnaires.
ii. Secondary data: Is readily available data collected by someone else. It is available to
the public through publications, journals and newspapers.
Generally, primary sources are preferred compared to secondary datasets because the
possibility of errors of transcription is reduced. Primary sources are also accompanied by
documentation and precise definitions.

Types of statistical data and scale of measurements

There are two types of statistical data: quantitative (or numerical) and qualitative (or
categorical) data.

Quantitative (or numerical) data is obtained by measuring or counting process. Examples


of such datasets include weights, age, number of seats in the auditorium etc. In contrast,
Qualitative (also known as categorical) data result from descriptive measures. Examples
include colours, marital status, religious affiliations, etc.

Quantitative/numerical data can further be classified as discrete or continuous. Discrete


data has distinct or separate values (i.e. it takes on only particular values). This type of data
cannot be measured but can be counted. Example of discrete data includes the number of pre-
requite courses in STATS program, number of heads in 100 coin flips, number of students in
STATS 101 class, etc. Continuous data represents measurements thus, their values cannot
be counted but can be measured (i.e. it takes on any values in an interval). Examples of
continuous data includes heights or weights of students in STATS 101 class, price of a
toolkit, etc.

Scales of measurement for statistical data

There are four scales of measurement for statistical data which include nominal, ordinal,
interval and ratio scales.

Nominal Data: There exist no natural ranking or ordering in the data. For example, political
affiliations (UDC, BMD, AP, BDP), gender (Female/Male), etc.

Nominal Data: Provides an order, but there is no precise mathematical difference between
levels. For example, heat (low, medium, high), movie rating (1-star, 2-star), etc.

Interval Data: Satisfy the following conditions

• Intervals of equal length signify equal differences in the characteristics. For example,
the difference between 100 𝑐𝑐 and 200 𝑐𝑐 is the same the difference between 900 𝑐𝑐 and
1000 𝑐𝑐.
• Difference makes sense, but ratios do not. For example, 100𝑜𝑜 𝑐𝑐 is not twice as hot as
500 𝑐𝑐.
• The scale does not have a ‘true zero’ starting point (i.e. it has an arbitrary zero).
Additionally, zero does not signify an absence of the characteristics, e.g. 0𝑜𝑜 𝑐𝑐 does not
represent the absence of heat.

Ratio Data: This is more meaningful than the interval data. Ratio data satisfies the
following conditions:

• Both differences and ratios are meaningful. For example, two 2ml glasses of water
is equivalent to one 4ml glass of water. We can also say, 4ml of water is twice as
much as 2 ml of water.
• The scale has a ‘true zero’ starting point. For example, 0 ml of water is a ‘true
zero’ as it is empty and means absence of water.

From above, we can now summarize what is known as follows:

Data types

Quantitative/Numerical data Qualitative/Categorical data

Discrete Continuous Nominal Ordinal

Interval Ratio Interval Ratio

Why Data types are important?

This is an important concept because statistical methods can only be used with certain data
types. For instance, you cannot analyse continuous data the same way categorical data is
analysed. The results would be wrong. Therefore, it is important to know data types to
enable the choice of correct methods of analysis.

In general, categorical data can be summarised using frequencies, proportions and


percentages. And you can visualise the results using simple pie charts, bar charts, etc. In
contrast continuous data can be summarised using percentiles, median, interquartile range,
mean, mode, standard deviation and range. And can be visualized using histograms, stem and
leaf diagrams, box plots etc. We demonstrate most of these methods in the next chapters.

Progression check

1. The person’s highest education attainment is which type of data?


(a) Nominal
(b) Continuous
(c) Ordinal
(d) Discrete numeric

2. Nominal data is often analysed in the form of:


(a) Ranks
(b) Counts
(c) Average

3. The total number of false alarms reported in week is which type of data?
(a) continuous
(b) ordinal
(c) discrete numeric
(d) nominal

You might also like