0% found this document useful (0 votes)
10 views

Data Overview

The document provides an overview of data concepts, focusing on scales of measurement including nominal, ordinal, interval, and ratio scales, along with their properties and applicable mathematical operations. It also classifies data based on nature, source, and design, explaining the differences between qualitative and quantitative data. The training is aimed at beginners using Stata and is presented by Leguma Bakari from the Eastern Africa Statistical Training Center.

Uploaded by

planerpop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Data Overview

The document provides an overview of data concepts, focusing on scales of measurement including nominal, ordinal, interval, and ratio scales, along with their properties and applicable mathematical operations. It also classifies data based on nature, source, and design, explaining the differences between qualitative and quantitative data. The training is aimed at beginners using Stata and is presented by Leguma Bakari from the Eastern Africa Statistical Training Center.

Uploaded by

planerpop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Overview of Concepts of Data (Theory)

Stata Beginners Level Training

Leguma Bakari
Email: [email protected]
Phone:+255 762 760 095
December 25, 2023
Eastern Africa Statistical Training Center (EASTC)

1
Outline

Introduction

Scales of Measurement of Data

Nominal Scale

Ordinal Scale

Interval Scale

Ratio Scale

Classification of Data

Data in Terms of Nature

Data in Terms of Source

Data in Terms of Design

2
Definitions

• Statistics is the discipline that concerns the collection,


organization, analysis, interpretation and presentation of data.
• Data are facts and statistics collected together for reference or
analysis.
• These facts may be in terms of quantities, characters, or symbols.
• Variable is an element, feature, or factor that is liable to vary or
change.

3
Outline

Introduction

Scales of Measurement of Data

Nominal Scale

Ordinal Scale

Interval Scale

Ratio Scale

Classification of Data

Data in Terms of Nature

Data in Terms of Source

Data in Terms of Design

4
Scales of Measurement of Data

• Scales of measurement refer to ways in which variables/numbers


are defined and categorized.
• Each scale of measurement has certain properties which in turn
determines the appropriateness for use of certain statistical
analyses.
• There are four scales of measurement are
1. Nominal scale,
2. Ordinal scale,
3. Interval scale, and
4. Ratio scale.
• Mathematical operation are important criteria on naming and
identifying a scale of measurement.
• The mathematical operations are addition(+),subtraction(-),
multiplication(×), division(÷) and inequalities(< or >).
5
Scales of Measurement Summary
• You can first classify scale of measurements into two groups,
namely
1. categorical data and
2. numeric data.
• Categorical data is a variable with a fixed number of responses,
this can either be in nominal or ordinal scale.
1. In nominal scale the categories can not be ranked.
2. In ordinal scale the categories can be ranked.
• Numeric data is any data with responses in form of numbers
which can either be interval or ratio scale.
1. Interval data is a measured data with false zero.
2. Ratio data is a measured (or count) data with true zero.
• All data which are in ratio or interval scale they can be converted
to categorical data.
• When they are converted they become ordinal data but not
nominal, eg GPA to GPA class, age to age group etc.
6
Outline

Introduction

Scales of Measurement of Data

Nominal Scale

Ordinal Scale

Interval Scale

Ratio Scale

Classification of Data

Data in Terms of Nature

Data in Terms of Source

Data in Terms of Design

7
Nominal Scale

• A nominal scale is a scale, in which numbers only serve as tags


or labels for the purpose of categorizing responses.
• No any mathematical operation which can be applied in nominal
variable.
• Nominal responses can not be ranked.
• Examples of nominal variables includes gender, any binary (ie
yes-no) response, race, marital status e.t.c
• In nominal scale you can only compute mode value, but not
median and any form of mean.

8
Nominal Scale Arithmetic Application

• Let us use marital status nominal variable for demonstration.


• Numbers in nominal variables are just assigned number (not
original).
• The possible values for marital status variable are
1. Single
2. Married
3. Widowed
4. Divorced
• Addition (+)
• From mathematics; 1+2=3,
• For nominal variable this is equivalent to
• Single+Married=Widowed; this is impossible,
• Therefore addition operation can not be applied in nominal data.
• Since addition can not be applied, and the mean computation
involves addition in it’s computation,
• Therefore mean is an invalid statistic for nominal data.
9
• Mean = x1 +x2 +···+
n
xn
• Subtraction (−)
• From mathematics; 4-2=2,
• For nominal variable this is equivalent to
• Divorced-Married=Married; this is impossible
• Therefore subtraction operation can not be applied in nominal
data.
• Multiplication (×)
• From mathematics; 2 × 2 = 4,
• For nominal variable this is equivalent to
• Married × Married = Divorced; this is impossible
• Therefore multiplication operation can not be applied in nominal
data.
• Division (÷)
• From mathematics; 3 ÷ 1 = 3,
• For nominal variable this is equivalent to
• Widowed ÷ Single = Single; this is impossible
• Therefore division operation can not be applied in nominal data.

10
• Inequality (< or >)
• From mathematics; 2 > 1,
• For nominal variable this is equivalent to
• Married > Single; this is impossible
• Inequality in other words implies ranking,
• Since inequality can not be applied, and the median computation
involves ranking (ascending or descending),
• Therefore median is an invalid statistic for nominal data.
• Among three common measure of central tendency, mean
median and mode,
• Mode is the only valid statistic since in involves counts
(frequency)
• This conclude that, no any mathematical operation can be
applied in nominal data.

11
Marital Status Frequency
Single 23
Married 67
Widowed 19
Divorced 32

• From the above frequency table married is mode since it has the
most occurred values.

12
Outline

Introduction

Scales of Measurement of Data

Nominal Scale

Ordinal Scale

Interval Scale

Ratio Scale

Classification of Data

Data in Terms of Nature

Data in Terms of Source

Data in Terms of Design

13
Ordinal Scale

• An ordinal scale is a scale, in which numbers only serve as tags


or labels for the purpose of categorizing an ordered responses.
• An ordinal scale implies that the categories must be put into an
order such that each category in one class is considered greater
(or less) than every category in another class.
• For ordinal variable inequality mathematical operations are only
mathematical operations which can be applied.
• Example of ordinal variables includes education level, GPA
class, e.t.c
• In ordinal scale you can compute mode and median values but
not and any form of mean.

14
Ordinal Scale Arithmetic Application

• Let us use education level ordinal variable for demonstration.


• Numbers in nominal variables are just assigned number (not
original).
• The possible values for education level variable are
1. Primary
2. Secondary
3. College
4. University
• Addition (+)
• From mathematics; 1+2=3,
• For nominal variable this is equivalent to
• Primary+Secondary=College; this is impossible,
• Therefore addition operation can not be applied in ordinal data.
• Since addition can not be applied, and the mean computation
involves addition in it’s computation,
• Therefore mean is an invalid statistic for ordinal data. 15
• Subtraction (−)
• From mathematics; 4-2=2,
• For nominal variable this is equivalent to
• University-Secondary=Secondary; this is impossible
• Therefore subtraction operation can not be applied in ordinal data.
• Multiplication (×)
• From mathematics; 2 × 2 = 4,
• For nominal variable this is equivalent to
• Secondary × Secondary = University; this is impossible
• Therefore multiplication operation can not be applied in ordinal
data.
• Division (÷)
• From mathematics; 3 ÷ 1 = 3,
• For nominal variable this is equivalent to
• College ÷ Primary = College; this is impossible
• Therefore division operation can not be applied in ordinal data.

16
• Inequality (< or >)
• From mathematics; 2 > 1,
• For nominal variable this is equivalent to
• Secondary > Primary; this is possible
• Inequality in other words implies ranking,
• Since inequality can be applied, and the median computation
involves ranking (ascending or descending),
• Therefore median is an valid statistic for ordinal data.
• Among three common measure of central tendency, mean
median and mode,
• Mode and median are the only valid statistic for summarization.
• This conclude that, only inequality mathematical operation can
be applied in ordinal data.

17
Binary Variable

• Binary variable is a nominal or ordinal data with two responses


only.
• Example gender (male or female), all forms of yes-no responses.

18
Outline

Introduction

Scales of Measurement of Data

Nominal Scale

Ordinal Scale

Interval Scale

Ratio Scale

Classification of Data

Data in Terms of Nature

Data in Terms of Source

Data in Terms of Design

19
Interval Scale

• An interval scale is a scale of measurement whereby the presence


of zero (0) does not indicate the absence of measurement.
• Zero value which do not indicate absence of measurement is also
called false zero.
• For interval variable three mathematical operations are
applicable which are.
• Inequalities,
• Addition, and
• Subtraction.
• Example of interval variable is temperature and deviations.
• In interval scale you can compute mode, median and arithmetic
mean values, but not geometric mean.

20
Interval Scale Arithmetic Application

• Let us use degree centigrade interval variable for demonstration.


• Numbers in interval variable are original number (from
measurement).
• The possible values for education level variable are
• 15o C, −9o C, 0o C, 5o C, −15o C, 25o C
• Addition (+)
• From mathematics; 15o C + 5o C = 20o C, which is possible and
valid
• Since addition can be applied, and the mean computation involves
addition in it’s computation,
• Therefore mean is a valid statistic for interval data.
• Mean = x1 +x2 +···+
n
xn

21
• Subtraction (−)
• From mathematics; 15o C − 5o C = 10o C, which is possible and
valid
• Any computation which involves subtraction will become valid
due to this property.
• Multiplication (×)
• From mathematics; 15o C × 0o C =??, which is impossible and not
valid
• Any computation which involves multiplication (ie geometric
mean) will become invalid due to this property.

• Geometric mean = n y1 × y2 × · · · × yn
• Division (÷)
• From mathematics; 15o C ÷ −5o C =??, which is impossible and
not valid
• Any computation which involves division (ie harmonic mean)
will become invalid due to this property.
• Harmonic mean =  1 1 n 1 
x1 + x +···+ xn
2

22
• Inequality (< or >)
• From mathematics; 15o C > 5o C, which is possible and valid
• Inequality in other words implies ranking,
• Since inequality can be applied, and the median computation
involves ranking (ascending or descending),
• Therefore median is an valid statistic for interval data.
• All three common measure of central tendency, arithmetic mean
median and mode, are valid statistic for summarization.
• Geometric mean and harmonic mean can not be applied.
• This conclude that only inequality, addition and subtraction
mathematical operation can be applied in nominal data.

23
Outline

Introduction

Scales of Measurement of Data

Nominal Scale

Ordinal Scale

Interval Scale

Ratio Scale

Classification of Data

Data in Terms of Nature

Data in Terms of Source

Data in Terms of Design

24
Ratio Scale

• A ratio scale is a of measurement whereby the presence of zero


(0) indicates the absence of measurement.
• Zero value which indicates absence of measurement is also
called true zero.
• For ratio variable all five forms mathematical operations can be
applicable.
• Examples of ratio variables are height, weight, age e.t.c
• In ratio scale you can compute many statistics include mode,
mean and arithmetic mean values, geometric mean, etc.

25
Ratio Scale Arithmetic Application

• Let us use length ratio variable for demonstration.


• Numbers in interval variable are original number (from
measurement).
• The possible values for length variable are
• 150 cm, 80 cm, 130 cm, 50 cm
• Addition (+)
• From mathematics; 150 cm + 50 cm = 200 cm, which is possible
and valid
• Since addition can be applied, and the mean computation involves
addition in it’s computation,
• Therefore mean is a valid statistic for ratio data.
• Mean = x1 +x2 +···+
n
xn

26
• Subtraction (−)
• From mathematics; 150 cm − 50 cm = 100 cm, which is possible
and valid
• Any computation which involves subtraction will become valid
due to this property.
• Multiplication (×)
• From mathematics; 50 cm × 5 cm = 250 cm2 , which is possible
and valid
• Any computation which involves multiplication (ie geometric
mean) will become valid due to this property.

• Geometric mean = n y1 × y2 × · · · × yn
• Division (÷)
• From mathematics; 50 cm ÷ 5 cm = 10, which is possible and
valid
• Any computation which involves division (ie harmonic mean)
will become valid due to this property.
• Harmonic mean =  1 1 n 1 
x1 + x +···+ xn
2

27
• Inequality (< or >)
• From mathematics; 50 cm > 5 cm, which is possible and valid
• Inequality in other words implies ranking,
• Since inequality can be applied, and the median computation
involves ranking (ascending or descending),
• Therefore median is an valid statistic for ratio data.
• All three common measure of central tendency, arithmetic mean
median and mode, are valid statistic for summarization.
• Geometric mean and harmonic mean can also be applied.
• This conclude that, all forms mathematical operation can be
applied in nominal data.

28
Outline

Introduction

Scales of Measurement of Data

Nominal Scale

Ordinal Scale

Interval Scale

Ratio Scale

Classification of Data

Data in Terms of Nature

Data in Terms of Source

Data in Terms of Design

29
Classification of Data

• Data types can be classified in terms of source, nature or design.


• In terms of nature the types of data are qualitative data and
quantitative data.
• In terms of source the types of data are primary data and
secondary data
• In terms of design the types of data are cross-section, time series
and longitudinal (also referred as panel) data.

30
Outline

Introduction

Scales of Measurement of Data

Nominal Scale

Ordinal Scale

Interval Scale

Ratio Scale

Classification of Data

Data in Terms of Nature

Data in Terms of Source

Data in Terms of Design

31
Type of Data in Terms of Nature

• In terms of nature the of data type can either be qualitative data


and quantitative.
Quantitative Data
• These are measures of values or counts that are expressed as
numbers.
• Quantitative data are data are expressed in terms of numeric.
• Count data are such as how many?, how much?, or how often?
and
• Measurement data are such as weight, height, age etc
Qualitative Data
• Is a data concerned with descriptions, which can be observed but
cannot be computed or quantified,e.g why?
• Qualitative data includes texts, all categories, audios, video, image etc.
32
Outline

Introduction

Scales of Measurement of Data

Nominal Scale

Ordinal Scale

Interval Scale

Ratio Scale

Classification of Data

Data in Terms of Nature

Data in Terms of Source

Data in Terms of Design

33
Types of Data in Terms of Source

• In terms of source the data types can either be primary or


secondary data.

Primary Data

• Is information collected through original or first-hand research.


• For example, surveys and focus group discussions, interviews,
observations, questionnaire etc

Secondary Data

• Is information which has been collected in the past by someone


else.
• For example, researching the internet, newspaper articles and
company reports.
34
Outline

Introduction

Scales of Measurement of Data

Nominal Scale

Ordinal Scale

Interval Scale

Ratio Scale

Classification of Data

Data in Terms of Nature

Data in Terms of Source

Data in Terms of Design

35
Types of Data in Terms of Design

• In terms of design the data types can be classified as,


1. Cros-section data.
2. Time series data.
3. Longitudinal (Panel) data.
Table 1: Summary Table

Design Subject(n) Time unit(t) No. of Obs


Cross-section Multiple(n>1) Fixed(t=1) n×1=n
Time-series Fixed(n=1) Multiple(t>1) 1×t =t
Panel Multiple(n>1) Multiple(t>1) n × t = nt

• Possible subjects (key identifier)-an individual, an institution, a


household, a city, a country or a region (continent), etc.
• Possible time units (index)-a day, a week, a month, quarterly,
semi-annual,annual or any other fixed time interval. 36
Cross-Section Data

• Its is a collection of observations for multiple subjects (n>1) at


single unit of time (t=1).
• In cross-section the unit of analysis is subjects.
• The number of observation will be obtained by counting number
of subjects, ie n × 1 = n.
• Example - Max Temperature, Humidity and Wind (all three
behaviors) in New York City, SFO, Boston, Chicago(multiple
subjects) on 1/1/2015 (single point of a time).

37
Time Series Data

• It is a collection of observations(behavior) for a single subject


(n=1) at different time units (t>1) interval generally equally
spaced.
• In time series the unit of analysis is time unit.
• The number of observation will be obtained by counting number
of time points,ie 1 × t = t .
• Example - Max Temperature, Humidity and Wind( all three
behaviors) in New York City (single subject) collected on First
day of every year (multiple units of time).

38
Panel (Longitudinal) Data

• It is a collection of observations for a multiple subjects(n>1) at


multiple units of time(t>1).
• It is also called as cross-sectional time-series data as it a
combination of above mentioned types.
• Example- Max Temperature, Humidity and Wind( all three
observation) in New York City, SFO, Boston(multiple times) on
First day of every year(multiple time units).

39
Summary

• n is number of subjects also called cross-section units.


• t is number of time units.
• For cross-section data, n>1 and t=1,
• For time series data, n=1 and t>1, and
• For panel data, n>1 and t>1,
• Further description for panel data,
• When n>t, panel data is known as micro panel data.
• When n=t, panel data is known as balanced panel data.
• When n<t, panel data is known as macro panel data.

40

You might also like