Data and Sampling
Data and Sampling
• COLLECTION
• REPRESENTATION
Concept vs Variable
Measurement
• Cardinal: How many players are there in a football team?
• Ordinal: Who is the first (second, etc.) scorer of the season?
• Nominal: What is the number on the back of “Messi”
• Interval: Temperature
• Ratio: Weight
• Continuous: Age, Weight, Length
• Discrete: Population, #children
Use
• Organized in univariate or multivariate format
• Can compute statistical characteristics
• Mean, Variance, Probabilities, etc.
Qualitative data
Information that can describe “qualities”. We create a “variable” that “logs” a characteristic
Measurement
• Labels
• Lists
• Classifications
Use
• For classification/organizational purposes
• Cannot compute statistical characteristics (mean, variance are irrelevant)
• Discrete probabilities
Quantifying Qualitative Data
The analysis of data requires quantification of characteristics. We will consider the following approaches
Sampling
Generalization
Participant
How can I collect data? Observation
Primary
Interview
(Usually Qualitative)
Questionnaires
Data Collection
Documentary
Secondary
(Usually Quantitative)
Survey
Participant Observation
Researcher
takes part in
activity
Participant Complete
Researcher’ as observer participant Researcher’
s identity is s identity is
revealed Observer concealed
Complete
as
observer
participant
Researcher
observes
activity
Interviews and
Questionnaires
A quick note for Interviews and questionnaires. Their Reliability and Validity needs to be ensured!!!
Do I collect information for
everything? -Sample
Population vs Sample
Population
Samplin Stratified
Random
g Cluster Multistage
Sampling
Extreme Case
Quota Heterogeneous
Purposive Homogeneous
Non- Snowball
Critical Case
Probability
Typical Case
Dataset Types
Chart Types
I have collected my sample:
Usable Datasets
dataset is a bunch of observations that are used in analyses
◦ some main sources of economic data:
Quandl.com 5. Delayed Commodities:
1. Historical FX Rates: here 6. US Fundamentals:
2. Historical Stock Prices: some economic data: UK, US: here, here, here, here and
here and here here
3. Recent LIBOR rates: 7. Macroeconomic Data: here, here and here
4. Some Implied Volatilities:
Types of data:
◦ Time Series
◦ Cross Sectional
◦ Panel Data
DATA
Time Series
Characteristics
One individual
One or many variables
Over time
Examples
GDP over time
Stock Price
Unemployment, etc.
Volume (M)
6,900.00
6750
Price
400 6,800.00
6700
6650 6,700.00
200
6600
6,600.00
6550 0
Mar 31, 2021
Mar 30, 2021
Mar 29, 2021
Apr 29, 2021
Apr 28, 2021
Apr 27, 2021
Apr 26, 2021
Apr 23, 2021
Apr 22, 2021
Apr 21, 2021
Apr 20, 2021
Apr 19, 2021
Apr 16, 2021
Apr 15, 2021
Apr 14, 2021
Apr 13, 2021
Apr 12, 2021
Apr 09, 2021
Apr 08, 2021
Apr 07, 2021
Apr 06, 2021
Apr 01, 2021
6,500.00
1 1 1 1 1 1 1 1 1 1 1
202 202 202 202 202 202 202 202 202 202 202
, , , , , , , , , , ,
r 29 r 27 r 23 r 21 r 19 r 15 r 13 r 09 r 07 r 01 r 30
Ap Ap Ap Ap Ap Ap Ap Ap Ap Ap M
a
Cross-Sectional
Characteristics
One point in time
Many individuals
Multiple Variables
Examples
GDP of many countries in a year
Covid Cases in a day
5000000 120000
4500000
4000000 100000
3500000 80000
Deaths
2000000
Cases
1500000 40000
1000000 20000
500000
0 0
ria tia rk ce ry ly ia ds al ia
Austria Belgium
ust roa ma ran nga Ita uan rlan rtug ven
n
Covid cases A C
De
F Hu h e Po S lo
Lit eth
Bulgaria Croatia N
Cyprus Czechia
Proportional Survival
Denmark Estonia Sur vival (abs ) Survivals deaths
100%
Finland France 90%
5000000
Germany Greece 4500000 80%
4000000 70%
Hungary Iceland
3500000 60%
Ireland Italy 3000000
50%
Latvia Liechtenstein 2500000
40%
2000000
Lithuania Luxembourg 30%
1500000
Malta Netherlands 1000000 20%
Ire ry
Fi ark
Hu any
Po way
La d
n
rm d
hu a
M a
No ta
Sp a
Bu tria
Cy ia
S l g al
nm s
De ru
n
G e lan
i
ai
Lit vi
u
ak
an
ar
a
n r r N Po S lo
al
C u h
la
A Bu I
t
p
rtu
F
ng
De Ge H Lit
s
lg
ov
r
n
Au
Slovakia Slovenia
Romania Czechia
Belgium
Hungary
Portugal Netherlands
Bulgaria Sweden
10000 Slovakia Austria
Greece
Spain Croatia
Deaths (log#)
Ireland
Slovenia
Lithuania
Denmark
Latvia
1000
FinlandEstonia
Luxembourg
Norway
Malta
Cyprus
100
Liechtenstein
Iceland
10
1000 10000 100000 1000000 10000000
Cases (log #)
Panel Data
Characteristics
Many individuals
Many points in time
Multiple Variables
Examples
GDP of many countries in many years
Covid Cases across countries every day
Activity
Do the following on the
dataset provided
◦ Plot the intensity of
unemployment for European
Countries. Tip:
◦ Use the second layer for Total
◦ Use a filter to select only “Total”
◦ use the maximum
◦ Plot the difference between
Male and Females (tip:
◦ Use the filter to select only Males
and females
◦ Use the Category option in 3D
Maps
◦ Use the average estimate of
unemployment
Next Step
Analysis: What can I do with the
data?
Descriptive Analysis-What
◦ I try to describe the data in order to find out what is or what happens.
Test Hypotheses
◦ Differences
Exploratory Analysis-Why
◦ I try to establish causality (Regressions)
◦ I try to create clusters/groups (PCA)
Inferential-Sample population
◦ Always I work on a sample and I generalize my findings for the population