0% found this document useful (0 votes)
2 views25 pages

Data and Sampling

The document provides an overview of data definition, types, collection, and representation in social sciences, emphasizing the distinction between qualitative and quantitative data. It discusses various measurement types, data collection methods, and sampling techniques, highlighting the importance of reliability and validity. Additionally, it outlines different data presentation formats and analytical approaches for interpreting collected data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views25 pages

Data and Sampling

The document provides an overview of data definition, types, collection, and representation in social sciences, emphasizing the distinction between qualitative and quantitative data. It discusses various measurement types, data collection methods, and sampling techniques, highlighting the importance of reliability and validity. Additionally, it outlines different data presentation formats and analytical approaches for interpreting collected data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

• DATA DEFINITION AND TYPES

• COLLECTION
• REPRESENTATION

Data and Sampling


Definition &
Types
What is Data?

What are the Different Types of Data?

Concept vs Variable

Measurement, Reliability and Validity


What is data?
Data in social sciences engulfs the philosophical sense of the term “observation”.
Observations we make using our senses.
◦ When it is words: Qualitative
◦ When it is numbers: Quantitative
In brief:
Data is information we collect by observing the world around us, in
order to analyse it and arrive at conclusions.

We might collect it ourselves (Primary)


We might use existing data (Secondary)
Quantitative data
Information that can be “quantified”. We create a “variable” that measures (Metric or Imperial units) a quantity in
• Integers
• Fractions
• Roots, Etc.

Measurement
• Cardinal: How many players are there in a football team?
• Ordinal: Who is the first (second, etc.) scorer of the season?
• Nominal: What is the number on the back of “Messi”
• Interval: Temperature
• Ratio: Weight
• Continuous: Age, Weight, Length
• Discrete: Population, #children

Use
• Organized in univariate or multivariate format
• Can compute statistical characteristics
• Mean, Variance, Probabilities, etc.
Qualitative data
Information that can describe “qualities”. We create a “variable” that “logs” a characteristic

Measurement
• Labels
• Lists
• Classifications

Use
• For classification/organizational purposes
• Cannot compute statistical characteristics (mean, variance are irrelevant)
• Discrete probabilities
Quantifying Qualitative Data
The analysis of data requires quantification of characteristics. We will consider the following approaches

• The characteristics can be used as a classification


variable (e.g., male/female), for which we can measure
Classification • the frequency of observations
• the intensity of other variables

• We can allocate an intensity score


Intensity • This should be standardized
• It can be treated as a quantitative variable

Identification • Principal Components


Concept vs
Variable
In social science we want
to analyse concepts. We
measure concepts with
variables. All measures
are human inventions
and therefore, they
might not measure
concepts well. We need Stock Price vs Identity Identity

to ensure the reliability • Which one is easier to • Think of appropriate


and the validity of measure? measures?
• Which one you think can be • Do these parameters measure
measurement. more reliable? it in a reliable and valid way?
Data
Collection
Population vs Sample

Sampling

Generalization
Participant
How can I collect data? Observation

Primary
Interview
(Usually Qualitative)

Questionnaires
Data Collection

Documentary
Secondary
(Usually Quantitative)
Survey
Participant Observation
Researcher
takes part in
activity

Participant Complete
Researcher’ as observer participant Researcher’
s identity is s identity is
revealed Observer concealed
Complete
as
observer
participant
Researcher
observes
activity
Interviews and
Questionnaires

A quick note for Interviews and questionnaires. Their Reliability and Validity needs to be ensured!!!
Do I collect information for
everything? -Sample
Population vs Sample
Population

I can only observe a particular sample


I can only collect data for a sample
Can I really
I can only work with what is generalize?
observable
BUT:
I need to learn something about the
population Sample
Simple Random

How to: Probability


Systematic

Samplin Stratified
Random
g Cluster Multistage

Sampling
Extreme Case

Quota Heterogeneous

Purposive Homogeneous

Non- Snowball
Critical Case
Probability
Typical Case

Self Selection Convenience


Sampling (Chapter 6)
Probability Sampling: The probability of each sample of the population is known (RANDOM)
◦ Simple Random (all classifications)
◦ Systematic (choose a pattern)
◦ Stratified: Define and select Strata (e.g., students, pensioners, female, etc.)
◦ Clustered: Define a Cluster (similar groups, but different variables. E.g., Nantes vs Paris, or fans of PSG vs Nantes, etc.)
◦ Multistage: Clusters of Clusters until the sample is defined

Non-Probability Sampling: Non-Random. The probability of the sample is not known.


◦ Quota sampling: Non-random within a strata
◦ Purposive: Serve researcher’s needs
◦ Snowball sampling: Start with a person. You can meet more through your initial contact
◦ Self selection: You make your intentions known and let the audience approach you
◦ Convenience: select the cases that they are easiest to approach. (Needs a modification of the initial idea
◦ Problems
◦ Generalization is difficult
◦ Cannot be based on statistical techniques/Imposes limitations on the results
◦ Might be appropriate for Case Studies but not for empirically testing theories.
Data
Presentation
Dataset

Examples of Sources of information

Dataset Types

Chart Types
I have collected my sample:
Usable Datasets
dataset is a bunch of observations that are used in analyses
◦ some main sources of economic data:
Quandl.com 5. Delayed Commodities:
1. Historical FX Rates: here 6. US Fundamentals:
2. Historical Stock Prices: some economic data: UK, US: here, here, here, here and
here and here here
3. Recent LIBOR rates: 7. Macroeconomic Data: here, here and here
4. Some Implied Volatilities:

Types of data:
◦ Time Series
◦ Cross Sectional
◦ Panel Data
DATA

Time Series
Characteristics
One individual
One or many variables
Over time

Examples
GDP over time
Stock Price
Unemployment, etc.

Relevant Chart Types


Lines
Candle bars
Line and Candlestick charts
Volume and Price
7050 1200
Open, Close, High and Low
7000
Vol. Price Price
1000
6950 Price
6900 7,100.00
800
6850
7,000.00
6800 600

Volume (M)
6,900.00
6750
Price

400 6,800.00
6700
6650 6,700.00
200
6600
6,600.00
6550 0
Mar 31, 2021
Mar 30, 2021
Mar 29, 2021
Apr 29, 2021
Apr 28, 2021
Apr 27, 2021
Apr 26, 2021
Apr 23, 2021
Apr 22, 2021
Apr 21, 2021
Apr 20, 2021
Apr 19, 2021
Apr 16, 2021
Apr 15, 2021
Apr 14, 2021
Apr 13, 2021
Apr 12, 2021
Apr 09, 2021
Apr 08, 2021
Apr 07, 2021
Apr 06, 2021
Apr 01, 2021

6,500.00
1 1 1 1 1 1 1 1 1 1 1
202 202 202 202 202 202 202 202 202 202 202
, , , , , , , , , , ,
r 29 r 27 r 23 r 21 r 19 r 15 r 13 r 09 r 07 r 01 r 30
Ap Ap Ap Ap Ap Ap Ap Ap Ap Ap M
a
Cross-Sectional
Characteristics
One point in time
Many individuals
Multiple Variables

Examples
GDP of many countries in a year
Covid Cases in a day

Relevant Chart Types


Bar Chart
Pie Charts
Scatter Plot
Europe-Covid cases deaths

5000000 120000
4500000
4000000 100000
3500000 80000

Bar Charts and Pie Charts


3000000
2500000 60000

Deaths
2000000

Cases
1500000 40000
1000000 20000
500000
0 0
ria tia rk ce ry ly ia ds al ia
Austria Belgium
ust roa ma ran nga Ita uan rlan rtug ven
n
Covid cases A C
De
F Hu h e Po S lo
Lit eth
Bulgaria Croatia N

Cyprus Czechia
Proportional Survival
Denmark Estonia Sur vival (abs ) Survivals deaths
100%
Finland France 90%
5000000
Germany Greece 4500000 80%
4000000 70%
Hungary Iceland
3500000 60%
Ireland Italy 3000000
50%
Latvia Liechtenstein 2500000
40%
2000000
Lithuania Luxembourg 30%
1500000
Malta Netherlands 1000000 20%

Norway Poland 500000 10%


0 0%
l ia in
Portugal Romania ria ria us rk nd ny ry nd ia ia lta ay a
st lga ypr ma inla ma nga ela Latv uan Ma orw rtug vak S pa

Ire ry
Fi ark

Hu any

Po way
La d

n
rm d

hu a
M a
No ta

Sp a
Bu tria

Cy ia

S l g al
nm s
De ru

n
G e lan

i
ai
Lit vi
u

ak
an
ar

a
n r r N Po S lo

al
C u h

la
A Bu I

t
p

rtu
F

ng
De Ge H Lit

s
lg

ov
r
n
Au
Slovakia Slovenia

Spain Sweden Survivals deaths


Scatter Plot
1000000

100000 Italy France


Germany
Poland

Romania Czechia
Belgium
Hungary
Portugal Netherlands
Bulgaria Sweden
10000 Slovakia Austria
Greece
Spain Croatia
Deaths (log#)

Ireland
Slovenia
Lithuania
Denmark
Latvia
1000
FinlandEstonia
Luxembourg
Norway
Malta
Cyprus

100
Liechtenstein
Iceland

10
1000 10000 100000 1000000 10000000

Cases (log #)
Panel Data
Characteristics
Many individuals
Many points in time
Multiple Variables

Examples
GDP of many countries in many years
Covid Cases across countries every day

Relevant Chart Types


Multivariate Bar (or line) charts
Contour Plots
Infographics
Infographics
Activity One

Activity
Do the following on the
dataset provided
◦ Plot the intensity of
unemployment for European
Countries. Tip:
◦ Use the second layer for Total
◦ Use a filter to select only “Total”
◦ use the maximum
◦ Plot the difference between
Male and Females (tip:
◦ Use the filter to select only Males
and females
◦ Use the Category option in 3D
Maps
◦ Use the average estimate of
unemployment
Next Step
Analysis: What can I do with the
data?
Descriptive Analysis-What
◦ I try to describe the data in order to find out what is or what happens.
Test Hypotheses
◦ Differences
Exploratory Analysis-Why
◦ I try to establish causality (Regressions)
◦ I try to create clusters/groups (PCA)
Inferential-Sample population
◦ Always I work on a sample and I generalize my findings for the population

You might also like