0% found this document useful (0 votes)
28 views

IDS2b Data, Science, Data Science

The document provides an introduction to data science. It discusses how data science is a new field that has emerged due to the abundance of data being generated. Data science utilizes this data and mathematical/statistical techniques to make discoveries and solve problems across many domains. Specifically, the document notes that data science can act as a "handmaiden" to other scientific fields by enabling new insights, as well as being an engineering discipline through direct applications and as an independent field of study focused on data itself.

Uploaded by

T Do
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

IDS2b Data, Science, Data Science

The document provides an introduction to data science. It discusses how data science is a new field that has emerged due to the abundance of data being generated. Data science utilizes this data and mathematical/statistical techniques to make discoveries and solve problems across many domains. Specifically, the document notes that data science can act as a "handmaiden" to other scientific fields by enabling new insights, as well as being an engineering discipline through direct applications and as an independent field of study focused on data itself.

Uploaded by

T Do
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Introduction to Data Science

Theory of (data) science

10101001
The fundamental asymmetry between
verification/confirmation and falsification
• Confirming the rule (model/theory) doesn’t add
anything.
• It only breeds false confidence, as seen in the
turkey problem and the issue of black swans.
• One only learns from falsification.
• Corollary: A scientific investigation needs to allow
for falsification. The conclusions can’t be foregone.
• People are not inclined to do this. The natural
tendency – in almost all human affairs is to confirm
(support with evidence) what one already suspects.
Rule: If there is a vowel on one side,
there is an even number on the other.

Which cards need to be turned over to test the


rule (whether it is true or false)?
Rule:
If one is indoors, one has to wear a mask

Does not
Wears a
Indoors Outdoors wear a
mask
mask

Who has to be checked to see whether the rule was violated?


Is it now clear why falsification is
central in science?

One can only learn something from falsification,


not confirmation (to test hypotheses).
To be clear: What science is not
• A bunch of facts that are beyond doubt.
• Proof (!)
• Hard – not necessarily. Science aims at
simplicity. Misconception: Unfamiliar = hard.
• Concerned about how things should be.
• Instead: Science is a process. Coming up with
theories, then trying to falsify them in an
interplay between induction and deduction.
• The German term is more apt:
• Wissen = Knowledge
• Wissenschaft = Knowledge creation = Science
Is Mathematics a science?
Is neuroscience a science?
Is History a science?
Is History a science?
Is engineering a science?
Body of Goal: Natural
Field Inductive Deductive Experiments Kind
knowledge understanding

Politics ☓ ☓ ☓ ☓ ☓ ☓
Religion ☓ ☓ ☓ ☓ ☓ ☓
English literature ✓ ☓ ☓ ☓ ☓ ☓
Dance ✓ ☓ ☓ ☓ ☓ ☓
Engineering ✓ ☓ ☓ ☓ ☓ ☓
Philosophy ✓ ☓ ☓ ☓ ☓ ☓
Computer science ✓ ☓ ☓ ☓ ☓ ☓
Library science ✓ ☓ ☓ ☓ ☓ ☓
Medicine ✓ ☓ ☓ ☓ ☓ ☓
History ✓ ✓ ☓ ☓ ☓ ☓
Mathematics ✓ ☓ ☓ ✓ ☓ ☓
Economics ✓ ✓ ✓ ✓ ☓ 2
Physics ✓ ✓ ✓ ✓ ✓ 1
Astronomy ✓ ✓ ✓ ✓ ☓ 1
Psychology ✓ ✓ ✓ ✓ ✓ 2
Neuroscience ✓ ✓ ✓ ✓ ✓
Hallmarks Type I Type II

Objects reducible to elements Yes (e.g. quarks, atoms, No (e.g. people, societies)
with simple behavior molecules)
Objects reducible to Yes (e.g. all gold atoms are No (e.g. brains–are inherently
categories with no intrinsic in- identical, variation is due to different between people,
group variance measurement noise) true variability)
Reactive subject matter No (i.e. once forces of nature Yes (i.e. once rules of behavior
have been understood, they have been found, they might
don’t change) well change as a result of this)
Ethical considerations No (e.g. no IRB necessary to Yes (e.g. human experiments,
do experiments with natural animal experiments)
forces)
Ergodicity Typically yes Typically no

Measurement typically on Ratio scale Lower than ratio scale


There are dependencies between fields
Logic
Primary goal: Proving theorems
Math from axioms using deduction

uses
Primary goal: Recording and
Physics analyzing *data* to understand the

uses
natural world
Science

Medicine Primary goal: Improving health


Engineering
What about data science?
0) Most like “library science” – solely a field of
knowledge on a given subject?
1) Most like physics – aimed at understanding
an aspect of the natural world?
2) Most like computer science – a branch of
applied logic, proving things?
3) Most like engineering – making the world a
better place by solving specific problems?
The role of data: Data are fuel
• For the inducto-deductive engine of science.

Realm of ideas

Deduction*
Induction

External physical world (Reality)


In regular science, data flows like water
What about data in data science –
science applied to data itself?
• A new kind of science.
• What makes it so special?
Data science as a relevant and independent
field is less than a decade old
A genuinely 21st century enterprise
What took so long?
Recap: Data is twice exceptional
1. Both born and made
• Resulting from measurement processes.
• Measurement: Systematically assigning a number to
an aspect of the natural world according to formal
rules.

2. By interpreting the mathematical properties of


these numbers, we apply mathematics to the natural
world

• This yields scientific data.


Scientific data is becoming ever more
abundant, as a function of measurements
There is an even broader sense of data
• In addition to scientific data (which results from
measurements), there is a broader conception of
data, which is “quantitative information” more
generally.
• Information = reduction of uncertainty
• The other way major to reduce uncertainty (in
addition to measurement) is digitization
(sometimes called digitalization)
• Analog signals (e.g. sounds, images) being
converted into numbers.
This is fortuitous because a dire need has arisen

• Many scientific fields (economics, neuroscience,


psychology, microbiome, nutrition, epigenetics,
pharmacology, etc.) have run into a wall of diplexity.
• Diplexity: Fundamental, irreducible, inherently diverse
complexity.
• This renders most traditional data analysis approaches
(importantly all that assume ergodicity) inappropriate
or misleading.
• Impeding further progress in these fields.
• Luckily, there is salvation in (multivariate) big data
analytics.
Electron(s)
Mass: 9.10938356 × 10-31 kg
Charge: -1.60217662 × 10-19 C
Spin: ½
Type I:
Number: At least 1080
They are all identical.
Diplexity

Type II:
à 1) Data science as a “handmaiden”
to many scientific fields
• Akin to the relationship between mathematics and
physics.
• Issue: Can’t do experiments in many fields.
• And well all know that correlation ≠ causation.
• Or is it?
• Causal inference: If there are many replicates that
are characterized in a multivariate fashion, some
causal models are much more likely than others.
Can data science turn history into a science?

Lyall (2020) on the 825 engagements documented


and analyzed by “Project Mars”
It’s not just history
2) DS akin to CS/math:
• Many/most mathematical techniques were
developed in response to a specific analytical
need, e.g. Calculus, Fourier transform, Mann-
Whitney U test.
• Rich, polyphonic datasets generate new such
demands for completely new math/algorithm
development.
• This is happening, within DS.
3) DS as an engineering discipline:
• In a world awash with data, the potential for direct
application of DS is obvious.
• This is most definitely happening:
• ~10k years ago: Domestication of animals
• ~2k years ago: Metal tools
• ~200 years ago: Mechanical machines
• ~100 years ago: Electrical machines
• Now: Harnessing information itself
DS as a scientific field in its own right?
• Studying data itself…

You might also like