STAT121 / AC209 / E-109: CS109 Data Science
STAT121 / AC209 / E-109: CS109 Data Science
Joe Blitzstein
[email protected]
Outline
What?
Why?
Who?
How?
Outline
What?
Why?
Who?
How?
Data Science
To gain insights into data through
computation, statistics, and visualization
A Data Scientist Is...
A data scientist is someone who knows more
statistics than a computer scientist and more
computer science than a statistician.
- Josh Blumenstock
http://www.thedailyshow.com/watch/wed-october-17-2012/nate-silver
Some Key Principles
use many data sources (the plural of anecdote is not data)
weight the data thoughtfully (not all polls are equally good)
Affimetrix Chip
[wikipedia]
Sequencing
Sequencing Cost
Genome Data
Genome Visualization
[Krzywinski+2009]+
[Thorvaldsd,r-2013]-
[Meyer&2009]&
Personalized Therapy
...10 years from now, each cancer
patient is going to want to get a genomic
analysis of their cancer and will expect
customized therapy based on that
information.
Director, The Cancer Genome Atlas
(TCGA), Time Magazine, 6/13/11
Netflix Prize
Some Challenges
massive data (500k users, 20k movies, 100m ratings)
http://blogs.hbr.org/cs/2012/10/big_data_hype_and_reality.html
Connectome
What is the connectivity of large brain circuits?
What?
Why?
Who?
How?
The Age of Big Data
BBC, 2013
Crime Prevention
Boston Globe,
Sunday, Aug 4, 2013
Big Data
2.5 exabytes
daily data
years 2012
[IBMbigdata]
[Domo]
Between the dawn of civilization and
2003, we only created five exabytes of
information; now were creating that
amount every two days.
Eric Schmidt, Google (and others)
http://onesecond.designly.com/
Smarter Devices
Build a model.
Model the data. Fit the model.
Validate the model.
What?
Why?
Who?
How?
Hanspeter Pfister
An Wang
My Background
Grew up in Switzerland
Be Flexible
Be Constructive
http://davidzinger.wordpress.com/2007/05/page/2/
Next Steps
HW 0
Good test of your basic skills