0% found this document useful (0 votes)
3 views

Lecture 1 - overview

The lecture provides an overview of data science, its significance, and the course structure. It emphasizes the interdisciplinary nature of data science and its applications in various fields such as business, government, and health. The course aims to prepare students for advanced studies and careers in data science by covering essential topics and utilizing Python as a primary tool.

Uploaded by

jevictoria
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Lecture 1 - overview

The lecture provides an overview of data science, its significance, and the course structure. It emphasizes the interdisciplinary nature of data science and its applications in various fields such as business, government, and health. The course aims to prepare students for advanced studies and careers in data science by covering essential topics and utilizing Python as a primary tool.

Uploaded by

jevictoria
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Lecture 1 : Course Overview

An overview of data science and the course

2 0 2 5 S P R ING I N T RO TO DATA S CI E NCE


Today

• What is data science?

• Overview of the course


20th Century Innovation

Engineering and Computer Science played key role


• nuclear power
• airplanes & automobiles
• the digital computer
• radio
• internet
• imaging
But how about these 20th Century questions?

• Does fertilizer increase crop yields?

• Does Streptomycin cure Tuberculosis?

• Does smoking cause lung-cancer?


What is the difference?

• Deterministic versus random

• Deductive versus empirical

• Solution deduced mostly from theory versus solutions deduced from mostly
from data
Cameron Davidson-Pilon at Dataorigami blog
Data

• Does fertilizer increase crop yields?


• Answer: Collect and analyze agricultural experimental data

•Does Streptomycin cure Tuberculosis?

•Does smoking cause lung-cancer?


Data

• Does fertilizer increase crop yields?


• Answer: Collect and analyze agricultural experimental data

• Does Streptomycin (결핵) cure Tuberculosis (항생제) ?


• Answer: Collect and analyze clinical data

• Does smoking cause lung-cancer?


Data

• Does fertilizer increase crop yields?


• Answer: Collect and analyze agricultural experimental data

• Does Streptomycin cure Tuberculosis?


• Answer: Collect and analyze agricultural experimental data

• Does smoking cause lung-cancer?


• Answer: Collect and analyze observational studies data
21st century

•“I keep saying that the sexy job in the next 10 years will be statisticians,” said Hal Varian,
chief economist at Google. “And I’m not kidding.” NYT, 2008

What happened 10 years later?


Recommendation Systems
WIFIRE (UCSD) - Wildfire modeling and management

https://wifire.ucsd.edu
2019: First Image of a Black Hole

Katie Bouman
MIT/Caltech
2019: First Image of a Black Hole
Artificial intelligence! …so called
Data is changing the world

https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
The Darker Side of Data Science?

• Obscuring complex decisions:


• Mortgage-backed securities → market crash

• Reinforcing historical trends and biases:


• Hiring based on previous hiring data
• Recidivism and racially biased sentencing
• Social media, news, and politics

NPR author interview


with Cathy O’Neil
But… I’m Optimistic !

• Knowledge is empowering.

• Data science offers immense potential to address challenging problems


facing society.

• ‘Use knowledge for good’


• This is one of the main reasons that universities teach data science.
Hal Varian says

• “The ability to take data - to be able to understand it, to process it, to


extract value from it, to visualize it, to communicate it’s going to be a
hugely important skill in the next decades, not only at the professional level
but even at the educational level for elementary school kids, for high school
kids, for college kids.
• This is a good definition of data science
Data Science

https://www.usu.edu/math/datascience/
Data Science Is a Fundamentally Interdisciplinary Field
Example Questions in Data Science

• Business
• In which markets should we focus our advertising campaign?
• Where should we put docking ports for our bikes?

• Government
• What areas of the world are at higher risks for climate change impact in 10 years? 20?
• Do immigrants from poor countries have a positive or negative impact on the economy?

• Life
• What should we eat to avoid dying early of heart disease?
• Should I send my kids to daycare?
Data Science Requires Engineering and Scientific Insight

• Good data analysis is not:


• Simple application of a statistics recipe.
• Simple application of software.
• There are many tools out there for data science, but they are merely tools.
• They don’t do any of the important thinking
• They are NOT a focus of this course
• “The purpose of computing is insight, not numbers.”
• R. Hamming. Numerical Methods for Scientists and Engineers (1962).
This course : goal

• Prepare student for advanced SNU courses in statistic, machine learning


and more, by providing a necessary foundation and context
• Enable students to start career as data scientists by providing experience
working with real-world data, tools, and techniques.
• Empower students to apply computational and inferential thinking to
address real-world problems.
Tentative List of Topics to be Covered

• data wrangling • Sampling methods

• visualization • unsupervised learning

• intro to modeling • kernel methods

• linear regression model • classification

• random variables • decision trees and ensemble methods


• estimators, bias and variances
• deep learning
• cross validation and regularization

• parameter inference and bootstrap


We will use python in this course

• A high-level, interpreted programming language.


• Easy to read & write

• Huge community & support


• Especially for modern data science applications

• Free

• How to use Python is not a focus in this course.


• A lab session will be held for the kickoff before a first HW
• Let’s go over the syllabus

You might also like