0% found this document useful (0 votes)
5 views

Introduction to Data Science

The document provides an overview of Data Science, discussing its significance, sources of data, and the skills required to become a Data Scientist. It highlights the excitement surrounding data analytics through examples like Google Flu Trends and various applications in fields such as healthcare and marketing. Additionally, it contrasts Data Science with traditional fields like Business Intelligence and Machine Learning, emphasizing its focus on massive datasets and real-time analysis.

Uploaded by

sahil.y.prince
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Introduction to Data Science

The document provides an overview of Data Science, discussing its significance, sources of data, and the skills required to become a Data Scientist. It highlights the excitement surrounding data analytics through examples like Google Flu Trends and various applications in fields such as healthcare and marketing. Additionally, it contrasts Data Science with traditional fields like Business Intelligence and Machine Learning, emphasizing its focus on massive datasets and real-time analysis.

Uploaded by

sahil.y.prince
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Introduction to Data Science

•Dr Vatan Sehrawat


•Asst. Professor, Computer Sc. & Engg. Department
•RBS-SIET Zainabad
[email protected]
•8059211113
Data Science Overview
Why, Where, What, How, Who
Outline
• Data Science -- Why all the excitement?
• history
• examples
• Where does data come from?
• What is Data Science?
• How to do Data Science?
• Who are Data Scientists?

3
Data Science – Why all the excitement?

4
Data Analysis Has Been Around for a While…

R.A. Fisher W.E. Deming


Peter Luhn

Howard
Dresner
Data Science: Why all the Excitement?
Exciting new effective
applications of data analytics

e.g.,
Google Flu Trends:

Detecting outbreaks
two weeks ahead
of CDC data

New models are estimating


which cities are most at risk
for spread of the Ebola virus.

Prediction model is built on


Various data sources,
types and analysis.
6
Why the all the Excitement?

Predicting political
champagne and election
Outcome

7
PageRank: The web as a behavioral dataset
Sponsored search
Sponsored search
• Google revenue around $50 bn/year from marketing, 97% of
the companies revenue.

• Sponsored search uses an auction – a pure competition for


marketers trying to win access to consumers.

• In other words, a competition for models of consumers – their


likelihood of responding to the ad – and of determining the
right bid for the item.

• There are around 30 billion search requests a month. Perhaps a


trillion events of history between search providers.

• Google Adwords and Adsense


Other Data Science Applications
• Transaction Databases  Recommender systems (NetFlix), Fraud Detection (Security and
Privacy)

• Wireless Sensor Data  Smart Home, Real-time Monitoring, Internet of Things

• Text Data, Social Media Data  Product Review and Consumer Satisfaction (Facebook,
Twitter, LinkedIn), E-discovery

• Software Log Data  Automatic Trouble Shooting (Splunk)

• Genotype and Phenotype Data  Epic, 23andme, Patient-Centered Care, Personalized


Medicine
Where does data come from?

13
“Big Data” Sources
User Generated (Web &
It’s All Happening On-line Mobile)
Every:
Click
Ad impression
Billing event
….
Fast Forward, pause,… .
Server request
Transaction
Network message
Fault

Internet of Things / M2M Health/Scientific Computing


“Data is the New Oil”
– World Economic Forum 2011
5 Vs of Big Data
• Raw Data: Volume
• Change over time: Velocity
• Data types: Variety
• Data Quality: Veracity
• Information for Decision Making: Value
What can you do with the data?
Traffic Prediction and Earthquake Warning

Crowdsourcing + physical modeling + sensing + data assimilation

to produce:

From Alex Bayen, UCB, Director, Institute for Transportation Studies


17
What is Data Science?

18
“Data Science” an Emerging Field

19
Data Science – A Definition

Data Science is the science which uses computer science, statistics and
machine learning, visualization and human-computer interactions to
collect, clean, integrate, analyze, visualize, interact with data to create
data products.

20
Goal of Data Science

Turn data into data products.


Some recent ML
Competitions at
https://www.kaggle.co
m/

NIST Pre-Pilot Data


Science Evaluation –
likely to be
incorporated to be part
of Labs/Final project
Data Science – A Visual Definition
Contrast: Databases
Databases Data Science
Data Value “Precious” “Cheap”
Data Volume Modest Massive
Examples Bank records, Online clicks,
Personnel records, GPS logs,
Census, Tweets,
Medical records Building sensor readings
Priorities Consistency, Speed,
Error recovery, Availability,
Auditability Query richness
Structured Strongly (Schema) Weakly or none (Text)
Properties Transactions, ACID* CAP* theorem (2/3),
eventual consistency
Realizations SQL NoSQL:
MongoDB, CouchDB,
Hbase, Cassandra, Riak,
Memcached,
Apache River, …
ACID = Atomicity, Consistency, Isolation and Durability
CAP = Consistency, Availability, Partition Tolerance
Contrast: Business Intelligence
Business Intelligence Data Science

Querying the past Querying the past


present and future
Contrast: Machine Learning
Machine Learning Data Science
Develop new (individual) models Explore many models, build and tune
hybrids
Prove mathematical properties of
models Understand empirical properties of
models
Improve/validate on a few,
relatively clean, small datasets Develop/use tools that can handle
massive datasets
Publish a paper Take action!

You might also like