0% found this document useful (0 votes)
48 views

Data Science Class Lecture

This document provides an introduction to data science, including: 1) There is a huge amount of data being collected from various sources such as the web, financial transactions, social networks, etc. and data storage is becoming cheaper. 2) Data science aims to extract knowledge and insights from large, diverse datasets using techniques from computer science, mathematics, and statistics. 3) Data scientists analyze data to help decision makers in many fields, and turn data into useful products and insights for businesses and other organizations.

Uploaded by

Adarsh Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Data Science Class Lecture

This document provides an introduction to data science, including: 1) There is a huge amount of data being collected from various sources such as the web, financial transactions, social networks, etc. and data storage is becoming cheaper. 2) Data science aims to extract knowledge and insights from large, diverse datasets using techniques from computer science, mathematics, and statistics. 3) Data scientists analyze data to help decision makers in many fields, and turn data into useful products and insights for businesses and other organizations.

Uploaded by

Adarsh Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Introduction to Data Science

Data All Around

• Lots of data is being collected


and warehoused
• Web data, e-commerce
• Financial transactions, bank/credit transactions
• Online trading and purchasing
• Social Network
How Much Data Do We have?

• Google processes 160 PB a day (2013)


• Google process more than 40000 searches/sec (2018)
• Facebook has 60 TB of daily logs

• Cost of 1 TB of disk: $35


• Time to read 1 TB disk: 3 hrs
(100 MB/s)
Big Data

Big Data is any data that is expensive to manage and


hard to extract value from
• Volume
• The size of the data
• Velocity
• The latency of data processing relative to the growing demand for
interactivity
• Variety and Complexity
• the diversity of sources, formats, quality, structures.
Big Data
CMS: Content Management System
SMS: Short Message Service
Types of Data We Have

• Relational Data (Tables/Transaction/Legacy Data)


• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
• Social Network, Semantic Web , …
• Streaming Data
What To Do With These Data?

• Aggregation and Statistics


• Data warehousing and OLAP
• Indexing, Searching, and Querying
• Keyword based search
• Pattern matching
• Knowledge discovery
• Data Mining
• Statistical Modeling
What is Data Science?

• A field that manages, manipulates, extracts, and interprets


knowledge from tremendous amount of data
• Data science (DS) is a multidisciplinary field of study with goal to
address the challenges in big data
• Data science principles apply to all data – big and small
What is Data Science?

• Theories and techniques from many fields and


disciplines are used to investigate and analyze a
large amount of data to help decision makers in
many industries such as science, engineering,
economics, politics, finance, and education
• Computer Science
• Pattern recognition, visualization, data warehousing, High
performance computing, Databases, AI
• Mathematics
• Mathematical Modeling
• Statistics
• Statistical and Stochastic modeling, Probability.
• Gartner’s 2014 Hype Cycle
Data Science
Data Science
Real Life Examples

• Companies learn your secrets, shopping patterns,


and preferences

• Data Science and election (2008, 2012)


• 1 million people installed the Obama Facebook app that
gave access to info on “friends”
Data Scientists

• Data Scientist
• May be the most paid Job of the 21st Century
• They find stories, extract knowledge. They are not reporters
Data Scientists

• Data scientists are the key to realizing the


opportunities presented by big data. They bring
structure to it, find compelling patterns in it, and
advise executives on the implications for products,
processes, and decisions
What do Data Scientists do?

• National Security
• Cyber Security
• Business Analytics
• Engineering
• Healthcare
• And more ….
Concentration in Data Science

• Mathematics and Applied Mathematics


• Applied Statistics/Data Analysis
• Solid Programming Skills (R, Python, Julia, SQL)
• Data Mining
• Data Base Storage and Management
• Machine Learning and discovery
Goal of Data Science

Turn data into data products.


Data Science – A Visual Definition
Contrast: Databases
Databases Data Science
Data Value “Precious” “Cheap”

Data Volume Modest Massive

Examples Bank records, Online clicks,


Personnel records, GPS logs,
Census, Tweets,
Medical records Building sensor readings
Priorities Consistency, Speed,
Error recovery, Availability,
Auditability Query richness
Structured Strongly (Schema) Weakly or none (Text)

Properties Transactions, ACID* CAP*,


eventual consistency
Realizations SQL NoSQL:
MongoDB, CouchDB,
Hbase, Cassandra, Riak, Memcached,
Apache River, …
ACID = Atomicity, Consistency, Isolation and Durability
CAP = Consistency, Availability, Partition
Tolerance
Contrast: Business Intelligence
Business Intelligence Data Science

Querying the past Querying the past


present and future
Contrast: Machine Learning

Machine Learning Data Science


Develop new (individual) models Explore many models, build and tune
hybrids
Prove mathematical properties of
models
Understand empirical properties of
models
Improve/validate on a few, relatively
Develop/use tools that can handle
clean, small datasets
massive datasets

Publish a paper Take action!

You might also like