0% found this document useful (0 votes)
56 views

Lecture-1 Introduction To Data Science

This document provides an introduction to data science, including: 1) It discusses the large amounts of data being collected ("big data") from sources like the web, financial transactions, and social networks. 2) It describes challenges posed by big data in terms of volume, velocity, variety, and complexity. 3) It outlines the fields of computer science, mathematics, and statistics that contribute to data science techniques for aggregating, indexing, searching, and discovering knowledge from large datasets.

Uploaded by

Tareq Hassan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Lecture-1 Introduction To Data Science

This document provides an introduction to data science, including: 1) It discusses the large amounts of data being collected ("big data") from sources like the web, financial transactions, and social networks. 2) It describes challenges posed by big data in terms of volume, velocity, variety, and complexity. 3) It outlines the fields of computer science, mathematics, and statistics that contribute to data science techniques for aggregating, indexing, searching, and discovering knowledge from large datasets.

Uploaded by

Tareq Hassan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Introduction to

Data Science

Md Abul Kalam Azad


Professor
Dept. Computer Science and Engineering
Jahangirnagar University
Savar, Dhaka-1342, Bangladesh
Outline

Data, Big Data and their Challenges


Data Science
Introduction
Why Data Science
Data Scientists
What do they do?
Major/Concentration in Data Science
What courses do we take
Data All Around

Lots of data is being collected


and warehoused
Web data, e-commerce
Financial transactions, bank/credit
transactions
Online trading and purchasing
Social Network
How Much Data Do We have?
Google processes 20 PB a day (2008)
Facebook has 60 TB of daily logs
eBay has 6.5 PB of user data + 50 TB/day
(5/2009)
1000 genomes project: 200 TB

Cost of 1 TB of disk: $35


Time to read 1 TB disk: 3 hrs
(100 MB/s)
Big Data
Big Data is any data that is expensive to manage
and hard to extract value from
Volume
The size of data
Velocity
The latency of data processing relative to the
growing demand for interactivity
Variety and Complexity
The diversity of sources, formats, quality, structures.
Big Data
Types of Data We Have

Relational Data
(Tables/Transaction/Legacy Data)
Text Data (Web)
Semi-structured Data (XML)
Graph Data
Social Network, Semantic Web (RDF), …
Streaming Data
What To Do with These Data?

Aggregation and Statistics


Data warehousing and OLAP
Indexing, Searching, and Querying
Keyword based search
Pattern matching (XML/RDF)
Knowledge discovery
Data Mining
Statistical Modeling
Big Data and Data Science

“… the most attractive job in the next 10 years will


be statisticians,” Hal Varian, Google Chief Economist

The U.S. will need 140,000 - 190,000 predictive


analysts and 1.5 million managers/analysts by 2018.

McKinsey Global Institute’s June 2011

New Data Science institutes being created or


repurposed – NYU, Columbia, Washington, UCB,...
New degree programs, courses, boot-camps:
e.g., at Berkeley: Stats, I-School, CS, Astronomy…
One proposal (elsewhere) for an MS in “Big Data Science”
What is Data Science?

An area that manages, manipulates,


extracts, and interprets knowledge from
tremendous amount of data
Data science is a multidisciplinary field of
study with goal to address the challenges
in big data
Data science principles apply to all data –
big and small

https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/
What is Data Science?
Theories and techniques from many fields and
disciplines are used to investigate and analyze a
large amount of data to help decision makers in
many industries such as science, engineering,
economics, politics, finance, and education
Computer Science
Pattern recognition, visualization, data warehousing, High
performance computing, Databases, AI
Mathematics
Mathematical Modeling
Statistics
Statistical and Stochastic modeling, Probability.
Why is it so Attractive?

Gartner’s 2014 Hype Cycle


Data Science
Data Science
Real Life Examples

Companies learn your secrets, shopping


patterns, and preferences
For example, can we know if a woman is
pregnant, even if she doesn’t want us to
know? Target case study
Data Science and Election (2008, 2012)
1 million people installed the Obama
Facebook app that gave access to info on
“friends”
Data Scientists

Data Scientist
The Hottest Job of the 21st Century
They find stories, extract knowledge. They
are not reporters
Data Scientists

Data scientists are the key to realizing the


opportunities presented by big data. They
bring structure to it, find compelling
patterns in it, and advise executives on
the implications for products, processes,
and decisions
What do Data Scientists do?

National Security
Cyber Security
Business Analytics
Engineering
Healthcare
And more ….
Concentration in Data Science

Mathematics and Applied Mathematics


Applied Statistics/Data Analysis
Solid Programming Skills (R, Python, Julia, SQL)
Data Mining
Database Storage and Management
Machine Learning and Discovery
THANKS

You might also like