0% found this document useful (0 votes)
48 views

Introduction To Data Science

Uploaded by

srirupa dasgupta
Copyright
© © All Rights Reserved
Available Formats
Download as KEY, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Introduction To Data Science

Uploaded by

srirupa dasgupta
Copyright
© © All Rights Reserved
Available Formats
Download as KEY, PDF, TXT or read online on Scribd
You are on page 1/ 19

Introduction to Data Science

Srirupa Dasgupta
Department of Information Technology
Govt College of Engineering and Leather Technology
Outline
. Data, Big Data and Challenges
. Data Science
. Introduction
. Why Data Science
. Data Scientists
. What do they do?
. Major/Concentration in Data Science
. What courses to take.
Data All Around

. Lots of data is being collected and


warehoused
. Web data, e-commerce
. Financial transactions, bank/credit transactions
. Online trading and purchasing
. Social Network
How Much Data Do We have?

. Google processes 20 PB a day (2008)


. Facebook has 60 TB of daily logs
. eBay has 6.5 PB of user data + 50 TB/day
(5/2009)
. 1000 genomes project: 200 TB
.

. Cost of 1 TB of disk: $35


. Time to read 1 TB disk: 3 hrs
(100 MB/s)
Big Data
.

. Big Data is any data that is expensive to manage and


hard to extract value from
. Volume
. The size of the data
. Velocity
. The latency of data processing relative to the growing
demand for interactivity
. Variety and Complexity
. the diversity of sources, formats, quality, structures.
. Veracity
. refers to the quality or insightfulness of the data .
Big Data
Types of Data We Have

. Relational Data (Tables/Transaction/Legacy


Data)
. Text Data (Web)
. Semi-structured Data (XML)
. Graph Data
. Social Network, Semantic Web (RDF), …
. Streaming Data
. You can afford to scan the data once
What To Do With These Data?
. Aggregation and Statistics
. Data warehousing and OLAP

A data warehouse is a database of corporate information


that has been obtained from one or several sources.
OLAP is one of the technologies that analyze and evaluate its stored
information.
OLAP stands for Online Analytical Processing, a computer processing
technology tIndexing, Searching, and Querying
. Indexing, Searching, and Querying
. Keyword based search
. Pattern matching (XML/RDF)
. Knowledge discovery
. Data Mining
. Statistical Modeling
Big Data. And Data Science

What is Data Science?
. An area that manages, manipulates, extracts,
and interprets knowledge from tremendous
amount of data
. Data science (DS) is a multidisciplinary field
of study with goal to address the challenges in
big data
. Data science principles apply to all data – big
and small
What is Data Science?(Cont.d)
. Theories and techniques from many fields and disciplines
are used to investigate and analyze a large amount of data
to help decision makers in many industries such as science,
engineering, economics, politics, finance, and education
. Computer Science
. Pattern recognition, visualization, data warehousing, High performance
computing, Databases, AI
. Mathematics
. Mathematical Modeling
. Statistics
. Statistical and Stochastic modeling, Probability.
Data Science
Data Science
Real Life Examples
The Job of Data Scientists
. Data Scientist
. One of the most demanding Jobs of the 21st
Century
. They find stories, extract knowledge. They are
not reporters however.
Data Scientists
. Data scientists are the key to realizing the
opportunities presented by big data. They
bring structure to it, find compelling patterns
in it, and advise executives on the
implications for products, processes, and
decisions
What do data scientists do?
. National Security
. Cyber Security
. Business Analytics
. Engineering
. Healthcare
. And more ….

You might also like