0% found this document useful (0 votes)
8 views

Introduction to Data Science 2

The document provides an introduction to Data Science, outlining its significance, challenges, and the role of data scientists in managing and interpreting large datasets. It discusses the types of data, the emergence of Data Science as a field, and the skills required for a concentration in Data Science. Additionally, it highlights the growing demand for data professionals and the steps involved in a Data Science project.

Uploaded by

dr.zunairausman
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Introduction to Data Science 2

The document provides an introduction to Data Science, outlining its significance, challenges, and the role of data scientists in managing and interpreting large datasets. It discusses the types of data, the emergence of Data Science as a field, and the skills required for a concentration in Data Science. Additionally, it highlights the growing demand for data professionals and the steps involved in a Data Science project.

Uploaded by

dr.zunairausman
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Introduction to Data Science

Mr. Muhammad Javaid Iqbal


Department of Computer Science and Information Technology
The Superior University, Lahore
Pakistan

The Superior University, Lahore


Outline

• Data, Big Data and Challenges


• Data Science
• Introduction
• Why Data Science
• Data Scientists
• What do they do?
• Major/Concentration in Data Science
• What courses to take.

The Superior University, Lahore


Data All Around

• Lots of data is being collected


and warehoused
• Web data, e-commerce
• Financial transactions, bank/credit
transactions
• Online trading and purchasing
• Social Network

The Superior University, Lahore


How Much Data Do We have?

• Google processes 20 PB a day (2008)


• Facebook has 60 TB of daily logs
• eBay has 6.5 PB of user data + 50 TB/day
(5/2009)
• 1000 genomes project: 200 TB

• Cost of 1 TB of disk: $35


• Time to read 1 TB disk: 3 hrs
(100 MB/s)

The Superior University, Lahore


Big Data
• Big Data is any data that is expensive to manage
and hard to extract value from
• Volume
• The size of the data
• Velocity
• The latency of data processing relative to the
growing demand for interactivity
• Variety and Complexity
• the diversity of sources, formats, quality, structures.

The Superior University, Lahore


Big Data

The Superior University, Lahore


Types of Data We Have

• Relational Data
(Tables/Transaction/Legacy Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
• Social Network, Semantic Web (RDF), …
• Streaming Data
• You can afford to scan the data once

The Superior University, Lahore


What To Do With These Data?

• Aggregation and Statistics


• Data warehousing and OLAP
• Indexing, Searching, and Querying
• Keyword based search
• Pattern matching (XML/RDF)
• Knowledge discovery
• Data Mining
• Statistical Modeling

The Superior University, Lahore


Big Data and Data Science

• “… the emerging job in the next 10 years will be


statisticians,” Hal Varian, Google Chief Economist
• The U.S. will need 140,000-190,000 predictive
analysts and 1.5 million managers/analysts by 2018.
McKinsey Global Institute’s June 2011

• New Data Science institutes being created or


repurposed – NYU, Columbia, Washington, UCB,...
• New degree programs, courses, boot-camps:
• e.g., at Berkeley: Stats, I-School, CS, Astronomy…
• One proposal (elsewhere) for an MS in “Big Data Science”

The Superior University, Lahore


What is Data Science?

• An area that manages, manipulates,


extracts, and interprets knowledge from
tremendous amount of data
• Data science (DS) is a multidisciplinary
field of study with goal to address the
challenges in big data
• Data science principles apply to all data –
big and small

The Superior University, Lahore


What is Data Science?

• Theories and techniques from many fields and


disciplines are used to investigate and analyze a
large amount of data to help decision makers in
many industries such as science, engineering,
economics, politics, finance, and education
• Computer Science
• Pattern recognition, visualization, data warehousing, High
performance computing, Databases, AI
• Mathematics
• Mathematical Modeling
• Statistics
• Statistical and Stochastic modeling, Probability.

The Superior University, Lahore


Why is it emerging?

• Gartner’s 2014 Hype Cycle

The Superior University, Lahore


Data Science

The Superior University, Lahore


Data Science

The Superior University, Lahore


Real Life Examples

• Companies learn your secrets, shopping


patterns, and preferences
• For example, can we know if a woman is
pregnant, even if she doesn’t want us to
know? Target case study
• Data Science and election (2008, 2012)
• 1 million people installed the Obama
Facebook app that gave access to info on
“friends”

The Superior University, Lahore


Data Scientists

• Data Scientist
• The Great Job of the 21st Century
• They find stories, extract knowledge. They
are not reporters

The Superior University, Lahore


Data Scientists

• Data scientists are the key to realizing the


opportunities presented by big data. They
bring structure to it, find compelling
patterns in it, and advise executives on the
implications for products, processes, and
decisions

The Superior University, Lahore


What do Data Scientists do?

• National Security
• Cyber Security
• Business Analytics
• Engineering
• Healthcare
• And more ….

The Superior University, Lahore


Concentration in Data Science

• Mathematics and Applied Mathematics


• Applied Statistics/Data Analysis
• Solid Programming Skills (R, Python, Julia, SQL)
• Data Mining
• Data Base Storage and Management
• Machine Learning and discovery

The Superior University, Lahore


Data Science Project Flow

Following steps in the Data Science Project

1- Data Acquisition or Collection


2- Data Pre-Processing and Analysis
3- Feature Engineering (Extraction and Selection)
4- Machine Learning Models (Training)
5- Evaluation (Testing, Measures)

The Superior University, Lahore

You might also like