19 CS
1 Term Final Year
st
Data Sciences
and Analytics
(DSA)
Prof. Dr. M. S. Memon
Course In charge
[email protected]1. Overview
DATA SCIENCE VOCABULARY
Prof. Dr. M. S. Memon 5
4
Prof. Dr. M. S. Memon 6
ALL ABOUT DATA SCIENCE
• Statistics
• Big Data Analytics
• Business Analytics
• Business Intelligence
• Data(base) Management
• Visualization
• Machine Learning
• Data Mining
• Artificial Intelligence
• Predictive Modelling
Prof. Dr. M. S. Memon 7
What is Data Science
Data
Analysing
Manipulating
Prof. Dr. M. S. Memon 8
How is it different from ML, DL, and AI
Artificial Intelligence
Machine Learning
Deep Learning
Prof. Dr. M. S. Memon 9
Artificial Intelligence
What is AI?
Tools
Application
Prof. Dr. M. S. Memon 10
Deep Learning
What is DL?
Tools
Application
Prof. Dr. M. S. Memon 11
ALL ABOUT Data Science
Prof. Dr. M. S. Memon 12
1. WHAT IS DATA SCIENCE?
• “Data science, also known as data-driven science, is an
interdisciplinary field of scientific methods, processes,
algorithms and systems to extract knowledge or insights
from data in various forms, either structured or
unstructured, similar to data mining.”
Prof. Dr. M. S. Memon 13
WHAT IS DATA SCIENCE?
• “Data science, also known as data-driven science, is an
interdisciplinary field of scientific methods, processes,
algorithms and systems to extract knowledge or insights
from data in various forms, either structured or
unstructured, similar to data mining.”
• “Data science intends to analyze and understand actual
phenomena with ‘data’. In other words, the aim of data science
is to reveal the features or the hidden structure of complicated
natural, human, and social phenomena with data from a
different point of view from the established or traditional theory
and method.”
Prof. Dr. M. S. Memon 14
WHAT IS DATA SCIENCE?
• Fourth paradigm
• “… change of all sciences moving from observational, to
theoretical, to computational and now to the 4th Paradigm –
Data-Intensive Scientific Discovery”
Prof. Dr. M. S. Memon 15
2. WHAT IS IMPORTANT?
Need to solve a real problem using data…
No applications, no data science.
3. Defining Data Science
A PROCESS OF FINDING THE
KNOWLEDGE (HIDDEN PATTERN) FROM THE
RAW DATA USING
PRINCIPLE OF MACHINE LEARNING,
ALGORITHMS AND VARIOUS TOOLS.
Data Science Process
• Setting the
• Data Exploration
01 research goal 04
• Retrieving Data • Data Modeling
02 05
• Results analysis and
• Data Preparation
03 06 visualization
Prof. Dr. M. S. Memon 17
3.1. Setting the Research Goal
DATA SCIENCE RESEARC H GOAL IS
MOSTLY OBTAIN AS PER ORGANIZATION
REQUIREMENT.
PREPARING THE CHARTER WITH SOME
MAJOR QUESTIONS AND THEIR
ANSWERS AS:
What is going to research?
How the organization will get benefit from it?
What are the resources and data required?
What are the time table and deliverable?
Prof. Dr. M. S. Memon 18
3.2. Retrieving the Data
DATA COLLECTION IS THE SECOND STEPS OF
DATA SCIENCE PROCESS.
COLLECTING THE REQUIRED DATA AS PER
PROJECT CHARTER BY CHECKING THE DATA
EXISTENCE, ACCESS, AND QUALITY WITHIN AND
OUTSIDE OF THE ORGANIZATION.
DEALING WITH DIFFERENT TYPES OF DATA
FORMAT AND DATABASE.
ACCESSING THE THIRD PARTY RESOURCE TO
ENRICH THE QUALITY OF INFORMATION.
Prof. Dr. M. S. Memon 19
3.3. Data Preparation
PREPARING A GOOD QUALITY OF DATA IN
REQUIRED FORMAT USING COMMON AND
DOMAIN SPECIFIC PREPROCESSING STEPS.
Data Preparation
Phases
• Data Cleaning: Removing inconsistency and
01 Noise data
• Data Integration:Enriching data by combining
02 the multiple data sources
• Data Transformation:Obtaining the suitable
03 format to utilize for modeling
Prof. Dr. M. S. Memon 20
3.4. Data Exploration
UNDERSTANDING THE DATA USING
STATISTICAL ANALYSIS AND VISUALIZATION.
DETECTING THE NOISE AND OUTLIERS.
UNDERSTANDING THE VARIABLE INTERACTIONS.
TRYING TO SENSE THE DISTRIBUTION OF
THE DATA
THIS STEP SPECIALLY KNOWN AS EXPLORATORY
DATA ANALYSIS (EDA).
Prof. Dr. M. S. Memon 21
3.5. Building the Model
THIS STEP USE THE PREVIOUS EXPERIENCES OF THE DOMAIN TO
BUILD THE MODELS.
WHILE BUILDING THE MODEL, ITUTILIZESTHE STATISTICS, OPERATION
RESEARCH METHODS, OPTIMIZATION AND MACHINE LEARNING
ALGORITHMS.
IN ITERATIVE PROCESS, HYPERPARAMETER TUNING IS DONE FOR
SELECTING THEFINAL MODEL.
FINAL MODEL GOT SELECTED BASED ON PERFORMANCE OF
MODEL ON VALIDATION SET OF THE DATA.
Prof. Dr. M. S. Memon 22
3.5.1 Models
Prof. Dr. M. S. Memon 23
Models
Prof. Dr. M. S. Memon 24
3.6. Result Analysis and Visualization
THIS STEPS INVOLVES THE RESULTS ANALYSIS AND
VISUALIZATION
THERE ARE TWO WAY TO ANALYZE THE RESULTS
Quantitative measures
Graphical measures
Statistical measures
SOME TIME, IT IS IMPORTANT TO VISUALIZE THE
RESULTS DYNAMICALLY THAT SHOWS THE REAL TIME
BEHAVIOR OF RESULTS.
BUSINESS INTELLIGENCE TOOLS ARE UTILIZED FOR
VISUALIZATION OF RESULTS LIKE: MICROSOFT
POWER BI, TABLEAU DESKTOP, GOOGLE CHART,
MICROSOFT BI ETC.
Prof. Dr. M. S. Memon 25