0% found this document useful (0 votes)

57 views

Business Data Analytics Part 4

Uploaded by

Thao Pjn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views

Business Data Analytics Part 4

Uploaded by

Thao Pjn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Part 4.

Analyze data
Tasks
Strong collaboration between a data scientist and a
business analyst ensures that the analytics work is
performed within the correct business context.

1/ Develop data analysis plan

2/ Prepare data

3/ Explore data

4/ Perform data analysis

5/ Assess the analytics and system

approach taken
Develop data analysis plan
“If you fail to
plan, you are
planning to
fail!”
― Benjamin Franklin

Image source:
https://artsandculture.google.com/asset/ZgEyj5EEKdux-g
When developing the data analysis plan, the analyst
determines:

1/ which techniques to use

2/ which models will be used

3/ which data sources will be used

4/ how data will be preprocessed

and cleaned
Who creates the plan?

A delivery professional (such as a project The data scientist possesses deep technical
manager or a business analyst) provides insights expertise to decide how the data analysis will be
into the plan or may draft the initial plan for conducted.
review by the data scientist.

Metrics and KPIs can be used to assist the data

scientist in determining if the outcomes from
data analysis are producing the results required
to address the business need. Organizational
knowledge helps business analysis professionals
provide the context for the data scientist's work.
Technique: Linear regression
Linear regression is a data plot that graphs the linear relationship between an independent
and a dependent variable.

Y = 3.69*X -9.59 R2 = 0.992

Usage considerations
Strengths: Limitations:

● A proven method that is used extensively ● Simple construct - may perform poorly
● Easy to understand and explain ● The variables must be truly independent
● The variables should not be serially
correlated
Technique: Seasonality analysis
Linear (or any other simple) regression will not always work, especially when we talk about
cyclic trends, such as seasonality over a timeframe. ARIMA is a forecasting algorithm based
on the idea that the information in the past values of the time series can predict future
values.
Usage considerations
Strengths: Limitations:

● Can handle time-series data with trends ● Slowly gets phased out by more accurate
algorithms
Technique: Classiﬁcation
Logistic regression is a statistical method for predicting binary classes - in simple words, it
helps attribute an observation to one of the two potential outcomes.

Image source:
https://commons.wikimedia.org/wiki/File:Exam_pass_logistic_curve.jpeg
Usage considerations
Strengths: Limitations:

● Used for binary classification ● Can have high bias towards model
assumptions
● Requires preprocessing and normalization
of data
● There are other means of classification that
work better under specific circumstances
Naive Bayes is a technique for constructing classifiers: models that assign class based on a
set of features.

E.g. predict if a person is male or female based on height, weight, foot size.

For each given individual, a classifier will be created based on probability for a person to be
of specific gender adjusted by probability to be of a specific gender given the
measurements.
A decision tree is a decision support tool that uses a tree-like model of decisions and their
possible consequences, including chance event outcomes, resource costs, and utility.

A1B2C3D4

A1B2 C3D4

12 AB 34 CD
Random forests are an ensemble learning method for classiﬁcation, regression and other
tasks that operates by constructing a multitude of decision trees at training time.

Each tree follows 2 rules:

- Bagging: create each tree by randomly
sampling your original data
A1B2C3D4
AAB 2 C 2 D 4
33B2C3DB

- Feature Randomness: use a subset of

all possible features for each tree

Image source:
https://en.wikipedia.org/wiki/Random_forest#/media/File:Random_forest_diagram_complete.png
Example - step 1: get the testing data

# Red Green Blue Size Stripes Class

1 1 1 0 6 0 Apple

2 0 1 0 8 0 Apple

3 0 0 1 0.5 0 Blueberry

4 0 0 1 0.5 0 Blueberry

5 0.65 0.35 0 8 0 Orange

6 0.65 0.35 0 10 0 Orange

7 0 1 0 32 1 Watermelon

8 0 1 0 35 1 Watermelon
Example - step 2: generate random forest using bagging and feature randomness

# Red Green Blue Size

cm
Stripes Class 1. blue > o -> Blueberry
2. size > 8 -> Watermelon
1 1 1 0 6 0 Apple

2 0 1 0 8 0 Apple

3 0 0 1 0.5 0 Blueberry

4 0.1 0 0.9 0.3 0 Blueberry

5 0.65 0.35 0 8 0 Orange

6 0.65 0.35 0 10 0 Orange

7 0 1 0 32 1 Watermelon

8 0 1 0 35 1 Watermelon
Example - step 2: generate random forest using bagging and feature randomness

# Red Green Blue Size

cm
Stripes Class 1. stripes > 0 -> Watermelon
2. Green < 1 -> Orange
1 1 1 0 6 0 Apple

2 0 1 0 8 0 Apple

3 0 0 1 0.5 0 Blueberry

4 0.1 0 0.9 0.3 0 Blueberry

5 0.65 0.35 0 8 0 Orange

6 0.65 0.35 0 10 0 Orange

7 0 1 0 32 1 Watermelon

8 0 1 0 35 1 Watermelon
Example - step 2: generate random forest using bagging and feature randomness

# Red Green Blue Size

cm
Stripes Class 1. size > 10 -> Watermelon
2. green > 0 -> Orange
1 1 1 0 6 0 Apple

2 0 1 0 8 0 Apple

3 0 0 1 0.5 0 Blueberry

4 0.1 0 0.9 0.3 0 Blueberry

5 0.65 0.35 0 8 0 Orange

6 0.65 0.35 0 10 0 Orange

7 0 1 0 32 1 Watermelon

8 0 1 0 35 1 Watermelon * use Gini impurity metric to ﬁnd the best split

Example - step 3: apply the trees and pick a winner

Red Green Blue Size Stripes Class

0.7 0.3 0 10 0 ?

1. size > 10 -> Watermelon 1. stripes > 0 -> Watermelon 1. blue > o -> Blueberry
2. green > 0 -> Orange 2. Green < 1 -> Orange 2. size > 8 -> Watermelon

Orange Orange Watermelon

Orange
Usage considerations
Strengths: Limitations:

● Easy to visualise and understand (trees) ● May fall victim to generalisation errors
● Works in most cases with high accuracy (may perform poorly if future data is
significantly different from the training
data)
Other classification tools
● K-Nearest Neighbors Algorithm - grouping
observations together based on similarity in
a set of parameters. Often used to find
“similar items”, or “more like this product”.
● Support vector machine - an algorithm to
find a hyperplane in an N-dimensional space
(N — the number of features) that distinctly
classifies the data points.
● Perceptron - just like SVM tries to find a
hyperplane that classifies the data points. It
uses different math behind it, allowing to
keep training over time.

Image source:
https://en.wikipedia.org/wiki/Support-vector_machine#/media/File:SVM_margin.png
https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm#/media/File:Map1NN.png
https://en.wikipedia.org/wiki/Perceptron#/media/File:Perceptron_example.svg
Prepare data
Preparing data involves obtaining access to
the planned data sources and establishing
the relationships and linkages between
sources in order to create a coherent dataset.
— Guide to Business Data Analytics, IIBA
Preparing data

1. Understand relationships
2. Establish joins/linkages
3. Normalize
4. Standardize
5. Scale
6. Convert
7. Cleanse
8. Validate
Explore data
Exploring data involves performing an initial
exploratory analysis to ensure the data being
collected is what was expected from the data
sources.
— Guide to Business Data Analytics, IIBA
The data scientist assesses the data quality to
determine the course of action using the following
checkpoints:

1/ Data integrity

2/ Data validity

3/ Data reliability

4/ Data bias
Perform data analysis
Perform
analysis
Original question

Math question

Model
1. Statistical tests
2. Regression analysis
3. Machine learning
1.
2.
3.

1. Types of data
2. Organisation of data
3. Central tendency
4. Deviation
5. Probability and its distribution
Technical
visualisations
Technical visualizations are used by
data scientists to evolve their
analysis that becomes the detailed
data for driving insights. They may
not be useful for communicating
insights to business stakeholders,
but technical visualizations deepen
the team's understanding.
Image source:
https://en.wikipedia.org/wiki/Autocorrelation#/media/File:Acf_new.svg
Assess the Analytics and System
Approach Taken
We have explored the data

Analytics team
We have analysed the data

But have we answered the research

question?
Data
Data analysis
exploration

no yes Finish analysis

Comfortable with
data being used?
Technique: Simulation
Observe real
Build a model Test a model
world

Use a model for

decision making
Types of simulation:

1/ Risk simulation

2/ Event-based simulation

3/ Dynamic simulation
Usage considerations
Strengths: Limitations:

● Cause-action-reaction chains can be ● Creating effective simulations requires

modelled without disrupting the business. expert knowledge of the system being
● Complex business situations can be simulated.
modelled with accurate inputs. ● The outcome of a simulation experiment
● Simulations are computationally efﬁcient can be difﬁcult to explain due to the many
and involve lower data acquisition cost. variables involved.
● They are accurate for business scenarios ● Other types of modelling techniques are
with many contributing factors and a low considered more effective
amount of data.
● Simulations can be used in modelling
prescriptive actions and predictions under
business constraints.
Technique: Optimisation
Optimization can be described as choosing
the best possible option among multiple
available options under some constraints.

— Guide to Business Data Analytics, IIBA

Decision/
Decision
Objective/Cost/ Constraints
variables
Error Function

Decision model
Usage considerations
Strengths: Limitations:

● Optimization is the mathematical basis of ● The optimized solution may not be the best
most of the predictive, prescriptive, and solution available.
operation research analytical models. ● More complex formulations are difﬁcult to
● Optimization methods converge rapidly explain to the stakeholders.
(equating to ﬁnding the optimum solutions ● The process requires very accurate
faster) when applied to large scale and formulation of the constraints.
complex problems using many variables. ● The optimization process in large scale
requires processing power and time.
Case study: simulation and
optimisation

Business Data Analytics Part 1
No ratings yet
Business Data Analytics Part 1
68 pages
BCS Business Analysis Foundation Certification Course - Index
No ratings yet
BCS Business Analysis Foundation Certification Course - Index
4 pages
Data Strategy: Ben Harris
100% (1)
Data Strategy: Ben Harris
45 pages
Business Data Analytics Part 3
No ratings yet
Business Data Analytics Part 3
59 pages
Business Data Analytics Part 2
No ratings yet
Business Data Analytics Part 2
85 pages
Big Data and Business Analytics: Trends, Platforms, Success Factors and Applications
No ratings yet
Big Data and Business Analytics: Trends, Platforms, Success Factors and Applications
32 pages
BROCHURE - Data Science Learning Path - Board - Infinity
No ratings yet
BROCHURE - Data Science Learning Path - Board - Infinity
30 pages
Business Analytics
No ratings yet
Business Analytics
42 pages
Complete Download Data Analytics 1st Edition - Ebook PDF PDF All Chapters
100% (4)
Complete Download Data Analytics 1st Edition - Ebook PDF PDF All Chapters
51 pages
IPL-ExploratoryDataAnalysis - With MySQL
No ratings yet
IPL-ExploratoryDataAnalysis - With MySQL
12 pages
Careers - Data Analytics Training - Internshala Trainings PDF
No ratings yet
Careers - Data Analytics Training - Internshala Trainings PDF
2 pages
Business Analysis Glossary
No ratings yet
Business Analysis Glossary
8 pages
CBDA Domain-I Identify Research Questions v0.1
No ratings yet
CBDA Domain-I Identify Research Questions v0.1
46 pages
2nd Unit - 2.2 - Data Analytics
No ratings yet
2nd Unit - 2.2 - Data Analytics
22 pages
Data Analytics Program Brochure
No ratings yet
Data Analytics Program Brochure
29 pages
Business Analyst Master's Program in Collaboration With IBM V4
No ratings yet
Business Analyst Master's Program in Collaboration With IBM V4
28 pages
EQPM and Analyst Cheat Sheet
No ratings yet
EQPM and Analyst Cheat Sheet
2 pages
Data Smart For Product Managers
100% (1)
Data Smart For Product Managers
13 pages
Data Analytics Program Training
No ratings yet
Data Analytics Program Training
13 pages
Business Intelligence & Business Analytics
No ratings yet
Business Intelligence & Business Analytics
8 pages
What Is A DSS?: Decision Support Systems Concepts, Methodologies, and Technologies: An Overview
No ratings yet
What Is A DSS?: Decision Support Systems Concepts, Methodologies, and Technologies: An Overview
9 pages
DATA ANALYST ROADMAP 2.0 Syllabus
No ratings yet
DATA ANALYST ROADMAP 2.0 Syllabus
17 pages
Assignment 1&2
No ratings yet
Assignment 1&2
4 pages
Tableau Certification Training
No ratings yet
Tableau Certification Training
9 pages
BDM Using AI - Data Driven Decision Making
No ratings yet
BDM Using AI - Data Driven Decision Making
34 pages
11-12 Big Data Concepts and Tools
No ratings yet
11-12 Big Data Concepts and Tools
30 pages
Excel Data Analysis Handbook - A - Johnson, A.T
No ratings yet
Excel Data Analysis Handbook - A - Johnson, A.T
46 pages
Abell Model-Business Modeling - (Chapter 2 MSO)
No ratings yet
Abell Model-Business Modeling - (Chapter 2 MSO)
35 pages
DataViz Checklist
No ratings yet
DataViz Checklist
4 pages
MKT Analytics Lec3,4
No ratings yet
MKT Analytics Lec3,4
21 pages
Everything You Need To Know About Agile Project Management
No ratings yet
Everything You Need To Know About Agile Project Management
5 pages
CBDA Exam Information
0% (1)
CBDA Exam Information
1 page
Assignement - Data Science For Business Growth and Big Data and Business Analytics
No ratings yet
Assignement - Data Science For Business Growth and Big Data and Business Analytics
5 pages
Resume Tata Consultancy Services
No ratings yet
Resume Tata Consultancy Services
4 pages
Syllabus (AI & ML BlackBelt+ Program)
No ratings yet
Syllabus (AI & ML BlackBelt+ Program)
15 pages
Power BI - Exam Prep - 29 - 3
No ratings yet
Power BI - Exam Prep - 29 - 3
40 pages
modern-data-sharing-for-data-driven-organizations
No ratings yet
modern-data-sharing-for-data-driven-organizations
21 pages
Chapter 1 Data Analysis
No ratings yet
Chapter 1 Data Analysis
18 pages
QWT BusinessIntelligencePlan PDF
No ratings yet
QWT BusinessIntelligencePlan PDF
20 pages
An Introduction To R Language
No ratings yet
An Introduction To R Language
11 pages
A Study On Customer Satisfaction Towards Maruti Suzuki
No ratings yet
A Study On Customer Satisfaction Towards Maruti Suzuki
13 pages
Customer Churn Prediction
100% (1)
Customer Churn Prediction
18 pages
Learn Data Modelling by Example PT 1 Beginner Level
No ratings yet
Learn Data Modelling by Example PT 1 Beginner Level
99 pages
Project Decision Analysis
No ratings yet
Project Decision Analysis
30 pages
Business Analysis PPT 1
No ratings yet
Business Analysis PPT 1
99 pages
Top Data Analyst Interview Questions
No ratings yet
Top Data Analyst Interview Questions
28 pages
An Introduction To Data Mining
No ratings yet
An Introduction To Data Mining
47 pages
Understanding Bda Paper
No ratings yet
Understanding Bda Paper
9 pages
CSE Image Processing
0% (1)
CSE Image Processing
15 pages
Big Data Platforms
No ratings yet
Big Data Platforms
8 pages
How To Build Career in Business Analysis and Excel
No ratings yet
How To Build Career in Business Analysis and Excel
9 pages
ThinkingLikeanAnalyst Slide
No ratings yet
ThinkingLikeanAnalyst Slide
108 pages
ProductNote Certificate in Business Analytics
No ratings yet
ProductNote Certificate in Business Analytics
5 pages
Literature Review On Big Data Analytics Vishal Kumar Harsh Bansal
No ratings yet
Literature Review On Big Data Analytics Vishal Kumar Harsh Bansal
6 pages
Single customer view Second Edition
From Everand
Single customer view Second Edition
Gerardus Blokdyk
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
From Everand
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
Pierre-yves Bonnefoy
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Feature engineering Complete Self-Assessment Guide
From Everand
Feature engineering Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Google Cloud Dataproc The Ultimate Step-By-Step Guide
From Everand
Google Cloud Dataproc The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
TOGAF® Business Architecture Level 1 Study Guide
From Everand
TOGAF® Business Architecture Level 1 Study Guide
Andrew Josey
No ratings yet
KaushalJ IIMC CV PDF
No ratings yet
KaushalJ IIMC CV PDF
1 page
CV (Data Analyst)
No ratings yet
CV (Data Analyst)
1 page
Statement of Purpose: ST TH
No ratings yet
Statement of Purpose: ST TH
3 pages
Week 3 4.0 Data Modeling in SAC
No ratings yet
Week 3 4.0 Data Modeling in SAC
11 pages
Mitul Kapoor Ecomm File
No ratings yet
Mitul Kapoor Ecomm File
57 pages
HR Metrics and Analytics
100% (1)
HR Metrics and Analytics
23 pages
Structured Vs Unstructured Data
No ratings yet
Structured Vs Unstructured Data
3 pages
Sap MM in S4 Hana Content
No ratings yet
Sap MM in S4 Hana Content
2 pages
Jyoti Ranjan - CV
No ratings yet
Jyoti Ranjan - CV
2 pages
One Year Programme Resume Guidelines
No ratings yet
One Year Programme Resume Guidelines
26 pages
35 & 36, Rajiv Gandhi Infotech Park, Phase - L, MIDC, Hinjawadi, Pune - 411057, India Phone: +91 - 20 - 6652 5000 - Fax: +91 - 20 - 6652 5001
No ratings yet
35 & 36, Rajiv Gandhi Infotech Park, Phase - L, MIDC, Hinjawadi, Pune - 411057, India Phone: +91 - 20 - 6652 5000 - Fax: +91 - 20 - 6652 5001
188 pages
Bikram RO - VA
No ratings yet
Bikram RO - VA
10 pages
Unit 1 Fod
No ratings yet
Unit 1 Fod
43 pages
10.1007/978 3 319 51415 4
No ratings yet
10.1007/978 3 319 51415 4
334 pages
Guest-Lectures-and-Events-A-Year-Wise-Overview
No ratings yet
Guest-Lectures-and-Events-A-Year-Wise-Overview
8 pages
(Synthesis Lectures On Algorithms and Software in Engineering 17) Michael Stanley, Jongmin Lee - Sensor Analysis For The Internet of Things-Morgan & Claypool Publishers (2018) PDF
No ratings yet
(Synthesis Lectures On Algorithms and Software in Engineering 17) Michael Stanley, Jongmin Lee - Sensor Analysis For The Internet of Things-Morgan & Claypool Publishers (2018) PDF
139 pages
EAN Nalytics: Using Data To Build A Startup Faster
100% (1)
EAN Nalytics: Using Data To Build A Startup Faster
2 pages
Introduction To ML
No ratings yet
Introduction To ML
80 pages
BA Interview Questions and Answers
100% (1)
BA Interview Questions and Answers
40 pages
BAIN BRIEF Building IT Capabilities
No ratings yet
BAIN BRIEF Building IT Capabilities
4 pages
Tableau Case Studies
No ratings yet
Tableau Case Studies
5 pages
Innovations in Project Management Trends and Best
No ratings yet
Innovations in Project Management Trends and Best
24 pages
DS100 3
No ratings yet
DS100 3
4 pages
Extended Essay
No ratings yet
Extended Essay
18 pages
Accenture Future Systems 2021 Report 2
No ratings yet
Accenture Future Systems 2021 Report 2
38 pages
Leveraging AI in Business
No ratings yet
Leveraging AI in Business
10 pages
Overview of Process Analytics With SAP Master Data Governance On SAP S - 4HANA
No ratings yet
Overview of Process Analytics With SAP Master Data Governance On SAP S - 4HANA
36 pages
RagingDebatesInHRAnalytics
No ratings yet
RagingDebatesInHRAnalytics
5 pages
SUMMER INTERNSHIP REPORT.
No ratings yet
SUMMER INTERNSHIP REPORT.
27 pages

Business Data Analytics Part 4

Uploaded by

Business Data Analytics Part 4

Uploaded by

Part 4.

1/ Develop data analysis plan

4/ Perform data analysis

5/ Assess the analytics and system

1/ which techniques to use

2/ which models will be used

3/ which data sources will be used

4/ how data will be preprocessed

Metrics and KPIs can be used to assist the data

Y = 3.69*X -9.59 R2 = 0.992

Each tree follows 2 rules:

- Feature Randomness: use a subset of

# Red Green Blue Size Stripes Class

5 0.65 0.35 0 8 0 Orange

6 0.65 0.35 0 10 0 Orange

# Red Green Blue Size

4 0.1 0 0.9 0.3 0 Blueberry

5 0.65 0.35 0 8 0 Orange

6 0.65 0.35 0 10 0 Orange

# Red Green Blue Size

4 0.1 0 0.9 0.3 0 Blueberry

5 0.65 0.35 0 8 0 Orange

6 0.65 0.35 0 10 0 Orange

# Red Green Blue Size

4 0.1 0 0.9 0.3 0 Blueberry

5 0.65 0.35 0 8 0 Orange

6 0.65 0.35 0 10 0 Orange

8 0 1 0 35 1 Watermelon * use Gini impurity metric to ﬁnd the best split

Red Green Blue Size Stripes Class

Orange Orange Watermelon

But have we answered the research

no yes Finish analysis

Use a model for

● Cause-action-reaction chains can be ● Creating effective simulations requires

— Guide to Business Data Analytics, IIBA

You might also like