Materials

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 30

Analytics- Basics

Why now?
Data Storage: Data Processing:

Complementing the advancement and innovation in


Time Period TB/Cost Data Storage, speed at which data is processed
2000 19000 increased substantially which enabled the companies
2010 62 to process the huge amount of data that they have to
2020 0.2 derive business insights. (740 KHz to 5 GHz)

Other Factors:

• Cloud Computing

• Improvement in Devices

• Internet Penetration
The Data Ecosystem
Structured  Unstructured Semi-Structured

Characteristics Pre-defined format No pre-defined format Mix of both and has metadata

Storage RDBMS/Data Warehouse Data Lakes DBMS

Use Case CRM Analytics Image Analytics Web Analytics


Example Excel/DBMS Voice Mail/Email/Image JSON/XML

Reporting & Viz

Data Acquisition Data Storage Data Processing

Analytics
Types of Analytics
Prescriptive

Who should I
Predictive target?

What will be My
Diagnostic sales next year?
Time &
Complexity Why is My
Descriptive sales is low?

What is My sales?

Value
CRISP-DM- Framework for Analytics and Visualization

Most of resource is spent on first three stages

Only few models reach the deployment stage

Success probability of an analytics initiative is 0.3


Business & Data Understanding Sets the tone
Business Understanding

Identify Assess Plan


• Business Problem/Objective • Assess the technology ecosystem • Plan & Prioritize list of initiatives
• Stakeholders • Level of Data Literacy • Determine milestones
• Critical Success Factor • Benchmarking • Build Vs Buy
• Map the As-Is Process • Roadmap of Implementation

Data Understanding

Determine Data Sources Ensure Data Quality Data Storage


Data Preparation- This stage determines the outcome
Cleaning:

1.Handling of Missing Value. Eg : Imputation


2.Normalizing/Structuring the Data. (Records having USA and US as Country can be combined)
3.Removing unnecessary columns. (You may need only few columns not all)
4.Outlier Handling

Transformation:

5.Date Formatting (You have daily sales data, but you want to plot a graph using monthly sales
data)
6.Adding calculated fields (You have revenue and expense in your data, profit can be calculated
field.)
7.Units (Eg: Money represented in K, M,B for thousand, Million and Billion)
Modelling- Identification of Right Algorithms

Supervised learning uses a training set to Unsupervised learning, uses machine learning
teach models to yield the desired output. This algorithms to analyze and cluster unlabelled
training dataset includes inputs and correct datasets. These algorithms discover hidden
outputs, which allow the model to learn over patterns or data groupings without the need for
time. human intervention. Its ability to discover
similarities and differences in information.
Task Driven
Data Driven
Classification and Regression
Cross-selling strategies, customer segmentation,
and image recognition.
Evaluation – Which model to choose?
Model 1 Model 2 Model 3
Uses 10 features Uses 20 features Uses 30 features
Accuracy: 80% Accuracy: 95% Accuracy: 97%

Confusion Matrix: Classification


Actual
Accuracy: TP+TN/Total
Positive Negative
Precision: TP/TP+FP
Positive True Positive False Positive
Predicted

Recall: TP/TP+FN
Negative False Negative True Negative F1-Score: 2 * Precision * Recall/(Precision + Recall)

Errors: Regression/Forecasting
Root Mean Square Error: It’s the square root of the average of squared differences between prediction and actual
observation.

Mean Absolute Error: absolute differences between prediction and actual observation 
Identify Me?

Ethics

TTV
Forecasting
Forecasting- Predict the Score

Match Team Batting 1st Team Batting Second


1 168 163
2 168 169
3 181 161
4 195 196
5 182 170

Predict Score of Team Batting 1st


Forecasting- Predict the Score 2

According to the experts, the score is expected to around 180

According to the players, the score is expected to around 200


Forecast the Sales-1
You are a operations manager for a leading beverage company. You have been assigned a task to predict the
demand for next week sales. You data science team comes up with 5 models with Demands.

Model Demand

1 70
2 80

3 65

4 75

5 45
Forecast the Sales-2
You are a operations manager for a milk procurement company. You have been assigned a task to predict the
demand for next week sales. You data science team comes up with 5 models with Demands.

Model Demand

1 70

2 80

3 65

4 75

5 45
Forecast the Sales-3

Type of Goods Value Forecasting

Perishable High Pessimistic

Non-Perishable Low Optimistic

Perishable Low Moderately Optimistic

Non-Perishable High Moderately Pessimistic


Situational Analysis….
1. A cyclonic storm is going to make a landfall in Chennai in a weeks time. Government has issued a warning to the public.

2. You work in a leading retail chain and you are tasked to avoid out of stock situation and find products that would have
unusual demands?

3. How would you begin?

4. What products do you think will have unusual demands?

Walmart
1. Strawberry pop tart
2. Beers
Forecasting- Basics

Time Series Econometric Model

Cost vs Year Price vs Demand


80

70

60

50

40
cost

30

20

10

0
2010 2020 2030E
Year
Time Series Forecasting
A time series is a collection of observations of well-defined data
items obtained through repeated measurements over time.

Components of Time Series


1.Trend -> Upwards or Down Wards – Long Term
2.Seasonality -> X-Mas, New Year- Short Term (Less than
a Year)
3.Cyclic->Business Cycle, Economic Forces-Mid Term
4.Irregular/Random- Can’t predict (COVID)
Visualization
Some Visualization Basics-1
Bar graph : Compare things
Scatter Plot : Line chart : To track changes over time between 2 different groups
To understand how much
or to track changes over
one metric is affecting
time
another
Some Visualization Basics-2
Area chart : Similar to line Box plot : Understand Gauge Chart: Goals or Target
chart. Track changes over time. distribution, mean, median,
quartiles etc.
Self-Service BI Platform
Traditional Self Service
Analyst creates the report Business creates the report

Standard Reports Customized Reports

IT driven Department Driven

Challenges Advantages
Educating the user on the Offers greater flexibility
capability and limitation of Less dependency on the
the tool. service organization
End User Driven
End-user adoption. Cost Effective
Why should we learn PBI?
Roadmap of Different Roles

Data Analyst Data Engineer Data Scientist


Emerging Roles
Analytics Maturity- DELTA Model
ROI/TTV
Why it is difficult to measure RoI?

•The impact of an analytical initiative can reap rich dividends across multiple
divisions/geographies/departments, making it harder to measure the RoI.

•Intangible benefits due to Analytics. Example: Application of analytics would have


reduced a manual effort of an employee by 20% resulting in better morale and a lesser
employee turnaround.

Levers to Improve the Impact of Analytics

•Clear Objective and Scope


•Last Mile Acceptance
•Data Literacy
•Begin with a Pilot
•Make or Buy
Who said this?
How will you measure your life?

      "People often think that the best way to predict the future is
by collecting as much data as possible before making a
decision. But this…is like driving a car looking only at the
rearview mirror-because data is only available about the past."

https://www.youtube.com/watch?v=pk35J2u8KqY

You might also like