Materials
Materials
Materials
Why now?
Data Storage: Data Processing:
Other Factors:
• Cloud Computing
• Improvement in Devices
• Internet Penetration
The Data Ecosystem
Structured Unstructured Semi-Structured
Characteristics Pre-defined format No pre-defined format Mix of both and has metadata
Analytics
Types of Analytics
Prescriptive
Who should I
Predictive target?
What will be My
Diagnostic sales next year?
Time &
Complexity Why is My
Descriptive sales is low?
What is My sales?
Value
CRISP-DM- Framework for Analytics and Visualization
Data Understanding
Transformation:
5.Date Formatting (You have daily sales data, but you want to plot a graph using monthly sales
data)
6.Adding calculated fields (You have revenue and expense in your data, profit can be calculated
field.)
7.Units (Eg: Money represented in K, M,B for thousand, Million and Billion)
Modelling- Identification of Right Algorithms
Supervised learning uses a training set to Unsupervised learning, uses machine learning
teach models to yield the desired output. This algorithms to analyze and cluster unlabelled
training dataset includes inputs and correct datasets. These algorithms discover hidden
outputs, which allow the model to learn over patterns or data groupings without the need for
time. human intervention. Its ability to discover
similarities and differences in information.
Task Driven
Data Driven
Classification and Regression
Cross-selling strategies, customer segmentation,
and image recognition.
Evaluation – Which model to choose?
Model 1 Model 2 Model 3
Uses 10 features Uses 20 features Uses 30 features
Accuracy: 80% Accuracy: 95% Accuracy: 97%
Recall: TP/TP+FN
Negative False Negative True Negative F1-Score: 2 * Precision * Recall/(Precision + Recall)
Errors: Regression/Forecasting
Root Mean Square Error: It’s the square root of the average of squared differences between prediction and actual
observation.
Mean Absolute Error: absolute differences between prediction and actual observation
Identify Me?
Ethics
TTV
Forecasting
Forecasting- Predict the Score
Model Demand
1 70
2 80
3 65
4 75
5 45
Forecast the Sales-2
You are a operations manager for a milk procurement company. You have been assigned a task to predict the
demand for next week sales. You data science team comes up with 5 models with Demands.
Model Demand
1 70
2 80
3 65
4 75
5 45
Forecast the Sales-3
2. You work in a leading retail chain and you are tasked to avoid out of stock situation and find products that would have
unusual demands?
Walmart
1. Strawberry pop tart
2. Beers
Forecasting- Basics
70
60
50
40
cost
30
20
10
0
2010 2020 2030E
Year
Time Series Forecasting
A time series is a collection of observations of well-defined data
items obtained through repeated measurements over time.
Challenges Advantages
Educating the user on the Offers greater flexibility
capability and limitation of Less dependency on the
the tool. service organization
End User Driven
End-user adoption. Cost Effective
Why should we learn PBI?
Roadmap of Different Roles
•The impact of an analytical initiative can reap rich dividends across multiple
divisions/geographies/departments, making it harder to measure the RoI.
"People often think that the best way to predict the future is
by collecting as much data as possible before making a
decision. But this…is like driving a car looking only at the
rearview mirror-because data is only available about the past."
https://www.youtube.com/watch?v=pk35J2u8KqY