1 - Intro To ML System Design
1 - Intro To ML System Design
Building ML solutions:
From System Design to Deployment
Alexander Guschin
Mikhail Rozhkov
OUTLINE
3. Design ML Product
4. Problem Statement
Exercise Output
● Stakeholder interview simulation: 1. Interviewing a client to gather
gather information about the requirements.
business problem, stakeholders
2. Drafting the first part of a design
and requirements.
document: ML Product Design
● Goal: Understand requirements for
a. Problem Statement (Motivation)
ML Product
b. Value Proposition
c. Customers
d. Business Metrics (Success)
e. Assumptions and Constraints
Introduction to ML System
Design
System design <> ML system design
Introduction to ML System Design
I’m a designer now!
Source: How To Answer Any Machine Learning System Design Interview Question
What is System Design?
Source:
https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers
ML System Design Document
System design <> ML system design
Business Model Canvas is a way to understand your
business
● Simple
● Informative
● Efficient
Source:
- https://medium.com/@niloal361/what-is-the-business-model-canvas-e0f3e7816a4f
- https://hbr.org/2013/05/a-better-way-to-think-about-yo
Example: Netflix
Source:
- https://global.thepower.education/blog/business-model-canvas
Example: Toyota
Source:
- https://www.upgrad.com/blog/business-model-canvas-explained-with-examples/
Can it be useful for ML/AI projects?
● Proposed ML solution
● Operational requirements
Source:
- https://madewithml.com/courses/mlops/product-design/
ML goal is to drive the business impact
Ex
am
pl
e
Overview: Purpose and Impact
Overview Key points:
● This project aims to improve the ● Problem: High MAPE in current trip duration
prediction accuracy of taxi trip predictions.
durations for our company, EasyRide. ● Solution: Develop an in-house ML model.
We currently rely on an external ● Business Impact: Reduce revenue loss and
provider with a MAPE > 30%. customer churn by improving prediction
accuracy.
● Timeline: 1 week for POC, 1 week for testing,
decision point thereafter.
Ex
am ML-powered trip duration prediction for EasyRide Taxi
optimizes dispatching, improves pricing accuracy, and reduces
pl customer churn, boosting revenue and satisfaction.
e
Problem Statement
> ML Product Design
> Guide: 2.1 - Problem Statement
2.1 - Problem Statement (Motivation)
Purpose Guiding questions:
● Clearly define the business problem ● Why the problem is important to solve, and why
and its relevance to the organization. now?
● NewPizza, a franchise pizzeria, experiences challenges with ● Problem: Significant variations in queue
managing peak times, leading to long wait times in queues. times, with more than 10% of customers
The goal is to predict customer queues a week in advance to waiting over 5 minutes.
optimize staff schedules and reduce service time. ● Current Approach: Manual planning of
staffing by the owner.
● Accurate queue prediction allows for efficient staff allocation, ● Industry Context: Fast-paced food
improving customer experience by minimizing wait times and service where quick turnaround is
essential for customer satisfaction.
enhancing operational efficiency.
● Alignment: Directly supports NewPizza’s
● Currently, the inability to predict queues results in suboptimal goal of enhancing customer service by
staffing, longer wait times for customers, and potential loss of ensuring a fast and efficient ordering
Ex sales due to customer dissatisfaction. process.
am
pl
e
Exercise: Client Interview
What information do we need? Who should we ask?
Group task: Key points:
● 15 min
● EasyRide Taxi, a leading ride-hailing service in New York City, ● Problem: High MAPE > 30%.
faces challenges in efficiently dispatching taxis and providing ● Current Approach: External
accurate estimated arrival times (ETAs) to customers. provider's service.
● Industry Context: Competitive market
● Accurate trip duration prediction is crucial for optimizing fleet with accurate pricing as a
management, improving customer satisfaction, and maximizing differentiator.
driver utilization. ● Alignment: Critical to EasyRide's
strategy of superior customer service.
Ex
am
pl
e
ML Product Design: EasyRide Taxi
Problem Methodology Value Solution Customers
Statement Proposition
● ●
● ●
● High MAPE > 30%.
● External provider's prediction
service (we can’t improve).
● ●
● Competitive market with
● ●
accurate pricing as a
Validation App/UI/UX
differentiator.
● Critical to EasyRide's strategy
● ●
of superior customer service.
● ●
●
●
● ●
● ●
Customers & Value Proposition
> ML Product Design
> Guide: 2.2 - Customers
> Guide: 2.3 - Value Proposition
2.2 - Customers
Who are our customers?
Purpose Guiding questions:
● To ensure all relevant perspectives are ● Who will be directly using the ML system?
considered and to clarify who will be
using or impacted by the system. ● Whose work or processes will be affected by
the system?
● To justify the use of AI/ML over ● How does AI/ML solve this problem better than
traditional approaches and highlight traditional methods?
its unique benefits.
● What new capabilities does AI/ML bring to our
business?
● 5 min
Pr
ac
tic
e
Customers
Who are our customers?
● Taxi app customers: unpredictable fares, long wait times, ● Taxi app customers
Inaccurate ETAs, lack of pricing transparency. ● Taxi drivers
● Taxi drivers: inefficient trip allocations, excessive idle time,
unpredictable earnings, wasted fuel during empty rides.
● Business Leaders: revenue loss, difficulty in maintaining
market share, customer churn.
● Internal Teams: inaccurate dispatching decisions, lack of
real-time data insights, difficulty in optimizing operations,
challenges in improving service quality.
Ex Notes:
am - ETA stands for Estimated Time of Arrival.
- Instead of Customer Pains you may want to
pl write down Customer Needs.
e
Value Proposition
Why AI/ML is required?
● Enhances pricing accuracy, minimizing revenue loss from ● Minimize revenue loss
over/underpricing ● Improve customer retention
● Optimizes dispatching efficiency, reducing idle time and fuel ● Improve driver retention
costs
● Improves customer retention by providing more reliable ETAs
● Increases driver satisfaction through better trip allocation
● Adapts continuously to NYC's dynamic urban environment
Ex
am
pl
e
ML Product Design: EasyRide Taxi
Problem Methodology Value Solution Customers
Statement Proposition
● ●
● ●
● High MAPE > 30%.
● External provider's prediction
service (we can’t improve). ● Minimize revenue loss
● Taxi app customers
● Competitive market with ● Improve customer retention
● Taxi drivers
accurate pricing as a ● Improve driver retention
Validation App/UI/UX
differentiator.
● Critical to EasyRide's strategy
● ●
of superior customer service.
● ●
●
●
● ●
● ●
Business Metrics (Success)
> ML Product Design
> Guide: 2.4 - Business Metrics (Success)
> Guide: 2.5 - Cost Structure & ROI
> Guide: 2.6 - Assumptions and Constraints
2.4 - Business Metrics (Success)
How do we measure success?
Purpose Guiding questions:
● To establish clear, quantifiable goals ● How will we measure the success of this ML
that align business objectives with system?
technical performance.
● What metrics align with our business
objectives?
● To justify the investment in the ML ● What are all the costs associated with this
system and set realistic expectations project?
for financial returns
● How the costs will change with time?
Hints:
- Estimate costs after a Solution draft is
complete
Exercise: Business Metrics
How do we measure success?
Group task: Key points:
● 10 min
Pr
ac
tic
e
Business Metrics (Success)
How do we measure success?
Success is measured by improved dispatching efficiency, pricing accuracy, Key points:
and reduced customer churn. Primary Metrics and improvements:
● Daily Revenue Increase by $24,000
1. Daily Revenue Increase (by dispatching efficiency Improvement): ● Pricing Loss Reduction by $17,000
a. Current: $0 (baseline) ● Booking Rate Improvement by 6%
b. Target: $24,000 per day ● Timeline: Evaluate metrics daily, with
2. Pricing Loss Reduction (by pricing accuracy improvement): quarterly reviews
a. Current: $35,500 per day
b. Target: $18,000 per day
3. Booking Rate Improvement (by customer churn reduction):
a. Current: 88%
b. Target: 94%
Ex
am
pl
e
Example Formulas
How do we calculate metrics?
Ex
am
pl
e
Assumptions & Expectations
How do we measure success?
Ex
am
pl
e
Cost Structure & ROI
How do we measure success?
Costs involve initial development and ongoing operations. Substantial Key points:
financial benefits expected from improved dispatching and pricing
● Initial Development: $214,083
accuracy
● Annual Operations: $386,000
1. Costs: ● Annual Benefit: $15,147,500
a. Initial Development: $214,083 ● ROI:
b. Annual Operations: $386,000 ○ First Year: 2,424%
c. First Year Total Cost: $600,083 (Initial Development + First ○ Subsequent Years: 3,825%
Year Operation)
d. Subsequent Annual Cost: $386,000
2. Financial Benefits:
a. Annual Benefit: $15,147,500
■ Dispatching: $8,760,000
■ Pricing: $6,387,500
Ex 3. ROI:
am a. First Year: 2,424%
pl b. Subsequent Years: 3,825%
e
ML Product Design: EasyRide Taxi
Problem Methodology Value Solution Customers
Statement Proposition
● ●
● ●
● High MAPE > 30%.
● External provider's prediction
service (we can’t improve). ● Minimize revenue loss
● Taxi app customers
● Competitive market with ● Improve customer retention
● Taxi drivers
accurate pricing as a ● Improve driver retention
Validation App/UI/UX
differentiator.
● Critical to EasyRide's strategy
●of superior customer service. ● ●
● ●