0% found this document useful (0 votes)
192 views9 pages

MLOPS Case Study Questions and Answers

The document outlines a comprehensive MLOps case study focusing on the design and implementation of a machine learning system for car damage detection. It details key performance indicators (KPIs) for model efficiency, operational effectiveness, and reliability, as well as the advantages of an MLOps system over a simple model. Additionally, it describes the architecture, tools, and workflows necessary for building an end-to-end ML system, including data ingestion, model training, deployment, monitoring, and continuous improvement.

Uploaded by

geekychintu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
192 views9 pages

MLOPS Case Study Questions and Answers

The document outlines a comprehensive MLOps case study focusing on the design and implementation of a machine learning system for car damage detection. It details key performance indicators (KPIs) for model efficiency, operational effectiveness, and reliability, as well as the advantages of an MLOps system over a simple model. Additionally, it describes the architecture, tools, and workflows necessary for building an end-to-end ML system, including data ingestion, model training, deployment, monitoring, and continuous improvement.

Uploaded by

geekychintu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

MLOPS Case Study Submitted by: Ritik Karir

Q1. System design: Based on the above information, describe the KPI that the
business should track.

Answer:
1. Model view KPI (technical efficiency)

• Accuracy: Measures the general purity of the model that detects damage when identifying scratches, bulk
or "no damage" (no damage).

• Accurate, recall and F1 score (per square):

• Prosecutor: indicates how much identified damage (dent/scratch) is classified correctly, which reduces
false positives.

• F1-score: Balances accuracy and remembers to provide a single performance metal.

• Confusion matrix: Visit performance in different damage categories, which help identify miscalculation
patterns.

• ESTIMMS Delay: Measures the time taken by the model to process an image and return the result of
detecting significant damage to real -time applications.

2. Business Effect KPI (operational efficiency)

• Price accuracy index: Evaluates how a resale price is proximity to using model facility with actual sales
prices in the market.

• Reduction in manual inspection costs: Manual car tracks The costs received by reducing the need for
inspection.

• LED-to-cell conversion frequency: Monitor whose fast, detect automatically damage improves the speed
that the entry is converted to successful sales.

• Operational scalability: The system measures the system's ability to handle the increasing amount of
images of the car without a decrease in performance.

3. Model reliability and maintenance KPI (distribution and monitoring)

• Model Operation Detection Rate: Changes in data distribution (eg selection of lighting in new images)
Change that may affect the model performance.

• Model support frequency: Tracks that the model requires retrieval due to a decrease in performance or
new annotate data availability.
Q2. System design: Your company has decided to build an MLOps system. What
advantages would you get from building an MLOps system rather than a simple
model?

Answer:
• Scalability: MLOPS allows the system to handle large versions of car images and data, which support the
growing international operation of Carcadeepo.

• Automation: Whole ML provides automatic - data Patting, model training, perineogenic and monitoring -
to reduce human efforts and human errors.

• Continuous integration and purinogenic (CI/CD): Fast, reliable model updates and perfections, without
disturbing existing services.

• Model monitoring and operating detection: Continuous model in production spores performance, detect
data message (eg due to poor light) and trigger retrieval when needed.

• Breeding and use of use: The model maintains a broad overview of versions, experiments and data sets,
so that easy replication and results can compare.

• BETTER COOPERATION: Data offers steady workflows between researchers, engineers and business teams
through standardized pipelines and shared equipment.

• Cost efficiency: By reducing manual interventions, adaptation of resource use and advisory time to the
market for new models, reduces operating costs.

Q3: System design: You must create an ML system that has the features of a
complete production stack, from experiment tracking to automated model
deployment and monitoring. For this problem, create an ML system design
(diagram)

Answer:
Q4. System design: After creating the architecture,
please specify your reason for choosing the specific
tools you chose for the use case.
Answer:
Explaination why I would choose these technology sources

Explanation of the Diagram:

Data Sources:

• Input: Car images and annotations from multiple sources (user uploads, partner dealerships, legacy
systems).

Data Ingestion & Storage:

• Function: Consolidate and store raw data in a centralized data lake (e.g., AWS S3).

ETL & Data Processing Pipeline:

• Tools: Orchestrated with Airflow or Kubeflow Pipelines.

• Function: Clean, augment, and transform the data to prepare it for training.

Experimentation & Training Layer:

• Tools: TensorFlow/Keras for model building; MLflow for experiment tracking (logging parameters, metrics,
and artifacts).

• Function: Run training experiments, tune models, and compare performance.

Model Registry & Artifact Store:

• Tools: MLflow Model Registry.

• Function: Maintain version control and organize the best-performing models for production use.

Deployment & Inference Layer:

• Tools: Docker for containerization; Kubernetes for orchestration; Flask/FastAPI for RESTful API endpoints.

• Function: Package and deploy the model to serve real-time predictions.

Monitoring, Logging & Alerting:

• Tools: Prometheus and Grafana for metrics, ELK Stack for logging, and custom drift detection solutions.
• Function: Continuously monitor the deployed model's performance, log issues, and trigger alerts if
metrics (such as drift or latency) fall out of acceptable ranges.

CI/CD Pipeline:

• Tools: Jenkins, GitLab CI/CD, or GitHub Actions.

• Function: Automate testing, building, and deployment processes ensuring smooth transitions from
development to production.

Q5. Workflow of the solution:


You must specify the steps that should be taken to build
such a system end to end.
The steps should mention the tools used in each of the
components and how they are connected with one
another to solve the problem.
Answer:
Data Ingestion & Preprocessing

Data Sources & Storage:

What: Collect car images and annotations from various sources (user uploads, partner dealerships, legacy
systems).

Where: Store the raw data in a centralized cloud data lake (e.g., AWS S3, Azure Blob Storage).

ETL & Data Processing:

Tool: Apache Airflow or Kubeflow Pipelines


How:

Schedule and orchestrate ETL jobs that extract raw images, clean them, and apply preprocessing steps.

Use Python libraries (e.g., TensorFlow’s ImageDataGenerator) to perform image augmentation (rotation, scaling,
brightness adjustments) and normalization.

Outcome: Preprocessed images are stored in a designated training repository.

2. Experimentation & Model Training

Model Building:

Tool: TensorFlow/Keras

How:

Design a Convolutional Neural Network (CNN) to classify images into “Dent,” “Scratch,” or “None.”

Experiment with various architectures and hyperparameters.

Experiment Tracking:

Tool: MLflow

How:

Log hyperparameters, training metrics (accuracy, loss), and model artifacts during each experiment.

Compare different experiment runs to select the best-performing model.

Integration:

The preprocessed data from the ETL pipeline feeds directly into the training scripts, ensuring consistent input for
experiments.

3. Model Evaluation & Registration

Evaluation:

What: Assess model performance using metrics such as accuracy, precision, recall, and F1-score.

Model Registry:

Tool: MLflow Model Registry

How:

Register the best-performing model version.

Maintain version control and metadata (e.g., training parameters, experiment logs) to enable rollback if necessary.

4. Automated Deployment & Inference


Containerization:

Tool: Docker

How: Package the trained model along with its inference server (using Flask or FastAPI) into a Docker container.

Orchestration & Deployment:

Tool: Kubernetes

How:

Deploy the containerized model to a Kubernetes cluster.

Use Kubernetes Ingress and Horizontal Pod Autoscaler to manage load balancing and auto-scale the service.

Inference API:

What: Expose a RESTful endpoint that accepts car images and returns the predicted damage classification.

5. Continuous Monitoring, Logging & Alerting

Performance Monitoring:

Tools: Prometheus (for metrics collection) and Grafana (for dashboard visualization)

How:

Monitor key metrics such as inference latency, throughput, and error rates.

Drift Detection:

Approach:

Implement custom drift detection scripts or use libraries (e.g., Evidently AI) to continuously compare current input
data distributions against historical baselines.

Monitor prediction distributions to detect anomalies such as poor lighting conditions.

Logging & Alerting:

Tools: ELK Stack (Elasticsearch, Logstash, Kibana) or Splunk

How:

Aggregate logs from the deployed service for debugging and historical analysis.

Set up alerts (using Prometheus Alertmanager or PagerDuty) to notify stakeholders if performance or drift metrics
exceed predefined thresholds.

6. Automated Retraining & CI/CD Integration

Retraining Triggers:
When:

If drift is detected (e.g., due to lighting issues) or when new annotated data is available.

How:

The drift monitoring or data ingestion pipeline (monitored via Airflow/Kubeflow) automatically triggers a retraining
job.

Retraining Pipeline:

Process:

Load the latest preprocessed data and retrain the model.

Log new experiments via MLflow and compare against the current production model.

If the updated model performs better, register the new version in the model registry.

CI/CD Pipeline:

Tools: Jenkins, GitLab CI/CD, or GitHub Actions

How:

Automatically test, build, and deploy new models as soon as they pass integration and performance tests.

Ensure seamless updates from development to production.

7. Feedback Loop

User Feedback:

How:

Collect user and system feedback (via operational metrics and logs) to identify areas for further improvement.

Continuous Improvement:

Outcome:

The feedback loop feeds back into the data ingestion layer, triggering further retraining and fine-tuning of the
model.

Integration Overview

Data Flow:

Raw images → ETL Pipeline (Airflow/Kubeflow) → Preprocessed Data → Training Pipeline (TensorFlow/Keras,
MLflow)
Experimentation & Versioning:

Training experiments → MLflow logging → Model Registry → Dockerized Deployment

Real-Time Inference:

Deployed RESTful API (Flask/FastAPI) on Kubernetes → Inference & Prediction → Monitoring


(Prometheus/Grafana)

Monitoring & Retraining:

Continuous drift and performance monitoring → Alerting → Automated retraining (triggered via CI/CD)

CI/CD Integration:

Automated build, test, and deployment cycles ensure that new code or models are smoothly transitioned to
production.

You might also like