0% found this document useful (0 votes)
30 views32 pages

Machine Learning (ML) and ML Engineering: CSE 473 24wi

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views32 pages

Machine Learning (ML) and ML Engineering: CSE 473 24wi

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Machine Learning (ML) and

ML Engineering

CSE 473 24wi


Rob Minneker
Administrivia

Due date changes:

PM3: Due Feb 13th. Late deadline Feb 14th. Working to get autograder fixed.

PM 4: Due Feb 26th

RL: Due Feb 19th (out now!)

Uncertainty: Due Feb 23rd

ML: Cancelled

DL/NLP unchanged: Due March 1/8 respectively


ML as a subset of AI

We are getting “narrow” in this course

Honing in on data-driven decision making

New programming paradigm

- Data writes the program not the


programmer
Make a cat/dog image classifier
Make a cat/dog image classifier
ML Models learn from data

Remember how Q-learning updated it’s estimates (from estimates i.e.


bootstrapping)?

- This is a form of machine learning. We use data to find the patterns/rules and
then the model executes the logic
- This is in opposition to a programmer explicitly writing the logic

Notice how in the second example, we don’t have any rules. The model does all the
work.
Machine learning systems

The model is only a small part of the overall


system!

ML engineers cover everything from data


collection to deployment

CSE 446 will cover mostly the ML algos


aspect

Credit: Chip Huyen


Machine learning systems in production

ML development is an iterative process


with a lot of back-and-forth

Everything in production is a trade-off


- Memory, time, space, budget, latency,
accuracy, precision, recall, safety,
reliability, maintainability,
extensibility, etc.

Credit: Chip Huyen


Some ML fundamentals

Garbage in → Garbage out

- The quality of your data matters A LOT

Not all problems should be solved with ML!

- When you have a hammer everything looks like a nail

Models are not static

- They must change over time like the users and their data do
ML System Design Example

Prompt: Design the Twitter recommendation engine

Questions to consider:

- What data do you need? How will you collect it? How much do you need?
- What kind of model(s) will you use? How will you train/evaluate it?
- Will it be online or offline learning? Where will the model live?
- Will there be personalization? If so, how?
- Is there NSFW content that needs to be filtered out?
- Will you process individual tweets or batches? How and why?
- What will the end-user experience look like? How will ML enhance this?
ML System Design: Your turn!

Design the YouTube video recommendation engine

Work together with folks around you!

We’ll come back together to discuss.


ML Engineering Day 2

CSE 473 24wi


Rob Minneker
Agenda

ML (eng) fundamentals continued

ML System Design continued


Types of ML

https://resources.experfy.com/ai-ml/coding-deep-learn
ing-for-beginners-types-of-machine-learning/
Applications: NLP, Computer Vision, Robotics, Comp.
Bio., Interactive learning, Convex optimization, etc.

Adjacent fields: Databases, Data Viz, Compilers,


Types of ML Distributed Systems, etc.

ML, DL, ML for Big


Data ML, DL, NLP

RL https://resources.experfy.com/ai-ml/coding-deep-learn
ing-for-beginners-types-of-machine-learning/
Example ML Workflow (greatly simplified)

1. Experiment in Jupyter Notebooks / Google Colab


a. Find model candidates
2. Scale up training jobs as needed
a. Specialized libraries (Microsoft deepspeed, etc.) + specialized compute
GPU/TPU, etc.
3. Export models for production inference
a. Export model to production format: e.g. coreml (mobile), onnx (server) etc.
4. Build production pipeline around the model
a. Optimize business metrics
5. Repeat 1-4 as data changes, needs change, tech changes, etc.
Keeping up with the times

ML Engineering is a field that changes every single day

Reading research papers, (engineering) blogs, OSS repos, documentation, etc.

Learn to be uncomfortable

- Many different mathematical notations, many different libraries, frameworks,


etc. constantly using new tools or developing them
ML System Design: Your turn!

Design the TikTok video recommendation engine

Work together with folks around you!

We’ll come back together to discuss.


The real TikTok engine
ML System Design extra resources

Textbook:

Designing Machine Learning Systems: An Iterative Process for Production-Ready


Applications by Chip Huyen

Online resources:

ByteByteGo: Machine Learning System Design Interview

Company engineering blogs: e.g. Uber Engineering blog


ML Engineering Day 3

CSE 473 24wi


Rob Minneker
Content adapted from Chip Huyen
https://huyenchip.com/2022/02/07/data-distribution-shifts-and-monitoring.html
Administrivia

Uncertainty HW released (due next Friday 2/23)

- Checkout Berkeley lectures on Markov Models and HMMs. Links are in course
calendar and posted on Ed.
Agenda

ML (eng) fundamentals continued

- Data distribution shifts


ML system failures

“ML engineering is more engineering than ML”

Most of the failures of an ML system are due to distributed system failures

- Pure software engineering issues, nothing to do with the model itself. E.g.
dependency failure, deployment failure, hardware failure, downtime/crash, etc.

Some of the time the model/data is at fault


Data distribution shifts

Big issue in supervised learning (most common models deployed in production)

- ML models trained from labeled datasets


- Inputs (covariates): X
- Outputs (labels): Y

Three major types of shift:

1. Covariate shift
2. Label shift
3. Concept drift
Data distribution shifts

In supervised learning, the training data can be viewed as samples from the joint
distribution: P(X, Y)
Remember, we can decompose the joint distribution two ways:
P(X, Y) = P(Y|X)*P(X) [eqn. 1]
P(X, Y) = P(X|Y)*P(Y) [eqn. 2]
Covariate shift: P(X) changes, but P(Y|X) remains the same. [eqn. 1]
Label shift: P(Y) changes, but P(X|Y) remains the same. [eqn. 2]
Concept drift: P(Y|X) changes, but P(X) remains the same. [eqn. 1]
Covariate shift

P(X, Y) = P(Y|X)*P(X)

Covariate shift: P(X) changes, but P(Y|X) remains the same.

I.e. the distribution of the input changes, but the conditional probability of a label
given an input remains the same.

Most widely studied form of data distribution shift

Essentially the input distribution at training time differs from inference time. Many
causes of this..
Label shift (aka prior shift)
P(X, Y) = P(X|Y)*P(Y)
Label shift: P(Y) changes, but P(X|Y) remains the same.
I.e. You can think of this as the case when the output distribution changes but for a
given output, the input distribution stays the same.
“When the input distribution changes, the output distribution also changes, resulting
in both covariate shift and label shift happening at the same time.”
E.g. predicting cancer incidence from age. If everyone takes an effective anti-cancer
drug P(Y|X) reduces for everyone. Imagine age is your only input. Age distribution
remains constant while true incidence of cancer decreases.
Concept drift (aka posterior shift)

P(X, Y) = P(Y|X)*P(X)
Concept drift: P(Y|X) changes, but P(X) remains the same.
I.e. when the input distribution remains the same but the conditional distribution of
the output given an input changes. → “same input, different output”
Predicting house prices. House features are fixed. House prices are dynamic (e.g.
early pandemic house prices were much cheaper)
Many cases these drifts are cyclic or seasonal. E.g. Think of dynamic pricing on
rideshares. Companies may have different models for weekday vs. weekend pricing.
Other types of shifts

These shifts are not exhaustive

Other things can mess up your model’s performance in the real world

E.g. changing feature values, maybe you had age input as years and now it’s input as
months → range of feature values drifted

Maybe your data pipeline has a bug and it starts feeding NaNs to your model
Addressing data distribution shifts

Two main approaches in research/industry today.

Train models on massive datasets

- Hope that they learn all the complex patterns from the data

Retrain the models using labeled data from the target distribution

- Could be from scratch or continue training with the new data (fine-tuning)
Monitoring and observability

Monitor predictions

- What is your model outputting?

Monitor Features

- What is being fed to your model?

Logs, Dashboards

You might also like