Machine Learning (ML) and ML Engineering: CSE 473 24wi
Machine Learning (ML) and ML Engineering: CSE 473 24wi
ML Engineering
PM3: Due Feb 13th. Late deadline Feb 14th. Working to get autograder fixed.
ML: Cancelled
- This is a form of machine learning. We use data to find the patterns/rules and
then the model executes the logic
- This is in opposition to a programmer explicitly writing the logic
Notice how in the second example, we don’t have any rules. The model does all the
work.
Machine learning systems
- They must change over time like the users and their data do
ML System Design Example
Questions to consider:
- What data do you need? How will you collect it? How much do you need?
- What kind of model(s) will you use? How will you train/evaluate it?
- Will it be online or offline learning? Where will the model live?
- Will there be personalization? If so, how?
- Is there NSFW content that needs to be filtered out?
- Will you process individual tweets or batches? How and why?
- What will the end-user experience look like? How will ML enhance this?
ML System Design: Your turn!
https://resources.experfy.com/ai-ml/coding-deep-learn
ing-for-beginners-types-of-machine-learning/
Applications: NLP, Computer Vision, Robotics, Comp.
Bio., Interactive learning, Convex optimization, etc.
RL https://resources.experfy.com/ai-ml/coding-deep-learn
ing-for-beginners-types-of-machine-learning/
Example ML Workflow (greatly simplified)
Learn to be uncomfortable
Textbook:
Online resources:
- Checkout Berkeley lectures on Markov Models and HMMs. Links are in course
calendar and posted on Ed.
Agenda
- Pure software engineering issues, nothing to do with the model itself. E.g.
dependency failure, deployment failure, hardware failure, downtime/crash, etc.
1. Covariate shift
2. Label shift
3. Concept drift
Data distribution shifts
In supervised learning, the training data can be viewed as samples from the joint
distribution: P(X, Y)
Remember, we can decompose the joint distribution two ways:
P(X, Y) = P(Y|X)*P(X) [eqn. 1]
P(X, Y) = P(X|Y)*P(Y) [eqn. 2]
Covariate shift: P(X) changes, but P(Y|X) remains the same. [eqn. 1]
Label shift: P(Y) changes, but P(X|Y) remains the same. [eqn. 2]
Concept drift: P(Y|X) changes, but P(X) remains the same. [eqn. 1]
Covariate shift
P(X, Y) = P(Y|X)*P(X)
I.e. the distribution of the input changes, but the conditional probability of a label
given an input remains the same.
Essentially the input distribution at training time differs from inference time. Many
causes of this..
Label shift (aka prior shift)
P(X, Y) = P(X|Y)*P(Y)
Label shift: P(Y) changes, but P(X|Y) remains the same.
I.e. You can think of this as the case when the output distribution changes but for a
given output, the input distribution stays the same.
“When the input distribution changes, the output distribution also changes, resulting
in both covariate shift and label shift happening at the same time.”
E.g. predicting cancer incidence from age. If everyone takes an effective anti-cancer
drug P(Y|X) reduces for everyone. Imagine age is your only input. Age distribution
remains constant while true incidence of cancer decreases.
Concept drift (aka posterior shift)
P(X, Y) = P(Y|X)*P(X)
Concept drift: P(Y|X) changes, but P(X) remains the same.
I.e. when the input distribution remains the same but the conditional distribution of
the output given an input changes. → “same input, different output”
Predicting house prices. House features are fixed. House prices are dynamic (e.g.
early pandemic house prices were much cheaper)
Many cases these drifts are cyclic or seasonal. E.g. Think of dynamic pricing on
rideshares. Companies may have different models for weekday vs. weekend pricing.
Other types of shifts
Other things can mess up your model’s performance in the real world
E.g. changing feature values, maybe you had age input as years and now it’s input as
months → range of feature values drifted
Maybe your data pipeline has a bug and it starts feeding NaNs to your model
Addressing data distribution shifts
- Hope that they learn all the complex patterns from the data
Retrain the models using labeled data from the target distribution
- Could be from scratch or continue training with the new data (fine-tuning)
Monitoring and observability
Monitor predictions
Monitor Features
Logs, Dashboards