Recap of Machine Learning
Recap of Machine Learning
Concepts
• Essential Libraries and Tool:
•JupyterNotebook: browser-based
interactive development environment for
running code (IDE)
•Data Collection
Data •Data Pre-processing
•Feature Engineering
•Supervised Models
Algorithms •Unsupervised Models
•Model Boosting, Stacking, Ensembling
Deployment
& monitoring
ML is good for:
Machine
Learning
Model
Monitoring Model Training
Learning
Evaluation
LifeCycle
Model
Deployment
Business Goal Data Collection
Problem Definition
Data
Model Preprocessing
Business Analyst Maintenance & Feature
Engineering
Product Manager
Model
Monitoring Model Training
Model
Model Serving Evaluation
Model
Deployment
Business Goal Data Collection
& Preparation
Problem Definition
Data
Model
Maintenance
Data Engineer Preprocessing
& Feature
Engineering
Model
Monitoring Model Training
Model
Model Serving Evaluation
Model
Deployment
Business Goal Data Collection
& Preparation
Problem Definition
Data
Model Preprocessing
Maintenance & Feature
Engineering
Data Scientist
Model
Monitoring Model Training
Model
Model Serving Evaluation
Model
Deployment
Business Goal Data Collection
& Preparation
Problem Definition
Data
Model Preprocessing
Maintenance & Feature
Engineering
Model
Monitoring Model Training
Software Engineer
MLOps
Model
Model Serving Evaluation
Model
Deployment
Business Goal Data Collection
& Preparation
Problem Definition
Data
Model Preprocessing
Maintenance & Feature
Engineering
Data Scientist
Model
Monitoring Model Training
Model
Model Serving Evaluation
Model
Deployment
Customer Churn Example: Problem Formulation
Problem: Declining sales
◼
Fast, easy but tend to be inaccurate without accounting for other
features/correlations or overall data structure; Only suitable for MCAR;
May be sensitive to noise e.g., outliers
Feature Engineering
Sidebar:
What is Cardinality?
X={4,6,7}
1 Cardinality = 3
X2={9,2,7,3,1} Cardinality =5
Counties dataset:
Cust
County
1002
2010
3030 Cardinality = 47
4002
5006
6047
Converting Categorical Features to Numeric
One-Hot Encoding (dummy coding)
Marital Single Married Divorced Unknown -Very simple
1 Single 1 1 0 0 0 -but can create an
2 Married 2 0 1 0 0 explosion of features if
3 Divorced 3 0 0 1 0
cardinality is high.
-Is not target-led
4 Unknown 4 0 0 0 1
(pro-tip: can use aggregation approach to reduce cardinality e.g., ‘Nairobi’, ‘Kiambu’, ‘Nakuru’, ‘Other’)
Label Encoding
Mon 1 -Alsosimple
Tue 2 -works better for ordered categories -
Wed 3
but may mislead algorithm on scale
Thur 4
and distance
Fri 5
Sat 6
–Is not target-led.
Sun 7
Clustering Approach to
reduce cardinality
from sklearn.preprocessing
import MinMaxScaler
Feature Scaling
Bring your features to the same or similar range of values or distributions
from sklearn.preprocessing
import StandardScaler
Power Transformation for changing distribution
Techniques used for converting a skewed distribution to a normal distribution/less-skewed distribution.
Many machine learning algorithms prefer or perform better when numerical variables have a Gaussian probability
distribution
◼ Log Transform
◼ Box-Cox Transform
care of extreme values
λ when λ
Y' = [Y−1] / λ
≠0
from sklearn.preprocessing
when λ import PowerTransformer
Y' = ln Y
=0 PowerTransformer(method
=
Quantile Transform
a non-parametric data transformation technique to
transform your numerical data distribution to following a
certain data distribution (e.g. theGaussian Distribution or
Uniform Distribution)
Linear separability implies that if there are twoclassesthen there will be a point, line, plane, or hyperplane that splits the
input features in such a way that all points of one class are in one-half space and the second class is in the other half-
space.
Linear Models For Classification
0 (not admitted)
Y
1 (admitted)
Exam Score
Logistic regression fits a sigmoidalcurve and
constrains the probability to between 0 and 1
Types of Logistic Regression: