0% found this document useful (0 votes)
5 views29 pages

Recap of Machine Learning

The document provides a comprehensive overview of machine learning concepts, essential libraries, and tools such as Python, Scikit-Learn, and Jupyter Notebook. It covers various types of machine learning systems, data collection, preprocessing, feature engineering, and model training, along with practical applications like customer churn analysis. Additionally, it discusses the importance of exploratory data analysis, handling missing values, and feature scaling techniques to improve model performance.

Uploaded by

estherndunge278
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views29 pages

Recap of Machine Learning

The document provides a comprehensive overview of machine learning concepts, essential libraries, and tools such as Python, Scikit-Learn, and Jupyter Notebook. It covers various types of machine learning systems, data collection, preprocessing, feature engineering, and model training, along with practical applications like customer churn analysis. Additionally, it discusses the importance of exploratory data analysis, handling missing values, and feature scaling techniques to improve model performance.

Uploaded by

estherndunge278
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Recap of Machine Learning

Concepts
• Essential Libraries and Tool:

• Python : the lingua franca for data science

• Scikit-Learn: open-source project


http://scikit-learn.org

•JupyterNotebook: browser-based
interactive development environment for
running code (IDE)

• NumPy: for scientific computing

• Pandas: for data wrangling and analysis

• Matplotlib and Seaborn: for plotting


•What is ML?
Introduction •Types of ML Systems
•Applications and challenges

•Data Collection
Data •Data Pre-processing
•Feature Engineering

•Supervised Models
Algorithms •Unsupervised Models
•Model Boosting, Stacking, Ensembling

Training ML •Problem framing


Models for •Training best practices
Production
•Model validation

Deployment
& monitoring
ML is good for:

◼ Solutions requiring long list of rules


◼ Solutions requiring extensive fine-tuning
◼ Complex problems unsolvable by traditional methods e.g. perceptive problems
such as image recognition
◼ Fluctuating environments e.g. data changes, problem changes
◼ Dealing with large, complex data
◼ Observable but unstudied phenomenon e.g. computing network logs
Recap: Types of ML Systems

Machine
Learning

Supervised Unsupervised Semi- Reinforcement


Supervised Learning

Classification Regression Clustering Association

Binary Multi-Class Multi-Label Divisive Agglomerative


Business Goal Data Collection
Problem Definition
Data
Model Preprocessing
Maintenance & Feature
Engineering

Model
Monitoring Model Training

The Machine Model Serving


Model

Learning
Evaluation

LifeCycle
Model
Deployment
Business Goal Data Collection
Problem Definition
Data
Model Preprocessing
Business Analyst Maintenance & Feature
Engineering
Product Manager

Model
Monitoring Model Training

Model
Model Serving Evaluation

Model
Deployment
Business Goal Data Collection
& Preparation
Problem Definition
Data
Model
Maintenance
Data Engineer Preprocessing
& Feature
Engineering

Model
Monitoring Model Training

Model
Model Serving Evaluation

Model
Deployment
Business Goal Data Collection
& Preparation
Problem Definition
Data
Model Preprocessing
Maintenance & Feature
Engineering

Data Scientist
Model
Monitoring Model Training

Model
Model Serving Evaluation

Model
Deployment
Business Goal Data Collection
& Preparation
Problem Definition
Data
Model Preprocessing
Maintenance & Feature
Engineering

Model
Monitoring Model Training

Software Engineer
MLOps
Model
Model Serving Evaluation

Model
Deployment
Business Goal Data Collection
& Preparation
Problem Definition
Data
Model Preprocessing
Maintenance & Feature
Engineering

Data Scientist

Model
Monitoring Model Training

Model
Model Serving Evaluation

Model
Deployment
Customer Churn Example: Problem Formulation
Problem: Declining sales

Why? A) Customers buying less items or B) Fewer customers


buying? If B, are there patterns? loyal customers, some
regions? some months? Etc.
What is churn?

Hypothesis: We can identify customers likely to churn before


they leave. This is a classification problem (ANN or DT?)
Understand potential value.. E.g., if we can convince 5% not to
leave, the financial impact is X
Other Considerations:
-What tools and data do I have?
-if model will be used as a batch system; the scores will be
generated monthly (weekly? Daily? Real time?) and sent to Y
-What is the activation incentive (offer?), how will success be
measured in production? Feedback loop?
EDA –Know your
data

In statistics,exploratory data analysisis


an approach to analyzing data sets to
summarize their main characteristics,
often with visual methods. A statistical
model can be used or not, but primarily
EDA is for seeing what the data can tell us
beyond the formal modeling or hypothesis
testing task.
◼ Data set shape and feature types
 df.shape, df.dtypes
◼ Eye-balling
 of data
 Explore oddities by looking at column names etc.
df.head()
◼ Univariate
 analysis
Understand distributions, outliers, missing
EDA 
values, variance, unique values etc.
df.describe(), box plots, cdf/pdf plots, violin plots
Tools/Techniques ◼ 
Bivariate analysis
 Understand relationship between 2 variables
e.g., age vs target
Box plots, pair plots
◼ 
Multivariate analysis
Understand
 interactions between multiple
variables
Correlation matrix, pair plots, 3D plots etc.
Clean your data
Missing data
Important: Understand why data is missing

Missing completely at random (MCAR)


Missing data are randomly distributed across the variable and unrelated to other variables.
(no patterns observed, same probability of missing)
Missing at random (MAR)
There might be systematic differences between missing and observed recordsbut these are completely accounted for by
otherobserved variables. (e.g., more data is missing for males vs. females but probability of missing is the same within
each group). The term ‘random’ is a bit of a misnomer
Missing not at random (MNAR)
Missing data systematically differ from the observed values. Related to the variable itself e.g., not stating my preference
for brand X because I don’t like it.
Handling missing values
◼ Remove records with missing data
◼ Leave as-is
◼ Impute
 Substitution (Fixed-value impute (mean/mode/median/’unknown’)


Fast, easy but tend to be inaccurate without accounting for other
features/correlations or overall data structure; Only suitable for MCAR;
May be sensitive to noise e.g., outliers
Feature Engineering
Sidebar:
What is Cardinality?

The number of unique elements in a set:

X={4,6,7}
1 Cardinality = 3
X2={9,2,7,3,1} Cardinality =5

Counties dataset:
Cust
County
1002
2010
3030 Cardinality = 47
4002
5006
6047
Converting Categorical Features to Numeric
One-Hot Encoding (dummy coding)
Marital Single Married Divorced Unknown -Very simple
1 Single 1 1 0 0 0 -but can create an
2 Married 2 0 1 0 0 explosion of features if
3 Divorced 3 0 0 1 0
cardinality is high.
-Is not target-led
4 Unknown 4 0 0 0 1
(pro-tip: can use aggregation approach to reduce cardinality e.g., ‘Nairobi’, ‘Kiambu’, ‘Nakuru’, ‘Other’)

Label Encoding
Mon 1 -Alsosimple
Tue 2 -works better for ordered categories -
Wed 3
but may mislead algorithm on scale
Thur 4
and distance
Fri 5
Sat 6
–Is not target-led.
Sun 7
Clustering Approach to
reduce cardinality

 For high-cardinality features, build


similarity clusters and then perform
one-hot encoding or proportion
representation.

Example, clustering US state


codes into fewer categories
Feature Enrichment

◼ Feature splitting Management, tertiary Management tertiary

◼ Date extraction 1 May 2022 1 May 2022 Sunday


1 May 2022; 01:00hrs (Hour of the week from 1 to 168)

◼ Combining Features Household


income
Number of
kids
Income
/Kid
(domain led) Can apply simple additions, subtractions, polynomials etc. to various
features to extract a different dimension
Feature Scaling
Bring your features to the same or similar range of values or distributions

◼ Normalization (min-max scaling), constrain data into a range (typically [0,1]

Caution: if min and max are outliers, the feature will be


squeezed into a very small range. In this case consider
robust scaling ( x = (x —median) / inter-quartile range

from sklearn.preprocessing
import MinMaxScaler
Feature Scaling
Bring your features to the same or similar range of values or distributions

◼ Standardization , rescale data to achieve properties of a standard normal distribution


(µ=0, σ=1)

from sklearn.preprocessing
import StandardScaler
Power Transformation for changing distribution
Techniques used for converting a skewed distribution to a normal distribution/less-skewed distribution.
Many machine learning algorithms prefer or perform better when numerical variables have a Gaussian probability
distribution

◼ Log Transform

Changes distribution and takes

◼ Box-Cox Transform
care of extreme values

λ when λ
Y' = [Y−1] / λ
≠0
from sklearn.preprocessing
when λ import PowerTransformer
Y' = ln Y
=0 PowerTransformer(method
=
Quantile Transform
a non-parametric data transformation technique to
transform your numerical data distribution to following a
certain data distribution (e.g. theGaussian Distribution or
Uniform Distribution)

Related technique: Feature Discretization


(binning) to replace numerical values with
bin numbers e.g., No of times defaulted on a loan 0,1,2,3,4 can be
binnedin to 0 (never) or 1 (have defaulted one + times
Linear
Models
Linear Separability

Linearly Separable Not Linearly Separable

Linear separability implies that if there are twoclassesthen there will be a point, line, plane, or hyperplane that splits the
input features in such a way that all points of one class are in one-half space and the second class is in the other half-
space.
Linear Models For Classification

Admitted Suppose we are looking college admissions data:

0 (not admitted)
Y
1 (admitted)

Fit a linear regression model:


Not Admitted

Ypred= β0+ β1 x1+…..+ βp xp


Exam Score

Linear regression is not suitable for classification. Reasons: •It


underfits the data
•Predictions are not constrained between 0 and 1
• Outliers can have a large negative impact on the model
Logistic Regression
Ytakes on the value 1 with success probability p
And takes on the value 0 with failure probability
(1-p)
Admitted
We can use an appropriate link function to
produce a linearized model:
Cutoff

= β0+ β1 x1 Not admitted

Exam Score
Logistic regression fits a sigmoidalcurve and
constrains the probability to between 0 and 1
Types of Logistic Regression:

1 Binary Logistic Regression (2 outcomes)


2.Multinomial Logistic Regression (one can also binarize the target and do ‘one vs all’ approach)
. Ordinal Logistic Regression (aka Ordered Logit e.g., Olympic medals)

You might also like