BMI 704 - Machine Learning Lab
BMI 704 - Machine Learning Lab
BMI 704 - Machine Learning Lab
Lab
030719
Topics
• Types of Machine Learning
• Supervised - you have data where you have observed the input variables
(features) and the outcome
• Unsupervised - you have a data but there is no observed outcome
• Features
• i.e. variables (Xs)
• Inputs you are using to predict outcome
• Model Models
• 1) Pick a guy Diabetes = 0.5*age + 0.2*sex + 2.1*BMI + …
• 2) sub his features into the model
Height = 0.2*age + 0.8*sex + 1.3*weight + …
• 3) now you know his outcome
Where is the predicting model come from?
• 1) Pick an algorithm
• Linear model
• Y = X1 + X2 + X3
• 2) Split your data set into train and test (e.g. 80/20,
70/30)
Simple Regression
• R2 - amount of variance
explained
• If Y = 1 or 0;
• High sensitivity:
• Y = 1; ➙ Y^ = 1
• High specificity:
• Y = 0; ➙ Y^ = 0
Which model (algorithm) should you use?
Unsupervised Learning
• Not interest in predicting Y but exploratory analysis (Xs)
• discovering patterns
• Find subgroups that you don’t know
• Visualize the results
• A few latent variables to capture the most of the information of the data
• i.e. the variance explained
loading x%
Score x%
Unsupervised Learning
• Clustering
• PCA looks to find a low-dimensional representation of the observations that
explain a good fraction of the variance;
• Clustering looks to find homogeneous subgroups among the observations.
• K-means clustering
• hierarchical clustering
K-means clustering
• partitioning a data set into K distinct, non-overlapping clusters.
• Specify how many clusters do you want
• The algorithm looks for
local optimum
• Run a few times to see
the different
hierarchical clustering
• tree-based representation of the
observations, called a
dendrogram.
• bottom-up clustering
Algorithms and Packages
• ML Algorithms (many, many, many!)
• Basics: linear-based
• Shrinkage Methods
• Lasso and Ridge regression
• ElasticNet
• Non-linear methods
• Spline
• Support Vector Machines
• Tree based methods
• Decision trees
• Random Forests
• Packages in R
• Individual packages for each algorithm - glmnet
• Meta packages – caret