Machine Learning With Scikit Learn Strata 2015
Machine Learning With Scikit Learn Strata 2015
http://bit.ly/sklstrata
Me
2
Classification
Regression
Clustering
Semi-Supervised Learning
Feature Selection
Feature Extraction
Manifold Learning
Dimensionality Reduction
Kernel Approximation
Hyperparameter Optimization
Evaluation Metrics
Out-of-core learning
…...
3
4
Get the notebooks!
http://bit.ly/sklstrata
5
Hi Andy,
Thanks 6
Hi Andy,
Thanks 7
Supervised Machine Learning
Training Data
Model
Training Labels
8
Supervised Machine Learning
Training Data
Model
Training Labels
9
Supervised Machine Learning
Training Data
Model
Training Labels
10
Supervised Machine Learning
Training Data
Training
Model
Training Labels
11
clf = RandomForestClassifier()
Training Data
clf.fit(X_train, y_train) Model
Training Labels
12
clf = RandomForestClassifier()
Training Data
clf.fit(X_train, y_train) Model
Training Labels
13
IPython Notebook:
Chapter 1 - Introduction to Scikit-learn
14
Unsupervised Machine Learning
15
Unsupervised Machine Learning
16
Unsupervised Transformations
pca = PCA()
17
IPython Notebook:
Chapter 2 – Unsupervised Transformers
18
All Data
19
All Data
20
All Data
21
All Data
22
All Data
23
IPython Notebook:
Chapter 3 - Cross-validation
24
25
26
All Data
27
All Data
Test data
28
All Data
30
SVC(C=0.001, SVC(C=0.01, SVC(C=0.1, SVC(C=1, SVC(C=10,
gamma=0.001) gamma=0.001) gamma=0.001) gamma=0.001) gamma=0.001)
31
SVC(C=0.001, SVC(C=0.01, SVC(C=0.1, SVC(C=1, SVC(C=10,
gamma=0.001) gamma=0.001) gamma=0.001) gamma=0.001) gamma=0.001)
32
SVC(C=0.001, SVC(C=0.01, SVC(C=0.1, SVC(C=1, SVC(C=10,
gamma=0.001) gamma=0.001) gamma=0.001) gamma=0.001) gamma=0.001)
33
SVC(C=0.001, SVC(C=0.01, SVC(C=0.1, SVC(C=1, SVC(C=10,
gamma=0.001) gamma=0.001) gamma=0.001) gamma=0.001) gamma=0.001)
34
IPython Notebook:
Chapter 4 – Grid Searches
35
Training Labels Training Data
Model
36
Training Labels Training Data
Model
37
Training Labels Training Data
Feature
Extraction
Model
38
Training Labels Training Data
Feature
Extraction
Scaling
Model
39
Training Labels Training Data
Feature
Extraction
Scaling
Feature
Selection
Model
40
Training Labels Training Data
Feature
Extraction
Scaling
Feature
Selection
Model
41
Cross Validation
Training Labels Training Data
Feature
Extraction
Scaling
Feature
Selection
Model
42
Cross Validation
IPython Notebook:
Chapter 5 - Preprocessing and Pipelines
43
Do cross-validation over all steps jointly.
Keep a separate test set until the very end.
44
Bag Of Word Representations
CountVectorizer / TfidfVectorizer
45
Bag Of Word Representations
CountVectorizer / TfidfVectorizer
46
Bag Of Word Representations
CountVectorizer / TfidfVectorizer
tokenizer
47
Bag Of Word Representations
CountVectorizer / TfidfVectorizer
tokenizer
48
Application: Insult detection
49
Application: Insult detection
50
Application: Insult detection
51
IPython Notebook:
Chapter 6 - Working With Text Data
52
Overfitting and Underfitting
Training
Accuracy
Model complexity
53
Overfitting and Underfitting
Training
Accuracy
Generalization
Model complexity
54
Overfitting and Underfitting
Training
Sweet spot
Accuracy
Generalization
Underfitting Overfitting
Model complexity
55
Linear SVM
56
Linear SVM
57
(RBF) Kernel SVM
58
(RBF) Kernel SVM
59
(RBF) Kernel SVM
60
(RBF) Kernel SVM
61
Decision Trees
62
Decision Trees
63
Decision Trees
64
Decision Trees
65
Decision Trees
66
Decision Trees
67
Random Forests
68
Random Forests
69
Random Forests
70
71
Thank you for your attention.
@t3kcit
@amueller
72