Machine Learning Pocket Reference Working with Structured Data in Python 1st Edition Matt Harrison instant download
Machine Learning Pocket Reference Working with Structured Data in Python 1st Edition Matt Harrison instant download
https://textbookfull.com/product/machine-learning-pocket-
reference-working-with-structured-data-in-python-1st-edition-
matt-harrison/
https://textbookfull.com/product/learning-the-pandas-library-
python-tools-for-data-munging-analysis-and-visual-matt-harrison/
https://textbookfull.com/product/deep-learning-with-structured-
data-1st-edition-mark-ryan/
https://textbookfull.com/product/deep-learning-with-structured-
data-1st-edition-mark-ryan-2/
https://textbookfull.com/product/advanced-data-analytics-using-
python-with-machine-learning-deep-learning-and-nlp-examples-
mukhopadhyay/
Applied Text Analysis with Python Enabling Language
Aware Data Products with Machine Learning 1st Edition
Benjamin Bengfort
https://textbookfull.com/product/applied-text-analysis-with-
python-enabling-language-aware-data-products-with-machine-
learning-1st-edition-benjamin-bengfort/
https://textbookfull.com/product/learning-data-mining-with-
python-layton/
https://textbookfull.com/product/introduction-to-machine-
learning-with-python-a-guide-for-data-scientists-1st-edition-
andreas-c-muller/
https://textbookfull.com/product/introduction-to-machine-
learning-with-python-a-guide-for-data-scientists-andreas-c-
muller/
https://textbookfull.com/product/python-data-science-handbook-
essential-tools-for-working-with-data-1st-edition-jake-
vanderplas/
Machine
Learning
Pocket
Reference
Working with Structured Data
in Python
Matt Harrison
Machine Learning
Pocket Reference
Working with Structured Data
in Python
Matt Harrison
Machine Learning Pocket Reference
by Matt Harrison
Copyright © 2019 Matt Harrison. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North,
Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promo‐
tional use. Online editions are also available for most titles (http://oreilly.com).
For more information, contact our corporate/institutional sales department:
800-998-9938 or [email protected].
978-1-492-04754-4
[LSI]
Table of Contents
Preface ix
Chapter 1: Introduction 1
Libraries Used 2
Installation with Pip 5
Installation with Conda 6
iii
Impute Data 25
Normalize Data 27
Refactor 27
Baseline Model 29
Various Families 29
Stacking 31
Create Model 32
Evaluate Model 33
Optimize Model 34
Confusion Matrix 35
ROC Curve 36
Learning Curve 38
Deploy Model 39
Chapter 6: Exploring 55
Data Size 55
Summary Stats 56
Histogram 58
Scatter Plot 59
Joint Plot 60
iv | Table of Contents
Pair Grid 63
Box and Violin Plots 64
Comparing Two Ordinal Values 65
Correlation 67
RadViz 71
Parallel Coordinates 73
Table of Contents | v
Penalize Models 100
Upsampling Minority 100
Generate Minority Data 101
Downsampling Majority 101
Upsampling Then Downsampling 103
vi | Table of Contents
Precision-Recall Curve 167
Cumulative Gains Plot 169
Lift Curve 171
Class Balance 172
Class Prediction Error 173
Discrimination Threshold 175
Index 295
Machine learning and data science are very popular right now
and are fast-moving targets. I have worked with Python and
data for most of my career and wanted to have a physical book
that could provide a reference for the common methods that I
have been using in industry and teaching during workshops to
solve structured machine learning problems.
This book is what I believe is the best collection of resources
and examples for attacking a predictive modeling task if you
have structured data. There are many libraries that perform a
portion of the tasks required and I have tried to incorporate
those that I have found useful as I have applied these techni‐
ques in consulting or industry work.
Many may lament the lack of deep learning techniques. Those
could be a book by themselves. I also prefer simpler techniques
and others in industry seem to agree. Deep learning for
unstructured data (video, audio, images), and powerful tools
like XGBoost for structured data.
I hope this book serves as a useful reference for you to solve
pressing problems.
ix
What to Expect
This book gives in-depth examples of solving common struc‐
tured data problems. It walks through various libraries and
models, their trade-offs, how to tune them, and how to inter‐
pret them.
The code snippets are meant to be sized such that you can use
and adapt them in your own projects.
x | Preface
TIP
This element signifies a tip or suggestion.
NOTE
This element signifies a general note.
WARNING
This element indicates a warning or caution.
Preface | xi
If you feel your use of code examples falls outside fair use or
the permission given above, feel free to contact us at
[email protected].
How to Contact Us
Please address comments and questions concerning this book
to the publisher:
We have a web page for this book, where we list errata, exam‐
ples, and any additional information. You can access this page
at http://www.oreilly.com/catalog/9781492047544.
To comment or ask technical questions about this book, send
email to [email protected].
xii | Preface
For more information about our books, courses, conferences,
and news, see our website at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia
Acknowledgments
Much thanks to my wife and family for their support. I’m
grateful to the Python community for providing a wonderful
language and toolset to work with. Nicole Tache has been
lovely to work with and provided excellent feedback. My tech‐
nical reviewers, Mikio Braun, Natalino Busa, and Justin Fran‐
cis, kept me honest. Thanks!
Preface | xiii
CHAPTER 1
Introduction
1
Libraries Used
This book uses many libraries. This can be a good thing and a
bad thing. Some of these libraries may be hard to install or con‐
flict with other library versions. Do not feel like you need to
install all of these libraries. Use “JIT installation” and only
install the libraries that you want to use as you need them.
>>> import autosklearn, catboost,
category_encoders, dtreeviz, eli5, fancyimpute,
fastai, featuretools, glmnet_py, graphviz,
hdbscan, imblearn, janitor, lime, matplotlib,
missingno, mlxtend, numpy, pandas, pdpbox, phate,
pydotplus, rfpimp, scikitplot, scipy, seaborn,
shap, sklearn, statsmodels, tpot, treeinterpreter,
umap, xgbfir, xgboost, yellowbrick
2 | Chapter 1: Introduction
... pydotplus,
... rfpimp,
... scikitplot,
... scipy,
... seaborn,
... shap,
... sklearn,
... statsmodels,
... tpot,
... treeinterpreter,
... umap,
... xgbfir,
... xgboost,
... yellowbrick,
... ]:
... try:
... print(lib.__name__, lib.__version__)
... except:
... print("Missing", lib.__name__)
catboost 0.11.1
category_encoders 2.0.0
Missing dtreeviz
eli5 0.8.2
fancyimpute 0.4.2
fastai 1.0.28
featuretools 0.4.0
Missing glmnet_py
graphviz 0.10.1
hdbscan 0.8.22
imblearn 0.4.3
janitor 0.16.6
Missing lime
matplotlib 2.2.3
missingno 0.4.1
mlxtend 0.14.0
numpy 1.15.2
pandas 0.23.4
Missing pandas_profiling
pdpbox 0.2.0
phate 0.4.2
Libraries Used | 3
Missing pydotplus
rfpimp
scikitplot 0.3.7
scipy 1.1.0
seaborn 0.9.0
shap 0.25.2
sklearn 0.21.1
statsmodels 0.9.0
tpot 0.9.5
treeinterpreter 0.1.0
umap 0.3.8
xgboost 0.81
yellowbrick 0.9
NOTE
Most of these libraries are easily installed with pip or
conda. With fastai I need to use pip install
--no-deps fastai. The umap library is installed with pip
install umap-learn. The janitor library is installed
with pip install pyjanitor. The autosklearn library is
installed with pip install auto-sklearn.
I usually use Jupyter for doing an analysis. You can use
other notebook tools as well. Note that some, like Google
Colab, have preinstalled many of the libraries (though they
may be outdated versions).
4 | Chapter 1: Introduction
Installation with Pip
Before using pip, we will create a sandbox environment to
install our libraries into. This is called a virtual environment
named env:
$ python -m venv env
NOTE
On Macintosh and Linux, use python; on Windows, use
python3. If Windows doesn’t recognize that from the com‐
mand prompt, you may need to reinstall or fix your install
and make sure you check the “Add Python to my PATH”
checkbox.
6 | Chapter 1: Introduction
To create a file with the package requirements in it, run:
(env) $ conda env export > environment.yml
To install these requirements in a new environment, run:
(other_env) $ conda create -f environment.yml
WARNING
Some of the libraries mentioned in this book are not avail‐
able to install from Anaconda’s repository. Don’t fret. It
turns out you can use pip inside of a conda environment
(no need to create a new virtual environment), and install
these using pip.
• Business understanding
• Data understanding
• Data preparation
• Modeling
• Evaluation
• Deployment
9
Figure 2-1. Common workflow for machine learning.
11
Imports
This example is based mostly on pandas, scikit-learn, and Yel‐
lowbrick. The pandas library gives us tooling for easy data
munging. The scikit-learn library has great predictive model‐
ing, and Yellowbrick is a visualization library for evaluating
models:
>>> import matplotlib.pyplot as plt
>>> import pandas as pd
>>> from sklearn import (
... ensemble,
... preprocessing,
... tree,
... )
>>> from sklearn.metrics import (
... auc,
... confusion_matrix,
... roc_auc_score,
... roc_curve,
... )
>>> from sklearn.model_selection import (
... train_test_split,
... StratifiedKFold,
... )
>>> from yellowbrick.classifier import (
... ConfusionMatrix,
... ROCAUC,
... )
>>> from yellowbrick.model_selection import (
... LearningCurve,
... )
Ask a Question
In this example, we want to create a predictive model to answer
a question. It will classify whether an individual survives the
Titanic ship catastrophe based on individual and trip charac‐
teristics. This is a toy example, but it serves as a pedagogical
tool for showing many steps of modeling. Our model should be
able to take passenger information and predict whether that
passenger would survive on the Titanic.
This is a classification question, as we are predicting a label for
survival; either they survived or they died.
Ask a Question | 13
Another Random Scribd Document
with Unrelated Content
Ennél jobbat, a körülményekhez képest, nem is kivánhatott volna
szegény francziáknak.
A múzsák kegyeltjét azután borral is kinálták. Ivott a jámbor;
annyira ivott, hogy utoljára ő maga kérte azt a rettenetes tisztet,
hogy menjen vissza hozzá szállásra; s fogadta Simonyinak, hogy az
egész háborút le fogja mind írni versben s tönkre teszi vele az Iliást,
Odysseát s más afféle ponyvára való firkálmányokat; genealogiáját
pedig felviszi Simonyidesig, a ki nagy bölcs volt és poéta, még pedig
görög nem-egyesült.
Simonyi mind e mellett elismeré, hogy Bressau úr igen tudós és
derék ember, s olyan kiméletes volt, hogy még azután is
mindenkinek azt mondta, hogy a francziák legnagyobb költője
Marcigniban lakik.
É
Költeménynek is szép volna: pedig ekként történt az meg. És
ilyen regék születnek akkor, ha két oly hővérű nép, mint a magyar és
franczia, egymás szemeibe pillant.
A legvitézebb huszár.
Előszó143
I. A fiatal diák144
II. Az ujoncz147
III. Az első vér151
IV. Az első érdemrend155
V. Huszár-tréfa156
VI. A szürke ló és lovagja160
VII. A Garda-tó melletti hőstett162
VIII. Visszaadott leczke165
IX. A hős, mint diplomata167
X. Az új Horatius Cocles172
XI. Hadszerencse177
XII. A báskirok179
XIII. A végzetes golyó181
XIV. Milyen ülés esik a császári trónban?184
XV. Pezsgőspalaczk- és ágyúdurrogás188
XVI. Becsület az ellenség földén193
XVII. A lyoni nap195
XVIII. A szerelmes oroszlán199
XIX. Szerelmes szív, kegyelmes szív203
XX. A hős és poéta206
XXI. A hős asszony209
XXII. A két elitélt213
XXIII. Toldi utóda217
XXIV. A huszár Bécsben is úr220
XXV. Az utolsó lánczszem226
Végszó229
Javítások.
Updated editions will replace the previous one—the old editions will
be renamed.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the
terms of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or
expense to the user, provide a copy, a means of exporting a copy, or
a means of obtaining a copy upon request, of the work in its original
“Plain Vanilla ASCII” or other form. Any alternate format must
include the full Project Gutenberg™ License as specified in
paragraph 1.E.1.
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
1.F.4. Except for the limited right of replacement or refund set forth
in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com