Python-cheatsheet-ML
Python-cheatsheet-ML
Python
Stop wasting time explicitly importing all the data science libraries you Tips & Tricks
for Machine Learning
need to use in your dev environment. Instead, use Pyforest to lazily
import them for you only when you need them: Predictive Analysis Guide
pip install pyforest
And finally, a simple rule guide for when to apply which regression
technique when doing predictive analysis:
Tired of writing for loops to join lists? Use the zip function instead:
Regression
Usage Result
a = ("John", "Charles", "Mike") Type
Initial data
b = ("For", "Against", "For")
Visualization Tips
15
A B C D E
90
Shortcut your initial data analysis with the pandas’ profile function, which
10
60
0.13297
33
bj.std=bj(sx*sy-1)
counts, etc) in just one line of code. For example:
16
df.profile_report() 08
sigmoid(x)
Speed up pandas operations with pandarallel. For example, instead of Logistic binary value
04
launch an interactive debugger that takes you back to the point where
Need to unstack a table? Use pandas, which can convert one level of an the exception happened. Press q to exit.
00
-4 -2 0 2 4 6
Initial data state email_provider Computational costs matter. To check the running time of a block of code
AK aol.com
(trimmed) hotmail.cm in a Jupyter notebook preface the code block with %%time 90
cox.net
kitty.com Used for curvilinear data 80
AR deleo.com Working with Python in a Jupyter Notebook, but wish you had access to Polynomial
AZ yahoo.com R? Now you can run both of them together by typing ß0+ß0x1+ 70
aol.com
cox.net pip install rpy2. 60
nikolozakes.org
parvis.com
CA gmail.com 0 5 10 15 20
cox.net
aol.com Scikit-Learn Tips Best used when you have
a small number of
Unstack -----------------------------> Always use the stratify parameter to ensure test and train sets are Lasso significant parameters
clients.groupby('state')['email_provider'].value_counts 1.0
code ().unstack().fillna(0) split into equal proportions for better prediction and reproducibility of N-1 i
N
=1ƒ(xi, y
-----------------------------> results. For example: 0.8
state
(trimmed) random_state = 59, stratify = y) Ridge significant parameters 0.4
ak
ar 0.0 0.0 2.0 LinearRegression:m=0.05
ca 0.0
0.0
0.0
1.0
2.0
9.0
Missing values in your dataset? Don’t settle for univariate methods to Lasso: m=0.00
ElasticNet: m=0.00
co
create the missing values when scikit-learn’s multivariate input based on
0.0