0% found this document useful (0 votes)
3 views

Python-cheatsheet-ML

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Python-cheatsheet-ML

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

General Tips

Python
Stop wasting time explicitly importing all the data science libraries you Tips & Tricks
for Machine Learning
need to use in your dev environment. Instead, use Pyforest to lazily
import them for you only when you need them: Predictive Analysis Guide
pip install pyforest
And finally, a simple rule guide for when to apply which regression
technique when doing predictive analysis:
Tired of writing for loops to join lists? Use the zip function instead:
Regression
Usage Result
a = ("John", "Charles", "Mike") Type
Initial data
b = ("For", "Against", "For")
Visualization Tips
15

Zip function x = zip(a, b)


Tired of rendering static plots in Jupyter with %matplotlib ? Try
Used to predict the value
importing %matplotlib notebook which gives you interactive plots of a variable (called the 10
(('John', For), ('Charles', 'Against'), ('Mike', you can resize and zoom in on. dependent variable)
Result
'For')) Linear based on the value of
Want to more easily spot patterns in tabulated data? Create a heatmap another variable 5

using Seaborn's gradient capabilities. For example: Y’ = bX + A


10 20 30 40 50 60
hm = sns.light.palette('green', as_cmap = True)
Pandas Tips style = df.style.background_gradient(cmap = hm) 0 10 20 20 40 60 8 60 24 0 25 50
120

Used when you have


100
y

A B C D E
90

many variables and want 20

Shortcut your initial data analysis with the pandas’ profile function, which
10

Stepwise to identify a subset of


x1

0 -1.264051 1.52791 -0.970711 0.47056 -0.100697


0

60

generates a detailed report of your data (missing values, variables, predictors 40

1 0.303793 -1.72596 1.5851 -1.10686


x2

0.13297
33

bj.std=bj(sx*sy-1)
counts, etc) in just one line of code. For example:
16

2 1.57823 0.10798 -0.764048 -0.775189 1.38385


x3 8

df = pd.read_csv(‘somedata.csv’) 3 0.760385 -0.285647 0.538367 -2.0839 0.937782 10

df.profile_report() 08

Used when the


dependent variable is a
06

Jupyter Notebook Tips

sigmoid(x)
Speed up pandas operations with pandarallel. For example, instead of Logistic binary value
04

df.progress_apply() use df.parallel_apply() to run the process in parallel. ß0=ß1x1+ ß2x2


Need to debug your code in Jupyter notebooks? Just type %debug to 02

launch an interactive debugger that takes you back to the point where
Need to unstack a table? Use pandas, which can convert one level of an the exception happened. Press q to exit.
00
-4 -2 0 2 4 6

index into the columns of your data frame. For example:


100

Initial data state email_provider Computational costs matter. To check the running time of a block of code
AK aol.com
(trimmed) hotmail.cm in a Jupyter notebook preface the code block with %%time 90

cox.net
kitty.com Used for curvilinear data 80

AR deleo.com Working with Python in a Jupyter Notebook, but wish you had access to Polynomial
AZ yahoo.com R? Now you can run both of them together by typing ß0+ß0x1+ 70

aol.com
cox.net pip install rpy2. 60
nikolozakes.org
parvis.com
CA gmail.com 0 5 10 15 20

cox.net
aol.com Scikit-Learn Tips Best used when you have
a small number of
Unstack -----------------------------> Always use the stratify parameter to ensure test and train sets are Lasso significant parameters
clients.groupby('state')['email_provider'].value_counts 1.0

code ().unstack().fillna(0) split into equal proportions for better prediction and reproducibility of N-1 i
N
=1ƒ(xi, y
-----------------------------> results. For example: 0.8

Best used when you have


Result email_provider angalich.com ankeny.org aol.com
test_x, train_x, test_y, train_y = tain_test_split (x, y, a large number of
0.6

state
(trimmed) random_state = 59, stratify = y) Ridge significant parameters 0.4
ak
ar 0.0 0.0 2.0 LinearRegression:m=0.05

az 0.0 0.0 0.0 ß= (XT -1


XTy 0.2
Ridge: m=0.02

ca 0.0
0.0
0.0
1.0
2.0
9.0
Missing values in your dataset? Don’t settle for univariate methods to Lasso: m=0.00
ElasticNet: m=0.00
co
create the missing values when scikit-learn’s multivariate input based on
0.0

ct 0.0 0.0 0.0 The happy medium 0 2 4 6 8 10

dc 0.0 0.0 1.0 between Lasso and Ridge


fl 0.0 0.0 0.0 k-Nearest Neighbor can offer better accuracy: ElasticNet
ga 0.0 0.0 4.0
0.0 0.0 1.0 impute = KNNImputer(n_neighbors=2) =
1
P
j=1 j

You might also like