0% found this document useful (0 votes)
46 views2 pages

Scikit Learn Cheat Sheet

Uploaded by

Rania Dirar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views2 pages

Scikit Learn Cheat Sheet

Uploaded by

Rania Dirar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Scikit-Learn(Cyber-security) Cheat Sheet

by satwik dondapati (sati) via cheatography.com/121228/cs/22124/

Definition Pandas functions for importing Data

Scik​it-​learn is an open source Python library that implements a pd.rea​d_c​sv(​fil​ename) From a CSV file
range of machine learning, prepro​ces​sing, cross-​val​idation and pd.rea​d_e​xce​l(f​ile​name) From an Excel file
visual​ization algorithms using a unified interface
pd.rea​d_s​ql(​query, Read from a SQL table/​dat​abase
connec​tio​n_o​bject)
Splitting Data
pd.rea​d_c​lip​board() Takes the contents of your clipboard and
from sklear​n.m​ode​l_s​ele​ction import train_​tes​t_split
passes it to read_t​able()
X_train, X_test, y_train, y_test = train_​tes​t_s​plit(X, y, random​_st​‐
ate=7) Visual​ization using Scikit​-learn

from sklear​n.m​etrics import Importing "​plo​t_r​oc_​cur​ve" to plot


Handling Missing Data
plot_r​oc_​curve
from sklear​n.i​mpute import Simple​Imputer
svc_disp = plot_r​oc_​cur​ve(svc, Plotting Receiver operating
missin​gvalues = Simple​Imp​ute​r(m​iss​ing​_values = np.nan, strategy = X_test, y_test) charac​ter​istic Curve
'mean')
metric​s.p​lot​_co​nfu​sio​n_m​atrix Plotting Conf​usion Matrix.
missin​gvalues = missin​gva​lue​s.f​it(X[:, 1:3])
X[:, 1:3]=m​iss​ing​val​ues.tr​ans​for​m(X[:, 1:3])
Clustering metrics

Adjusted Rand Index


Linear Regression
>>> from sklear​n.m​etrics import adjust​ed_​ran​d_score
from sklear​n.l​ine​ar_​model import Line​arR​egr​ess​ion
>>> adjust​ed_​ran​d_s​cor​e(y​_true, y_pred)
linear_reg = Linear​Reg​res​sion()
Homoge​neity
linear​_re​g.fit( X , y )
>>> from sklear​n.m​etrics import homoge​nei​ty_​score >>> homoge​‐
nei​ty_​sco​re(​y_true, y_pred)
Decision Tree and Random forest
V-measure
from sklear​n.tree import DecisionTreeRegressor >>> from sklear​n.m​etrics import v_meas​ure​_score
from sklear​n.e​nsemble import Random​For​est​Reg​ressor >>> metric​s.v​_me​asu​re_​sco​re(​y_true, y_pred)
regressor = Decisi​onT​ree​Reg​res​sor​(ra​ndo​m_state = 0)
regres​sor.fi​t(X,y) Pandas Data Cleaning functions
regressor2 = Random​For​est​Reg​res​sor​(n_​est​imators = pd.isn​ull() Checks for null Values, Returns Boolean Arrray
100,random_state=0)
pd.not​null() Opposite of pd.isn​ull()
regressor2.fit(X,y)
df.dro​pna() Drop all rows that contain null values

Cross-​Val​idation df.dro​pna​(ax​is=1) Drop all columns that contain null values


df.fil​lna(x) Replace all null values with x
from sklear​n.d​atasets import make_regression
from sklear​n.l​ine​ar_​model import LinearRegression
Numpy Basic Functions
from sklear​n.m​ode​l_s​ele​ction import cross_​val​idate
X , y = make)r​egr​ess​ion​(n_​samples = 1000, random​_state = 0) import numpy as np importing numpy
lr = Linear​Reg​res​sion() example = [0,1,2] array([0, 1, 2])
result = cross_validate(lr,X,y) example = np.arr​ay(​exa​mple)
result['test_score'] np.ara​nge​(1,4) array(​[1,​2,3])

It is used to know the effect​iveness of our Models by re-sam​pling and np.zer​os(2,2) array(​[[0​,0]​,[0​,0]])
applying to models in different iterat​ions.

By satwik dondapati (sati) Published 22nd March, 2020. Sponsored by CrosswordCheats.com


cheatography.com/sati/ Last updated 22nd March, 2020. Learn to solve cryptic crosswords!
Page 1 of 2. http://crosswordcheats.com
Scikit-Learn(Cyber-security) Cheat Sheet
by satwik dondapati (sati) via cheatography.com/121228/cs/22124/

Numpy Basic Functions (cont) Feature Scaling

np.lin​spa​ce(​0,10,2) array(​[0,5]), gives two evenly spaced values from sklear​n.p​rep​roc​essing import Standa​rdS​caler
np.eye(2) array(​[[1​,0]​,[0​,1]), 2*2 Identity Matrix sc_X = Standa​rdS​caler()
exampl​e.r​esh​ape​(3,1) array(​[[0​],[​1],​[2]]) X_train = sc_X.f​it_​tra​nsf​orm​(X_​train)
X_test = sc_X.t​ran​sfo​rm(​X_test)
Loading Dataset from local Machine
Euclidean distance is dominated by the larger numbers and to make
import pandas as pd all the values on the same scale. hence Scaling should be done.
data = pd.rea​d_c​sv(​pat​hname) Most of the models do feature scaling by themse​lves.

If the file is in the local directory then we can directly use File name
SVR(No​n-l​inear Regression model)

Loading Data from Standard datasets from sklear​n.svm import SVR

regressor = SVR​(k​ernel = 'rbf')


from sklearn import datasets
regressor.fit(X,y)
iris = datase​ts.l​oa​d_i​ris()
digits = datase​ts.l​oa​d_d​igits() y_pred​iction = regres​sor.​pre​dic​t ​(va​lues)

Basically, the kernel is selected based on the given problem. If the


Encoding Catego​rical Variables problem is Linear then kern​el=​'li​nea​r'. And if problem is non-linear
from sklear​n.p​rep​roc​essing import LabelE​ncoder we can choose either 'poly' or 'rbf'(​gus​sian)

labele​nco​der_X = LabelE​nco​der()
Some Classi​fic​ation Models
X[ : , 0] = labele​nco​der​_X.f​it​_tr​ans​form(X[ : , 0 ])
Logistic Regression
onehot​encoder = OneHot​Enc​ode​r(c​ate​gor​ica​l_f​eatures = [0])
X = onehot​enc​ode​r.f​it_​tra​nsf​orm​(X).to​array() K-NN(K- nearest neighb​ours)
Support Vector Machin​e(SVM)
Polynomial Regression Naive Bayes
from sklear​n.p​rep​roc​essing import Poly​nom​ial​Fea​tures Decision Tree Classi​fic​ation
poly_reg = Polyno​mia​lFe​atu​res​(degree =2) Random Forest Classi​fic​ation
X_poly = poly_r​eg.f​it​_tr​ans​form(X)

It not only checks the relation between X(inde​pen​dent) and y(depe​‐ Some Clustering Models
ndent). But also checks with X2 ..X n. (n is degree specified by us). K-Means Clustering
Hierar​chial Clustering
Evaluation of Regr​ession Model Perfor​mance
DB-SCAN
R2 = 1 - SS(res​idu​als​)/S​S(t​otal)

SS(res) = SUM(Yi - y^i) 2 Knowing about Data inform​ation with Pandas


SS(Total) = SUM(yi - yavg) 2 df.head(n) First n rows of the DataFrame
from sklear​n.m​etrics import r2_score df.tail(n) Last n rows of the DataFrame
r2_sco​re(​y_t​rue​,y_​pred)
df.shape Number of rows and columns
The Greater the R2 value the better the model is.. df.info() Index, Datatype and Memory inform​ation
df.des​cribe() Summary statistics for numerical columns
Converting Dataframe to Matrix

data = pd.rea​d_c​sv(​"​dat​a.c​sv")
X = data.iloc[ : , :-1].v​alues
y = data.iloc[ : , 3].values

y is Dependent parameter

By satwik dondapati (sati) Published 22nd March, 2020. Sponsored by CrosswordCheats.com


cheatography.com/sati/ Last updated 22nd March, 2020. Learn to solve cryptic crosswords!
Page 2 of 2. http://crosswordcheats.com

You might also like