Scikit Learn Cheat Sheet
Scikit Learn Cheat Sheet
Scikit-learn is an open source Python library that implements a pd.read_csv(filename) From a CSV file
range of machine learning, preprocessing, cross-validation and pd.read_excel(filename) From an Excel file
visualization algorithms using a unified interface
pd.read_sql(query, Read from a SQL table/database
connection_object)
Splitting Data
pd.read_clipboard() Takes the contents of your clipboard and
from sklearn.model_selection import train_test_split
passes it to read_table()
X_train, X_test, y_train, y_test = train_test_split(X, y, random_st‐
ate=7) Visualization using Scikit-learn
It is used to know the effectiveness of our Models by re-sampling and np.zeros(2,2) array([[0,0],[0,0]])
applying to models in different iterations.
np.linspace(0,10,2) array([0,5]), gives two evenly spaced values from sklearn.preprocessing import StandardScaler
np.eye(2) array([[1,0],[0,1]), 2*2 Identity Matrix sc_X = StandardScaler()
example.reshape(3,1) array([[0],[1],[2]]) X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
Loading Dataset from local Machine
Euclidean distance is dominated by the larger numbers and to make
import pandas as pd all the values on the same scale. hence Scaling should be done.
data = pd.read_csv(pathname) Most of the models do feature scaling by themselves.
If the file is in the local directory then we can directly use File name
SVR(Non-linear Regression model)
labelencoder_X = LabelEncoder()
Some Classification Models
X[ : , 0] = labelencoder_X.fit_transform(X[ : , 0 ])
Logistic Regression
onehotencoder = OneHotEncoder(categorical_features = [0])
X = onehotencoder.fit_transform(X).toarray() K-NN(K- nearest neighbours)
Support Vector Machine(SVM)
Polynomial Regression Naive Bayes
from sklearn.preprocessing import PolynomialFeatures Decision Tree Classification
poly_reg = PolynomialFeatures(degree =2) Random Forest Classification
X_poly = poly_reg.fit_transform(X)
It not only checks the relation between X(independent) and y(depe‐ Some Clustering Models
ndent). But also checks with X2 ..X n. (n is degree specified by us). K-Means Clustering
Hierarchial Clustering
Evaluation of Regression Model Performance
DB-SCAN
R2 = 1 - SS(residuals)/SS(total)
data = pd.read_csv("data.csv")
X = data.iloc[ : , :-1].values
y = data.iloc[ : , 3].values
y is Dependent parameter