Python For Data Science Cheat Sheet: Scikit-Learn Create Your Model Evaluate Your Model's Performance
This document provides a summary of key machine learning concepts in Python using the scikit-learn library. It discusses loading and preparing data, fitting models using supervised and unsupervised algorithms like linear regression, KNN, SVM, k-means clustering and PCA. It also covers evaluating model performance using various metrics for classification like accuracy, confusion matrix, and regression like mean squared error and R2 score. Cross-validation techniques are mentioned to validate models.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100%(1)100% found this document useful (1 vote)
233 views
Python For Data Science Cheat Sheet: Scikit-Learn Create Your Model Evaluate Your Model's Performance
This document provides a summary of key machine learning concepts in Python using the scikit-learn library. It discusses loading and preparing data, fitting models using supervised and unsupervised algorithms like linear regression, KNN, SVM, k-means clustering and PCA. It also covers evaluating model performance using various metrics for classification like accuracy, confusion matrix, and regression like mean squared error and R2 score. Cross-validation techniques are mentioned to validate models.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1
PYTHON FOR DATA SCIENCE CHEAT SHEET Learn Python for Data Science at www.edureka.
co
Scikit-learn Create Your Model Evaluate Your Model’s Performance
Scikit-learn is an open source Python library that Supervised Learning Estimators Classification Metrics implements a range of machine learning, scikit preprocessing, cross-validation and visualization Linear Regression Accuracy Score algorithms using a unified interface. >>> from sklearn.linear_model import LinearRegression >>> knn.score(X_test, y_test) #Estimator score method >>> lr = LinearRegression(normalize=True) >>> from sklearn.metrics import accuracy_score A Basic Example >>> accuracy_score(y_test, y_pred) Support Vector Machines (SVM) Classification Report #Metric scoring functions >>> from sklearn import neighbors, datasets, preprocessing >>> from sklearn.svm import SVC >>> svc = SVC(kernel='linear') >>> from sklearn.metrics import classification_report >>> from sklearn.cross_validation import train_test_split >>> print(classification_report(y_test, y_pred)) >>> from sklearn.metrics import accuracy_score Naive Bayes >>> from sklearn.naive_bayes import GaussianNB Confusion Matrix >>> iris = datasets.load_iris() #Precision, recall, >>> gnb = GaussianNB() >>> from sklearn.metrics import confusion_matrix f1-score and support >>> X, y = iris.data[:, :2], iris.target >>> print(confusion_matrix(y_test, y_pred)) >>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=33) KNN >>> scaler = preprocessing.StandardScaler().fit(X_train) >>> from sklearn import neighbors >>> knn = neighbors.KNeighborsClassifier(n_neighbors=5) Regression Metrics >>> X_train = scaler.transform(X_train) >>> X_test = scaler.transform(X_test) Mean Absolute Error Unsupervised Learning Estimators >>> knn = neighbors.KNeighborsClassifier(n_neighbors=5) >>> from sklearn.metrics import mean_absolute_error >>> knn.fit(X_train, y_train) K Means >>> y_true = [3, -0.5, 2] >>> y_pred = knn.predict(X_test) >>> from sklearn.decomposition import PCA >>> mean_absolute_error(y_true, y_pred) >>> accuracy_score(y_test, y_pred) >>> pca = PCA(n_components=0.95) Mean Squared Error >>> from sklearn.metrics import mean_squared_error Principal Component Analysis (PCA) >>> mean_squared_error(y_test, y_pred) Loading The Data >>> from sklearn.cluster import KMeans R² Score >>> k_means = KMeans(n_clusters=3, random_state=0) >>> from sklearn.metrics import r2_score Your data needs to be numeric and stored as NumPy arrays or SciPy >>> r2_score(y_true, y_pred) sparse matrices. Other types that are convertible to numeric arrays, such as Pandas DataFrame, are also acceptable. Model Fitting Clustering Metrics >>> import numpy as np Adjusted Rand Index Supervised learning >>> X = np.random.random((10,5)) >>> from sklearn.metrics import adjusted_rand_score >>> lr.fit(X, Y) >>> y = np.array(['M','M','F','F','M','F','M','M','F','F','F']) #Fit the model to the data >>> adjusted_rand_score(y_true, y_pred) >>> knn.fit(X_train, Y_train) >>> X[X < 0.7] = 0 >>> svc.fit(X_train, Y_train) Homogeneity Unsupervised Learning #Fit the model to the data >>> from sklearn.metrics import homogeneity_score Training And Test Data >>> k_means.fit(X_train) #Fit to data, then transform it >>> homogeneity_score(y_true, y_pred) >>> pca_model = pca.fit_transform(X_train) V-measure >>> from sklearn.cross_validation import train_test_split >>> from sklearn.metrics import v_measure_score >>> X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=0) >>> metrics.v_measure_score(y_true, y_pred) Prediction Cross-Validation Supervised Estimators #Predict labels >>> y_pred = svc.predict(np.random.random((2,5))) #Predict labels Adjusted Rand Index >>> y_pred = lr.predict(X_test) #Estimate probability >>> from sklearn.cross_validation import cross_val_score >>> y_pred = knn.predict_proba(X_test)) of a label >>> print(cross_val_score(knn, X_train, y_train, cv=4)) Unsupervised Estimators >>> print(cross_val_score(lr, X, y, cv=2)) >>> y_pred = k_means.predict(X_test) #Predict labels in clustering algos PYTHON FOR DATA SCIENCE Tune Your Model
Scikit-learn Grid Search
Standardization Encoding Categorical Features >>> from sklearn.grid_search import GridSearchCV