Python For Data Science Cheat Sheet: Scikit-Learn Create Your Model Evaluate Your Model's Performance
This document provides a summary of key machine learning concepts in Python using the scikit-learn library. It discusses loading and preparing data, fitting models using supervised and unsupervised algorithms like linear regression, KNN, SVM, k-means clustering and PCA. It also covers evaluating model performance using various metrics for classification like accuracy, confusion matrix, and regression like mean squared error and R2 score. Cross-validation techniques are mentioned to validate models.
Python For Data Science Cheat Sheet: Scikit-Learn Create Your Model Evaluate Your Model's Performance
This document provides a summary of key machine learning concepts in Python using the scikit-learn library. It discusses loading and preparing data, fitting models using supervised and unsupervised algorithms like linear regression, KNN, SVM, k-means clustering and PCA. It also covers evaluating model performance using various metrics for classification like accuracy, confusion matrix, and regression like mean squared error and R2 score. Cross-validation techniques are mentioned to validate models.
PYTHON FOR DATA SCIENCE CHEAT SHEET Learn Python for Data Science at www.edureka.
co
Scikit-learn Create Your Model Evaluate Your Model’s Performance
Scikit-learn is an open source Python library that Supervised Learning Estimators Classification Metrics implements a range of machine learning, scikit preprocessing, cross-validation and visualization Linear Regression Accuracy Score algorithms using a unified interface. >>> from sklearn.linear_model import LinearRegression >>> knn.score(X_test, y_test) #Estimator score method >>> lr = LinearRegression(normalize=True) >>> from sklearn.metrics import accuracy_score A Basic Example >>> accuracy_score(y_test, y_pred) Support Vector Machines (SVM) Classification Report #Metric scoring functions >>> from sklearn import neighbors, datasets, preprocessing >>> from sklearn.svm import SVC >>> svc = SVC(kernel='linear') >>> from sklearn.metrics import classification_report >>> from sklearn.cross_validation import train_test_split >>> print(classification_report(y_test, y_pred)) >>> from sklearn.metrics import accuracy_score Naive Bayes >>> from sklearn.naive_bayes import GaussianNB Confusion Matrix >>> iris = datasets.load_iris() #Precision, recall, >>> gnb = GaussianNB() >>> from sklearn.metrics import confusion_matrix f1-score and support >>> X, y = iris.data[:, :2], iris.target >>> print(confusion_matrix(y_test, y_pred)) >>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=33) KNN >>> scaler = preprocessing.StandardScaler().fit(X_train) >>> from sklearn import neighbors >>> knn = neighbors.KNeighborsClassifier(n_neighbors=5) Regression Metrics >>> X_train = scaler.transform(X_train) >>> X_test = scaler.transform(X_test) Mean Absolute Error Unsupervised Learning Estimators >>> knn = neighbors.KNeighborsClassifier(n_neighbors=5) >>> from sklearn.metrics import mean_absolute_error >>> knn.fit(X_train, y_train) K Means >>> y_true = [3, -0.5, 2] >>> y_pred = knn.predict(X_test) >>> from sklearn.decomposition import PCA >>> mean_absolute_error(y_true, y_pred) >>> accuracy_score(y_test, y_pred) >>> pca = PCA(n_components=0.95) Mean Squared Error >>> from sklearn.metrics import mean_squared_error Principal Component Analysis (PCA) >>> mean_squared_error(y_test, y_pred) Loading The Data >>> from sklearn.cluster import KMeans R² Score >>> k_means = KMeans(n_clusters=3, random_state=0) >>> from sklearn.metrics import r2_score Your data needs to be numeric and stored as NumPy arrays or SciPy >>> r2_score(y_true, y_pred) sparse matrices. Other types that are convertible to numeric arrays, such as Pandas DataFrame, are also acceptable. Model Fitting Clustering Metrics >>> import numpy as np Adjusted Rand Index Supervised learning >>> X = np.random.random((10,5)) >>> from sklearn.metrics import adjusted_rand_score >>> lr.fit(X, Y) >>> y = np.array(['M','M','F','F','M','F','M','M','F','F','F']) #Fit the model to the data >>> adjusted_rand_score(y_true, y_pred) >>> knn.fit(X_train, Y_train) >>> X[X < 0.7] = 0 >>> svc.fit(X_train, Y_train) Homogeneity Unsupervised Learning #Fit the model to the data >>> from sklearn.metrics import homogeneity_score Training And Test Data >>> k_means.fit(X_train) #Fit to data, then transform it >>> homogeneity_score(y_true, y_pred) >>> pca_model = pca.fit_transform(X_train) V-measure >>> from sklearn.cross_validation import train_test_split >>> from sklearn.metrics import v_measure_score >>> X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=0) >>> metrics.v_measure_score(y_true, y_pred) Prediction Cross-Validation Supervised Estimators #Predict labels >>> y_pred = svc.predict(np.random.random((2,5))) #Predict labels Adjusted Rand Index >>> y_pred = lr.predict(X_test) #Estimate probability >>> from sklearn.cross_validation import cross_val_score >>> y_pred = knn.predict_proba(X_test)) of a label >>> print(cross_val_score(knn, X_train, y_train, cv=4)) Unsupervised Estimators >>> print(cross_val_score(lr, X, y, cv=2)) >>> y_pred = k_means.predict(X_test) #Predict labels in clustering algos PYTHON FOR DATA SCIENCE Tune Your Model
Scikit-learn Grid Search
Standardization Encoding Categorical Features >>> from sklearn.grid_search import GridSearchCV
Neural Networks and Deep Learning - Deep Learning Explained To Your Granny - A Visual Introduction For Beginners Who Want To Make Their Own Deep Learning Neural Network (Machine Learning)
Instant Access to Practical Quantum Computing for Developers: Programming Quantum Rigs in the Cloud using Python, Quantum Assembly Language and IBM QExperience 1st Edition Vladimir Silva ebook Full Chapters