0% found this document useful (0 votes)
27 views5 pages

Train - Test - Split Function

This document describes the sklearn.model_selection.train_test_split function which splits arrays or matrices into random train and test subsets. It allows inputting data, labels, and options to split the data for model training and testing. Examples are given showing how to split sample data for use in machine learning models.

Uploaded by

priyanshu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views5 pages

Train - Test - Split Function

This document describes the sklearn.model_selection.train_test_split function which splits arrays or matrices into random train and test subsets. It allows inputting data, labels, and options to split the data for model training and testing. Examples are given showing how to split sample data for use in machine learning models.

Uploaded by

priyanshu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

11/7/2019 sklearn.model_selection.train_test_split — scikit-learn 0.21.

3 documentation

Home Installation
Documentation
Examples

sklearn.model_selection.train_test_split

sklearn.model_selection.train_test_split(*arrays, **options) [source]


»

Split arrays or matrices into random train and test subsets

Quick utility that wraps input validation and next(ShuffleSplit().split(X, y)) and application to input
data into a single call for splitting (and optionally subsampling) data in a oneliner.

Read more in the User Guide.


Parameters: *arrays : sequence of indexables with same length / shape[0]
Allowed inputs are lists, numpy arrays, scipy-sparse matrices or pandas
dataframes.

test_size : float, int or None, optional (default=None)


If float, should be between 0.0 and 1.0 and represent the proportion of the dataset
to include in the test split. If int, represents the absolute number of test samples. If
None, the value is set to the complement of the train size. If train_size is also
None, it will be set to 0.25.

train_size : float, int, or None, (default=None)


If float, should be between 0.0 and 1.0 and represent the proportion of the dataset
to include in the train split. If int, represents the absolute number of train samples. If
None, the value is automatically set to the complement of the test size.

random_state : int, RandomState instance or None, optional (default=None)


If int, random_state is the seed used by the random number generator; If
RandomState instance, random_state is the random number generator; If None, the
random number generator is the RandomState instance used by np.random.

shuffle : boolean, optional (default=True)


Whether or not to shuffle the data before splitting. If shuffle=False then stratify must
be None.

stratify : array-like or None (default=None)


If not None, data is split in a stratified fashion, using this as the class labels.

Returns: splitting : list, length=2 * len(arrays)


List containing train-test split of inputs.

New in version 0.16: If the input is sparse, the output will be a


scipy.sparse.csr_matrix. Else, output type is the same as the input type.
Previous Next
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html 1/5
11/7/2019 sklearn.model_selection.train_test_split — scikit-learn 0.21.3 documentation

Examples
>>> import numpy as np >>>
>>> from sklearn.model_selection import train_test_split
>>> X, y = np.arange(10).reshape((5, 2)), range(5)
>>> X
array([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
» >>> list(y)
[0, 1, 2, 3, 4]

>>> X_train, X_test, y_train, y_test = train_test_split( >>>


... X, y, test_size=0.33, random_state=42)
...
>>> X_train
array([[4, 5],
[0, 1],
[6, 7]])
>>> y_train
[2, 0, 3]
>>> X_test
array([[2, 3],
[8, 9]])
>>> y_test
[1, 4]

>>> train_test_split(y, shuffle=False) >>>


[[0, 1, 2], [3, 4]]

Examples using sklearn.model_selection.train_test_split

Faces recognition example Prediction Latency Probability Calibration


using eigenfaces and curves
SVMs

Probability calibration of
Previous Classifier comparison Column Transformer with Next
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html 2/5
11/7/2019 sklearn.model_selection.train_test_split — scikit-learn 0.21.3 documentation

classifiers Mixed Types

Effect of transforming the Comparing random forests Early stopping of Gradient


targets in regression model and the multi-output meta Boosting
estimator

Feature transformations Gradient Boosting Out-of- Pipeline Anova SVM


with ensembles of trees Bag estimates

Comparing various online MNIST classfification using Multiclass sparse logisitic


solvers multinomial logistic + L1 regression on
newgroups20

Previous Next
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html 3/5
11/7/2019 sklearn.model_selection.train_test_split — scikit-learn 0.21.3 documentation

Early stopping of Parameter estimation Confusion matrix


Stochastic Gradient using grid search with
Descent cross-validation

Receiver Operating Precision-Recall Classifier Chain


Characteristic (ROC)

Comparing Nearest Dimensionality Reduction Restricted Boltzmann


Neighbors with and without with Neighborhood Machine features for digit
Neighborhood Components Analysis classification
Components Analysis

Varying regularization in Using FunctionTransformer Importance of Feature


Multi-layer Perceptron to select columns Scaling

Previous Next
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html 4/5
11/7/2019 sklearn.model_selection.train_test_split — scikit-learn 0.21.3 documentation

Map data to a normal Feature discretization Understanding the


distribution decision tree structure

Previous Next
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html 5/5

You might also like