0% found this document useful (0 votes)
27 views5 pages

DL Practical 1 Train - Test - Split

Deep learning train data

Uploaded by

tkalyankar200
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views5 pages

DL Practical 1 Train - Test - Split

Deep learning train data

Uploaded by

tkalyankar200
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Title: Python Code for Train - Test Dataset Split

Aim: Split a Dataset into Train and Test Sets.


Theory:
In machine learning splitting the dataset into a training set and a test set, we can estimate the
model's performance on new, unseen data. We train the model on the training set and then
evaluate its performance on the test set. This gives us an estimate of how well the model will
perform on new, unseen data.
Typically, a portion of the dataset is set aside as the test set, while the remaining data is used
for training the model. The ratio of the training set to the test set can vary depending on the
size of the dataset and the problem being solved. A common practice is to use a 80/20 or 70/30
split for training and testing respectively.
NumPy and Pandas:

import numpy as np
import pandas as pd

Both the Pandas and NumPy can be seen as an essential library for any scientific computation,
including machine learning due to their intuitive syntax and high-performance matrix
computation capabilities.
NumPy arrays facilitate advanced mathematical and other types of operations on large numbers
of data.

from sklearn.model_selection import train_test_split

This code imports the NumPy library as 'np' and the 'train_test_split' function from the Scikit-
Learn (sklearn) library's 'model_selection' module.
The 'train_test_split' function is a utility function provided by Scikit-Learn to split a dataset
into two separate sets: a training set and a testing set. This is a common technique in machine
learning, where we want to train our model on a portion of the data and evaluate its performance
on the remaining portion.
The train-test split is used to estimate the performance of machine learning algorithms that are
applicable for prediction-based Algorithms/Applications. This method is a fast and easy
procedure to perform such that we can compare our own machine learning model results to
machine results. By default, the Test set is split into 30 % of actual data and the training set is
split into 70% of the actual data.
Data Splitting:
Scikit-learn alias sklearn is the most useful and robust library for machine learning in Python.
The scikit-learn library provides us with the model_selection module in which we have the
splitter function train_test_split().

Syntax:

train_test_split(*arrays, test_size=None, train_size=None,


random_state=None, shuffle=True, stratify=None)

Parameters:
*arrays: inputs such as lists, arrays, data frames, or matrices
test_size: this is a float value whose value ranges between 0.0 and 1.0. it represents the
proportion of our test size. its default value is none.
train_size: this is a float value whose value ranges between 0.0 and 1.0. it represents the
proportion of our train size. its default value is none.
random_state: this parameter is used to control the shuffling applied to the data before applying
the split. it acts as a seed.
shuffle: This parameter is used to shuffle the data before splitting. Its default value is true.
stratify: This parameter is used to split the data in a stratified fashion.
Code:

import numpy as np
from sklearn.model_selection import train_test_split

# Generate some random data


X = np.array([1,2,3,4,5,6,7,8,9,10])
y = np.array([10,20,30,40,50,60,70,80,90,100])

#This will split the data into 80% train and 20% test
x_train,x_test,y_train,y_test = train_test_split(X,Y,train_size=0.8)
print(x_train)
print(y_train)

Output:
[ 5 4 2 10 3 7 6 1]
[25 20 10 50 15 35 30 5]

# Split the data into training and testing sets.


#The data is shuffled before the split, and a random state of 5 is set for reproducibility.
x_train, x_test, y_train, y_test = train_test_split(X, Y, train_size=0.6, shuffle=False)

Output:
[1 2 3 4 5 6]
[ 5 10 15 20 25 30]

In this example, we generate a random dataset of 100 samples with 5 input features and 1 output
label. We then use the `train_test_split` function to split the data into training and testing sets,
with 60% of the data used for training and the remaining 40% used for testing. We set
`shuffle=False` to avoid shuffling the data before splitting it. Finally, we print the shapes of the
resulting arrays to confirm that the data was split correctly.

#The data is shuffled before the split, and a random state of 5 is


set for reproducibility.
x_train, x_test , y_train, y_test = train_test_split(X,Y,train_size
=0.7,shuffle = True, random_state = 5)
Output:
print(x_train)
print(x_test)
[5 8 2 1 9 7 4]
[10 6 3]

#variables X and Y, with 60% of the data assigned to the training set #and 40% to the test set.
The data is randomly shuffled before the #split, and no specific random state is set.
x_train,x_test,y_train,y_test = train_test_split(X,Y,train_size=0.6,random_state = None)
print(x_train)
print(y_train)

Output:
[ 1 3 4 9 10 7]
[ 5 15 20 45 50 35]
#It performs a train-test split on the variables X and y, where 70% of #the data is assigned to
the training set and 30% to the test set.
#It will print train-test data and shape
x_train, x_test, y_train, y_test = train_test_split(X,y, test_size = 0.3)
print(x_train)
print(x_train.shape)
print(x_test)
print(x_test.shape)

Output:
[1 7 6 8 5 3 4]
(7,)
[ 2 10 9]
(3,)
`X`: input features
`Y`: output labels
`train_size`: proportion of data used for training (default is 0.75)
`shuffle`: whether to shuffle the data before splitting (default is True)
`random_state`: a seed value for the random number generator used for shuffling and splitting
the data (default is None)
`stratify`: preserve the proportion of classes in the output labels in both the training and testing
sets (default is None)

Conclusion:
The train-test split is a common technique used in machine learning to evaluate the
performance of a model. This helps in assessing how well the model generalizes to unseen data.

Experiment Date of
Grade Teacher's Sign
Number Performance

You might also like