0% found this document useful (0 votes)

27 views5 pages

DL Practical 1 Train - Test - Split

Deep learning train data

Uploaded by

tkalyankar200

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views5 pages

DL Practical 1 Train - Test - Split

Deep learning train data

Uploaded by

tkalyankar200

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Title: Python Code for Train - Test Dataset Split

Aim: Split a Dataset into Train and Test Sets.

Theory:
In machine learning splitting the dataset into a training set and a test set, we can estimate the
model's performance on new, unseen data. We train the model on the training set and then
evaluate its performance on the test set. This gives us an estimate of how well the model will
perform on new, unseen data.
Typically, a portion of the dataset is set aside as the test set, while the remaining data is used
for training the model. The ratio of the training set to the test set can vary depending on the
size of the dataset and the problem being solved. A common practice is to use a 80/20 or 70/30
split for training and testing respectively.
NumPy and Pandas:

import numpy as np
import pandas as pd

Both the Pandas and NumPy can be seen as an essential library for any scientific computation,
including machine learning due to their intuitive syntax and high-performance matrix
computation capabilities.
NumPy arrays facilitate advanced mathematical and other types of operations on large numbers
of data.

from sklearn.model_selection import train_test_split

This code imports the NumPy library as 'np' and the 'train_test_split' function from the Scikit-
Learn (sklearn) library's 'model_selection' module.
The 'train_test_split' function is a utility function provided by Scikit-Learn to split a dataset
into two separate sets: a training set and a testing set. This is a common technique in machine
learning, where we want to train our model on a portion of the data and evaluate its performance
on the remaining portion.
The train-test split is used to estimate the performance of machine learning algorithms that are
applicable for prediction-based Algorithms/Applications. This method is a fast and easy
procedure to perform such that we can compare our own machine learning model results to
machine results. By default, the Test set is split into 30 % of actual data and the training set is
split into 70% of the actual data.
Data Splitting:
Scikit-learn alias sklearn is the most useful and robust library for machine learning in Python.
The scikit-learn library provides us with the model_selection module in which we have the
splitter function train_test_split().

Syntax:

train_test_split(*arrays, test_size=None, train_size=None,

random_state=None, shuffle=True, stratify=None)

Parameters:
*arrays: inputs such as lists, arrays, data frames, or matrices
test_size: this is a float value whose value ranges between 0.0 and 1.0. it represents the
proportion of our test size. its default value is none.
train_size: this is a float value whose value ranges between 0.0 and 1.0. it represents the
proportion of our train size. its default value is none.
random_state: this parameter is used to control the shuffling applied to the data before applying
the split. it acts as a seed.
shuffle: This parameter is used to shuffle the data before splitting. Its default value is true.
stratify: This parameter is used to split the data in a stratified fashion.
Code:

import numpy as np
from sklearn.model_selection import train_test_split

# Generate some random data

X = np.array([1,2,3,4,5,6,7,8,9,10])
y = np.array([10,20,30,40,50,60,70,80,90,100])

#This will split the data into 80% train and 20% test
x_train,x_test,y_train,y_test = train_test_split(X,Y,train_size=0.8)
print(x_train)
print(y_train)

Output:
[ 5 4 2 10 3 7 6 1]
[25 20 10 50 15 35 30 5]

# Split the data into training and testing sets.

#The data is shuffled before the split, and a random state of 5 is set for reproducibility.
x_train, x_test, y_train, y_test = train_test_split(X, Y, train_size=0.6, shuffle=False)

Output:
[1 2 3 4 5 6]
[ 5 10 15 20 25 30]

In this example, we generate a random dataset of 100 samples with 5 input features and 1 output
label. We then use the `train_test_split` function to split the data into training and testing sets,
with 60% of the data used for training and the remaining 40% used for testing. We set
`shuffle=False` to avoid shuffling the data before splitting it. Finally, we print the shapes of the
resulting arrays to confirm that the data was split correctly.

#The data is shuffled before the split, and a random state of 5 is

set for reproducibility.
x_train, x_test , y_train, y_test = train_test_split(X,Y,train_size
=0.7,shuffle = True, random_state = 5)
Output:
print(x_train)
print(x_test)
[5 8 2 1 9 7 4]
[10 6 3]

#variables X and Y, with 60% of the data assigned to the training set #and 40% to the test set.
The data is randomly shuffled before the #split, and no specific random state is set.
x_train,x_test,y_train,y_test = train_test_split(X,Y,train_size=0.6,random_state = None)
print(x_train)
print(y_train)

Output:
[ 1 3 4 9 10 7]
[ 5 15 20 45 50 35]
#It performs a train-test split on the variables X and y, where 70% of #the data is assigned to
the training set and 30% to the test set.
#It will print train-test data and shape
x_train, x_test, y_train, y_test = train_test_split(X,y, test_size = 0.3)
print(x_train)
print(x_train.shape)
print(x_test)
print(x_test.shape)

Output:
[1 7 6 8 5 3 4]
(7,)
[ 2 10 9]
(3,)
`X`: input features
`Y`: output labels
`train_size`: proportion of data used for training (default is 0.75)
`shuffle`: whether to shuffle the data before splitting (default is True)
`random_state`: a seed value for the random number generator used for shuffling and splitting
the data (default is None)
`stratify`: preserve the proportion of classes in the output labels in both the training and testing
sets (default is None)

Conclusion:
The train-test split is a common technique used in machine learning to evaluate the
performance of a model. This helps in assessing how well the model generalizes to unseen data.

Experiment Date of
Grade Teacher's Sign
Number Performance

Seat Leon (1P, 1P0,1P1) Workshop - Electrical System
67% (3)
Seat Leon (1P, 1P0,1P1) Workshop - Electrical System
365 pages
Hypertension and Cardiovascular Disease - Nutritional Case Study
No ratings yet
Hypertension and Cardiovascular Disease - Nutritional Case Study
9 pages
Property Management Presentation
100% (1)
Property Management Presentation
14 pages
Data Splitting-Training Material
No ratings yet
Data Splitting-Training Material
42 pages
Lab 2 Train - Test Split
No ratings yet
Lab 2 Train - Test Split
11 pages
Machine Learning-Lecture 02
No ratings yet
Machine Learning-Lecture 02
28 pages
Lab Session 10
No ratings yet
Lab Session 10
9 pages
ADS - Phase 3
No ratings yet
ADS - Phase 3
34 pages
Train Test Split in Python
No ratings yet
Train Test Split in Python
11 pages
Aula4 Myself
No ratings yet
Aula4 Myself
105 pages
Week 5
No ratings yet
Week 5
18 pages
Train - Test - Split Function
No ratings yet
Train - Test - Split Function
5 pages
Lab 1 - Machine Learning with Python - ML Engineering مهم
No ratings yet
Lab 1 - Machine Learning with Python - ML Engineering مهم
10 pages
Train and Test Datasets in Machine Learning
No ratings yet
Train and Test Datasets in Machine Learning
6 pages
Deep Learning Unit 3
No ratings yet
Deep Learning Unit 3
19 pages
CSC407 - Chapter 5-6
No ratings yet
CSC407 - Chapter 5-6
42 pages
Xiiaiuniticapstone Projectpartii
No ratings yet
Xiiaiuniticapstone Projectpartii
11 pages
Ds You Should Know
No ratings yet
Ds You Should Know
6 pages
Information Check 172124
No ratings yet
Information Check 172124
3 pages
Urmatan BSIT3C Essay
No ratings yet
Urmatan BSIT3C Essay
1 page
Funciones para Python
No ratings yet
Funciones para Python
33 pages
SPlit An Optimal Method For Data Splitting
No ratings yet
SPlit An Optimal Method For Data Splitting
36 pages
Group A 1
No ratings yet
Group A 1
9 pages
Train and Test Datasets in Machine Learning
No ratings yet
Train and Test Datasets in Machine Learning
26 pages
ML Unit1
No ratings yet
ML Unit1
11 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Machine Learning Splitting Data To Train Test
No ratings yet
Machine Learning Splitting Data To Train Test
2 pages
Sklearn
No ratings yet
Sklearn
141 pages
Dela Cruz - NB - AT
No ratings yet
Dela Cruz - NB - AT
6 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Machine Learning Feature - Week 5-8
No ratings yet
Machine Learning Feature - Week 5-8
54 pages
Unit-2 Feature Selection
No ratings yet
Unit-2 Feature Selection
92 pages
ML 6
No ratings yet
ML 6
15 pages
ML Unit 2
No ratings yet
ML Unit 2
18 pages
Training vs. Testing Sets - Solution
No ratings yet
Training vs. Testing Sets - Solution
4 pages
HW2A - Jiarui Han
No ratings yet
HW2A - Jiarui Han
6 pages
Deep Learning and Machine Learning: Lab Explanation
No ratings yet
Deep Learning and Machine Learning: Lab Explanation
34 pages
ML
No ratings yet
ML
8 pages
Linearregression SVM
No ratings yet
Linearregression SVM
3 pages
Scikit Learn
No ratings yet
Scikit Learn
25 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
Intro To ML
No ratings yet
Intro To ML
29 pages
Unit 2 ML
No ratings yet
Unit 2 ML
93 pages
Week 7 Laboratory Activity
No ratings yet
Week 7 Laboratory Activity
12 pages
Scikit-Learn: Library For Machine Learning and Data Science With Python
No ratings yet
Scikit-Learn: Library For Machine Learning and Data Science With Python
11 pages
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 3
No ratings yet
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 3
30 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Train Test Splitting
No ratings yet
Train Test Splitting
3 pages
Maxbox - Starter67 Machine Learning
No ratings yet
Maxbox - Starter67 Machine Learning
7 pages
2022UIT3140 Mlda (Prac-2)
No ratings yet
2022UIT3140 Mlda (Prac-2)
4 pages
Data Science
No ratings yet
Data Science
1 page
CTRL
No ratings yet
CTRL
5 pages
Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
Dsbda 10
No ratings yet
Dsbda 10
5 pages
L03 Generalization, Train Test Splits and Validation
No ratings yet
L03 Generalization, Train Test Splits and Validation
49 pages
Crash Course Sul Machine Learning ?
No ratings yet
Crash Course Sul Machine Learning ?
13 pages
Aam Codes
No ratings yet
Aam Codes
8 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
8 pages
ML Lab
No ratings yet
ML Lab
7 pages
Scikit-Learn: Scikit-Learn Is An Open Source Python Library That
100% (1)
Scikit-Learn: Scikit-Learn Is An Open Source Python Library That
1 page
Unit 1
No ratings yet
Unit 1
28 pages
C2 W3 Assignment
No ratings yet
C2 W3 Assignment
437 pages
Algorithmic Splitting A Method For Dataset Prepara
No ratings yet
Algorithmic Splitting A Method For Dataset Prepara
12 pages
(Communication Electronic Circuits) Preface
No ratings yet
(Communication Electronic Circuits) Preface
2 pages
Fosroc Nitomortar FC (FS) : Constructive Solutions
No ratings yet
Fosroc Nitomortar FC (FS) : Constructive Solutions
2 pages
BS en ISO 12781-2-2011 - Geometrical Product Specifications (GPS) - Flatness - Part 2 - Specification Operators
No ratings yet
BS en ISO 12781-2-2011 - Geometrical Product Specifications (GPS) - Flatness - Part 2 - Specification Operators
24 pages
MAN TGA ZF Transmission 16S151/16S181 (RL)
100% (4)
MAN TGA ZF Transmission 16S151/16S181 (RL)
4 pages
Air21 Location
No ratings yet
Air21 Location
1 page
CrimPro Lakas Atenista Notes
No ratings yet
CrimPro Lakas Atenista Notes
46 pages
Journal of Accounting and Economics: Shuping Chen, Ying Huang, Ningzhong Li, Terry Shevlin T
No ratings yet
Journal of Accounting and Economics: Shuping Chen, Ying Huang, Ningzhong Li, Terry Shevlin T
19 pages
Email Exchange
No ratings yet
Email Exchange
2 pages
Dayananda Sagar College of Engineering: M.TECH: Digital Electronics and Communication
No ratings yet
Dayananda Sagar College of Engineering: M.TECH: Digital Electronics and Communication
4 pages
Try Free Fortinet NSE 6 - FortiMail 6.2 NSE6-FML - 6.2 Real Dumps PDF
No ratings yet
Try Free Fortinet NSE 6 - FortiMail 6.2 NSE6-FML - 6.2 Real Dumps PDF
11 pages
The Japanese Led Light Industry
No ratings yet
The Japanese Led Light Industry
10 pages
Construction Cost For Vietnam q4 2013
No ratings yet
Construction Cost For Vietnam q4 2013
2 pages
EN Checklist ISO Aanvulling Ontwerp 7 - 3 260303
No ratings yet
EN Checklist ISO Aanvulling Ontwerp 7 - 3 260303
3 pages
Moving Forward VOCABULARY WORD SEARCH Y5
No ratings yet
Moving Forward VOCABULARY WORD SEARCH Y5
2 pages
CB2201 5
No ratings yet
CB2201 5
1 page
Fraud Alert!: "@ril - VC" and "@ril - Sg". These
No ratings yet
Fraud Alert!: "@ril - VC" and "@ril - Sg". These
2 pages
300UT-PL Concrete Mixer
No ratings yet
300UT-PL Concrete Mixer
8 pages
DCIT 65 Class Activity 1
No ratings yet
DCIT 65 Class Activity 1
2 pages
2022-23 Eco Ch-1 Assignment (Development)
No ratings yet
2022-23 Eco Ch-1 Assignment (Development)
4 pages
Backgroud of Malaysia Airlines 1
No ratings yet
Backgroud of Malaysia Airlines 1
38 pages
FM Chapter 16 Exercises
No ratings yet
FM Chapter 16 Exercises
7 pages
9-ch3 Part3 ch5 Part1
No ratings yet
9-ch3 Part3 ch5 Part1
24 pages
Introduction To The USA and Canada
No ratings yet
Introduction To The USA and Canada
10 pages
Obj - Que. PBG 4.4
No ratings yet
Obj - Que. PBG 4.4
11 pages
HYD691 Datasheet: Introducing The Hyd691... Standard Materials of Construction
No ratings yet
HYD691 Datasheet: Introducing The Hyd691... Standard Materials of Construction
4 pages
Breeds of Cattle
No ratings yet
Breeds of Cattle
18 pages
Air Filter Grades PDF
No ratings yet
Air Filter Grades PDF
2 pages

DL Practical 1 Train - Test - Split

Uploaded by

DL Practical 1 Train - Test - Split

Uploaded by

Title: Python Code for Train - Test Dataset Split

Aim: Split a Dataset into Train and Test Sets.

from sklearn.model_selection import train_test_split

train_test_split(*arrays, test_size=None, train_size=None,

# Generate some random data

# Split the data into training and testing sets.

#The data is shuffled before the split, and a random state of 5 is

You might also like