Machine Learning-Lecture 02
Machine Learning-Lecture 02
LECTURE – 02
• To use a dataset in Machine Learning, the dataset is first split into a training
and test set
• Reusing the same data for both training and testing is a bad idea
because we need to know how the method will work on data it was
not trained on.
Types of train-test split
• 50:50 split
• The function takes a loaded dataset as input and returns the dataset
split into two subsets
Train-Test Split Procedure in Scikit-Learn
• Ideally, you can split your original dataset into input (X) and output (y)
columns, then call the function passing both arrays and have them
split appropriately into train and test subsets.
• 0.33 where 33 percent of the dataset will be allocated to the test set
and 67 percent will be allocated to the training set.
Train-Test Split Procedure in Scikit-Learn