0% found this document useful (0 votes)
55 views1 page

Random State

The random_state argument in scikit-learn algorithms makes results reproducible by controlling the shuffling of data before splitting or training models. Without random_state, results will be different each time due to random shuffling. Specifying random_state sets the seed for the pseudo-random number generator, ensuring splits or algorithms will be the same across multiple runs. It is preferred for unsupervised algorithms like k-means clustering and tree-based methods like decision trees and random forests.

Uploaded by

Karan Patni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views1 page

Random State

The random_state argument in scikit-learn algorithms makes results reproducible by controlling the shuffling of data before splitting or training models. Without random_state, results will be different each time due to random shuffling. Specifying random_state sets the seed for the pseudo-random number generator, ensuring splits or algorithms will be the same across multiple runs. It is preferred for unsupervised algorithms like k-means clustering and tree-based methods like decision trees and random forests.

Uploaded by

Karan Patni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Random State

In Sklearn’s Train Test Split or any other Algorithm, you will find an argument called ‘random_state’.

Its function is to make the results reproducible.

Sklearn’s train_test_split splits arrays or matrices into random train and test subsets. When you run

the algorithm without specifying a random_state, you will get a different result every time, this is an

expected behaviour. Random State controls the shuffling applied to the data before applying the split

In unsupervised algorithms like KMeans, or Tree Based/Ensemble methods like DT/RF, using

random_state is preferred to get reproducible results. All these algorithms will give different results

on every run without random_state.

Train Test Split Example:

[email protected]
BF8ML1XK6GX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

KMeans Random State Example:

kmeans = KMeans(n_clusters=2, random_state=0)


`

Decision Tree and Random Forest Example:

clf = tree.DecisionTreeClassifier(random_state=0)

clf = RandomForestClassifier(random_state=0)

Similarly, you can check every algorithm’s documentation here to see where to enter random_state.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.

You might also like