0% found this document useful (0 votes)
107 views

Random Forest Explained & Implemented in Python

The document provides an overview of the random forest algorithm and its implementation in Python. It explains that random forest is an ensemble tree-based algorithm that consists of a set of decision trees trained on random subsets of the data. It aggregates the votes from decision trees to make predictions, is highly accurate, can handle missing data, and avoids overfitting. The document also shows code samples to implement random forest for classification and regression problems in Python using scikit-learn.

Uploaded by

Pooja Bhushan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views

Random Forest Explained & Implemented in Python

The document provides an overview of the random forest algorithm and its implementation in Python. It explains that random forest is an ensemble tree-based algorithm that consists of a set of decision trees trained on random subsets of the data. It aggregates the votes from decision trees to make predictions, is highly accurate, can handle missing data, and avoids overfitting. The document also shows code samples to implement random forest for classification and regression problems in Python using scikit-learn.

Uploaded by

Pooja Bhushan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

ANALYTICS INDIA MAGAZINE

Cheatsheet
RANDOM FOREST EXPLAINED
& IMPLEMENTED IN PYTHON
WHAT IS THE RANDOM FOREST ALGORITHM? #Implementation

• An ensemble tree based algorithm. It consists of a import pandas as pd


set of decision trees that are randomly selected
from a subset of the training data. df = pd.read_csv(‘data.csv’)

• Final class of the testing data point is selected on X = df.drop(‘class’,axis = 1)


the basis of aggregate votes from other decision Y = df[[‘class’]]
trees.
from sklearn.model_selection import train_test_split
• Highly accurate algorithm that can even work with X_train,X_test,y_train,y_test =
missing values. train_test_split(X,y,test_size=0.33,random_state=42)

• It can be used for both classification as well as from sklearn.ensemble import


regression tasks. RandomForestClassifier,RandomForestRegressor

• Overfitting in models results in poor performance #Classification


of the model but in case of random forest it will
not overfit if there are many trees. rfcl = RandomForestClassifier()
rfcl.fit(X_train,y_train)
y_pred = rfcl.predict(X_test)
HOW DOES IT WORK? accuracy_score(y_pred,y_test)

• Choose random samples from the respective #Regression


dataset.
rfr = RandomForestRegression()
• Generate decision trees for every sample and rfr.fit(X_train,y_train)
check prediction results from every decision tree. y_pred = rfcl.predict(X_test)
accuracy_score(y_pred,y_test)
• Calculate votes for every decision tree and pick the
prediction result that has max votes as the final
class prediction.

www.analyticsindiamag.com

You might also like