0% found this document useful (0 votes)
13 views

2. Random Forest Algorithm

2. Random Forest Algorithm

Uploaded by

nicolaas.ryota
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

2. Random Forest Algorithm

2. Random Forest Algorithm

Uploaded by

nicolaas.ryota
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Random Forest Algorithm (for Crab Age Prediction)

How it Works: Random Forest is an ensemble learning algorithm that creates multiple decision trees. It splits data
randomly at each node and averages the predictions of all trees for regression tasks like predicting the age of crabs.

Steps:

1. Collect Data: Gather crab data (e.g., size, weight, shell dimensions) and their ages.

2. Preprocess Data: Handle missing data and split the data into training and testing sets.

3. Train Model: Build a Random Forest model using the training data.

4. Evaluate: Use metrics like Mean Absolute Error (MAE) and R² to assess the model’s performance.

Advantages:

 Can capture complex, non-linear relationships.

 Robust to overfitting and handles missing data well.

CODE
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import StandardScaler

# Load your dataset (replace 'your_dataset.csv' with the actual file path)
dataset = pd.read_csv('your_dataset.csv')

# Assume the last column is the target variable


X = dataset.iloc[:, :-1] # Features
y = dataset.iloc[:, -1] # Target variable

# Preprocess the features (Standardizing the data)


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split the data into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Create the Random Forest model with default parameters


model = RandomForestClassifier(random_state=42)

# Hyperparameter tuning using GridSearchCV to find the best parameters


param_grid = {
'n_estimators': [50, 100, 200], # Number of trees
'max_depth': [None, 10, 20, 30], # Maximum depth of trees
'min_samples_split': [2, 5, 10], # Minimum samples required to split a node
'min_samples_leaf': [1, 2, 4], # Minimum samples required at a leaf node
'bootstrap': [True, False] # Bootstrap sampling (whether to use bootstrapping)
}
# Set up GridSearchCV with cross-validation
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, n_jobs=-1, verbose=2)

# Fit the GridSearchCV model on the training data


grid_search.fit(X_train, y_train)

# Get the best parameters from the grid search


best_params = grid_search.best_params_
print(f"Best Hyperparameters: {best_params}")

# Train the Random Forest model with the best parameters


best_model = grid_search.best_estimator_

# Predict on the test set


y_pred = best_model.predict(X_test)

# Evaluate the model's accuracy


accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of Random Forest model: {accuracy * 100:.2f}%")

# Print a classification report for more detailed performance analysis


print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Perform cross-validation to assess the model's stability


cv_scores = cross_val_score(best_model, X_scaled, y, cv=5)
print(f"Cross-Validation Accuracy: {cv_scores.mean() * 100:.2f}% ± {cv_scores.std() * 100:.2f}%")

Accuracy of Random Forest model: 80.00%

You might also like