Decision Tree Algorithm (for Crab Age Prediction)
How it Works: A Decision Tree splits data into branches based on feature values, forming a tree-like structure. Each
node represents a decision based on a feature, and each leaf node represents a predicted value (e.g., the crab's age).
Steps:
1. Collect Data: Gather data on crabs' physical features (e.g., weight, shell length) and their corresponding ages.
2. Preprocess Data: Handle missing data, and split into training and testing sets.
3. Train Model: Build a Decision Tree by recursively splitting the data based on the best feature that reduces
impurity (e.g., using metrics like Gini or MSE).
4. Evaluate: Use metrics like Mean Absolute Error (MAE) and R² to evaluate the model's accuracy.
Advantages:
Easy to interpret and visualize.
Can handle both numerical and categorical data.
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
# Load your dataset (replace 'your_dataset.csv' with your actual file)
dataset = pd.read_csv('your_dataset.csv')
# Features (X) and target variable (y)
X = dataset.iloc[:, :-1].values # Features (all columns except the target)
y = dataset.iloc[:, -1].values # Target variable (the last column)
# Split the data into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create the Decision Tree model
model = DecisionTreeClassifier(random_state=42)
# Train the model
model.fit(X_train, y_train)
# Make predictions on the test data
y_pred = model.predict(X_test)
# Evaluate the model using accuracy score
accuracy = accuracy_score(y_test, y_pred)
# Print the accuracy
print(f"Accuracy of Decision Tree model: {accuracy * 100:.2f}%")
Accuracy of Decision Tree model: 80.00%