0% found this document useful (0 votes)
21 views3 pages

Linear Regression

linear regression

Uploaded by

nicolaas.ryota
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views3 pages

Linear Regression

linear regression

Uploaded by

nicolaas.ryota
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Linear Regression

Linear Regression is a fundamental and widely used supervised machine learning algorithm for regression tasks. It
assumes a linear relationship between the independent variables (features) and the dependent variable (target).
For a project like Crab Age Prediction, Linear Regression would aim to predict the age of a crab based on physical
features such as weight, length, and shell dimensions.

Application in Crab Age Prediction


In the Crab Age Prediction project, the goal is to predict the crab's age based on its features (e.g., weight, shell
length). Linear Regression could be used if the relationship between the features and age is approximately linear.

Steps to Use Linear Regression for Crab Age Prediction


1. Data Collection:
Collect data on crabs, including their physical measurements (e.g., weight, shell length, width) and their
actual age.
2. Data Preprocessing:
o Handle missing data by imputation.
o Standardize or normalize features to bring them to a similar scale.
o Check for multicollinearity (correlation between features), as it can affect the performance.
3. Model Building:
o Fit the Linear Regression model using a training dataset.
o Use libraries like Scikit-learn in Python for implementation.
4. Evaluation Metrics:
o Mean Absolute Error (MAE): Average of absolute differences between actual and predicted values.
o Mean Squared Error (MSE): Average of squared differences between actual and predicted values.
o R-squared (R2R^2): Proportion of variance in the target variable explained by the model.

Advantages of Linear Regression for Crab Age Prediction


1. Simplicity:
Linear Regression is simple to implement and interpret.
2. Efficiency:
Computationally inexpensive, especially for small or medium-sized datasets.
3. Explainability:
Coefficients (β\beta) provide insights into the contribution of each feature to the prediction.

Limitations
1. Linearity Assumption:
Linear Regression assumes a linear relationship between features and target, which may not hold true for
complex data like crab age prediction.
2. Sensitivity to Outliers:
Outliers can heavily influence the regression line, leading to inaccurate predictions.
3. Feature Dependency:
Multicollinearity among features can cause instability in coefficient estimates.
CODE
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
import matplotlib.pyplot as plt

# Load the dataset


file_path = '/mnt/data/CrabAgePredictionDataset.csv'
data = pd.read_csv(file_path)

# Encode the 'Sex' column (categorical to numerical)


label_encoder = LabelEncoder()
data['Sex'] = label_encoder.fit_transform(data['Sex'])

# Features and target


X = data.drop('Age', axis=1)
y = data['Age']

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features


scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train a Linear Regression model


linear_model = LinearRegression()
linear_model.fit(X_train, y_train)

# Predict and evaluate


y_pred = linear_model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Print evaluation metrics


print("Mean Squared Error (MSE):", mse)
print("Mean Absolute Error (MAE):", mae)
print("R-squared Score (R2):", r2)

# Plot predicted vs actual values


plt.figure(figsize=(8, 6))
plt.scatter(y_test, y_pred, alpha=0.7, color='blue')
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red', linestyle='--')
plt.title('Predicted vs Actual Values')
plt.xlabel('Actual Age')
plt.ylabel('Predicted Age')
plt.grid(True)
plt.show()
 Input: The dataset provided (CrabAgePredictionDataset.csv) will be used. It includes:
 Numerical and categorical features (like Length, Sex, etc.)
 Target column (Age) for regression.
 Output:
 Metrics:
o Mean Squared Error (MSE): Reflects the average squared difference between predicted and actual
values.
o Mean Absolute Error (MAE): Measures the average magnitude of prediction errors.
o R-squared (R²): Indicates how well the model explains the variance in the data (values closer to 1 are
better).
 Visualization:
o A scatter plot comparing predicted vs. actual values, with a reference line (ideal prediction).
 Expected Accuracy:
 MSE, MAE, and R² scores will reflect the accuracy.
 R² closer to 1 means the model is performing well.

You might also like