ML Project
ML Project
Car Evaluation
Submitted
In Partial Fulfilment of the Requirements for the Award of Degree
BACHELOR OF TECHNOLOGY
IN
Submitted By
SCHOOL OF ENGINEERING
Department of Computer Science Engineering (Artificial
Intelligence Machine Learning)
5. Discussion
5.1 Analysis of Results
5.2 Strengths and Weaknesses
5.3 Model Interpretability
5.4 Reducing Type II Error
6. Conclusion
6.1 Summary of Findings
6.2 Recommendation
6.3 Future Work
1. Introduction
1.1 Context
1.4 Objective
The objective of this study is to evaluate and compare the performance of various
classification algorithms, including Random Forest, Decision Tree, Logistic
Regression, and K-Nearest Neighbors (KNN), on the Car Evaluation dataset. The
goal is to accurately predict the car evaluation based on input features such as
buying price, maintenance cost, number of doors, capacity, luggage boot size, and
safety rating. The study aims to identify the most effective algorithm by assessing
its accuracy, precision, recall, F1-score, and overall ability to generalize to new,
unseen data.
2 Literature Review
2.1Classification Algorithms
1. DataCleaning:
1.1 Checked for missing values (none present in this dataset).
1.2 Removed duplicates, if any.
2. FeatureEncoding:
2.1 Encoded categorical features (e.g., buying price, maintenance cost,
safety rating) using label encoding to convert them into numerical values
for model compatibility.
3. TargetVariableEncoding:
3.1 Converted the car evaluation target variable (unacceptable, acceptable,
good, very good) into numerical categories (0, 1, 2, 3).
4. Train-TestSplit:
4.1 Split the data into training and testing sets (80%-20%) to evaluate
model performance.
3.2PROGRAM:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
# Load dataset
data = pd.read_csv('car_evaluation.csv')
# Define models
models = {
"Logistic Regression": LogisticRegression(max_iter=1000),
"Decision Tree": DecisionTreeClassifier(random_state=42),
"Random Forest": RandomForestClassifier(random_state=42),
"K-Nearest Neighbors": KNeighborsClassifier(n_neighbors=5)
}
# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(6, 6))
plt.imshow(cm, cmap='Blues', interpolation='none')
plt.colorbar(label="Number of Samples")
plt.xticks(range(len(cm)), ["Unacc", "Acc", "Good", "V-good"],
rotation=45)
plt.yticks(range(len(cm)), ["Unacc", "Acc", "Good", "V-good"])
plt.title(f"Confusion Matrix - {name}")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.tight_layout()
plt.show()
3.3 Evaluation Metrics
The performance of the model was evaluated using the following metrics:
Accuracy: The percentage of correctly classified instances.
Precision: The proportion of true positive predictions among all positive
predictions made by the model.
Recall: The ability of the model to correctly identify all actual positives.
F1-Score: The harmonic mean of precision and recall, providing a balance
between the two metrics.
Confusion Matrix: A visual representation of actual versus predicted
classifications, helping to analyze the model's performance in detail.
4 Experiments and Results
4.1 Algorithm Comparison
The following classification algorithms were implemented and evaluated:
Each algorithm was trained and tested using the same train-test split and evaluated
using standard metrics: accuracy, precision, recall, F1-score, and confusion
matrix.
4.2 Results
The table below summarizes the performance metrics for each algorithm:
1. Cross-Validation:
o All models underwent 5-fold cross-validation to mitigate the impact
of data splits on performance metrics.
o The average metrics across folds confirmed the stability of the
results, with Random Forest consistently outperforming others.
2. Confusion Matrix Analysis:
o The confusion matrix revealed class-wise performance. Random
Forest had the lowest misclassification rates across all quality levels,
while k-NN struggled to distinguish between medium and high-
quality samples.
3. Significance Testing:
o A paired t-test was performed between the best-performing model
(Random Forest) and others to assess statistical significance.
o The p-value was < 0.05 in most cases, indicating that the
performance differences were statistically significant.
1. HighestAccuracy:
Random Forest achieved the highest accuracy (89.7%), indicating it
classified the most samples correctly across all classes.
2. BalancedPerformance:
The model showed strong performance in terms of precision, recall, and
F1-score (all above 89%), suggesting it effectively handled both false
positives and false negatives, making it robust in predicting wine quality.
3. Stability:
Random Forest, being an ensemble method, is less prone to overfitting
compared to individual models like Decision Trees or k-NN, which can be
sensitive to noise in the data.
4. FeatureHandling:
Random Forest performs well with a mixture of numerical features,
capturing complex relationships between variables (like acidity, alcohol
content, and pH), which is crucial for wine quality prediction.
In summary, Random Forest excels due to its high accuracy, ability to generalize
well, and overall balanced performance across various evaluation metrics, making
it the best choice for this task.
5 Discussion
This section provides an in-depth analysis of the results of the classification
model, highlights the strengths and weaknesses of the algorithm, and
discusses the interpretation of the model in the context of wine quality
prediction
The decision tree and K nearest neighbors were not accurate, the decision tree
was often overfitting and K-NN struggled in classifying medium-high-quality
hands
The results confirm that cluster models such as random forests are well suited for
complex, multi-classification problems, especially when the relationships
between inputs are complex.
The random forest showed robustness in dealing with noise and complex
interactions. Its ability to combine results from multiple decision trees mitigates
overfitting, making it a strong candidate for generalization.
The support vector mechanism worked well within clear resolution limits,
especially at high altitudes, and was effective for a variety of complex objects.
Logistic regression provided a good starting point, whose simple and explainable
nature made for easy understanding and quick estimation.
Weaknesses:
Decision tree models were often overfitting, especially when they had
insufficient data or when the hyperparameters were poorly refined.
K-Nearest Neighbors (k-NN) was sensitive to the choice of k and struggled with
unbalanced data distribution, resulting in imprecise predictions for some classes
SVM can be computationally expensive, especially for large data sets, and may
require optimization of kernel functions to obtain optimal results.
To find the algorithm with the fewest Type II errors (false negatives), we need to
look at recall for each model. The second type of error occurs when a model
incorrectly predicts a negative class (i.e., fails to find a truly positive model).
Recall is a direct part of the metric that measures the actual quality of information
that the model correctly identifies. More residuals show fewer Type II errors.
Based on the results of your tests:
Random forests were the most memorable, that is, they correctly identified the
most true positives and reduced false negatives.
Support Vector Machine (SVM) also performed well in recall, with values close
to those of random forest.
Logistic regression and decision trees recalled less compared to random forest,
indicating more true positive misses (higher error type II).
Neighbors in the immediate vicinity of K remembered little, indicating that it
struggled to find positive information (high errors in procedure).
At the end Random forests showed the lowest type II errors (false negatives)
based on high recall in all positive groups.
6 Conclusion:
This section summarizes the key findings from the experiments, offers
recommendations based on the results, and suggests directions for future work
in wine quality prediction using machine learning algorithms.
6.2 Recommendation
Based on the findings, we recommend the following:
1. Random Forest should be the primary choice for wine quality prediction
due to its superior performance across all evaluation metrics, especially in
terms of minimizing Type II error.
2. For cases where model interpretability is crucial, models like Logistic
Regression or Decision Trees can be used, as they offer clearer insights
into the decision-making process, albeit at the cost of slightly lower
performance.
3. SVM can be considered if computational resources allow and if the dataset
grows in size, as it performed well with high-dimensional data.
6.3 Future Work
Hyperparameter settings:
Key Technologies:
Exploring more advanced models such as deep neural networks (DNNs) could be
useful, especially if larger datasets can be obtained in the future.
Excellent speech:
Although random forest is the most efficient model, its definition can be further
enhanced through techniques such as the SHAP criterion or LIME to better
understand its decision-making process