0% found this document useful (0 votes)
20 views6 pages

Report

Abc

Uploaded by

Umaid Ali Keerio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views6 pages

Report

Abc

Uploaded by

Umaid Ali Keerio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Classification of Wine Quality Using Machine

Learning Models

i. Team Information
Members:

 Member 1: [Name], ID: [ID], Signature: [Signature]


 Member 2: [Name], ID: [ID], Signature: [Signature]
 Member 3: [Name], ID: [ID], Signature: [Signature]
 Member 4: [Name], ID: [ID], Signature: [Signature]

Abstract
The wine industry relies heavily on maintaining consistent quality standards to meet customer
expectations and ensure brand reputation. This project investigates the use of machine
learning models to automate the process of wine quality classification, thereby reducing
reliance on subjective and time-consuming sensory evaluations.

We applied five machine learning algorithms—Logistic Regression, Decision Trees, Support


Vector Machines (SVM), K-Nearest Neighbors (KNN), and Neural Networks—to predict
wine quality based on physicochemical properties. Evaluation metrics such as accuracy,
precision, recall, F1-score, and ROC-AUC were used to measure performance. Neural
Networks outperformed other models, achieving an accuracy of 80.5%, demonstrating their
effectiveness for complex classification tasks.

ii. Introduction
Context and Importance:

Wine quality significantly influences consumer satisfaction and purchasing decisions in the
competitive beverage industry. Traditional quality assessment methods rely on human tasters,
introducing subjectivity and variability into the process. Machine learning provides an
opportunity to automate and standardize this evaluation, ensuring consistency, reducing costs,
and offering valuable insights into the production process.

Objectives:

1. Develop and train machine learning models to classify wine quality using
physicochemical properties.
2. Evaluate the performance of these models using appropriate metrics.
3. Identify the model best suited for wine quality classification and explore its practical
applications.

This study aims to provide actionable insights into applying machine learning for automated
quality control in wine production.

iii. Background
Machine Learning in Quality Control:

Machine learning is widely used across industries for tasks such as defect detection, anomaly
detection, and classification. In the beverage industry, it has been applied to optimize
production processes, ensure consistent product quality, and enhance customer satisfaction.

Related Work:

1. Studies highlight the success of ensemble models (e.g., Random Forest, Gradient
Boosting) in handling non-linear relationships and noisy datasets.
2. Simple models like Logistic Regression are valued for their interpretability, though
they often underperform in complex tasks.
3. Neural Networks and SVM are well-regarded for their ability to model non-linear
relationships, but they require more computational resources.

This project builds on these findings, balancing interpretability and accuracy to determine the
best approach for wine quality classification.

iv. Dataset
Source and Description:

The dataset used is the Wine Quality Dataset, sourced from the UCI Machine Learning
Repository. It includes data on 1,599 red wine samples from Portugal's vineyards.

Features and Target Variable:

1. Input Features: 11 physicochemical properties, including:


o Fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur
dioxide, total sulfur dioxide, density, pH, sulfates, and alcohol.
2. Target Variable: A wine quality score ranging from 0 to 10, determined by sensory
evaluations.
Challenges:

1. Class Imbalance: Quality scores are skewed toward certain values (e.g., 5 and 6),
requiring preprocessing techniques to address imbalance.
2. Multicollinearity: Some features may be highly correlated, potentially impacting
model performance.

Preprocessing Steps:

1. Class Balancing: SMOTE (Synthetic Minority Oversampling Technique) was used to


create synthetic samples for underrepresented classes.
2. Feature Selection: Random Forest identified the 8 most important features to reduce
noise and improve model performance.
3. Scaling: StandardScaler normalized feature values to ensure equal weighting during
model training.

v. Methodology
Model Development:

We implemented five models, each tailored to address specific challenges in the dataset:

1. Logistic Regression:
o Used multinomial logistic regression for multi-class classification.
o Hyperparameter tuning was performed using GridSearchCV to optimize
performance.

2. Decision Tree:
o Limited tree depth to prevent overfitting.
o Captured non-linear relationships between features and quality scores.

3. SVM (Support Vector Machine):


o Employed an RBF kernel to handle non-linear decision boundaries.
o Configured probability outputs for ROC analysis.

4. KNN (K-Nearest Neighbors):


o Weighted neighbors by distance to improve classification in overlapping
regions.
o Optimized the number of neighbors (k) to balance bias and variance.

5. Neural Networks:
o Designed a feedforward architecture with two hidden layers (128 and 64
neurons) and ReLU activation.
o Configured dropout layers to reduce overfitting.
Evaluation Metrics:

1. Accuracy: Proportion of correct predictions.


2. Precision, Recall, and F1-Score: Evaluated the balance between false positives and
false negatives.
3. ROC Curves: Visualized the trade-offs between sensitivity and specificity for each
model.

vi. Results

Model Accuracy Precision Recall F1-Score


Logistic
36.7% 35.6% 36.7% 35.2%
Regression
Decision Tree 63.5% 63.3% 63.5% 63.1%
SVM 72.8% 72.5% 72.8% 72.3%
KNN 79.9% 79.1% 79.9% 79.3%
Neural Network 80.5% 80.4% 80.5% 80.2%

Key Observations

1. Neural Networks achieved the highest accuracy, followed closely by KNN.


2. Logistic Regression performed poorly due to its linear assumptions.
3. SVM and Decision Trees demonstrated moderate performance.

Visualizations

 Confusion Matrices: Showed that Neural Networks consistently reduced


misclassifications across classes.
 ROC Curves: Highlighted superior performance of Neural Networks and KNN in
distinguishing between classes.
Confusion Matrix Graphs for all Models

vii. Discussion
Analysis of Results:

 Neural Networks captured complex patterns and delivered the best results across all
metrics.
 KNN performed nearly as well, highlighting its suitability for moderately complex
datasets.
 Logistic Regression struggled due to its linear assumptions, highlighting its
limitations for non-linear problems.

Comparison

Metric Logistic Decision SVM KNN Neural


Regression Tree Network
Accuracy 36.7% 63.5% 72.8% 79.9% 80.5%
Precision 35.6% 63.3% 72.5% 79.1% 80.4%

Use Case Benefits:

1. Automated Quality Control: Reduces reliance on subjective human evaluations.


2. Production Insights: Identifies key factors influencing wine quality.
3. Improved Customer Satisfaction: Ensures consistent quality, building trust.

viii. Conclusion
This study demonstrated the effectiveness of machine learning for wine quality classification.
Neural Networks emerged as the top-performing model with an accuracy of 80.5%,
showcasing their potential for complex classification tasks. These findings highlight the
opportunity to integrate machine learning into wine production workflows, automating
quality control and optimizing processes.

ix. Future Work


1. Data Augmentation: Incorporate datasets for white wines and other varieties.
2. Advanced Architectures: Test convolutional and recurrent neural networks for
enhanced performance.
3. Explainability: Use SHAP or LIME to interpret model predictions.
4. Real-Time Applications: Develop predictive tools for real-time quality assessment in
production lines.

x. References
1. "Wine Quality Data Set," UCI Machine Learning Repository,
https://archive.ics.uci.edu/ml/datasets/wine+quality.
2. Géron, A., "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow,"
2nd Edition, O'Reilly Media, 2019.
3. Scikit-learn Documentation, https://scikit-learn.org.
4. Chen, T., & Guestrin, C., "XGBoost: A Scalable Tree Boosting System," Proceedings
of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining, 2016.

You might also like