Report
Report
Learning Models
i. Team Information
Members:
Abstract
The wine industry relies heavily on maintaining consistent quality standards to meet customer
expectations and ensure brand reputation. This project investigates the use of machine
learning models to automate the process of wine quality classification, thereby reducing
reliance on subjective and time-consuming sensory evaluations.
ii. Introduction
Context and Importance:
Wine quality significantly influences consumer satisfaction and purchasing decisions in the
competitive beverage industry. Traditional quality assessment methods rely on human tasters,
introducing subjectivity and variability into the process. Machine learning provides an
opportunity to automate and standardize this evaluation, ensuring consistency, reducing costs,
and offering valuable insights into the production process.
Objectives:
1. Develop and train machine learning models to classify wine quality using
physicochemical properties.
2. Evaluate the performance of these models using appropriate metrics.
3. Identify the model best suited for wine quality classification and explore its practical
applications.
This study aims to provide actionable insights into applying machine learning for automated
quality control in wine production.
iii. Background
Machine Learning in Quality Control:
Machine learning is widely used across industries for tasks such as defect detection, anomaly
detection, and classification. In the beverage industry, it has been applied to optimize
production processes, ensure consistent product quality, and enhance customer satisfaction.
Related Work:
1. Studies highlight the success of ensemble models (e.g., Random Forest, Gradient
Boosting) in handling non-linear relationships and noisy datasets.
2. Simple models like Logistic Regression are valued for their interpretability, though
they often underperform in complex tasks.
3. Neural Networks and SVM are well-regarded for their ability to model non-linear
relationships, but they require more computational resources.
This project builds on these findings, balancing interpretability and accuracy to determine the
best approach for wine quality classification.
iv. Dataset
Source and Description:
The dataset used is the Wine Quality Dataset, sourced from the UCI Machine Learning
Repository. It includes data on 1,599 red wine samples from Portugal's vineyards.
1. Class Imbalance: Quality scores are skewed toward certain values (e.g., 5 and 6),
requiring preprocessing techniques to address imbalance.
2. Multicollinearity: Some features may be highly correlated, potentially impacting
model performance.
Preprocessing Steps:
v. Methodology
Model Development:
We implemented five models, each tailored to address specific challenges in the dataset:
1. Logistic Regression:
o Used multinomial logistic regression for multi-class classification.
o Hyperparameter tuning was performed using GridSearchCV to optimize
performance.
2. Decision Tree:
o Limited tree depth to prevent overfitting.
o Captured non-linear relationships between features and quality scores.
5. Neural Networks:
o Designed a feedforward architecture with two hidden layers (128 and 64
neurons) and ReLU activation.
o Configured dropout layers to reduce overfitting.
Evaluation Metrics:
vi. Results
Key Observations
Visualizations
vii. Discussion
Analysis of Results:
Neural Networks captured complex patterns and delivered the best results across all
metrics.
KNN performed nearly as well, highlighting its suitability for moderately complex
datasets.
Logistic Regression struggled due to its linear assumptions, highlighting its
limitations for non-linear problems.
Comparison
viii. Conclusion
This study demonstrated the effectiveness of machine learning for wine quality classification.
Neural Networks emerged as the top-performing model with an accuracy of 80.5%,
showcasing their potential for complex classification tasks. These findings highlight the
opportunity to integrate machine learning into wine production workflows, automating
quality control and optimizing processes.
x. References
1. "Wine Quality Data Set," UCI Machine Learning Repository,
https://archive.ics.uci.edu/ml/datasets/wine+quality.
2. Géron, A., "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow,"
2nd Edition, O'Reilly Media, 2019.
3. Scikit-learn Documentation, https://scikit-learn.org.
4. Chen, T., & Guestrin, C., "XGBoost: A Scalable Tree Boosting System," Proceedings
of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining, 2016.