Programme 5: Classification with support vector machines (SVM)
Problem Statement
The goal of this case study is to predict the quality of red wine based on various physicochemical
properties using a Support Vector Machine (SVM). The dataset winequality_red.csy contains several
features such as fixed acidity, volatileacidity, citric aeid, and more, as well as a target variable indicating
the quality rating of the wine on a seale from 0 to 10, Accurate prediction of wine quality can be valuable
for wine producers and consumers by helping to assess and ensure quality standards,
import pandas as pd # Import the pandas library for data manipulation
from skleam.mode!_selection import train_test_split # Import function to split data into training and.
testing sets
from skleam. preprocessing import StandardSealer # Import StandardScaler to standardize features
from skleam.svm import SVC # Import Support Vector Classifier for SVM modeling
from sklearn metrics import confusion_matrix, accuracy_score, precision_score, recall_score,
fl_score # Import various metrics to evaluate the model
# Load the dataset
data = pd.read_esv('winequality_red.csv') # Read the dataset from a CSV file into a DataFrame
# Display the details of the dataset
print(data.infoQ)) # Print summary information about the Datal
non-null counts of each column
# Features and target variable
X =data.drop(‘quality’, axis=1) # Drop the ‘quality’ column to ereate the feature set (X) which
contains all colurms except ‘quality’
lata[quality’] # Create the target variable (y) which contains only the ‘quality’ column
4# Preprocessing: Standardize the features
scaler = StandardScaler() # Create an instance of StandardScaler
X sealed
the features
caler-fit_transform(X) # Fit the scaler to the feature data and transform it, standardizing
# Split the data into training and testing setsX_train, X_test, y train, y test =train_test_split(X_scaled, y, test_size-0.3, random_state—42) #
Split the data into training and testing sets with 30% of the data used for testing and a fixed random
seed for reproducibility
# Initialize and train the SVM model with balanced class weights
model = SVC(kemel=rbf’, gamma~'scale’, C=1.0, class_weight~'balanced’, probability~True) #
Create an instance of SVC with radial basis function kernel, automatic gamma sealing, regularization
parameter C set to 1.0, balanced class weights, and probability estimates enabled
model.fit(X_train, y_train) # Train the SVM model using the training data
# Make predictions on the test set
y_pred = model.predict(X_test) # Use the trained model to predict the labels for the test set
# Calculate and print accuracy
accuracy ~ accuracy score(y test, y pred) # Calculate the accuracy of the predictions
print(M\nAccuracy: {accuracy:.4f}") # Print the accuracy with 4 decimal places
print("\nConfusion Matrix:")
em = confusion_ma
of the classification
(y_test, y_pred) # Compute the confusion matrix to evaluate the performanee:
print(em) #Print the confusion matrix
precision = precision_score(y_test, y_pred, average-None) # Caleulate precision scores for each
class, without averaging
recall =recall_seore(y_test, y pred, average-None) # Calculate recall scores for each class, without
averaging
f= fl_score(y_test, y_pred, average=None) # Calculate F1 scores for each class, without averaging
print("\nPrecision for each clas
for, pin cnumerate(precision): # Iterate through each class's precision score
print(I"Class {i}: {p-4f}") # Print the precision score for cach class with 4 decimal places
print("\nRecall for each class:")
fori,r in enumerate(recall): #f Herate through each class's reeall score:
print(fClass {i}: {r.4f}") # Print the recall score for each class with 4 decimal placesprint("\nF1 Score for each elass:")
for i, fin enumerate(f1): # lterate through each class's FI score
prini(("Class {i}: {£.40}") 4 Print the F1 score for each class with 4 decimal places