Aiml Lab Week 8 Set-2
Aiml Lab Week 8 Set-2
Task Instructions:
Dataset Download:
• Fashion MNIST Dataset: The task is to classify fashion items (10
categories) using grayscale images from the Fashion MNIST dataset.
• Dataset URL: Fashion MNIST Dataset
(https://www.kaggle.com/datasets/zalando-research/fashionmnist)
Data Exploration:
1. Load the dataset into a DataFrame df_train for training data and df_test
for testing data.
o The Fashion MNIST dataset is available via libraries such as
keras.datasets.fashion_mnist.
2. Print statistical values such as mean, median, standard deviation for
each feature.
o Use functions like df.describe() for numerical statistics.
o Investigate the distribution of pixel values.
3. Check for missing values:
o Verify if there are missing values in the training or testing set.
Handle them accordingly (e.g., if required, scale or impute).
School of Computer Science Engineering and Technology
Data Preparation:
1. Preprocess Data:
o Normalize the data: Scale the pixel values to a range between 0
and 1 by dividing the image pixel values by 255.
o Reshape the images: The images are 28x28 pixels. Instead of
flattening, reshape them into (28,28,1) to maintain spatial features.
o Encode the labels: The target variable (fashion item) should be
one-hot encoded for classification.
2. Split the data into training (df_train) and testing (df_test) sets, ideally
80-20 split (train-test split can be done using train_test_split from
sklearn).
Performance Evaluation:
1. Evaluate the performance of the model using:
o Accuracy (most common metric for classification).
o Confusion matrix to evaluate how well the model predicted the
fashion items.
2. Compare the performance of models by varying the number of filters in
convolutional layers and kernel size. For example, try different
configurations (e.g., 32, 64, 128 filters) and compare the results.
3. Analyze the model’s overfitting:
o If overfitting occurs, consider adding Dropout layers or
EarlyStopping to prevent this.
Visualization:
1. Plot the training vs. validation accuracy and loss over epochs to
visually assess how well the model is learning.
2. Visualize the predictions:
o Show a sample of test images and their predicted vs. actual
labels.
School of Computer Science Engineering and Technology
Additional Tasks:
1. Experiment with different number of convolutional layers and filter
sizes. Compare performance by changing the number of filters in each
layer.
2. Hyperparameter tuning: Experiment with different configurations of
the model (e.g., number of filters, kernel sizes, activation functions)
and tune the model using techniques like GridSearchCV or
RandomSearch.
3. Evaluate using Cross-validation:
o Implement K-fold cross-validation to assess the model’s
consistency and generalizability.
Deliverables:
1. Submit the complete Python code used for implementing the assignment.
o Code for data preprocessing, model definition, training,
evaluation, and visualization.
2. Provide a detailed report summarizing the following:
o Findings from data exploration and preprocessing.
o Performance metrics (accuracy, loss, confusion matrix) for the
model.
o Visualizations of training history, predictions, and confusion
matrix.
School of Computer Science Engineering and Technology