Generative AI Mini Projects
Generative AI Mini Projects
Here are some mini-project ideas for generative AI along with brief explanations. I can
provide you with pseudocode or direct links to source code repositories if needed.
Tech Stack: Python, Flask (or Django), HTML/CSS for the UI.
Features:
2. Text-to-Image Generator
Description: Create a web app where users input a description, and the app generates
an image using DALL·E or Stable Diffusion.
Features:
1/39
3. AI-Based Story Generator
Description: Generate short stories based on user inputs like genre, theme, and
characters using OpenAI's GPT model.
Features:
Features:
Features:
2/39
6. Handwritten Digit Generator
Description: Train a GAN to generate realistic handwritten digits (MNIST dataset).
Features:
Features:
Features:
3/39
9. Poetry Generator
Description: Use generative AI to create poetry based on user-specified themes or
keywords.
Features:
Features:
Let me know which project interests you most, and I can guide you further!
This project involves building a machine learning model using TensorFlow to classify
shopping products based on their images and text descriptions. Here's a step-by-step guide:
4/39
1. Project Setup
Requirements
TensorFlow 2.x
2. Data Collection
You can use a publicly available dataset like:
Your own dataset: Images and descriptions of products labeled with categories.
3. Preprocessing
Image Preprocessing
python
5/39
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
train_images = image_gen.flow_from_directory(
"data/train/",
target_size=(128, 128),
batch_size=32,
class_mode='categorical'
)
Text Preprocessing
Tokenize and pad the text descriptions using Tokenizer from TensorFlow/Keras.
python
sequences = tokenizer.texts_to_sequences(product_descriptions)
padded_sequences = pad_sequences(sequences, maxlen=100, padding='post')
Label Encoding
python
encoder = LabelEncoder()
labels = encoder.fit_transform(categories)
6/39
4. Model Architecture
This model combines image and text features for classification.
Image Model
python
Text Model
python
text_input = tf.keras.layers.Input(shape=(100,))
embedding = tf.keras.layers.Embedding(5000, 128)(text_input)
lstm = tf.keras.layers.LSTM(128)(embedding)
text_output = tf.keras.layers.Dropout(0.5)(lstm)
Combined Model
python
7/39
Compile and train the model using the combined dataset.
python
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
6. Evaluation
Evaluate the model on a test set.
python
7. Deployment
Save the model:
python
model.save("product_classifier_model.h5")
Deploy it using TensorFlow Serving or integrate it into a Flask/Django web app for
inference.
8/39
8. Example Repository
You can refer to TensorFlow Image and Text Classification for similar implementations.
Sure! Here’s a detailed elaboration of the Auto Classification for Shopping Products using
TensorFlow project. This project involves building a machine learning model that can classify
shopping products into predefined categories based on both product images and text
descriptions.
Project Overview
The goal of this project is to automatically classify products into various categories (such as
"Electronics," "Clothing," "Home Goods," etc.) using a combination of their images and
textual descriptions. By leveraging TensorFlow, you will build a multi-input deep learning
model that processes both types of data (images and text) to predict the product category.
1. Problem Statement
Given a dataset of shopping products with images and descriptions, the model should:
Output: Category of the product (e.g., electronics, clothing, home appliances, etc.).
This is a multi-modal classification problem as the model processes two different types of
data: visual (image) and textual (description).
9/39
2. Dataset
Dataset Options:
Fashion MNIST: A dataset containing grayscale images of clothing items (though it does
not have text).
Custom Dataset: You can create your own dataset, including images and text
descriptions of products, each labeled with a category.
Images: Each product should have an image in standard sizes (128x128 or 224x224,
depending on the model used).
Text Descriptions: A short description of the product (e.g., “A pair of blue denim jeans”).
3. Data Preprocessing
Image Preprocessing:
Resize: Since images come in different sizes, resize them to a standard size (e.g.,
128x128 or 224x224).
python
image_gen = ImageDataGenerator(rescale=1.0/255.0)
train_images = image_gen.flow_from_directory(
"data/train/",
target_size=(128, 128),
batch_size=32,
10/39
class_mode='categorical' # For multi-class classification
)
Text Preprocessing:
Tokenization: Convert words in the descriptions into tokens (numbers). Use Tokenizer
from TensorFlow/Keras.
Padding: Ensure all descriptions are of equal length using padding. This is important for
LSTM layers.
python
sequences = tokenizer.texts_to_sequences(product_descriptions)
padded_sequences = pad_sequences(sequences, maxlen=100, padding='post')
Label Encoding:
Convert the product categories (labels) into numerical values using LabelEncoder .
python
encoder = LabelEncoder()
labels = encoder.fit_transform(categories)
4. Model Architecture
The model architecture will involve two parts:
1. Image Model: A Convolutional Neural Network (CNN) for processing product images.
11/39
We will combine the outputs of both models and add a dense layer for classification.
Image Model:
python
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(128, activation='relu')(x)
image_output = tf.keras.layers.Dropout(0.5)(x)
Text Model:
python
Combined Model:
Merge both the image and text outputs and add a final classification layer.
python
12/39
The softmax activation in the final layer ensures that the output represents the
probability distribution over the categories.
python
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
Training: During training, both the image and text data will be fed into the model, which
will adjust its weights to improve accuracy.
6. Model Evaluation
After training, evaluate the model on a test dataset:
python
You can visualize the training/validation accuracy using a plot to check how well the
model is generalizing.
13/39
python
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(loc='lower right')
plt.show()
7. Deployment
After the model is trained and evaluated, it can be saved for deployment.
python
model.save("product_classifier_model.h5")
You can deploy this model on a web server using frameworks like Flask or Django to
create a REST API for inference. This allows users to upload product images and
descriptions to get the predicted category.
8. Future Enhancements
Model Fine-tuning: Fine-tune a pre-trained model (like ResNet50 or InceptionV3) for the
image part.
Improved Text Model: Use transformer-based models (like BERT) for better text
understanding.
14/39
9. Example Repository
You can look for more complex implementations in these repositories:
This project is a great way to combine two different types of data (image and text) to solve a
real-world problem using deep learning. Let me know if you need further details on any
specific part of the implementation!
Here are some Kaggle datasets for Amazon product data you can explore:
15/39
KAGGLE .
These datasets vary in size and details, so you can choose one that fits your project
requirements.
For your Auto Classification for Shopping Products using TensorFlow project, you should
choose a dataset that includes images, text descriptions, and categories for products.
Based on the datasets I explored, here are the recommendations:
Key Considerations
16/39
For Prototyping: Start with the 10,000 Amazon Products Dataset to build and test your
pipeline.
For Scaling: Transition to the Ultimate Amazon Dataset for a more extensive dataset.
For Specialized Focus: If text is more critical, the Amazon Products Dataset 2023 might
suit your needs.
For your Auto Classification for Shopping Products using TensorFlow, the 10,000 Amazon
Products Dataset is the most suitable. Here's why:
Contains product descriptions (textual data), which can feed into your LSTM model.
While this dataset might not include images directly, you can:
4. Simpler Preprocessing:
As it’s smaller than massive datasets like the Ultimate Amazon Dataset,
preprocessing time (tokenizing text, resizing images, and encoding categories) will
be faster.
17/39
Steps for Your Project:
1. Initial Experimentation:
Use descriptions and category labels to train your text classifier first.
Once the pipeline works well, enrich the dataset with real product images:
3. Deployment Preparation:
Use this dataset to experiment with the multi-modal TensorFlow model architecture
(combining image and text inputs).
Would you like help with integrating an image dataset or setting up a pre-processing
pipeline?
Now that you've downloaded the dataset for your Auto Classification for Shopping
Products using TensorFlow project, here’s a structured step-by-step plan:
Open the dataset and examine its structure (CSV, JSON, etc.).
18/39
Image URL (if present): Source for visual data.
Identify and handle missing entries in critical fields (e.g., description, category).
Tokenize the product descriptions using a library like Keras Tokenizer or NLTK.
Example in Python:
python
Download images and resize them to a consistent size (e.g., 224x224 for models
like ResNet).
3. Label Encoding:
python
19/39
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
df['category_encoded'] = label_encoder.fit_transform(df['category'])
python
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
model = Sequential([
Embedding(input_dim=5000, output_dim=128, input_length=200),
LSTM(128, return_sequences=True),
Dropout(0.3),
LSTM(64),
Dense(64, activation='relu'),
Dense(num_categories, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=
['accuracy'])
Combine image features (CNN output) and text features (LSTM/GRU output) in a
shared dense layer.
20/39
Step 4: Train the Model
1. Split Data:
python
python
2. Hyperparameter Tuning:
Experiment with:
21/39
python
model.save('product_classifier.h5')
Accept user inputs (e.g., product descriptions or images) and return category
predictions.
Would you like detailed guidance on any of these steps? For example, setting up multi-modal
input or API deployment?
Let's start your Auto Classification for Shopping Products using TensorFlow project from
scratch with detailed guidance:
Ensure you have Python installed, and set up a virtual environment for this project.
bash
22/39
project-folder/
├── data/ # For dataset files
├── images/ # For downloaded images (if using images)
├── models/ # For saving trained models
└── scripts/ # For your Python scripts
python
import pandas as pd
python
df = df.dropna(subset=['description', 'category'])
print(f"Cleaned dataset: {len(df)} rows")
23/39
Step 3: Preprocess the Data
python
2. Encode Labels:
python
label_encoder = LabelEncoder()
df['category_encoded'] = label_encoder.fit_transform(df['category'])
print(label_encoder.classes_) # Check the categories
python
24/39
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
num_categories = df['category_encoded'].nunique()
model = Sequential([
Embedding(input_dim=5000, output_dim=128, input_length=200),
LSTM(128, return_sequences=True),
Dropout(0.2),
LSTM(64),
Dense(64, activation='relu'),
Dense(num_categories, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=
['accuracy'])
model.summary()
python
python
25/39
1. Evaluate Performance:
python
python
model.save('models/product_classifier.h5')
python
import requests
from PIL import Image
2. Preprocess Images:
26/39
Load and normalize image data before passing it to the model.
python
image_model = Sequential([
base_model,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(128, activation='relu')
])
2. Combine Models:
python
27/39
Step 8: Deploy the Model
1. Save as API:
2. Frontend Input:
Would you like detailed code for combining text and image models or setting up the API?
yes
Text Model
python
28/39
Image Model
The image model will handle product images using MobileNetV2, a pre-trained CNN.
python
Combine Models
python
# Combine outputs
combined = concatenate([x_text, x_image])
x_combined = Dense(128, activation='relu')(combined)
output = Dense(num_categories, activation='softmax', name="output")(x_combined)
python
29/39
# Define the model
model = Model(inputs=[text_input, image_input], outputs=output)
Ensure all images are resized to (224, 224) and normalized to pixel values in [0,
1] .
python
import numpy as np
from tensorflow.keras.preprocessing.image import load_img, img_to_array
def preprocess_image(image_path):
img = load_img(image_path, target_size=(224, 224))
img_array = img_to_array(img) / 255.0
return img_array
3. Split Data:
Split both text and image data into training and testing sets.
python
30/39
from sklearn.model_selection import train_test_split
python
history = model.fit(
[X_text_train, X_image_train], y_train,
validation_data=([X_text_test, X_image_test], y_test),
epochs=10,
batch_size=32
)
python
model.save("models/multi_modal_product_classifier.h5")
Set Up an API
31/39
Use Flask or FastAPI to create an API endpoint.
1. Install Flask:
bash
python
app = Flask(__name__)
@app.route("/predict", methods=["POST"])
def predict():
text = request.form["text"]
image_file = request.files["image"]
# Preprocess text
sequence = tokenizer.texts_to_sequences([text])
padded_sequence = pad_sequences(sequence, maxlen=200)
# Preprocess image
img = load_img(image_file, target_size=(224, 224))
img_array = img_to_array(img) / 255.0
img_array = np.expand_dims(img_array, axis=0)
# Make prediction
prediction = model.predict([padded_sequence, img_array])
predicted_category = label_encoder.inverse_transform([np.argmax(prediction)])
32/39
return jsonify({"predicted_category": predicted_category[0]})
if __name__ == "__main__":
app.run(debug=True)
bash
python app.py
Use tools like Postman or cURL to send text and image inputs to the endpoint.
Would you like more help with testing the API or visualizing results?
yes
To test and visualize your multi-modal classification API, you can use tools like Postman,
cURL, or write a Python script to automate the process. Below are detailed steps for both
approaches.
1. Using Postman
1. Install Postman:
33/39
Go to the Body tab.
2. Using cURL
bash
The server will return a JSON response with the predicted category.
You can automate the testing with a Python script using the requests library.
bash
34/39
python
import requests
url = "http://127.0.0.1:5000/predict"
data = {
"text": "Sample product description"
}
files = {
"image": open("path_to_image.jpg", "rb")
}
if response.status_code == 200:
print("Prediction:", response.json())
else:
print("Error:", response.text)
Visualizing Results
To better analyze the performance and results, you can create plots for:
python
# Plot accuracy
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epochs')
35/39
plt.ylabel('Accuracy')
plt.legend()
plt.show()
# Plot loss
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
2. Confusion Matrix
1. Generate Predictions:
python
# Make predictions
y_pred = model.predict([X_text_test, X_image_test])
y_pred_classes = np.argmax(y_pred, axis=1)
# Confusion matrix
cm = confusion_matrix(y_test, y_pred_classes)
print(cm)
python
36/39
import seaborn as sns
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
xticklabels=label_encoder.classes_, yticklabels=label_encoder.classes_)
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()
Would you like to discuss more visualization techniques, advanced deployment options (like
Docker), or further improvements to the model?
For your multi-modal product classification project, the Amazon Product Dataset is a good
choice, especially since it contains both product descriptions (text) and images, which fits the
needs of your project. There are a few datasets you can explore that offer a combination of
both text and image data for e-commerce products:
Data:
37/39
It includes both the product title (text) and images associated with it.
Data:
It may not have images, but you can combine it with image data from other sources for
a hybrid model.
4. DeepFashion
DeepFashion is another excellent dataset if you are focusing on fashion product images
and their textual descriptions.
Data:
Fashion-related images.
38/39
If your project is focused on fashion, DeepFashion would be ideal.
eBay Product Dataset is another good choice if you need a broader range of products,
though it's more focused on listings and may not have all the necessary text information
for some cases.
Would you like more specific guidance on how to preprocess these datasets for your model?
39/39