Text-to-Image using Stable Diffusion HuggingFace Model
Last Updated :
15 Apr, 2025
Models available through HuggingFace utilize advanced machine-learning techniques for a variety of applications, from natural language processing to computer vision. Recently, they have expanded to include the ability to generate images directly from text descriptions, prominently featuring models like Stable Diffusion. In this article, we will explore how we can use the Stable Diffusion XL base model to transform textual descriptions into vivid images.
Pre-requisites
- diffusers: A library from HuggingFace for diffusion models, commonly used for generative tasks such as text-to-image generation.
- invisible_watermark: This library is typically used to embed and detect invisible watermarks in digital images, useful for copyright protection.
Download Prerequisites:
pip install diffusers
pip install invisible-watermark transformers accelerate safetensors
Stable Diffusion XL Base Model for Text-to-Image
The Stable Diffusion XL base model is an advanced version of the popular Stable Diffusion model, designed for generating high-quality images from textual descriptions. This model is part of the broader category of diffusion models, which have gained significant attention for their ability to produce detailed and coherent images.
Implementing Stable Diffusion XL Base Model To Generate Images From Text
1. Using Diffusers Library
We will implement the code in Google Collab for computational efficiency.
The steps are as follows:
- Imports:
DiffusionPipeline
from diffusers
for handling diffusion model components.torch
for tensor operations and device management.
- Model Initialization:
- Loads the "stabilityai/stable-diffusion-xl-base-1.0" model using
DiffusionPipeline.from_pretrained()
. - Sets tensor data type to
torch.float16
for reduced memory usage. - Enables
safetensors
for secure tensor serialization. - Specifies a model variant optimized for float16 operations.
- Device Configuration:
- Transfers the model pipeline to GPU with
pipe.to("cuda")
for faster processing.
- Prompt Setting:
- Defines the text prompt "An astronaut riding a horse".
- Image Generation:
- Generates an image from the prompt, extracting the first image from the output batch with
.images[0]
.
Python
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
use_safetensors=True,
variant="fp16")
pipe.to("cuda")
prompt = "An astronaut riding a green horse"
images = pipe(prompt=prompt).images[0]
image
Output:
.png)
2. Implementation using HuggingFace Inference API
We can aslo use the HuggingFace Inference API following the steps:
- Navigate to the model page on the official website : Stable Diffusion XL Base Model.
- Click on the "Deploy" button as highlighted in the above image.
- Select "Inference API" from the options provided.
- Copy the generated code snippet in your desired language.
Here's how to incorporate the code into the implementation:
Python
import requests
import io
from PIL import Image
from IPython.display import display
API_URL = "https://api-inference.huggingface.co/models/stabilityai/stable-diffusion-xl-base-1.0"
headers = {"Authorization": "Bearer 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"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.content
image_bytes = query({
"inputs": "Dog playing",
})
# Convert the image bytes to a PIL image
image = Image.open(io.BytesIO(image_bytes))
# Display the image
display(image)
Output:
.jpg)
Thus the article provides a clear and detailed guide on generating images from text using HuggingFace models, catering to both beginners and experienced users. We can use this models in various tasks with considerations about copyright infringements.
Similar Reads
Text Classification using HuggingFace Model Text classification is a pivotal task in natural language processing (NLP) that categorizes text into predefined categories. It is widely used in sentiment analysis, spam detection, topic labeling, and more. The development of transformer-based models, such as those provided by Hugging Face, has sig
3 min read
Text-to-Video Synthesis using HuggingFace Model The emergence of deep learning has brought forward numerous innovations, particularly in natural language processing and computer vision. Recently, the synthesis of video content from textual descriptions has emerged as an exciting frontier. Hugging Face, a leader in artificial intelligence (AI) res
6 min read
Zero-Shot Text Classification using HuggingFace Model Zero-shot text classification is a groundbreaking technique that allows for categorizing text into predefined labels without any prior training on those specific labels. This method is particularly useful when labeled data is scarce or unavailable. Leveraging the HuggingFace Transformers library, we
4 min read
Text2Text Generations using HuggingFace Model Text2Text generation is a versatile and powerful approach in Natural Language Processing (NLP) that involves transforming one piece of text into another. This can include tasks such as translation, summarization, question answering, and more. HuggingFace, a leading provider of NLP tools, offers a ro
5 min read
Build Text To Image with HuggingFace Diffusers This article will implement the Text 2 Image application using the Hugging Face Diffusers library. We will demonstrate two different pipelines with 2 different pre-trained Stable Diffusion models. Before we dive into code implementation, let us understand Stable Diffusion. What is Stable Diffusion?W
5 min read