Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Habana Gaudi (HPU) Support #574

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Running Code on Habana Gaudi (HPU)

This directory contains instructions for running inference part from [Chapter 6](../../../ch06/01_main-chapter-code/ch06.ipynb) on Habana Gaudi processors. The code demonstrates how to leverage HPU acceleration.

## Prerequisites

1. **Habana Driver and Libraries**
Make sure you have the correct driver and libraries installed for Gaudi processors. You can follow the official installation guide from Habana Labs:
[Habana Labs Installation Guide](https://docs.habana.ai/en/latest/Installation_Guide/index.html)

2. **SynapseAI SDK**
The SynapseAI SDK includes the compiler, runtime, and various libraries needed to compile and run models on Gaudi hardware.

### Note
If you're using environment with Gaudi HPU instances - this environment probably already has PyTorch version preinstalled (eg. version 2.4.0a0+git74cd574) and this version is optimized for Habana Gaudi processors, so it is important that you do not install another version of PyTorch. Hence, in this folder you'll find another `requirements.txt` file that does not include PyTorch.


## Getting Started
1. **Model Configuration**
The code supports various GPT-2 model sizes:
- GPT-2 Small (124M parameters)
- GPT-2 Medium (355M parameters)
- GPT-2 Large (774M parameters)
- GPT-2 XL (1558M parameters)

2. **Running the Code**

*Note: We assume that you have already downloaded the model weights and placed them in the `gpt2` directory inside this folder. Additionally, we use `review_classifier.pth` weights created in [Chapter 6](../../../ch06/01_main-chapter-code/ch06.ipynb), so you don't need to download them separately. Just copy and paste the `review_classifier.pth` file into this folder.*
- Open the `inference_on_gaudi.ipynb` notebook
- Follow the cells to:
- Initialize the HPU device
- Load and configure the model
- Run inference on the Gaudi processor

3. **Performance Monitoring**
The notebook includes performance comparison tools to measure inference time on CPU vs HPU

## Code Structure

- `inference_on_gaudi.ipynb`: Main notebook for running inference on Gaudi
- `previous_chapters.py`: Supporting code from Chapter 6

## Troubleshooting

- **Driver Issues**: Make sure the driver version matches the SDK version.
- **Performance**: For optimal performance, monitor logs and use Habana's profiling tools to identify bottlenecks.

## Additional Resources

- [Habana Developer Site](https://developer.habana.ai/)
- [SynapseAI Reference](https://docs.habana.ai/en/latest/)
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).
# Source for "Build a Large Language Model From Scratch"
# - https://www.manning.com/books/build-a-large-language-model-from-scratch
# Code: https://github.com/rasbt/LLMs-from-scratch


import os
import urllib.request

# import requests
import json
import numpy as np
import tensorflow as tf
from tqdm import tqdm


def download_and_load_gpt2(model_size, models_dir):
# Validate model size
allowed_sizes = ("124M", "355M", "774M", "1558M")
if model_size not in allowed_sizes:
raise ValueError(f"Model size not in {allowed_sizes}")

# Define paths
model_dir = os.path.join(models_dir, model_size)
base_url = "https://openaipublic.blob.core.windows.net/gpt-2/models"
backup_base_url = "https://f001.backblazeb2.com/file/LLMs-from-scratch/gpt2"
filenames = [
"checkpoint", "encoder.json", "hparams.json",
"model.ckpt.data-00000-of-00001", "model.ckpt.index",
"model.ckpt.meta", "vocab.bpe"
]

# Download files
os.makedirs(model_dir, exist_ok=True)
for filename in filenames:
file_url = os.path.join(base_url, model_size, filename)
backup_url = os.path.join(backup_base_url, model_size, filename)
file_path = os.path.join(model_dir, filename)
download_file(file_url, file_path, backup_url)

# Load settings and params
tf_ckpt_path = tf.train.latest_checkpoint(model_dir)
settings = json.load(open(os.path.join(model_dir, "hparams.json")))
params = load_gpt2_params_from_tf_ckpt(tf_ckpt_path, settings)

return settings, params


def download_file(url, destination, backup_url=None):
def _attempt_download(download_url):
with urllib.request.urlopen(download_url) as response:
# Get the total file size from headers, defaulting to 0 if not present
file_size = int(response.headers.get("Content-Length", 0))

# Check if file exists and has the same size
if os.path.exists(destination):
file_size_local = os.path.getsize(destination)
if file_size == file_size_local:
print(f"File already exists and is up-to-date: {destination}")
return True # Indicate success without re-downloading

block_size = 1024 # 1 Kilobyte

# Initialize the progress bar with total file size
progress_bar_description = os.path.basename(download_url)
with tqdm(total=file_size, unit="iB", unit_scale=True, desc=progress_bar_description) as progress_bar:
with open(destination, "wb") as file:
while True:
chunk = response.read(block_size)
if not chunk:
break
file.write(chunk)
progress_bar.update(len(chunk))
return True

try:
if _attempt_download(url):
return
except (urllib.error.HTTPError, urllib.error.URLError):
if backup_url is not None:
print(f"Primary URL ({url}) failed. Attempting backup URL: {backup_url}")
try:
if _attempt_download(backup_url):
return
except urllib.error.HTTPError:
pass

# If we reach here, both attempts have failed
error_message = (
f"Failed to download from both primary URL ({url})"
f"{' and backup URL (' + backup_url + ')' if backup_url else ''}."
"\nCheck your internet connection or the file availability.\n"
"For help, visit: https://github.com/rasbt/LLMs-from-scratch/discussions/273"
)
print(error_message)
except Exception as e:
print(f"An unexpected error occurred: {e}")


# Alternative way using `requests`
"""
def download_file(url, destination):
# Send a GET request to download the file in streaming mode
response = requests.get(url, stream=True)

# Get the total file size from headers, defaulting to 0 if not present
file_size = int(response.headers.get("content-length", 0))

# Check if file exists and has the same size
if os.path.exists(destination):
file_size_local = os.path.getsize(destination)
if file_size == file_size_local:
print(f"File already exists and is up-to-date: {destination}")
return

# Define the block size for reading the file
block_size = 1024 # 1 Kilobyte

# Initialize the progress bar with total file size
progress_bar_description = url.split("/")[-1] # Extract filename from URL
with tqdm(total=file_size, unit="iB", unit_scale=True, desc=progress_bar_description) as progress_bar:
# Open the destination file in binary write mode
with open(destination, "wb") as file:
# Iterate over the file data in chunks
for chunk in response.iter_content(block_size):
progress_bar.update(len(chunk)) # Update progress bar
file.write(chunk) # Write the chunk to the file
"""


def load_gpt2_params_from_tf_ckpt(ckpt_path, settings):
# Initialize parameters dictionary with empty blocks for each layer
params = {"blocks": [{} for _ in range(settings["n_layer"])]}

# Iterate over each variable in the checkpoint
for name, _ in tf.train.list_variables(ckpt_path):
# Load the variable and remove singleton dimensions
variable_array = np.squeeze(tf.train.load_variable(ckpt_path, name))

# Process the variable name to extract relevant parts
variable_name_parts = name.split("/")[1:] # Skip the 'model/' prefix

# Identify the target dictionary for the variable
target_dict = params
if variable_name_parts[0].startswith("h"):
layer_number = int(variable_name_parts[0][1:])
target_dict = params["blocks"][layer_number]

# Recursively access or create nested dictionaries
for key in variable_name_parts[1:-1]:
target_dict = target_dict.setdefault(key, {})

# Assign the variable array to the last key
last_key = variable_name_parts[-1]
target_dict[last_key] = variable_array

return params
Loading
Loading