0% found this document useful (0 votes)
92 views5 pages

Alpaca + Codellama 34b Full Example - Ipynb - Colab

The document demonstrates how to use Unsloth to load a CodeLlama model, prepare an Alpaca dataset, and train the model on the data using TRL. It loads a CodeLlama-34b model in a quantized 4-bit format, prepares an Alpaca dataset, and trains the model for 60 steps using TRL's SFTTrainer.

Uploaded by

itisqh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views5 pages

Alpaca + Codellama 34b Full Example - Ipynb - Colab

The document demonstrates how to use Unsloth to load a CodeLlama model, prepare an Alpaca dataset, and train the model on the data using TRL. It loads a CodeLlama-34b model in a quantized 4-bit format, prepares an Alpaca dataset, and trains the model for 60 steps using TRL's SFTTrainer.

Uploaded by

itisqh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2024/4/25 19:12 Alpaca + Codellama 34b full example.

ipynb - Colab

To run this, press "Runtime" and press "Run all" on a free Tesla T4 Google Colab instance!

Join Discord if you need help + support us if you can!

To install Unsloth on your own computer, follow the installation instructions on our Github page here.

You will learn how to do data prep, how to train, how to run the model, & how to save it (eg for Llama.cpp).

%%capture
import torch
major_version, minor_version = torch.cuda.get_device_capability()
# Must install separately since Colab has torch 2.2.1, which breaks packages
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
if major_version >= 8:
# Use this for new GPUs like Ampere, Hopper GPUs (RTX 30xx, RTX 40xx, A100, H100, L40)
!pip install --no-deps packaging ninja einops flash-attn xformers trl peft accelerate bitsandbytes
else:
# Use this for older GPUs (V100, Tesla T4, RTX 20xx)
!pip install --no-deps xformers trl peft accelerate bitsandbytes
pass

We support Llama, Mistral, CodeLlama, TinyLlama, Vicuna, Open Hermes etc


And Yi, Qwen (llamafied), Deepseek, all Llama, Mistral derived archs.
We support 16bit LoRA or 4bit QLoRA. Both 2x faster.
max_seq_length can be set to anything, since we do automatic RoPE Scaling via kaiokendev's method.
With PR 26037, we support downloading 4bit models 4x faster! Our repo has Llama, Mistral 4bit models.
[NEW] We make Gemma 6 trillion tokens 2.5x faster! See our Gemma notebook

from unsloth import FastLanguageModel


import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(


model_name = "unsloth/codellama-34b-bnb-4bit", # "codellama/CodeLlama-34b-hf" for 16bit loading
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
# token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing 1/5
2024/4/25 19:12 Alpaca + Codellama 34b full example.ipynb - Colab

/usr/local/lib/python3.10/dist-packages/unsloth/__init__.py:67: UserWarning: CUDA is not link


We shall run `ldconfig /usr/lib64-nvidia` to try to fix it.
warnings.warn(
config.json: 100% 1.10k/1.10k [00:00<00:00, 92.7kB/s]
==((====))== Unsloth: Fast Llama patching release 2023.12
\\ /| GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB
O^O/ \_/ \ CUDA capability = 8.0. Xformers = 0.0.22.post7. FA = True.
\ / Pytorch version: 2.1.0+cu121. CUDA Toolkit = 12.1
"-____-" bfloat16 = TRUE. Platform = Linux

You passed `quantization_config` to `from_pretrained` but the model you're loading already ha
model.safetensors.index.json: 198k/198k [00:00<00:00,

100% 12.4MB/s]

Downloading shards: 4/4 [14:45<00:00,

100% 209.39s/it]

model-00001-of- 4.98G/4.98G [03:54<00:00,

00004.safetensors: 100% 21.5MB/s]

model-00002-of- 5.00G/5.00G [04:14<00:00,

00004.safetensors: 100% 18.1MB/s]

model-00003-of- 5.00G/5.00G [03:54<00:00,

00004.safetensors: 100% 19.7MB/s]

model-00004-of- 3.21G/3.21G [02:38<00:00,

00004.safetensors: 100% 21.4MB/s]

Loading checkpoint shards: 4/4 [00:06<00:00,

100% 1.54s/it]

generation_config.json: 116/116 [00:00<00:00,

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

model = FastLanguageModel.get_peft_model(
model,
r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Currently only supports dropout = 0
bias = "none", # Currently only supports bias = "none"
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
max_seq_length = max_seq_length,
)

Unsloth 2023.12 patched 48 layers with 48 QKV layers, 48 O layers and 48 MLP layers.

keyboard_arrow_down Data Prep


We now use the Alpaca dataset from yahma, which is a filtered version of 52K of the original Alpaca dataset. You can replace this code section
with your own data prep.

[NOTE] To train only on completions (ignoring the user's input) read TRL's docs here.
keyboard_arrow_down

Alpaca dataset preparation code

显示代码

Downloading readme: 100% 11.6k/11.6k [00:00<00:00, 855kB/s]

Downloading data: 100% 44.3M/44.3M [00:02<00:00, 23.6MB/s]

Generating train split: 51760/0 [00:00<00:00, 137153.36 examples/s]

Map: 100% 51760/51760 [00:00<00:00, 102802.36 examples/s]

keyboard_arrow_down Train the model


https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing 2/5
2024/4/25 19:12 Alpaca + Codellama 34b full example.ipynb - Colab
Now let's use Huggingface TRL's SFTTrainer ! More docs here: TRL SFT docs. We do 60 steps to speed things up, but you can set
num_train_epochs=1 for a full run, and turn off max_steps=None . We also support TRL's DPOTrainer !

from trl import SFTTrainer


from transformers import TrainingArguments

trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
args = TrainingArguments(
per_device_train_batch_size = 4,
gradient_accumulation_steps = 4,
warmup_steps = 10,
max_steps = 120,
learning_rate = 2e-4,
fp16 = not torch.cuda.is_bf16_supported(),
bf16 = torch.cuda.is_bf16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
),
)

Special tokens have been added in the vocabulary, make sure the associated word embeddings ar
Map: 51760/51760 [00:09<00:00, 5660.67
keyboard_arrow_down

Show current memory stats

显示代码

GPU = NVIDIA A100-SXM4-40GB. Max memory = 39.564 GB.


17.791 GB of memory reserved.

trainer_stats = trainer.train()

https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing 3/5
2024/4/25 19:12 Alpaca + Codellama 34b full example.ipynb - Colab

You're using a CodeLlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, us
Unsloth: `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=F
[120/120 16:48, Epoch 0/1]
Step Training Loss

1 1.589300

2 1.634000

3 1.608500

4 1.398300

5 1.652000

6 1.335000

7 1.408300

8 1.282900

9 1.352700

10 1.055300

11 1.100300

12 0.997700

13 1.052700

14 0.966900

15 0.882100

16 0.863700

17 0.846300

18 0.886200

19 0.724900

20 1.072100

21 0.856300

22 0.827600

23 0.875100

24 0.937700

25 0.886100

26 0.885200

27 0.992800

28 0.848600
keyboard_arrow_down

Show final memory and time stats

显示代码

1020.2541 seconds used for training.


17.0 minutes used for training.
Peak reserved memory = 23.98 GB.
Peak reserved memory for training = 6.189 GB.
Peak reserved memory % of max memory = 60.611 %.
Peak reserved memory for training % of max memory = 15.643 %.

keyboard_arrow_down Inference
Let's run the model! You can change the instruction and input - leave the output blank!

https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing 4/5
2024/4/25 19:12 Alpaca + Codellama 34b full example.ipynb - Colab
inputs = tokenizer(
[
alpaca_prompt.format(
"Continue the fibonnaci sequence.", # instruction
"1, 1, 2, 3, 5, 8", # input
"", # output - leave this blank for generation!
)
]*1, return_tensors = "pt").to("cuda")

keyboard_arrow_down Saving, loading finetuned models


outputs = model.generate(**inputs, max_new_tokens = 128, use_cache = True)
tokenizer.batch_decode(outputs)
To save the final model, either use Huggingface's push_to_hub for an online save or save_pretrained for a local save.
/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1547: UserWarning: You have modified the pretrained model configuration
To savewarnings.warn(
to GGUF / llama.cpp , or for model merging, use model.merge_and_unload first, then save the model. Maxime Labonne's llm-course has a
['<s> Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately
nice tutorial on converting HF to GGUF! This issue might be helpful for more info.
completes the request.\n\n### Instruction:\nContinue the fibonnaci sequence.\n\n### Input:\n1, 1, 2, 3, 5, 8\n\n### Response:\n1, 1, 2, 3, 5, 8,
13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393']
model.save_pretrained("lora_model") # Local saving

https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing 5/5

You might also like