Building Finetuning Aimodels
Building Finetuning Aimodels
GenAI Models
SOEN 691: Generative Artificial Intelligence
for Software Engineering
4
Building a Large
Language Model
5
How much does it cost?
• Training the smallest Llama 7b took 180,000 GPU hours
• Renting:
• Nvidia A100: $1-2 per GPU per hour -> whereabouts of $250,000 dollars
Source: Llama 2: Open Foundation and Fine-Tuned Chat Models
6
Building a Large
Language Model
Theoretically…
Source: A Survey of Large Language Models
7
Building a machine learning model
Here are the 4 key steps to build a traditional ML model:
8
What to know more?
Machine Learning
Book freely available
• Covers the basics of classifiers and
Regressions
• Emphasis on interpretability
• How do ML models work?
• Techniques to explain and interpret ML
models
9
Want to know more?
Engineering AI-Based Systems
• Course focused on Engineering AI-
Based Systems
• Requirements
• Architecture and Design
• Implementation
• Testing
• Operations Source: Engineering AI-Based Software Systems
10
Building large language models
Here are the 4 key steps to build a traditional ML model:
15
Step 1. Data Curation/Processing
Large Language Models do not understand "pure text”
17
The Attention Mechanism
Learn to pay attention to some key words, depending on the context
The pink elephant tried to get into a car but it was too ________
18
Why don’t we have the same diversity?
Machine Learning algorithms LLM algorithms
Encoder
20
Encoder-only Architecture
Good for tasks requiring a nuanced understanding of the entire
sentence or code snippet.
• Code review, bug report understanding, and named entity recognition
Source: Large Language Models for Software Engineering: A Systematic Literature Review
21
Encoder-decoder Architecture
Good for translation or summarization tasks.
• Code summarization, (programming?) language translation
Source: Large Language Models for Software Engineering: A Systematic Literature Review
22
Decoder-only Architecture
Good for generative tasks
• Basically any SE tasks that requires generation
Source: Large Language Models for Software Engineering: A Systematic Literature Review
23
Architectures of LLMs for SE
Source: Large Language Models for Software Engineering: A Systematic Literature Review
24
Step 2. Model Architecture
Other design choices and hyper-parameters to tune:
• Activation functions, Layer normalization, Position Embeddings
Model Size
Rule of Thumb:
1 parameter for
~20 tokens
27
Example: Hellaswag
• Can a Machine really finish your sentence?
28
Building large language models
Here are the 4 key steps to build a traditional ML model:
29
DEMO
https://www.kaggle.com/code/diegoeliascosta/soen691-building-your-own-gpt
30
From Base Models to
SE-related Solutions
31
Fine-tuning in a nutshell
1. Define a SE-related task/problem where LLM has the potential to
assist
2. Find a reasonably sized dataset for the model to fine-tune
3. Choose a pre-trained LLM
• Open source or closed source + API
4. Choose the method for fine-tuning
5. Evaluate the fine-tuned model in the SE task
32
What SE Task is commonly selected?
33
SE Related Datasets Challenge:
Which ones are **really**
open source?
35
Fine-tuning strategies
Instruction tuning Alignment tuning
Reinforcement
Learning with
Human
Feedback
39
Course Project
• Careful about the scope of the project
• You should be discussing a Software Engineering problem/application
• Using LLMs for other topics is not part of this course’s scope
• Start diving deep into the type of problem you want to address
40
Paper Critiques
• Next week’s class will be a research discussion class!
• 2 papers to read:
• 1 paper to summarize (half a page)
• 1 paper to critique (1 to 2 pages max)
• Summary + Positive points (3+) + Negative Points (3+)
41