Open navigation menu

Scribd

0% found this document useful (0 votes)

362 views78 pages

Building LLMs - Stanford

How to build llm

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

362 views78 pages

Building LLMs - Stanford

How to build llm

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 78

Introduction to

Building LLMs
CS229: Machine Learning
Yann Dubois | Aug. 13th 2024

Slides partially based on CS336, CS224N, CS324

2

LLMs

• LLMs & chatbots took over the world

• How do they work?

3

What matters when training LLMs

• Architecture
Most of
Transformer
academia
• Training algorithm/loss

• Data Model
What
• Evaluation matters in
practice
• Systems
Overview
Pretraining -> GPT3
• Task & loss

Post-training -> ChatGPT

5

Language Modeling
• LM: probability distribution over sequences of tokens/words 𝑝 𝑥1 , … , 𝑥𝐿
P(the, mouse, ate, the, cheese) = 0.02
P(the, the, mouse, ate, cheese) = 0.0001 Syntactic knowledge

P(the, cheese, ate, the, mouse) = 0.001 Semantic knowledge

• LMs are generative models: x1:L ~ 𝑝 𝑥1 , … , 𝑥𝐿

• Autoregressive (AR) language models:

𝑝 𝑥1 , … , 𝑥𝐿 = 𝑝 𝑥1 𝑝 𝑥2 𝑥1 𝑝 𝑥3 𝑥2 , 𝑥1 … = ෑ 𝑝 𝑥𝑖 𝑥1:𝑖−1)
𝑖
No approx: chain rule of probability

=> You only need a model that can predict the next token given past context!
6

dogs
AR Language Models 5

• Task: predict the next word

• Steps:
1. tokenize Model
2. forward
3. predict probability of next token
4. sample 1 2 3
Inference only
5. detokenize She likely prefers
7

AR Neural Language Models

https://lena-voita.github.io/nlp_course/language_modeling.html#intro
8

Loss
• Classify next tokens’ index
• => cross-entropy loss

https://lena-voita.github.io/nlp_course/language_modeling.html#intro
• => maximize text’s log-likelihood

max ෑ 𝑝 𝑥𝑖 𝑥1:𝑖−1 ) = min − ෍ log 𝑝 𝑥𝑖 | 𝑥𝑖:𝑖−1 = min ℒ(𝑥𝑖:𝐿 )

𝑖 𝑖
9

Tokenizer
• Why?
• More general than words (eg typos) tokenizer:
text to token
• Shorter sequences than with characters

• Idea: tokens as common subsequences (~3 letters)

• Eg: Byte Pair Encoding (BPE). Train steps: index
1. Take large corpus of text
Start with one token per character
Merge common pairs of tokens into a token
Repeat until desired vocab size
10

Tokenizer
• Why?
• More general than words (eg typos) tokenizer:
text to token
• Shorter sequences than with characters

• Idea: tokens as common subsequences

• Eg: Byte Pair Encoding (BPE). Train steps: index
1. Take large corpus of text
2. Start with one token per character
3. Merge common pairs of tokens into a token
4. Repeat until desired vocab size
11

Tokenizer
• Why?
• More general than words (eg typos) tokenizer:
text to token
• Shorter sequences than with characters

• Idea: tokens as common subsequences

• Eg: Byte Pair Encoding (BPE). Train steps: index
1. Take large corpus of text
2. Start with one token per character
3. Merge common pairs of tokens into a token
4. Repeat until desired vocab size
12

Tokenizer
• Why?
• More general than words (eg typos) tokenizer:
text to token
• Shorter sequences than with characters

• Idea: tokens as common subsequences

• Eg: Byte Pair Encoding (BPE). Train steps: index
1. Take large corpus of text
2. Start with one token per character
3. Merge common pairs of tokens into a token
4. Repeat until desired vocab size or all merged
13

Tokenizer
• Why?
• More general than words (eg typos) tokenizer:
text to token
• Shorter sequences than with characters

• Idea: tokens as common subsequences

• Eg: Byte Pair Encoding (BPE). Train steps: index
1. Take large corpus of text
2. Start with one token per character
3. Merge common pairs of tokens into a token
4. Repeat until desired vocab size or all merged
14

Tokenizer
• Why?
• More general than words (eg typos) tokenizer:
text to token
• Shorter sequences than with characters

• Idea: tokens as common subsequences

• Eg: Byte Pair Encoding (BPE). Train steps: index
1. Take large corpus of text
2. Start with one token per character
3. Merge common pairs of tokens into a token
4. Repeat until desired vocab size or all merged
15

Tokenizer
• Why?
• More general than words (eg typos) tokenizer:
text to token
• Shorter sequences than with characters

• Idea: tokens as common subsequences

• Eg: Byte Pair Encoding (BPE). Train steps: index
1. Take large corpus of text
2. Start with one token per character
3. Merge common pairs of tokens into a token
4. Repeat until desired vocab size or all merged
Overview
Pretraining -> GPT3
• Task & loss
• Evaluation

Post-training -> ChatGPT

17

LLM evaluation: Perplexity

• Idea: validation loss
1
𝑃𝑃𝐿 𝑥1:𝐿 = 2𝐿 ℒ(𝑥1:𝐿 ) = ∏ 𝑝 𝑥𝑖 𝑥1:𝑖−1 −1/𝐿

• To be more interpretable: use perplexity

• avg per token (~independent of length)

• Exponentiate => units independent of log base

• Perplexity: between 1 and |Vocab|

• Intuition: number of tokens that you are hesitating between
18

LLM evaluation: Perplexity

Between 2017-2023, models went from ”hesitating” between ~70 tokens to <10 tokens
Perplexity not used anymore for academic benchmark but still important for development
19

LLM Evaluation: agg. std NLP benchmarks

Holistic evaluation of language models (HELM) Huggingface open LLM leaderboard

collect many automatically evaluatable

benchmarks, evaluate across them
20

LLM Evaluation: agg. std NLP benchmarks

• Mix of things that can be “easily”
evaluated

• Typically there is “gold” answer

=> you likelihood of LLM to
predict that vs other options

HELM-lite
[Liang+ 2022]
21

LLM Evaluation: eg MMLU

• Example: MMLU
• ~Most trusted pretraining benchmark

MMLU
[Hendrycks+ 2020]
22

Evaluation: challenges
• Sensitivity to prompting/inconsistencies
23

Evaluation: challenges
• Sensitivity to prompting/inconsistencies
• Train & test contamination (~not important for development)
Overview
Pretraining -> GPT3
• Task & loss
• Evaluation
• Data

Post-training -> ChatGPT

25

Data
• Idea: use all of the clean internet
• Note: internet is dirty & not representative of what we want. Practice:
1. Download all of internet. Common crawl: 250 billion pages, > 1PB (>1e6 GB)

2. Text extraction from HTML (challenges: math, boiler plate)

3. Filter undesirable content (e.g. NSFW, harmful content, PII)

4. Deduplicates (url/document/line). E.g. all the headers/footers/menu in forums are always same

5. Heuristic filtering. Rm low quality documents (e.g. # words, word length, outlier toks, dirty toks)

6. Model based filtering. Predict if page could be references by Wikipedia.

7. Data mix. Classify data categories (code/books/entertainment). Reweight domains using scaling
laws to get high downstream performance.

• Also: lr annealing on high-quality data, continual pretraining with longer context

26

Data
• Collecting well data is a huge part of practical LLM (~the key)
• Lot of research to be done!
• How do you process well and efficiently? • Synthetic data?

• How do you balance domains? • Multi-modal data?

• A lot of secrecy:
• Competitive dynamics • Copyright liability

• Common academic datasets:

• C4 (150B tokens | 800GB) • Dolma (3T tokens)

• The Pile (280B tokens) • FineWeb (15T tokens)

• Closed: LLaMA 2 (2T tokens), LLaMA 3 (15T tokens), GPT-4 (~13T tokens?)
Overview
Pretraining -> GPT3
• Task & loss
• Evaluation
• Data
• Scaling laws

Post-training -> ChatGPT

28

Scaling laws
• Empirically: more data and larger models => better performance
• Large models =/> overfitting

• Idea: predict model performance based on amount of data & parameter

It works for many things! Scaling laws

[Kaplan+ 2020]
29

Scaling laws: tuning

• You have 10K GPUs for a month, what model do you train?

• Old pipeline:
• Tune hyperparameters on big models (e.g. 30 models)

• Pick the best => final model is trained for as much as each filtered out ones (e.g. 1 day)

• New pipeline:
• Find scaling recipes (eg lr decrease with size)
• Tune hyperparameters on small models of different sizes (e.g. for <3 days)
• Extrapolate using scaling laws to larger ones
• Train the final huge model (e.g. >27 days)
30

Scaling laws: eg LSTM

• Q: Should we use transformers or LSTM?

A: Transformers have a better constant and scaling rate (slope)

Scaling laws
[Kaplan+ 2020]
31

Scaling laws: eg Chinchilla

• Q: How do we optimally allocate training* resources (size vs data)?

Isoflop:
Isoflop:
varytokens
vary tokens&&
parameters
parameters

Best tokens
Best for each
parameters isoflop
for each
isoflop

A: Use 20:1 tokens for each parameter (20:1)

Chinchilla
*doesn’t consider inference cost => in practice use larger (> 150:1) [Hoffmann+ 2022]
32

Scaling laws: tuning

• Many questions you can try to answer with scaling laws

• Resource allocation:
• Train models longer vs train bigger models? • Collect more data vs get more GPUs?

• Data:
• Data repetition / multiple epochs? • Data mixture weighting?

• Algorithm:
• Arch: LSTMs vs transformers? • Size: width vs depth?
33

Bitter lesson

• Bitter lesson: models improve with scale & Moore’s Law

=> “only thing that matters in the long run is the leveraging of computation.”

Bitter [Sutton 2019] http://www.incompleteideas.net/IncIdeas/BitterLesson.html

• Don’t spend time over complicating: do the simple things and scale them!
34

Training a SOTA model

~40 tok/param => train
• Example of current SOTA: LLaMA 3 400B compute optimal
Data: 15.6T tokens Parameters: 405B

• FLOPs: 6NP = 6 * 15.6e12 * 405e9 = 3.8 e25 FLOPs ~2x less than executive order

• Compute: 16K H100 with average throughput of 400 TFLOPS

• Time: 3.8e25 / (400e12 * 3600) = 26M GPU hour / (16e3 * 24) = 70 days From paper: ~30M

• Cost: rented compute + salary=~$2/h26Mh + 500k/y50employee= $52M+$25M = ~$75M $65-85M

• Carbon emitted= 26Mh0.7kW0.24kg/kWh = 4400 tCO2eq ~2k return tickets JFK-LHR

• Next model? ~10x more FLOPs

Overview
Pretraining -> GPT3
• Task & loss
• Evaluation
• Data
• Scaling laws
• Systems

Post-training -> ChatGPT

Overview
Pretraining -> GPT3
• Task & loss
• Evaluation
• Data
• Scaling laws
• Systems

Post-training -> ChatGPT

• Task
37

Language Modeling ≠ assisting users

• Problem: language modeling is not what we want
38

Task: “alignment”
• Goal: LLM follows user instructions and designer’s desires (eg moderation)
X

• Background:
• data of desired behaviors is what we want but scarce and expensive
• pretraining data scales but is not what we want

• Idea: finetune pretrained LLM on a little desired data => “post-”training

Overview
Pretraining -> GPT3
• Task & loss
• Evaluation
• Data
• Scaling laws
• Systems

Post-training -> ChatGPT

• Task
• SFT: data & loss
40

Supervised finetuning (SFT)

• Idea: finetune the LLM with language modeling of the desired answers
Next word prediction “supervised”
• How do we collect the data? Ask humans

OpenAssistant
[Kopf+ 2023]

This was the ~key to GPT3 -> ChatGPT model!

41

Scalable data for SFT: eg Alpaca

• Problem: human data is slow to collect and expensive
• Idea: use LLMs to scale data collection

Alpaca
[Taori+ 2023]

Started for academic replication of ChatGPT but “synthetic data generation” is now hot topic!
43

Scalable data for SFT: quantity?

• You need very little data for SFT! ~few thousand

LIMA
[Zhou+ 2023]

• Just learns the format of desired answers (length, bullet points, …)

• The knowledge is already in the pretrained LLM!
• Specializes to one “type of user”
Overview
Pretraining -> GPT3
• Task & loss
• Evaluation
• Data
• Scaling laws
• Systems

Post-training -> ChatGPT

• Task
• SFT: data & loss
• RLHF : data & loss
45

RL from Human Feedback (RLHF)

• Problem: SFT is behavior cloning of humans
1. Bound by human abilities: humans may prefer things that they are not able to generate
2. Hallucination: cloning correct answer teaches LLM to hallucinate if it didn’t know about it!

If LLM doesn’t know [Bivens 2013] => teaches the model to make up plausibly sounding referneces

3. Price: collecting ideal answers is expensive

46

RLHF
• Idea: maximize human preference rather than clone their behavior
• Pipeline:
1. For each instruction: generate 2 answers from a pretty good model (SFT)

2. Ask labelers to select their preferred answers

3. Finetune the model to generate more preferred answers

How??

Instruction
47

RLHF: PPO
• Idea: use reinforcement learning
• What is the reward?
• Option 1: whether the model’s output is preferred to some baseline
• Issue: binary reward doesn’t have much information

• Option 2: train a reward model R using a logistic regression loss to classify preferences.
exp(𝑅(𝑥, 𝑦ො𝑖 ))
𝑝 𝑖>𝑗 = [Bradley-Terry 1952]
exp 𝑅(𝑥, 𝑦ො𝑖 ) + exp 𝑅(𝑥, 𝑦ො𝑗 )
• Use logits R(…) as reward => continuous information => information heavy!

ො
𝑝𝜃 (𝑦|𝑥)
• Optimize 𝔼𝑦∼𝑝
ො 𝜃 (𝑦|𝑥)
ො 𝑅 𝑥, 𝑦ො − 𝛽 log 𝑝 ො
using PPO
𝑟𝑒𝑓 (𝑦|𝑥)

-> regularization avoids overoptimization

• Note: LMs are policies not a model of some distribution

48

RLHF: PPO -> ChatGPT

RLHF
[Ouyang+ 2022]
49

RLHF: PPO challenges

• Problem: RL in theory simple, in practice messy (clipping, rollouts, outer loops,…)

AlpacaFarm
[Dubois+ 2023]

Rollout

Idealized PPO in LM setting

50

RLHF: DPO
• Idea: maximize probability of preferred output, minimize the other

DPO
[Rafailov+ 2023]

• This is ~equivalent (same global minima) to RLHF/PPO

• Much simpler than PPO and performs as well => standard (in open source community)
51

RLHF: gains

PPO DPO

SFT

Pretrain

Learn to summarize AlpacaFarm

[Stiennon+ 2020] [Dubois+ 2023]
52

RLHF: human data

• Data: human crowdsourcing
example

guidelines
53

RLHF: challenges of human data

• Slow & expensive

• Hard to focus on correctness rather than form (eg length) LLM Opinion
[Santurkar+ 202
• Annotator distribution shifts its behavior Posttrain
Pretrain
Long way to go
• Crowdsourcing ethics [Singhal+ 2024]
54

RLHF: LLM data

• Idea: replace human preferences with LLM preferences

Works surprisingly well!

=> Standard in open community

AlpacaFarm
[Dubois+ 2023]
Overview
Pretraining -> GPT3
• Task & loss
• Evaluation
• Data
• Scaling laws

Post-training -> ChatGPT

• Task
• SFT: data & loss
• RLHF : data & loss
• Evaluation
56

Evaluation: aligned LLM

• How do we evaluate something like ChatGPT?

• Challenges:
• Can’t use validation loss to compare different methods
Some aligned
• Can’t use perplexity: not calibrated LLMs are policies!
• Large diversity
• Open-ended tasks => hard to automate

• Idea: ask for annotator preference between answers

InstructGPT
[Ouyang+ 2022]
57

Human evaluation: eg ChatBot Arena

• Idea: have users interact (blinded) with two chatbots, rate which is better.

• Problem: cost & speed!

ChatBot Arena
[Chiang+ 2024]
58

LLM evaluation: eg AlpacaEval

• Idea: use LLM instead of human

• Steps:
• For each instruction: generate output by baseline and model to eval

• Ask GPT-4 which output is better

• Average win-probability => win rate
LLM
• Benefits:
Evaluate
• 98% correlation with ChatBot Arena VS
• < 3 min and < $10

• Challenge: spurious correlation AlpacaEval

[Li+ 2023]
59

LLM evaluation: spurious correlation

• e.g. LLM prefers longer outputs

• Possible solution: regression analysis / causal innferece to “control” length

AlpacaEval LC
[Dubois+ 2023]
Overview
Pretraining -> GPT3
• Task & loss
• Evaluation
• Data
• Scaling laws
• Systems

Post-training -> ChatGPT

61

Systems

• Problem: everyone is bottlenecked by compute!

• Why not buy more GPUs?
• GPUs are expensive and scarce!

• Physical limitations (eg communication between GPUs)

• => importance of resource allocation (scaling laws) and optimized pipelines

62

Systems 101: GPUs

• Massively parallel: same instruction applied on all thread but different inputs.
=> Optimized for throughput!

SM
Streaming
Multiprocessors
63

Systems 101: GPUs

• Massively parallel
• Fast matrix multiplication: special cores >10x faster than other fp ops
64

Systems 101: GPUs

• Massively parallel
• Fast matrix multiplication

• Compute > memory & communication:

• Hard to keep processors fed with data

BERT transformer

DataMovement
Matmul
[Ivanov+ 2020]
Activation
65

Systems 101: GPUs

• Massively parallel
• Fast matrix multiplication

• Compute > memory & communication

• Memory hierarchy:
• Closer to cores => faster but less memory
• Further from cores => more memory but slower
66

Systems 101: GPUs

• Massively parallel
• Fast matrix multiplication

• Compute > memory & communication

• Memory hierarchy

• Metric: Model Flop Utilization (MFU)

• Ratio: observed throughput / theoretical best for that GPU

• 50% is great!
68

Systems: low precision

• Fewer bits => faster communication & lower memory consumption
• For deep learning: decimal precision ~doesn’t matter except exp & updates
• Matrix multiplications can use bf16 instead of fp32

• For training: Automatic Mixed Precision (AMP)

• Weights stored in fp32, but before computation convert to bf16
• Activation in bf16 => main memory gains
• (Only) matrix multiplication in bf16 => speed gains
• Gradients in bf16 => memory gains
• Master weights updated fp32 => full precision
69

Systems: operator fusion

• Problem:
• communication is slow
• every new PyTorch line moves variables to global memory

• Idea: communicate once

• torch.compile DRAM SRAM
&
Compute
70

Systems: tiling
• Idea: group and order threads to minimize global memory access (slow)

• Eg matrix multiplication

• Compute matrix multiplications in subphases to reuse memory

1. Load M_00 and N_00 tiles into SM

2. Compute partial sums for P

3. Load M_00 and N_20 into SM
4. …

• => reuse reads (~cache)

E.g. assume that thread can only keep 8 values in memory.
• T reduction of global reads
Then have to reread all values (no cache hits)!
71

Systems: eg FlashAttention
• Idea: kernel fusion, tiling, recomputation for attention!

• 1.7x end to end speed up!

FlashAttention
[Dao+ 2022]
72

Systems: parallelization
• Problem:
• model very big => can’t fit on one GPU
• Want to use as many GPUs as possible

• Idea: split memory and compute across GPUs

• Background: to naively train a P parameter model you need at least 16P GB of DRAM
• 4P GB for model weights
• 2 * 4P GB for optimizer
• 4P GP for gradients

• E.g. for 7B model you need 112GB!

73

Systems: data parallelism

• Goal: use more GPUs
• Naïve data parallelization:
1. Copy model & optimizer on each GPU

2. Split data

3. Communicate and reduce (sum) gradients

• Pro: use parallel GPU

• Con: no memory gains!

74

Systems: data parallelism

• Goal: split up memory

• Idea: each GPU updates subset of weights and them before next step => sharding

ZeRO
[Rajbhandari+ 2019]
75

Systems: model parallelism

• Problem: data parallelism only works if batch size >= # GPUS

• Idea: have every GPU take care of applying specific parameters (rather than updating)
• Eg pipeline parallel: every GPU has different layer

GPipe
[Huang+ 2018]
76

Systems: model parallelism

• Problem: data parallelism only works if batch size >= # GPUS

• Idea: have every GPU take care of applying specific parameters (rather than updating)
• Eg pipeline parallel: every GPU has different layer

• Eg tensor parallel: split single matrix across GPUs and use partial sum

Megatron-LM:
[Shoeybi+ 2019]
77

Systems: architecture sparsity

• Idea: models are huge => not every datapoint needs to go through every parameter
• Eg Mixture of Experts: use a selector layer to have less “active” parameter => same FLOPs
more parameters

Sparse Expert Models:

[Fedus+ 2012]
Wrap-up
79

Outlook
Haven’t touched upon:

• Architecture: MoE & SSM • Misuse

• Decoding & inference • Context size

• UI & tools: ChatGPT • Data wall

• Multimodality • Legality of data collection

Going further:
• CS224N: more of the background and historical context. Some adjacent material.
• CS324: more in-depth reading and lectures.

• CS336: you actually build your LLM. Heavy workload!

Questions?

You might also like

Esp32s3 Camera Mastery Free
No ratings yet
Esp32s3 Camera Mastery Free
124 pages
Hailort 4.18.0 User Guide
0% (1)
Hailort 4.18.0 User Guide
294 pages
7-Knowledge Distillation
No ratings yet
7-Knowledge Distillation
29 pages
T.O.R.C.S. Manual Installation and Robot Tutorial
No ratings yet
T.O.R.C.S. Manual Installation and Robot Tutorial
154 pages
Glove
100% (1)
Glove
10 pages
Enhancing Performance OF Autosar Transformer and Autosar Applications Using Gpus
No ratings yet
Enhancing Performance OF Autosar Transformer and Autosar Applications Using Gpus
100 pages
Lan - Guage Mo - Del Cheat Sheet
100% (2)
Lan - Guage Mo - Del Cheat Sheet
3 pages
2018 Indycar Rossi Instruct
No ratings yet
2018 Indycar Rossi Instruct
44 pages
LN NN Rug
No ratings yet
LN NN Rug
215 pages
How To Set-Up and Run A 2D Flow Simulation in Siemens Simcenter NX12
No ratings yet
How To Set-Up and Run A 2D Flow Simulation in Siemens Simcenter NX12
34 pages
FIA 2025 Formula 1 Technical Regulations - Issue 02 - 2025-02-26 PDF
No ratings yet
FIA 2025 Formula 1 Technical Regulations - Issue 02 - 2025-02-26 PDF
179 pages
Hailo Model Zoo v2.14.0
No ratings yet
Hailo Model Zoo v2.14.0
100 pages
S73042 Dynamo Tutorial GTC 2025
No ratings yet
S73042 Dynamo Tutorial GTC 2025
79 pages
Fine Tuning
No ratings yet
Fine Tuning
24 pages
Stable Diffusion
No ratings yet
Stable Diffusion
23 pages
Radiation Effects and Reactor Materials Materials: NUCL 520 NUCL 520
No ratings yet
Radiation Effects and Reactor Materials Materials: NUCL 520 NUCL 520
76 pages
NPU MachineLearning
No ratings yet
NPU MachineLearning
28 pages
5 Techiques To FineTune LLMs
No ratings yet
5 Techiques To FineTune LLMs
7 pages
Knowledge Graph Construction Using Large Language Models
No ratings yet
Knowledge Graph Construction Using Large Language Models
17 pages
Fine-Tuning Large Language Models For Specialized Use Cases - 2025
No ratings yet
Fine-Tuning Large Language Models For Specialized Use Cases - 2025
13 pages
Industrial Robotics: Fundamentals
0% (1)
Industrial Robotics: Fundamentals
16 pages
Techniques To FineTune LLMs
No ratings yet
Techniques To FineTune LLMs
7 pages
Assignment-Fuzzy Inference Problem
No ratings yet
Assignment-Fuzzy Inference Problem
6 pages
Computer Vision Ii: Ai Courses by Opencv
No ratings yet
Computer Vision Ii: Ai Courses by Opencv
8 pages
TensorRT Release Notes
No ratings yet
TensorRT Release Notes
66 pages
A Quick Introduction To Tensorflow: Machine Learning Spring 2019
100% (1)
A Quick Introduction To Tensorflow: Machine Learning Spring 2019
22 pages
Computer Vision Pretrained Models: What Is Pre-Trained Model?
No ratings yet
Computer Vision Pretrained Models: What Is Pre-Trained Model?
10 pages
TensorFlow Tutorial
No ratings yet
TensorFlow Tutorial
65 pages
Introduction To Ai
No ratings yet
Introduction To Ai
79 pages
Chapter4 Beyond Classical Search
No ratings yet
Chapter4 Beyond Classical Search
34 pages
Eights LLM Model App
No ratings yet
Eights LLM Model App
8 pages
1905.13750 Sketch2code Generating A Website From A Paper
No ratings yet
1905.13750 Sketch2code Generating A Website From A Paper
64 pages
Chapter 1 Introduction To Computer Vision and Image Processing For
No ratings yet
Chapter 1 Introduction To Computer Vision and Image Processing For
81 pages
UNIT-I - Introduction To Computer Vision
No ratings yet
UNIT-I - Introduction To Computer Vision
45 pages
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
From Everand
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
Kameron Hussain
No ratings yet
HoloLive User Guide 1.7
No ratings yet
HoloLive User Guide 1.7
21 pages
Gluon Tutorials: Deep Learning - The Straight Dope
No ratings yet
Gluon Tutorials: Deep Learning - The Straight Dope
403 pages
Robo Sumo Rules & Regulation
No ratings yet
Robo Sumo Rules & Regulation
5 pages
Maze Solving Robot PDF
No ratings yet
Maze Solving Robot PDF
20 pages
NCA-GENL Nvidia Generative Ai Llms Exam Dumps
No ratings yet
NCA-GENL Nvidia Generative Ai Llms Exam Dumps
5 pages
Ece Syllabus R-21
No ratings yet
Ece Syllabus R-21
272 pages
Protege Tutorial
No ratings yet
Protege Tutorial
40 pages
Forward and Inverse Kinematics of NAO
No ratings yet
Forward and Inverse Kinematics of NAO
78 pages
Fundamentals of Robotics - PPT1
No ratings yet
Fundamentals of Robotics - PPT1
175 pages
A Training Report On Automation: Completed at Sofcon PVT LTD, Ahemdabad'
No ratings yet
A Training Report On Automation: Completed at Sofcon PVT LTD, Ahemdabad'
36 pages
Final Module 2
No ratings yet
Final Module 2
32 pages
Extracting Text From Scanned PDF Using Pytesseract & Open CV
No ratings yet
Extracting Text From Scanned PDF Using Pytesseract & Open CV
9 pages
Machine Learning: Backpropagation
No ratings yet
Machine Learning: Backpropagation
24 pages
Bias-Variance Tradeoff Presentation
No ratings yet
Bias-Variance Tradeoff Presentation
11 pages
NAO Technical Brochure
No ratings yet
NAO Technical Brochure
18 pages
Sunswinger Pendulum Sunswinger Pendulum Sunswinger Pendulum: ® Skill Level: Beginner (Soldering Req'D)
100% (1)
Sunswinger Pendulum Sunswinger Pendulum Sunswinger Pendulum: ® Skill Level: Beginner (Soldering Req'D)
22 pages
Vistein Dissertation
No ratings yet
Vistein Dissertation
239 pages
AI and Robotics
No ratings yet
AI and Robotics
33 pages
Advanced Prompt Engineering
No ratings yet
Advanced Prompt Engineering
27 pages
Learning Computational Thinking Through Robotic Competitions
No ratings yet
Learning Computational Thinking Through Robotic Competitions
8 pages
PLC Notes 1
No ratings yet
PLC Notes 1
18 pages
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
No ratings yet
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
53 pages
DAB311 DL Week 11 RNN
No ratings yet
DAB311 DL Week 11 RNN
25 pages
LLM Prompting & In-Context Learning
No ratings yet
LLM Prompting & In-Context Learning
18 pages
LLaMA Ankit - Rawat
No ratings yet
LLaMA Ankit - Rawat
52 pages
VLSI Signal Processing Basics and Iteration Bound K.K. Parhi
100% (1)
VLSI Signal Processing Basics and Iteration Bound K.K. Parhi
49 pages
(W 11093) Zhang Et Al 2023 Artificial Intelligence Enhanced Molecular Simulations
No ratings yet
(W 11093) Zhang Et Al 2023 Artificial Intelligence Enhanced Molecular Simulations
13 pages
Expectation Maximisation Algorithm
No ratings yet
Expectation Maximisation Algorithm
11 pages
Wa0001.
No ratings yet
Wa0001.
17 pages
DeepSpeed Inference - Enabling Efficient Inference of Transformer Models at Unprecedented Scale
No ratings yet
DeepSpeed Inference - Enabling Efficient Inference of Transformer Models at Unprecedented Scale
13 pages
High Performance Cluster Computing:: Architectures and Systems
No ratings yet
High Performance Cluster Computing:: Architectures and Systems
70 pages
Process Part 3: Operating System
No ratings yet
Process Part 3: Operating System
6 pages
Ontents: Asic Tructure of Omputers
0% (1)
Ontents: Asic Tructure of Omputers
7 pages
Performance and Scalability Class
No ratings yet
Performance and Scalability Class
63 pages
Shared Memory Architecture
No ratings yet
Shared Memory Architecture
63 pages
Chapter 1 - Introduction - 2023 - Programming Massively Parallel Processors
No ratings yet
Chapter 1 - Introduction - 2023 - Programming Massively Parallel Processors
20 pages
Notes
No ratings yet
Notes
11 pages
Comparison of Parallel Quick and Merge Sort Algorithms On Architecture With Shared Memory
No ratings yet
Comparison of Parallel Quick and Merge Sort Algorithms On Architecture With Shared Memory
6 pages
Array Processor
100% (1)
Array Processor
8 pages
Quake Server Parallelization: ECE1747 Project Report December 20, 2004
No ratings yet
Quake Server Parallelization: ECE1747 Project Report December 20, 2004
35 pages
Ds Assignment
No ratings yet
Ds Assignment
6 pages
Chapter 01
No ratings yet
Chapter 01
49 pages
A Survey of Performance Modeling and Simulation Techniques For Accelerator-Based Computing
No ratings yet
A Survey of Performance Modeling and Simulation Techniques For Accelerator-Based Computing
10 pages
Parallel Terminology 2
No ratings yet
Parallel Terminology 2
7 pages
Coa 2022-2023 Pyq - SS1
No ratings yet
Coa 2022-2023 Pyq - SS1
3 pages
HPC Fall 2010: Prof. Robert Van Engelen
No ratings yet
HPC Fall 2010: Prof. Robert Van Engelen
35 pages
Big Data
No ratings yet
Big Data
28 pages
P&DC Course Information Sheet
No ratings yet
P&DC Course Information Sheet
4 pages
Unit 4: Aneka Cloud Application Platform
No ratings yet
Unit 4: Aneka Cloud Application Platform
9 pages
Lecture 3.1.4 (Amdahl's Law)
No ratings yet
Lecture 3.1.4 (Amdahl's Law)
4 pages
Transbase Release Notes Version 6 8 1 English
No ratings yet
Transbase Release Notes Version 6 8 1 English
37 pages
Part One Relational Databases
No ratings yet
Part One Relational Databases
9 pages
HPC 2025
No ratings yet
HPC 2025
16 pages
Final HPC Unit I
No ratings yet
Final HPC Unit I
42 pages
Hun Yuan Video
No ratings yet
Hun Yuan Video
35 pages