-
Notifications
You must be signed in to change notification settings - Fork 249
Add support for separate bias tensors #1250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Love it, if you want to spin up a PR, I'll gladly take a look |
Draft PR up: #1259 Currently, I have all of my branches in sequence in order to avoid merge conflicts since many of them touch similar portions of the code (particularly around |
gabe-l-hart
added a commit
to gabe-l-hart/torchchat
that referenced
this issue
Oct 7, 2024
pytorch#1250 Branch: BiasTensors-1250 Signed-off-by: Gabe Goodhart <[email protected]>
gabe-l-hart
added a commit
to gabe-l-hart/torchchat
that referenced
this issue
Oct 8, 2024
pytorch#1250 Branch: BiasTensors-1250 Signed-off-by: Gabe Goodhart <[email protected]>
gabe-l-hart
added a commit
to gabe-l-hart/torchchat
that referenced
this issue
Oct 8, 2024
pytorch#1250 Branch: BiasTensors-1250 Signed-off-by: Gabe Goodhart <[email protected]>
gabe-l-hart
added a commit
to gabe-l-hart/torchchat
that referenced
this issue
Oct 8, 2024
pytorch#1250 Branch: BiasTensors-1250 Signed-off-by: Gabe Goodhart <[email protected]>
Jack-Khuu
pushed a commit
that referenced
this issue
Oct 9, 2024
* feat: Add support for attention and ff biases Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <[email protected]> * fix(convert): Add support for permuted kvq bias weights in HF conversion Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <[email protected]> * fix(model): Add support for bias wqkv tensor in Attention Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <[email protected]> * fix(convert): Remove prints and unnecessary dict get #1250 Branch: BiasTensors-1250 Signed-off-by: Gabe Goodhart <[email protected]> * fix(convert): Remove unnecessary safe dict get #1250 Branch: BiasTensors-1250 Signed-off-by: Gabe Goodhart <[email protected]> --------- Signed-off-by: Gabe Goodhart <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
🚀 The feature, motivation and pitch
In the
transformers
implementation of llama, there are optionalbias
tensors for the LlamaMLP and LlamaAttention modules. Several additional models (specifically Granite Code 3B and 8B) use thellama
architecture and have these separate bias tensors.The proposal here is to add the ability to indicate the presence of bias tensors in TransformerArgs and then support loading them in Attention and FeedForward
Alternatives
If this project is designed to be limited to official Llama models, these bias tensors are not needed.
Additional context
This issue is a piece of the puzzle for adding support for Granite Code 3b/8b which use the llama architecture in transormers, but take advantage several pieces of the architecture that are not currently supported by torchchat. The work-in-progress for Granite Code can be found on my fork: https://github.com/gabe-l-hart/torchchat/tree/GraniteCodeSupport
RFC (Optional)
I have a working implementation to support these optional bias tensors that I plan to submit as a PR. The changes are along the following lines:
TransformerArgs
for attention and ffn biasbias
value based on these parameters in both theAttention
andFeedForward
modules.bias
tensor names inconvert_hf_checkpoint
.bias
tensors inconvert_hf_checkpoint
.bias
tensors inmodel.py
The text was updated successfully, but these errors were encountered: