Hugging Face’s Post

Hugging Face reposted this

View profile for Merve Noyan, graphic

open-sourceress at 🤗 | Google Developer Expert in Machine Learning, MSc Candidate in Data Science

The bleeding-edge alignment technique DPO for vision language models is now available in Hugging Face TRL along with LoRA/QLoRA ⚡️ Links and more in comments 🔖 DPO is a popular cutting-edge alignment technique for language models. TLDR; a (preference) model is trained using a dataset of inputs and chosen and rejected outputs, and this model generates scores for each input. the main model is fine-tuned using the scores. Essentially DPO in vision language models is pretty similar, since vision language models are models that take in images projected to text embedding space, it's just input tokens output tokens.  Quentin Gallouédec implemented support for Idefics2, Llava 1.5, and PaliGemma in TRL. 👏 as of now, VLM processors are quite non-standard, only difference is due to processor and chat templates themselves, you can implement it very easily (see his PR in links) Thanks to TRL's support for PEFT and bitsandbytes you can also try QLoRA and LoRA fine-tuning (which comes in blog post) 😏 Please try the scripts, share your models and let us know how it goes!

  • No alternative text description for this image
Merve Noyan

open-sourceress at 🤗 | Google Developer Expert in Machine Learning, MSc Candidate in Data Science

2mo
Ezoa DJANGORAN

Machine learning researcher|Computer vision Engineer |Aircraft Maintenance Engineer B1.1 B2 | Predictive Maintenance ||QT Beech 200 |Store Aircraft Manager|all Module B1.1 B2 obtained

2mo

Thank you very much

Like
Reply
Polina Svidovsky

Data Scientist | Machine Learning Engineer

2mo

Great news! ORPO seems as a natural next step

Like
Reply

Great stuff

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics