See Hugging Face’s activity on LinkedIn

open-sourceress at 🤗 | Google Developer Expert in Machine Learning, MSc Candidate in Data Science

2mo

The bleeding-edge alignment technique DPO for vision language models is now available in Hugging Face TRL along with LoRA/QLoRA ⚡️ Links and more in comments 🔖 DPO is a popular cutting-edge alignment technique for language models. TLDR; a (preference) model is trained using a dataset of inputs and chosen and rejected outputs, and this model generates scores for each input. the main model is fine-tuned using the scores. Essentially DPO in vision language models is pretty similar, since vision language models are models that take in images projected to text embedding space, it's just input tokens output tokens. Quentin Gallouédec implemented support for Idefics2, Llava 1.5, and PaliGemma in TRL. 👏 as of now, VLM processors are quite non-standard, only difference is due to processor and chat templates themselves, you can implement it very easily (see his PR in links) Thanks to TRL's support for PEFT and bitsandbytes you can also try QLoRA and LoRA fine-tuning (which comes in blog post) 😏 Please try the scripts, share your models and let us know how it goes!

7 Comments

Merve Noyan

open-sourceress at 🤗 | Google Developer Expert in Machine Learning, MSc Candidate in Data Science

2mo

📝 Blog post: https://huggingface.co/blog/dpo_vlms ✨ Space to compare DPO vs normal IDEFICS2-8B https://huggingface.co/spaces/HuggingFaceH4/compare_idefics-8b-dpo 📑 PR https://github.com/huggingface/trl/pull/1797/files

1 Reaction

Manuel Romero

Co-Founder and CSO @ MAISA

2mo

Colab to reproduce it: https://github.com/mrm8488/shared_colab_notebooks/blob/master/Idefics_8b_dpo.ipynb

Ezoa DJANGORAN

Machine learning researcher|Computer vision Engineer |Aircraft Maintenance Engineer B1.1 B2 | Predictive Maintenance ||QT Beech 200 |Store Aircraft Manager|all Module B1.1 B2 obtained

2mo

Thank you very much

Polina Svidovsky

Data Scientist | Machine Learning Engineer

2mo

Great news! ORPO seems as a natural next step

Stefan van Rest

AI Engineer

2mo

Great stuff

See more comments

To view or add a comment, sign in

Hugging Face’s Post

More from this author

What you may have missed from the 🤗 open source community gathering in Paris 🕹️

Accompagnement renforcé de la CNIL et protection des données "by design" 🤗

Explore topics