AI at Meta’s Post

View organization page for AI at Meta, graphic

818,723 followers

In April, we published a research paper on a new approach for building better and faster LLMs by using multi-token prediction. Using this approach, we can train language models to predict multiple future words at once, improving model capabilities and training efficiency while allowing for faster inference. In the spirit of responsible open science, we’ve released pre-trained models for code completion using this approach to enable further exploration in the research community. Get the model on Hugging Face ➡️ https://go.fb.me/dm1giu More on this approach ➡️ https://go.fb.me/x1zhdq

  • No alternative text description for this image
Mariusz Nitecki

LLM Expert & Data Scientist Specializing in Advanced LLM Applications, LLM Implementations and Scalable Data Solutions

1w

I'm curious if their multi-token model not only outperforms their own baseline but also the top models of a similar size. It works well for generative tasks, but the paper indicates mixed results on multiple-choice question benchmarks. Also see https://arxiv.org/abs/2401.10774

Jaffar Ali

CEO & Founder at Databiqs | Expert in AI, Blockchain, and Web Development | Innovating Future Technologies

7h

Multi-token prediction shows promise for improving efficiency and performance in language models, but managing complexity and resource demands, as well as ensuring consistent performance across varied datasets, may hinder widespread adoption.

Like
Reply

Maintaining accuracy and efficiency is a 'precisive' technique, follows 'variable' methodology with 'angular' technology. Focus times on Complexity (Agile), Dependency (mitigates), Intensity (Resource management), ETL Data Processing (WLB) with relevance. Coordinative Management is effective tool with CI along with rule infused baseline targeting.

Vincent Granville

Chief AI Scientist, GenAItechLab.com

1w

You are not the first one to use multi-tokens; I started earlier than April. I also use contextual tokens. See https://mltblog.com/4aHYM4i

Wow, are we witnessing another "Attention is all you need" moment?

An interesting approach. Keen to play around with it!

Dr. Timo Reckling

Software Consultant at TNG Technology Consulting GmbH at TNG Technology Consulting

6d

(Disclaimer: I haven't read the paper, yet.) Probably a provocative question: Any thoughts on why the paper was published in April and the model only released now?

Antony Konnoth

Customer-focused IT Director | Digital Transformation Leader | Strategic Technology Innovator

17h

It will be interesting to compare and contrast how this stands up against contextual tokens

Like
Reply

Excellent work! Exciting news!

This is unique. Collaboration with a global AI community is more important than ever 🙏

See more comments

To view or add a comment, sign in

Explore topics