0% found this document useful (0 votes)
44 views12 pages

OpenAssistant Roadmap

The document provides a vision and roadmap for OpenAssistant, an open-source conversational chat-based assistant. OpenAssistant aims to be personalized, integrate with third-party systems, and retrieve information dynamically through search engines. The roadmap outlines releasing a minimum viable prototype in January 2023, then expanding capabilities like retrieval augmentation and rapid personalization in subsequent quarters. Getting to the MVP involves collecting human demonstrations, fine-tuning models on the data, and reinforcement learning with a reward model trained on additional human feedback. Main efforts include data collection, instruction dataset gathering, and model training experiments.

Uploaded by

MERM 13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views12 pages

OpenAssistant Roadmap

The document provides a vision and roadmap for OpenAssistant, an open-source conversational chat-based assistant. OpenAssistant aims to be personalized, integrate with third-party systems, and retrieve information dynamically through search engines. The roadmap outlines releasing a minimum viable prototype in January 2023, then expanding capabilities like retrieval augmentation and rapid personalization in subsequent quarters. Getting to the MVP involves collecting human demonstrations, fine-tuning models on the data, and reinforcement learning with a reward model trained on additional human feedback. Main efforts include data collection, instruction dataset gathering, and model training experiments.

Uploaded by

MERM 13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

OpenAssistant

Vision & Roadmap


OpenAssistant is a chat-based assistant that
understands tasks, can interact with third-party
systems, and retrieve information dynamically
to do so.

It can be extended and personalized easily and is


developed as free, open-source software.
Your Conversational Retrieval via Search
Assistant Engines
State-of-the-Art chat External, upgradeable OpenAssistant unifies all
assistant that can be knowledge: No need for knowledge work in one place
personalized to your needs billions of parameters.

● Uses modern deep learning

● Runs on consumer hardware

Interfacing w/ A building block for ● Trains on human feedback


external systems developers
● Free and open
Usage of APIs and Integrate OpenAssistant
third-party applications, into your application.
described via language &
demonstrations.
Our Vision

We want OpenAssistant to be the single, unifying platform


that all other systems use to interface with humans.
Our Roadmap

Q1 Q2
ASAP …
2023 2023

Minimum Viable Growing Up Growing Out How did we get here?


Prototype
● Data Collection Pipeline ● Retrieval Augmentation ● Third-Party Extensions ● What do you need?

● RL on Human Feedback ● Rapid Personalization ● Device Control

● Assistant v1 usable ● Using External Tools ● Multi-Modality

● Out January 2023!


Getting to MVP
We follow InstructGPT
Source: InstructGPT
1) Supervised Fine-Tuning on Human Demonstrations

● We need to collect (human) demonstrations of assistant interactions


○ Read our Data Structures Overview to see how
* InstructGPT has 13k, 33k,
and 31k samples for the
○ We estimate about 50k* demonstrations three steps, respectively

● Fine-tuning a base model on the collected data


○ Candidates: GPT-J, CodeGen(surprisingly promising), FlanT5, GPT-JT

○ Can use pseudo-data (e.g. from QA dataset) before we have the real data

● Additionally, collect instruction datasets


○ Quora, StackOverflow, appropriate subreddits, …

○ Training an "instruction detector" would allow us to e.g. filter Twitter for good data
2) Training a Reward Model & RLHF

● We need to collect rankings of interactions


○ Again, read our Data Structures Overview to see how

● Reward Model Training could also use Active Learning


○ Keeps humans in the loop

○ Drastically decreases needed data

● Reinforcement Learning against the Reward Model


○ Follow InstructGPT and use PPO
Main Efforts
● Data Collection Code → Backend, website, and discord bot to collect data

● Instruction Dataset Gathering → Scraping & cleaning web data

● Gamification → Leaderboards & more, to make data collection more fun

● Model Training → Experiments on pseudo- and real-data

● Infrastructure → Collection, training, and inference

● Data Collection → This is the bulk of the work

● Data Augmentation → Making more data from little data

● Privacy and Safety → Protecting sensitive data


Principles
● We put the human in the center
● We need to get the MVP out fast, while we still have momentum
● We pull in one direction
● We are pragmatic
● We aim for models that can (or could, with some effort) be run on consumer
hardware
● We rapidly validate our ML experiments on a small scale, before going to a
supercluster
Where to go from here?

● Read the Data Structures Documentation

● Come to our repository and grab an issue

● Join the LAION Discord and the YK Discord

You might also like