0% found this document useful (0 votes)

32 views1 page

WorkAdventure Bot with OpenAI API

The document discusses the implementation of OpenAI's RealTime API for developers, emphasizing its ability to facilitate real-time speech interactions without the need for text transcription. It outlines the technical setup required, including the use of WebSockets and audio processing techniques to handle audio data efficiently. The document also provides code examples for creating a bot that utilizes the API within a specific application context, highlighting the conversion of audio formats and the use of audio worklets for playback.

Uploaded by

raziq.brown

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views1 page

WorkAdventure Bot with OpenAI API

Uploaded by

raziq.brown

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Map Building Admin User Developer Tutorials Login Get Started Search ctrl K

Interrupting the OpenAI RealTime API

Using OpenAI's RealTime API Getting started

Playing the audio

October 16, 2024 · 21 min read Sending audio to the

Using OpenAI's RealTime API
Realtime API
Tutorial 6: Developing a bot using [Link] David Négrier Managing interruptions
CTO of WorkAdventure
Tutorial 5: Developing a bot using ChatGPT What's remaining?

Tutorial 4: Coding a bell In this article, I'm going to describe our experience creating a WorkAdventure bot using the new Using WorkAdventure?

OpenAI's Realtime API. This API is revolutionary because it allows you to interact with an AI model in Conclusion
speech to speech mode.

Before diving into the details, let's take a look at the ﬁnal result:

INFO

This article is targeted at developers who want to start using the new OpenAI's Realtime API in
their projects. The API is still in beta as I'm writing this article so things might have moved when
you will read this article.

Previously, to interact with an AI model, you had to turn your voice into text, send it to the model, and
then turn the model's response back into speech. This process was somewhat slow. It could take a few
seconds for the AI to respond, leading to an awkward conversation.

We have already experimented with the previous OpenAI API in WorkAdventure. The results were good,
but the conversation was not as smooth as we would have liked.

With the new OpenAI's Realtime API, the model directly takes your voice as input and responds in real-
time. This makes the conversation smoother. Furthermore, because there is no need to convert speech
to text, the model does not loose important information that could be lost during the transcription
process, like the tone of your voice. And it can also respond with an appropriate tone.

The API is still in beta, but demos were very impressive. So we decided to give it a try.

Interacting with the API is very different from the previous chat completion API. Because we are dealing
with audio, the API keeps sending and receiving messages. This is done through a WebSocket. The API
sends audio chunks to the model and receives audio chunks as responses. Because we are using a
WebSocket, the API is now stateful. This means you no longer need to resend the context of the
conversation at each turn. The model keeps track of the conversation context and can respond
accordingly.

The context of WorkAdventure is somewhat special. Bots are actually scripts that run in JavaScript on
the browser side. Each bot is running in a headless browser tab on a server. Bots are using the Scripting
API to interact with the WorkAdventure map. Because the bots are running in a browser, on the server-
side, we can actually put the OpenAI key right into the browser. This saves us from having to manage
the key on the server-side and use a separate server as a proxy to the API.

If you are looking to implement the real-time API in your own project, it is likely your setup will be
different. You might need a relay server that will live on the server, take calls from the client, and
forward them to the OpenAI API, adding the API key in the process. I would also like to mention that
Livekit seems to have a great higher-level library for handling bots:
[Link] I haven't had the opportunity to test it yet, but it looks
promising and you probably should have a look at it before starting.

Getting started
Instead of directly talking on the WebSocket with the OpenAI Realtime API, we decided to use a
wrapper library provided by OpenAI: [Link]

$ npm install openai/openai-realtime-api-beta --save

This wrapper is still in its infancy:

Yeah... version 0.0.0 with 17 weekly downloads. This is bleeding-edge technology!

Now that the library is installed, let's create our "Robot" class that will handle the communication with
the API:

import {RealtimeClient} from "@openai/realtime-api-beta";

class Robot {
private realtimeClient: RealtimeClient;

constructor(
private audioTranscriptionModel = "whisper-1",
private voice = "alloy",
) {
}

async init(instructions: string): Promise<void> {

[Link]("Robot is starting...");

[Link] = new RealtimeClient({

// In WorkAdventure, we get the API key from the URL hash
// parameters.
// The way you get the API key will depend on your setup.
apiKey: [Link],
// We are fine with having the API key in the browser because
// this is a headless browser running on the server.
dangerouslyAllowAPIKeyInBrowser: true,
});

// The "session", in the Realtime API, refers to the WebSocket

// connection.
// There is one session per WebSocket connection.

// We update the session with the instructions we want to send to

// the model.
[Link]({
instructions: instructions
});

// We update the session with the voice and the audio transcription
// model we want to use.
[Link]({voice: [Link]});
[Link]({
// VAD means "Voice Activity Detection".
// In "server_vad" mode, we let OpenAI decide when to start and
// stop speaking.
turn_detection: {type: 'server_vad'},
input_audio_transcription: {model: [Link]},
});

// Each time the conversation is updated (each time we receive a

// bit of audio from the model or a word from the transcription),
// this callback is called.
[Link]('[Link]', (event) => {
[Link]([Link]);
if ([Link]) {
// Let's do something with the audio.
}
});

// Connect to Realtime API

// Note: in the future, we should be able to pass the LLM model here.
await [Link]();

// Send an initial message to the model to trigger a response.

[Link]([
{type: 'input_text', text: `How are you?`}
]);
}
}

At this point, we can see in the console that the model is responding with audio chunks.

If we take a look at the "delta" part of the event, we can see some events contain "transcripts", which
are usually single words, and other events contain "audio" chunks.

Playing the audio

The next step is to send the audio chunks to the browser's audio API to play the audio. In the case of
WorkAdventure, the audio would be played in the headless browser on the server.

"On the server, no one can hear you scream."

-- Alien... almost (1979)

Ok, so we have the audio chunks and we need to turn them into a MediaStream . A media stream is a
stream of audio data that can be played by the browser. We can create a MediaStream from an array of
audio chunks. The MediaStream can then be played by the browser or sent to a WebRTC connection.

The audio chunks sent by the Realtime API are in a PCM 16 format (that is: raw 16 bit PCM audio at
24kHz, 1 channel, little-endian).

In the browser however, the API in charge of managing audio is the Web Audio API. It is operating on
32bit ﬂoat numbers. So we need to convert the 16bit integer audio chunks into 32bit ﬂoat audio chunks.

First of all, I was very worried about this 'little-endian' thing. Little endian means that the least signiﬁcant
byte is stored ﬁrst. Handling that in JavaScript is not straightforward. However, when dealing with audio
output from OpenAI, the audio is correctly formatted, so you don't have to worry about endianness.

So turning from 16bit PCM to 32bit ﬂoat is actually turning an integer value between -32768 and 32767
into a ﬂoat value between -1 and 1. Which is quite simple.

Here is the code to do this conversion:

const int16Array = [Link];

// Convert Int16Array to Float32Array as the Web Audio API uses Float32

const float32Array = new Float32Array([Link]);
for (let i = 0; i < [Link]; i++) {
// Convert 16-bit PCM to float [-1, 1]
float32Array[i] = int16Array[i] / 32768.0;
}

Now that we have the audio in the correct format, we need to create a MediaStream from it.

There are many ways to do that. The simplest one involves using a ScriptProcessorNode . This is
easy, but unfortunately, it is deprecated. You can also use a AudioBufferSourceNode but the
documentation says it does not play well with streaming audio and is more adapted to play short audio
clips.

The technique I will use is using the AudioWorkletNode . This is the modern way to do things, but it is a
bit more complex.

Audio worklets are running in a separate thread and can be used to process audio in real-time. Although
we don't perform heavy computations that could block the main thread, the usage of audio worklets
mandates processing audio in a separate thread.

The audio worklet can receive audio chunks if a MediaStream is plugged in input and can send some
audio output to a destination MediaStream. It can also communicate with the main thread using the
postMessage API.

In this case, we will use the audio worklet to play the audio chunks we receive from the Realtime API.
We will send the audio chunks to the audio worklet using the postMessage API and the audio worklet
will generate a MediaStream.

The code will be split into 2 files. One file is containing the audio worklet processor and the other file is
the main file that will create the audio worklet node and connect it to the audio output.

Let's start with the audio worklet processor (the one running in a dedicated thread)

Data will be arriving via the postMessage API (in the onmessage method), as an object with this
structure:

{
pcmData: Float32Array
}

The pcmData ﬁeld contains the audio data in the correct format (32bit ﬂoat). We will store this data in
the audioQueue .

class OutputAudioWorkletProcessor extends AudioWorkletProcessor {

private audioQueue: Float32Array[] = [];

constructor() {
super();
[Link] = (event: MessageEvent) => {
const data = [Link];
if (data instanceof Float32Array) {
[Link](data);
} else {
[Link]("Invalid data type received in worklet",
[Link]);
}
};
}

process(inputs: Float32Array[][],
outputs: Float32Array[][],
parameters: Record<string, Float32Array>): boolean {
const output = outputs[0];
const outputData = output[0];

let nextChunk: Float32Array | undefined;

return true; // Keep processor alive

}
}

// Required registration for the worklet

registerProcessor('output-pcm-worklet-processor', OutputAudioWorkletProcessor);
export {};

When the processor is running, the process method is called each time the audio worklet needs to
process audio data (so every few milliseconds). The process method has "inputs" and "outputs"
parameters. The "inputs" parameter will be completely ignored in our case. The "outputs" parameter is
an array of arrays of Float32Arrays. The ﬁrst array contains the output data. In our case, we will only use
the ﬁrst output array.

If there is data in the audioQueue , the processor will copy the data to the output buffer, trying to fill it as
much as possible. It is important to note that when the process function is called, the output buffer is
already filled with zeros (silence). So if there is no data in the audioQueue , the output buffer will be
filled with silence. This is exactly what we want.

Last but not least, please note that we call registerProcessor at the end of the ﬁle. The output-
pcm-worklet-processor is the name we will use to create the audio worklet node in the main ﬁle.

Typescript support

If you are using Typescript, it is likely you will be missing the AudioWorkletProcessor type. There is a
NPM package to add this ( @types/audioworklet ), but it conﬂicts with the DOM types. So instead, I
copied types from a GitHub issue found here into my project.

Did I say that we are working on bleeding-edge technology? Yes, we are.

Instantiating the audio worklet node in the main ﬁle

Now that we have the audio worklet processor, we need to create the audio worklet node in the main
ﬁle.

From a bird's eye view, without any error management, the process looks like this:

// Sample rate returned by OpenAI is 24kHz

const audioContext = new AudioContext({ sampleRate: 24000 });

// Let's create the MediaStream

const mediaStreamDestination = [Link]();

await [Link]();

// Let's load the audio worklet processor (assuming the file is available in the pcm
await [Link]("[Link]");

// Instantiate the audio worklet node (we use the 'output-pcm-worklet-processor' str
const workletNode = new AudioWorkletNode([Link], 'output-pcm-worklet-proc

// Connect the worklet node to the media stream destination

[Link]([Link]);

In practice, it is a bit more complex. In a real-world application, you will use a bundler. So we cannot
reference the "[Link]" ﬁle directly without letting the bundler know about it.

In our case, we are using Vite.

In Vite, we can use the "?worker&url" parameter in imports to reference a ﬁle directly. This is very useful
for our use case.

import audioWorkletProcessorUrl from "./[Link]?worker&url";

// ...
await [Link](audioWorkletProcessorUrl);

When this is done, we can send the audio chunks to the audio worklet processor:

[Link](
{ pcmData: float32Array },
{ transfer: [[Link]] }
);

The transfer option is used to transfer the ownership of the buffer to the audio worklet processor.
This is important because the buffer is a large object and we don't want to copy it (that would waste
CPU cycles). We want to transfer it.

And that's it!

The full class looks like this:

import audioWorkletProcessorUrl from "./[Link]?worker&url";

export class OutputPCMStreamer {

private audioContext: AudioContext;
private mediaStreamDestination: MediaStreamAudioDestinationNode;
private workletNode: AudioWorkletNode | null = null;
private isWorkletLoaded = false;

constructor(sampleRate = 24000) {
[Link] = new AudioContext({ sampleRate });
[Link] =
[Link]();
}

// Initialize the AudioWorklet and load the processor script

public async initWorklet() {
if (![Link]) {
try {
await [Link]();
await [Link](audioWorkletProcessor
[Link] = new AudioWorkletNode([Link], 'output-p
[Link]([Link]);
[Link] = true;
} catch (err) {
[Link]('Failed to load AudioWorkletProcessor:', err);
}
}
}

// Method to get the MediaStream that can be added to WebRTC

public getMediaStream(): MediaStream {
return [Link];
}

// Method to append new PCM data (Int16Array) to the audio stream

public appendPCMData(float32Array: Float32Array): void {
if (![Link] || ![Link]) {
[Link]('AudioWorklet is not loaded yet.');
return;
}

// Send the PCM data to the AudioWorkletProcessor via its port

[Link](
{ pcmData: float32Array },
{transfer: [[Link]]}
);
}

// Method to close the AudioWorkletNode and clean up resources

public close(): void {
if ([Link]) {
// Disconnect the worklet node from the audio context
[Link]();
[Link](); // Close the MessagePort

// Set the worklet node to null to avoid further use

[Link] = null;
}

// Optionally, stop the AudioContext if no longer needed

if ([Link] !== 'closed') {
[Link]().then(() => {
[Link]('AudioContext closed.');
}).catch((err) => {
[Link]('Error closing AudioContext:', err);
});
}

[Link] = false;
}
}

From there, you can use the PCMStreamer class to stream audio data to a MediaStream that can be
used in a WebRTC connection. Whenever the Realtime API is sending an audio chunk, you convert it to
a Float32Array and send it to the PCMStreamer instance via the appendPCMData method.

[Link]('[Link]', (event) => {

if ([Link]) {
const int16Array = [Link];
// Convert Int16Array to Float32Array as the Web Audio API uses Float32
const float32Array = new Float32Array([Link]);
for (let i = 0; i < [Link]; i++) {
float32Array[i] = int16Array[i] / 32768.0; // Convert 16-bit PCM to floa
}

[Link](float32Array);
}
});

Success!

But this is only the beginning. So far, we are getting audio chunks from the Realtime API and playing
them in the browser. Now, we need to send our audio data to the Realtime API!

Sending audio to the Realtime API

The Realtime API is expecting audio chunks in the PCM 16 format. This is the same format as the audio
chunks we are receiving.

Hopefully, the openai-realtime-api-beta library manages automatic conversion of 32bit ﬂoat audio data
to 16bit PCM audio data in little endian on our behalf (the little endian part is important here).

We will just have to make sure we are sending the samples at the requested 24kHz sample rate.

Turning a MediaStream into audio chunks turns out to be quite similar to what we did previously. We will
use the Web Audio API, design a new Worklet. This time, the Worklet will receive a MediaStream -- in the
case of WorkAdventure were multiple people can talk to the IA at the same time, it can receive many
MediaStreams. The MediaStreams will be mixed together and sent back to the main thread using the
postMessage API.

The code of our audio Worklet will look like this:

class InputAudioWorkletProcessor extends AudioWorkletProcessor {

process(inputs: Float32Array[][], outputs: Float32Array[][], parameters: Record<

if (!inputs || [Link] === 0 || inputs[0].length === 0) {

return true; // Keep processor alive
}

// Let's merge all the inputs in one big Float32Array by summing them
const mergedInput = new Float32Array(inputs[0][0].length);

for (let k = 0; k < [Link]; k++) {

const input = inputs[k];
const channelNumber = [Link];
for (let j = 0; j < channelNumber; j++) {
const channel = input[j];
for (let i = 0; i < [Link]; i++) {
// We are dividing by the number of channels to avoid clipping
// Clipping could still happen if the sum of all the inputs is t
mergedInput[i] += channel[i] / channelNumber;
}
}
}

[Link]({ pcmData: mergedInput }, {transfer: [[Link]

return true; // Keep processor alive

}
}

// Required registration for the worklet

registerProcessor('input-pcm-worklet-processor', InputAudioWorkletProcessor);
export {};

A quick note about the parameters passed to the process function. The inputs parameter is an array
of arrays of Float32Arrays. Why do we have an array of array of array?

The innermost array is the audio data for a single channel. Each value represents the audio level at a
given time. But microphones can be stereo. In this case, we don't have one, but two channels. So we
wrap the Float32Arrays into an array that will contain one channel if the microphone is mono, or two
channels if the microphone is stereo.

But we can also have many input data sources! (many microphones, or many streams coming from
different WebRTC sources...). The third array is used to represent the different input sources.

On each call to the process function, we will sum all the input sources together to create a single
audio stream, and send this stream to the main thread via the postMessage API.

The code to instantiate the audio Worklet in the main ﬁle will look like this:

import audioWorkletProcessorUrl from "./[Link]?worker&url";

// Sample rate required by OpenAI is 24kHz

const audioContext = new AudioContext({ sampleRate: 24000 });

await [Link]();

// Let's load the audio worklet processor (assuming the file is available in the pcm
await [Link](audioWorkletProcessorUrl);

// Instantiate the audio worklet node (we use the 'input-pcm-worklet-processor' stri
const workletNode = new AudioWorkletNode([Link], 'input-pcm-worklet-proce

// Connect the media stream as an input to our worklet node (this assume you already
// to your microphone)
const sourceNode = [Link](mediaStream);
[Link](workletNode);

[Link] = (event: MessageEvent) => {

const data = [Link];
if (data instanceof Float32Array) {
[Link]('Received audio data from the input worklet:', data);
} else {
[Link]("Invalid data type received in worklet", [Link]);
}
};

Sending the audio chunks to the Realtime API

Sending the audio chunks to the Realtime API is quite simple. We just need to call the
appendInputAudio method of the RealtimeClient. The data we pass must be a 16bit PCM audio buffer.
We can use the RealtimeUtils.floatTo16BitPCM method provided by the Realtime API to convert
the 32bit ﬂoat audio data to 16bit PCM.

[Link](RealtimeUtils.floatTo16BitPCM(audioBuffer));

However, just doing this will lead to a conversation that will fail after a few seconds. In our experience,
this is because we are calling the Realtime API too often.

Throttling the audio chunks

Out of the box, when I tested the Audio Worklet, it generated Float32Array with 128 samples (i.e. 128
values in the array). Because we run at 24kHz, this means we are sending 24000 / 128 = 187.5 chunks
per second. This is probably a bit too much for the Realtime API.

We don't need to send audio chunks at a very high rate. One audio chunk every ~50ms should be more
than enough. That means we could target a chunk size of 24000 * 0.05 = 1200 samples.

We can simply buffer the audio chunks in the main thread and send them to the Realtime API when we
have enough data.

const audioBuffer = new Float32Array(24000 * 0.05); // 50ms of audio data

[Link] = (event: MessageEvent) => {

Managing interruptions
The Realtime API is sending audio chunks faster that we can play them. It means at some point in time,
we might have several seconds of audio chunks in our OutputAudioWorklet .

If we interrupt the conversation, this buffer is already on our computer and will still be played. This is not
what we want.

Hopefully, in VAD mode, the Realtime API is sending a [Link] event whenever it
detects the user has interrupted the conversation. We can use this event to clear the buffer. The buffer
is inside the OutputAudioWorklet so we need to send a message to the worklet to clear the buffer.

Let's say that if we receive the following event, we clear the buffer:

{
emptyBuffer: true
}

The updated audio worklet code will look like this:

class OutputAudioWorkletProcessor extends AudioWorkletProcessor {

private audioQueue: Float32Array[] = [];

constructor() {
super();
[Link] = (event: MessageEvent) => {
// In case the event is a buffer clear event, we empty the buffer
if ([Link] === true) {
[Link] = [];
} else {
const data = [Link];
if (data instanceof Float32Array) {
[Link](data);
} else {
[Link]("Invalid data type received in worklet", [Link]
}
}
};
}

//...
}

On the main thread side, we can add an additional method to the OutputPCMStreamer class to clear
the buffer:

export class OutputPCMStreamer {

// ...
public resetAudioBuffer(): void {
if (![Link] || ![Link]) {
[Link]('AudioWorklet is not loaded yet.');
return;
}

// Send the PCM data to the AudioWorkletProcessor via its port

[Link]({ emptyBuffer: true });
}
}

When we receive the [Link] event, we can call the resetAudioBuffer method
of the OutputPCMStreamer instance.

[Link]('[Link]', () => {
[Link]();
});

What's remaining?
The system is working well, but there is still room for improvement.

Remember that OpenAI sends audio chunks faster than we can play them? Just imagine if the
conversation is interrupted. We are going to remove a lot of audio chunks for the buffer, but we did not
tell the Realtime API that those audio chunks have not been played. So the Realtime API will "think" we
heard everything it said, but we didn't.

We will ﬁx this in the next article.

Using WorkAdventure?
If you are using WorkAdventure (you should!) and are looking to implement a bot using the Realtime
API, we added a few methods to the scripting API to help you with that.

You can now use [Link] to listen to an audio streams

coming from a bubble, and [Link] to send an audio
stream to a bubble.

This will make things considerably easier for you as we take care of the audio worklet instantiation and
the audio stream management.

Conclusion
This is it! We have a fully working system that allows us to interact with bots via voice! The bots
reactivity is insane. The conversation is smooth and the tone of the bot is appropriate.

It took us about 3 days of work to get this ﬁrst version working and we are very happy with the result. We
are now looking forward to integrating this system deeper into WorkAdventure, by adding more features
via the function calling feature of the Realtime API!

Next steps will be to have the bot better handle interruptions, and then react to the environment, guide
you through your oﬃce, etc...

Read the next article!

Tags: scripting tutorial ai openai realtime bot

Newer Post Older Post

« Interrupting the OpenAI RealTime API Tutorial 6: Developing a bot using [Link] »

Docs Need help ? Follow us

Map Building Book a demo Linkedin

Admin Discord Twitter

Developer GitHub Facebook

Interrupting The OpenAI RealTime API - WorkAdventure Documentation
No ratings yet
Interrupting The OpenAI RealTime API - WorkAdventure Documentation
1 page
Export Text - New Recording 1.m4a (09 - 05 - 2025)
No ratings yet
Export Text - New Recording 1.m4a (09 - 05 - 2025)
3 pages
API Reference - OpenAI API
No ratings yet
API Reference - OpenAI API
116 pages
Tutorial 5 - Developing A Bot Using ChatGPT - WorkAdventure Documentation
No ratings yet
Tutorial 5 - Developing A Bot Using ChatGPT - WorkAdventure Documentation
1 page
Audio and Speech - OpenAI API
No ratings yet
Audio and Speech - OpenAI API
1 page
Audio and Speech - OpenAI API
No ratings yet
Audio and Speech - OpenAI API
5 pages
OpenAI Realtime API Function Calling Guide
No ratings yet
OpenAI Realtime API Function Calling Guide
13 pages
Getting Started - OpenAI Realtime and WebRTC - by Chris McKenzie - Medium
No ratings yet
Getting Started - OpenAI Realtime and WebRTC - by Chris McKenzie - Medium
17 pages
OpenAI O1 and New Tools For Developers - OpenAI
No ratings yet
OpenAI O1 and New Tools For Developers - OpenAI
12 pages
Requesting An Api Key: Openai Website
No ratings yet
Requesting An Api Key: Openai Website
29 pages
Automate Tasks with AI Agent Teams
No ratings yet
Automate Tasks with AI Agent Teams
8 pages
OpenAI Text Generation API Guide
No ratings yet
OpenAI Text Generation API Guide
12 pages
TDSynexx - Virtual Training Session
No ratings yet
TDSynexx - Virtual Training Session
49 pages
Chapter 3
No ratings yet
Chapter 3
35 pages
? Leo Project Interview Questions and Answers
No ratings yet
? Leo Project Interview Questions and Answers
4 pages
I Was Wrong About ChatGPT Replacing Programmers (English (Canada) ) (DownloadYoutubeSubtitles - Com)
No ratings yet
I Was Wrong About ChatGPT Replacing Programmers (English (Canada) ) (DownloadYoutubeSubtitles - Com)
18 pages
Understanding OpenAI Assistants API
No ratings yet
Understanding OpenAI Assistants API
9 pages
Agent Ai
No ratings yet
Agent Ai
30 pages
Wepik Exploring The Power of Openai Apis Revolutionizing Ai Development 20230824150911SDx6
No ratings yet
Wepik Exploring The Power of Openai Apis Revolutionizing Ai Development 20230824150911SDx6
8 pages
Speech to Text in Python Guide
No ratings yet
Speech to Text in Python Guide
1 page
OpenAI API Library for TypeScript/JavaScript
No ratings yet
OpenAI API Library for TypeScript/JavaScript
9 pages
Integrating Azure OpenAI in Apps
No ratings yet
Integrating Azure OpenAI in Apps
27 pages
Creating Telebot - Ranganathar College-1
No ratings yet
Creating Telebot - Ranganathar College-1
44 pages
Open AI Python
No ratings yet
Open AI Python
1 page
Building with Autogen Studio 2.0
No ratings yet
Building with Autogen Studio 2.0
15 pages
OpenAI Assistants API - Course For Beginners
No ratings yet
OpenAI Assistants API - Course For Beginners
38 pages
Build Your Own YouTube Chatbot
No ratings yet
Build Your Own YouTube Chatbot
10 pages
Ai Assistant
No ratings yet
Ai Assistant
16 pages
Voice Assistant Project in Python
No ratings yet
Voice Assistant Project in Python
48 pages
OpenAI Developers Handbook
No ratings yet
OpenAI Developers Handbook
132 pages
C4GT DMP 2025 - Proposal (Repaired)
No ratings yet
C4GT DMP 2025 - Proposal (Repaired)
21 pages
Introducing The HTML5 Web Speech API Your Practical Introduction To Adding Browser Based Speech Capabilities To Your Websites and Online Applications 1st Edition Alex Libby PDF Available
No ratings yet
Introducing The HTML5 Web Speech API Your Practical Introduction To Adding Browser Based Speech Capabilities To Your Websites and Online Applications 1st Edition Alex Libby PDF Available
159 pages
Python Code Explanation
No ratings yet
Python Code Explanation
4 pages
Technical Guide - Python Desktop AI Assistant On Windows
No ratings yet
Technical Guide - Python Desktop AI Assistant On Windows
8 pages
OpenAI API: Unlock AI Potential
No ratings yet
OpenAI API: Unlock AI Potential
14 pages
Berryman
No ratings yet
Berryman
24 pages
Generative AI With Python - Bert Gollnick
100% (3)
Generative AI With Python - Bert Gollnick
708 pages
System Overview
No ratings yet
System Overview
6 pages
Lab1 Installation
No ratings yet
Lab1 Installation
8 pages
7 Cursor Hacks To Become The FASTEST Coder
No ratings yet
7 Cursor Hacks To Become The FASTEST Coder
2 pages
Web API Development With Python A Beginners Guide Using Flask and FastAPI (Intermediate Python) (Rehan Haider) (Z-Library)
100% (3)
Web API Development With Python A Beginners Guide Using Flask and FastAPI (Intermediate Python) (Rehan Haider) (Z-Library)
127 pages
Mini Project
No ratings yet
Mini Project
29 pages
Voice Assistant Script Guide
No ratings yet
Voice Assistant Script Guide
3 pages
Code in Voices
No ratings yet
Code in Voices
10 pages
Department of Mechanical Engineering: Mini Project Phase 1 Presentation
No ratings yet
Department of Mechanical Engineering: Mini Project Phase 1 Presentation
12 pages
Chatgpt API Short Workshop 230928162255 A12ec1ef
No ratings yet
Chatgpt API Short Workshop 230928162255 A12ec1ef
42 pages
Assistants API Overview for Python
No ratings yet
Assistants API Overview for Python
21 pages
Coding Session
No ratings yet
Coding Session
18 pages
Desktop Assistant Final
No ratings yet
Desktop Assistant Final
15 pages
Data Science APIs With Flask
No ratings yet
Data Science APIs With Flask
18 pages
Multi-Modal Vision with GPT-4o
No ratings yet
Multi-Modal Vision with GPT-4o
17 pages
Requesting An API Key
No ratings yet
Requesting An API Key
1 page
Prompt Engineering - OpenAI API
No ratings yet
Prompt Engineering - OpenAI API
9 pages
Innovative Carnival Script
No ratings yet
Innovative Carnival Script
2 pages
Session 9
No ratings yet
Session 9
18 pages
Research Paper Publish
No ratings yet
Research Paper Publish
8 pages
Personal Assistant Python Project Guide
No ratings yet
Personal Assistant Python Project Guide
5 pages
Resume New Final4
No ratings yet
Resume New Final4
1 page
Using Typescript - WorkAdventure Documentation
No ratings yet
Using Typescript - WorkAdventure Documentation
1 page
Scripting Api Extra - WorkAdventure Documentation
No ratings yet
Scripting Api Extra - WorkAdventure Documentation
1 page
Upload Your Map To WorkAdventure - WorkAdventure Documentation
No ratings yet
Upload Your Map To WorkAdventure - WorkAdventure Documentation
1 page
Start - WorkAdventure Documentation
No ratings yet
Start - WorkAdventure Documentation
1 page
Megaphone - WorkAdventure Documentation
No ratings yet
Megaphone - WorkAdventure Documentation
1 page
Webhooks - WorkAdventure Documentation
No ratings yet
Webhooks - WorkAdventure Documentation
1 page
Navigation - WorkAdventure Documentation
No ratings yet
Navigation - WorkAdventure Documentation
1 page
Integrated Websites - WorkAdventure Documentation
No ratings yet
Integrated Websites - WorkAdventure Documentation
1 page
Restricted Area - WorkAdventure Documentation
No ratings yet
Restricted Area - WorkAdventure Documentation
1 page
Action Layers - WorkAdventure Documentation
No ratings yet
Action Layers - WorkAdventure Documentation
1 page
Matrix Chat Room Property - WorkAdventure Documentation
No ratings yet
Matrix Chat Room Property - WorkAdventure Documentation
1 page
Map Scripting API - WorkAdventure Documentation
No ratings yet
Map Scripting API - WorkAdventure Documentation
1 page
Bells - Knocking On A Door - WorkAdventure Documentation
No ratings yet
Bells - Knocking On A Door - WorkAdventure Documentation
1 page
STA Captains Log Solo RPG Supplemental v1.0
No ratings yet
STA Captains Log Solo RPG Supplemental v1.0
19 pages
About WorkAdventure Maps - WorkAdventure Documentation
No ratings yet
About WorkAdventure Maps - WorkAdventure Documentation
1 page
Self-Hosting Your Map - WorkAdventure Documentation
No ratings yet
Self-Hosting Your Map - WorkAdventure Documentation
1 page
Persona 5 Game Mechanics
No ratings yet
Persona 5 Game Mechanics
28 pages
PS (VR) Numerical
No ratings yet
PS (VR) Numerical
17 pages
FortiNAC PoC Prerequisites, Testing Scenarios & Acceptance Criteria (TEMPLATE)
No ratings yet
FortiNAC PoC Prerequisites, Testing Scenarios & Acceptance Criteria (TEMPLATE)
12 pages
Chapter 3 - Cyber Security
100% (3)
Chapter 3 - Cyber Security
20 pages
Hikvision DS-D5050UC-C 50" 4K Monitor
No ratings yet
Hikvision DS-D5050UC-C 50" 4K Monitor
5 pages
CS 2104 Complete Notes
No ratings yet
CS 2104 Complete Notes
19 pages
Thevenin's Theorem Verification Experiment
No ratings yet
Thevenin's Theorem Verification Experiment
8 pages
Ec3552 Vlsi Unit 2
No ratings yet
Ec3552 Vlsi Unit 2
12 pages
Ep Dsx223 210 System Integration Overview and Planning Guide
No ratings yet
Ep Dsx223 210 System Integration Overview and Planning Guide
38 pages
Custom
No ratings yet
Custom
104 pages
Low Pass Filter Report
No ratings yet
Low Pass Filter Report
8 pages
Linkers &amp Loaders
100% (1)
Linkers &amp Loaders
29 pages
VLSI Arithmetic: Prof. Vojin G. Oklobdzija University of California
No ratings yet
VLSI Arithmetic: Prof. Vojin G. Oklobdzija University of California
54 pages
012) (Programmer Syllabus) Sad Intro (©stoner - Opps)
No ratings yet
012) (Programmer Syllabus) Sad Intro (©stoner - Opps)
13 pages
SPINE Manual Version 1.3 Overview
No ratings yet
SPINE Manual Version 1.3 Overview
69 pages
Embedded Systems Test Examples
No ratings yet
Embedded Systems Test Examples
12 pages
Theory and Performance of Electrical Machines by JB Gupta PDF
0% (3)
Theory and Performance of Electrical Machines by JB Gupta PDF
4 pages
An Introduction To Basic Ladder Logic Instructions in Siemens Tia Portal 2
No ratings yet
An Introduction To Basic Ladder Logic Instructions in Siemens Tia Portal 2
1 page
FBMC Modulation for Future Wireless Systems
No ratings yet
FBMC Modulation for Future Wireless Systems
36 pages
10.1" HDMI LCD User Manual
No ratings yet
10.1" HDMI LCD User Manual
6 pages
Online Hostel Management System
No ratings yet
Online Hostel Management System
47 pages
I. 8-Bit Microprocessors Architecture, Instruction Set and Their Programming
No ratings yet
I. 8-Bit Microprocessors Architecture, Instruction Set and Their Programming
4 pages
Grade 12 - Computer
No ratings yet
Grade 12 - Computer
2 pages
Modelling Website Technical Requirements
No ratings yet
Modelling Website Technical Requirements
87 pages
314321-Microprocessor Programming
No ratings yet
314321-Microprocessor Programming
8 pages
Installation of Java
No ratings yet
Installation of Java
23 pages
Bizhub PRO 950 Service Manual - IC Unit
No ratings yet
Bizhub PRO 950 Service Manual - IC Unit
18 pages
Online Library Management System Report
No ratings yet
Online Library Management System Report
5 pages
Homemade Circuit Projects: RC4 Wireless DMX Lighting
No ratings yet
Homemade Circuit Projects: RC4 Wireless DMX Lighting
12 pages
Assignment CS 301
No ratings yet
Assignment CS 301
2 pages
Sampling Theorem and Pulse Modulation
No ratings yet
Sampling Theorem and Pulse Modulation
27 pages

WorkAdventure Bot with OpenAI API

Uploaded by

WorkAdventure Bot with OpenAI API

Uploaded by

Map Building Admin User Developer Tutorials Login Get Started Search ctrl K

Interrupting the OpenAI RealTime API

Playing the audio

October 16, 2024 · 21 min read Sending audio to the

$ npm install openai/openai-realtime-api-beta --save

This wrapper is still in its infancy:

Yeah... version 0.0.0 with 17 weekly downloads. This is bleeding-edge technology!

import {RealtimeClient} from "@openai/realtime-api-beta";

async init(instructions: string): Promise<void> {

[Link] = new RealtimeClient({

// The "session", in the Realtime API, refers to the WebSocket

// We update the session with the instructions we want to send to

// Each time the conversation is updated (each time we receive a

// Connect to Realtime API

// Send an initial message to the model to trigger a response.

Playing the audio

"On the server, no one can hear you scream."

-- Alien... almost (1979)

Here is the code to do this conversion:

const int16Array = [Link];

// Convert Int16Array to Float32Array as the Web Audio API uses Float32

class OutputAudioWorkletProcessor extends AudioWorkletProcessor {

let nextChunk: Float32Array | undefined;

return true; // Keep processor alive

// Required registration for the worklet

Did I say that we are working on bleeding-edge technology? Yes, we are.

Instantiating the audio worklet node in the main ﬁle

// Sample rate returned by OpenAI is 24kHz

// Let's create the MediaStream

// Connect the worklet node to the media stream destination

In our case, we are using Vite.

import audioWorkletProcessorUrl from "./[Link]?worker&url";

And that's it!

The full class looks like this:

import audioWorkletProcessorUrl from "./[Link]?worker&url";

export class OutputPCMStreamer {

// Initialize the AudioWorklet and load the processor script

// Method to get the MediaStream that can be added to WebRTC

// Method to append new PCM data (Int16Array) to the audio stream

// Send the PCM data to the AudioWorkletProcessor via its port

// Method to close the AudioWorkletNode and clean up resources

// Set the worklet node to null to avoid further use

// Optionally, stop the AudioContext if no longer needed

[Link]('[Link]', (event) => {

Sending audio to the Realtime API

The code of our audio Worklet will look like this:

class InputAudioWorkletProcessor extends AudioWorkletProcessor {

if (!inputs || [Link] === 0 || inputs[0].length === 0) {

for (let k = 0; k < [Link]; k++) {

[Link]({ pcmData: mergedInput }, {transfer: [[Link]

return true; // Keep processor alive

// Required registration for the worklet

import audioWorkletProcessorUrl from "./[Link]?worker&url";

// Sample rate required by OpenAI is 24kHz

[Link] = (event: MessageEvent) => {

Sending the audio chunks to the Realtime API

Throttling the audio chunks

const audioBuffer = new Float32Array(24000 * 0.05); // 50ms of audio data

[Link] = (event: MessageEvent) => {

The updated audio worklet code will look like this:

class OutputAudioWorkletProcessor extends AudioWorkletProcessor {

export class OutputPCMStreamer {

// Send the PCM data to the AudioWorkletProcessor via its port

We will ﬁx this in the next article.

You can now use [Link] to listen to an audio streams

Read the next article!

Tags: scripting tutorial ai openai realtime bot

Newer Post Older Post

Docs Need help ? Follow us

Map Building Book a demo Linkedin

Admin Discord Twitter

Developer GitHub Facebook

Copyright © 2024 [Link] - All Rights Reserved

You might also like