Skribby
GuidesUpdated 2 days ago

Realtime Transcription

Real-time transcription allows you to receive live transcript data as the meeting progresses, rather than waiting for the meeting to end. This is enabled by using any of the realtime transcription models.

Key Features

  • Live transcript streaming via WebSocket connection
  • Immediate access to spoken content as it happens
  • Suitable for applications requiring real-time processing
  • Optional realtime audio streaming for raw audio access

How it works

After creating a new meeting bot with a realtime transcription model (or by enabling the Realtime Audio addon for non-realtime models), you'll receive websocket_url & websocket_read_only_url for event streaming.

  • websocket_url: Full access, allows Actions.
  • websocket_read_only_url: Read-only access. This is safe to provide to your customer's front-end directly.

Connect to either URL to start receiving live updates. You'll instantly receive a connected event containing all live transcripts generated up until that point, followed by live events as they happen.

If the Realtime Audio addon is enabled, you will also receive a websocket_audio_url. This is a separate, dedicated connection for raw audio data.

Realtime Audio Streaming

In addition to transcript events, you can also receive raw audio data in realtime via a separate WebSocket connection.

Enabling Realtime Audio

If you make use of a realtime model, then this is enabled by default. If you want to have realtime audio streaming for non-realtime models, set realtime_audio: true in your request:

{ "transcription_model": "none", "meeting_url": "https://meet.google.com/abc-defg-hij", "service": "gmeet", "bot_name": "My Meeting Bot", "realtime_audio": true }

The bot response will include a websocket_audio_url field containing the WebSocket URL for receiving audio data.

Using the SDK

// Create bot with realtime audio enabled const bot = await client.createBot({ transcription_model: 'none', meeting_url: 'https://meet.google.com/abc-defg-hij', service: 'gmeet', bot_name: 'My Audio Bot', realtime_audio: true, }); // Get the realtime client (includes audio by default) const realtimeClient = bot.getRealtimeClient(); // Listen to audio events realtimeClient.on('audio', (buffer: Buffer) => { // buffer is 16-bit PCM audio at 16kHz sample rate processAudio(buffer); }); await realtimeClient.connect(); // Check audio connection status console.log('Audio connected:', realtimeClient.audioConnected);

Without Audio Streaming

If you don't need audio streaming, you can get a realtime client without it:

// Get realtime client WITHOUT audio streaming const transcriptOnlyClient = bot.getRealtimeClient(true); await transcriptOnlyClient.connect();

Audio Format

The audio data received via the audio event is:

  • Format: 16-bit PCM (signed, little-endian)
  • Sample Rate: 16kHz
  • Channels: Mono

Websocket Events

Events will be websocket messages as strings. Parse these to JSON to receive the payload. The JSON will always be structured as follow;

{ "type": "[event]", "data": {...} }

Connected

When you connect to the websocket (including via websocket_read_only_url), you'll instantly receive a connected event. This event contains all live transcripts that have been generated up until the point of connection, allowing you to catch up on any conversation that occurred before you connected.

{ "type": "connected", "data": { "transcripts": [ { "transcript": "This contains the spoken text.", "start": 1.23, "end": 4.56, "speaker": 0, "speaker_name": "John Doe" } ] } }

SDK tip: full transcript buffer

If you're using the TypeScript/JavaScript SDK, the RealtimeClient maintains an internal transcript buffer for you:

  • On connect(), the SDK resets the buffer.
  • When the "connected" snapshot arrives, the SDK seeds the buffer from data.transcripts.
  • On each "ts" event, the SDK appends the new segment.

You can read the full transcript so far at any time via realtimeClient.transcript:

const realtimeClient = bot.getRealtimeClient(); realtimeClient.on('ts', (segment) => { console.log(segment.speaker_name, segment.transcript); console.log('Transcript so far:', realtimeClient.transcript); }); await realtimeClient.connect();

Start

Once the meeting bot has joined the meeting and started recording, it'll send this event.
Just to inform you that from now on you'll start receiving transcripts. Refer to Bot Lifecycle Documentation for a better understanding of the joining process

{ "type": "start" }

Status Update

This event is sent whenever the bot's status changes. It provides information about the previous status and the new status, allowing you to track the bot's lifecycle transitions.

If new_status becomes finished or not_admitted, a third field called stop_reason will also be included. Refer to the Bot Lifecycle Documentation for all stop reason codes.

{ "type": "status-update", "data": { "old_status": "joining", "new_status": "recording" } }

When the bot stops, stop_reason is included:

{ "type": "status-update", "data": { "old_status": "processing", "new_status": "finished", "stop_reason": "meeting_ended" } }

Transcript

This event contains the live transcript of the meeting as well as timestamps and speaker information. speaker_name may be something like "Speaker 1" at the start. This is because we're live calculating which name belongs to which speaker id. After a couple of sentences, once the system correlates the audio stream with the platform's participant list, the actual name (e.g., "John Doe") will be provided.

{ "type": "ts", "data": { "transcript": "This contains the spoken text.", "start": 1.23, "end": 4.56, "speaker": 0, "speaker_name": "John Doe" } }

Chat Message

This event triggers when a new chat message is sent.

Zoom only: chat message events include id and parent_id for thread support. id is the message id, and parent_id is the parent message or thread id.

{ "type": "chat-message", "data": { "id": "msg_8f1c8f0", // Zoom only "parent_id": "msg_7c2b4a9", // Zoom only "username": "John Doe", "content": "Foo bar.", "user_avatar": null // Either a URL or null. Do not rely on this field for permanent access. } }

Participant Tracked

Whenever we detect a new participant in the meeting, we'll throw the participant-tracked event. This gets triggered for all existing participants at the start of the meeting.

{ "type": "participant-tracked", "data": { "participantId": "John Doe", "participantName": "John Doe" } }

Started Speaking

When we detect a participant as actively speaking, we'll throw this event.

{ "type": "started-speaking", "data": { "participantId": "John Doe", "participantName": "John Doe" } }

Stopped Speaking

When we detect a participant has stopped speaking, we'll throw this event.

{ "type": "stopped-speaking", "data": { "participantId": "John Doe", "participantName": "John Doe" } }

Stop

Once the meeting is over and therefor the recording has stopped, this event will be fired. Informing you that no further transcripts will be shared and you're safe to disconnect from the websocket server.

{ "type": "stop" }

Error

If anything goes wrong with our transcription provider, then this error will event will be fired. This can be considered the same as stop, no further transcription will be shared. We will have been alerted right away to solve the issue.

{ "type": "error", "data": { "message": "Error message" } }

Websocket Actions

You can also interact with the bot via the websocket to provide feedback as soon as possible. This once again works via a pre-defined JSON structure you send as a message. Examples below will contain JSON, but keep in mind that websocket messages are strings. So you'll need to parse this JSON to string before sending. Validation is complicated with websocket events. So at this point in time there is no validation, if your action does not contain the required data then the action will not be performed.

{ "action": "[action]", "data": {...} }

Send Chat Message

Via this action you can send a message to the chat of the meeting.

Zoom only: include reply_to_message to reply inside a thread. Use the message id from a chat-message event.

{ "action": "chat-message", "data": { "content": "Welcome to the meeting!", "reply_to_message": "msg_7c2b4a9" // Zoom only } }

Stop the bot

This will simply stop the bot.

{ "action": "stop" }

More events coming soon. If you have specific feature requests, please let us know via Discord.