Skribby

Skribby tracks exactly when each participant speaks, allowing you to build rich meeting visualizations or synchronized transcripts.

Real-time Speaking Events

If you use a Real-time Transcription model or have the Real-time Audio addon enabled, you can listen for speaking events via WebSockets:

started-speaking: Fired as soon as a participant's voice is detected.
stopped-speaking: Fired when a participant stops talking.

realtimeClient.on('started-speaking', (data) => {
    console.log(`${data.participantName} started speaking.`);
});

Historical Speaker Data

Once a bot reaches the finished status, you can retrieve the full timeline of speaking events via the API.

`with-speaker-events` Option

By default, the GET /bot/{id} endpoint returns participant data without detailed event logs to keep the response size small. To include the full speaking timeline, add the with-speaker-events=true query parameter:

GET /api/v1/bot/{id}?with-speaker-events=true

Example Response

{
    "participants": [
        {
            "name": "John Doe",
            "events": [
                {
                    "type": "started-speaking",
                    "timestamp": 1750820602963
                },
                {
                    "type": "stopped-speaking",
                    "timestamp": 1750820605120
                }
            ]
        }
    ]
}

Speaker Identification

Real-time Identification

When using a real-time model, Skribby may initially label participants as Speaker 1, Speaker 2, etc. As the meeting progresses, our system correlates audio streams with the platform's participant list. Once identified, the labels will transition to actual display names (e.g., Jane Smith).

Asynchronous (Post-call) Identification

For non-realtime models, speaker identification is performed during post-processing. Skribby correlates the transcription engine's speaker segments with the meeting's participant metadata to automatically assign names to speaker IDs.

If the system is highly confident in the alignment, the speaker_name is assigned directly. In cases of lower confidence or overlapping signals, we provide a potential_speaker_names property. This includes a list of likely participants along with a confidence score, allowing your application to decide how to present the data.