
Which transcription model should you use?
We looked at real Meeting Bot usage across production teams to see which models developers actually choose, and why.
Choosing a transcription model for a meeting bot is not a theoretical benchmark problem. It is a production trade-off between cost, accuracy, speaker labels, latency, and how messy your users' meetings are.
We looked at the last three months of Skribby model usage across production meeting bots and recordings. The pattern is clear: developers are not all optimizing for the same thing.
Some want the cheapest possible transcript. Some need realtime output. Some only want raw audio. And increasingly, teams are picking models that handle real meeting audio better: accents, interruptions, speaker changes, phone audio, names, numbers, and domain-specific terms.
Usage distribution over the last three months
This data covers recent Skribby usage across EU and Japan production regions.
| Model | Usage share | Notes |
|---|---|---|
| AssemblyAI Universal-2 | 34.7% | Most used overall |
| Soniox STT Async v4 | 25.4% | Previous Soniox async version |
| No transcription | 13.8% | Audio/recording only |
| Deepgram Nova-2 | 6.9% | Common async option |
| Groq Whisper Large v3 Turbo | 5.5% | Low-cost transcript option |
| Soniox STT Real-Time v4 | 4.2% | Previous Soniox realtime version |
| ElevenLabs Scribe v2 | 3.4% | Strong newer async usage |
| Deepgram Nova-3 Multilingual | 2.5% | Newer Deepgram async option |
| Soniox STT Async v5 | 1.5% | New Soniox model, already supported |
| Deepgram Nova-2 Realtime | 1.3% | Realtime alternative |
| Other models | 1.0% | Smaller long-tail usage |
The biggest change from earlier usage data is AssemblyAI moving into the top spot. Soniox is still very strong overall: when you combine Soniox async, realtime, v4, and v5 usage, Soniox accounts for 31.1% of model usage.
The short version
If you want the safest default, start with Soniox or AssemblyAI.
If you care mostly about cost, Whisper is still attractive.
If you need realtime transcription, Soniox Real-Time is the most-used realtime option in our data.
If you want no vendor transcription at all, Skribby lets you run bots without transcription and handle the audio yourself.
That last point matters more than people expect. 13.8% of usage runs with no transcription model. These teams are usually doing one of three things:
- recording meetings and processing the audio later
- sending audio into their own transcription stack
- using Skribby for the meeting bot layer, not the transcription layer
That is a valid architecture. A Meeting Bot API should not force transcription if all you need is reliable meeting capture.
Why AssemblyAI is now the most used model
AssemblyAI Universal-2 made up 34.7% of the last three months of usage.
That does not automatically mean it is the best model for every use case. Usage is shaped by customer mix, defaults, pricing, and product requirements. But it does show that many production teams are willing to pay for a more feature-rich transcription model when the meeting output matters.
AssemblyAI is a good fit when you need:
- speaker labels
- stable general-purpose transcription
- profanity filtering
- custom vocabulary
- a mature async transcription path
For customer calls, recruiting notes, compliance review, and internal meeting intelligence, that feature set can be worth the extra cost.
Soniox is still the strongest all-rounder
Soniox remains one of the most important models in Skribby.
Combined Soniox usage across v4 async, v4 realtime, v5 async, and v5 realtime was 31.1% over the last three months.
The appeal is simple: Soniox gives you a strong balance of price, quality, speaker separation, multilingual support, and realtime availability.
For meeting bots, that mix is practical. Real meetings are noisy. People interrupt each other. They switch languages. They mention names, numbers, products, and weird internal acronyms. A model that looks good on clean audio can still struggle when it hits a normal sales call or team meeting.
Soniox is built closer to that reality than many generic transcription options.
Soniox v5 is now supported in Skribby
Soniox recently introduced new v5 models:
soniox/stt-async-v5soniox/stt-rt-v5
Skribby supports both.
We are already seeing early usage in production. Soniox v5 Async is appearing in the model mix, and Soniox v5 Real-Time is starting to show up as well.
Soniox describes v5 as a major upgrade for real-world speech: better accuracy, stronger speaker separation, improved language identification, better handling of names and alphanumeric strings, and improved realtime endpointing.
Those upgrades matter specifically for meeting bots. The hardest parts of meeting transcription are rarely the easy sentences. They are the moments where someone says a customer name, a product code, a date, an account number, or an action item while two people are talking over each other.
That is where better speech models become better product experiences.
Realtime is still a minority, but it matters
Realtime models made up 5.8% of usage over the last three months.
That is still a minority. Most teams can wait until the meeting ends, because async transcription is simpler and often cheaper.
But realtime usage is strategically important. It powers products that need to act during the meeting:
- live captions
- AI meeting copilots
- sales coaching
- customer support escalation
- compliance monitoring
- live CRM enrichment
- agent workflows that need immediate context
Soniox realtime appears clearly in the data, with v5 Real-Time now supported and starting to show up in usage.
If your product needs live transcript events, realtime model quality is not just a transcription decision. It affects latency, interruption handling, endpointing, and how natural the product feels.
Whisper is still useful, but know the trade-off
Groq Whisper Large v3 Turbo accounted for 5.5% of usage.
Whisper remains attractive because it is cheap and broadly capable. It is a good fit when you need a plain transcript and do not care much about speaker labels or realtime output.
That makes it useful for:
- content repurposing
- searchable archives
- simple summaries
- internal analysis pipelines
- teams that want the lowest transcription cost
The trade-off is speaker diarization. If your product needs to know who said what, Whisper is usually not the right default.
Deepgram and ElevenLabs have meaningful niches
Deepgram Nova-2 and Nova-3 together account for a meaningful slice of usage, especially when you include realtime variants.
Deepgram is often picked by teams that care about speed, developer experience, or realtime use cases. Nova-3 Multilingual is also starting to show up in production usage.
ElevenLabs Scribe v2 is also showing real adoption. It is becoming part of the practical model set teams evaluate for meeting transcription.
The takeaway: there is no single universal winner. The best model depends on the product you are building.
How to choose a model for your meeting bot
Start with the product requirement, not the model name.
If you need the best default balance:
Use Soniox or AssemblyAI. Both are strong production choices, and both handle more than just plain text.
If you need realtime:
Use Soniox Real-Time or Deepgram Realtime. If you are building a live AI agent or meeting copilot, do not pick an async model and hope to work around it.
If you need the lowest cost:
Use Whisper, or run with no transcription and process the audio yourself.
If you need multilingual meetings:
Evaluate Soniox v5, Deepgram Nova-3 Multilingual, and Whisper on your actual audio. Do not rely only on provider language lists.
If you need speaker-aware meeting intelligence:
Avoid models without diarization. Speaker labels are not a nice-to-have if your product extracts action items, objections, decisions, or commitments.
Why model choice matters less when the API normalizes output
One of the reasons Skribby supports multiple transcription providers is that customers should not have to rebuild their meeting bot integration every time they test a model.
Skribby handles the meeting bot layer for Google Meet, Microsoft Teams, and Zoom. It also normalizes transcription output across providers so you can switch models without rewriting your whole integration.
That matters because the right model can change as your product changes.
You might start with Whisper for cost. Move to Soniox for speaker labels. Add realtime transcription later for live agents. Test AssemblyAI for a customer segment that needs specific features. Or bring your own transcription key and run your own provider relationship.
A good Meeting Bot API should make those choices easy.
Our recommendation
For most teams building with meeting bots, start with Soniox v5 or AssemblyAI Universal-2.
Pick Soniox v5 if you care about the balance between cost, multilingual speech, speaker-aware output, and realtime support.
Pick AssemblyAI if you want a feature-rich async model that is already heavily used in production across our customer base.
Pick Whisper if price matters more than speaker labels.
Pick no transcription if Skribby is only your meeting capture layer and you want to process audio somewhere else.
The important part is not picking the model with the best marketing page. It is picking the model that matches your product's actual meeting audio.
FAQ
What is the most used transcription model in Skribby right now?
AssemblyAI Universal-2 is the most used model over the last three months, with 34.7% of model usage.
Is Soniox still popular?
Yes. Combined Soniox usage across v4 and v5, async and realtime, represents 31.1% of model usage in the last three months.
Does Skribby support Soniox v5?
Yes. Skribby supports soniox/stt-async-v5 and soniox/stt-rt-v5.
How much usage is realtime?
Realtime models account for 5.8% of usage over the last three months.
Why do some bots use no transcription model?
Some teams only need the meeting recording or realtime audio stream. Others process the audio later with their own transcription stack or bring their own provider setup.
Can I switch transcription models without changing my integration?
Yes. Skribby normalizes transcription output across providers, so you can test or switch models without rebuilding your meeting bot integration.
What model should I use for live AI agents?
Use a realtime model. Soniox Real-Time is the clearest realtime option in our recent data, and Soniox v5 Real-Time is now supported.