ASR Providers

DeLive supports twelve ASR backends through a unified provider registry. Each provider implements a common contract but uses different transport and audio processing strategies.

Need an API key? See the API Key Guide for step-by-step instructions on obtaining keys for each provider.

Provider Comparison

Provider	Type	Transport	Audio	Streaming	Translation	Diarization	File
Soniox V4	Cloud	WebSocket	MediaRecorder (WebM/Opus)	Yes	Yes	Yes	Yes
Volcengine	Cloud	WebSocket (via proxy)	AudioWorklet (PCM16)	Yes	No	No	Yes
ElevenLabs	Cloud	WebSocket (via proxy)	AudioWorklet (PCM16)	Yes	No	No	Yes
Mistral AI	Cloud	WebSocket (via proxy)	AudioWorklet (PCM16)	Yes	No	No	Yes
Gladia	Cloud	WebSocket (via proxy)	AudioWorklet (PCM16)	Yes	No	No	Yes
Deepgram	Cloud	WebSocket (via proxy)	AudioWorklet (PCM16)	Yes	No	No	Yes
AssemblyAI	Cloud	WebSocket (via proxy)	AudioWorklet (PCM16)	Yes	No	No	Yes
Cloudflare Workers AI	Cloud	REST (batch)	AudioWorklet (PCM16)	No	No	No	Yes
SiliconFlow	Cloud	REST (batch)	AudioWorklet (PCM16)	No	No	No	Yes
Groq	Cloud	REST (batch)	AudioWorklet (PCM16)	No	No	No	Yes
Local OpenAI	Local	REST (batch)	MediaRecorder (WebM/Opus)	No	No	No	No
whisper.cpp	Local	REST (local)	AudioWorklet (PCM16)	No	No	No	No

Execution Modes

Real-Time Streaming

Used by Soniox, Volcengine, ElevenLabs, Mistral AI, Gladia, Deepgram, and AssemblyAI. Audio chunks are sent continuously over a WebSocket connection, and transcript updates arrive in real-time.

Soniox emits token-level events (prefersTokenEvents: true) for fine-grained text updates
Volcengine, ElevenLabs, Mistral AI, Gladia, Deepgram, and AssemblyAI use local proxies on port 23456 to inject required authentication headers

Windowed Batch

Used by Cloudflare Workers AI, SiliconFlow, Groq, Local OpenAI-compatible, and whisper.cpp. Audio accumulates in a rolling buffer (max 45 seconds), and a REST call retranscribes the entire window at regular intervals.

Interval mode (Cloudflare, SiliconFlow, Groq, whisper.cpp): retranscribe every 1.5 seconds
Debounce mode (Local OpenAI): retranscribe 1200ms after the last audio chunk
A TranscriptStabilizer compares successive transcriptions and commits stable text prefixes, preventing text flickering

Electron-Managed Runtime

Used by whisper.cpp. DeLive manages the whisper-server binary lifecycle:

Import or download the binary and model
DeLive spawns the process and waits for HTTP readiness (up to 20 seconds)
Audio is sent to POST /inference as WAV
Process is stopped on disconnect or app quit

Soniox V4

The most feature-rich provider with real-time streaming, translation, and speaker diarization.

Required: apiKey

Optional: model, languageHints, translationEnabled, translationTargetLanguage, enableSpeakerDiarization

Features:

Token-level real-time transcription
Real-time translation with dual-line captions
Speaker diarization with labeled tokens
Audio format: auto (WebM/Opus from MediaRecorder)

Volcengine (火山引擎)

Chinese-focused real-time streaming through an embedded proxy.

Required: appKey, accessKey

Optional: languageHints

The browser cannot set custom WebSocket headers, so DeLive runs an embedded HTTP proxy in the Electron main process that forwards PCM16 audio to ByteDance's openspeech.bytedance.com endpoint with the required authentication headers.

Groq

Whisper large-v3-turbo / large-v3 through Groq's high-performance inference API.

Required: apiKey

Optional: model, languageHints

SiliconFlow (硅基流动)

SenseVoice, TeleSpeech, and Qwen Omni models through SiliconFlow's API.

Required: apiKey

Optional: model, languageHints

Mistral AI

Voxtral Realtime streaming ASR through the Mistral API.

Required: apiKey

Optional: model, languageHints

Uses a local WebSocket proxy (/ws/mistral on port 23456) to inject Authorization headers. Supports the Voxtral model family for real-time transcription.

Deepgram

Nova-3 and Nova-2 real-time streaming ASR through Deepgram's API.

Required: apiKey

Optional: model, languageHints

Uses a local WebSocket proxy (/ws/deepgram on port 23456) to inject Authorization: Token headers. Best for English and multilingual content.

AssemblyAI

Universal-3 Pro real-time streaming ASR through AssemblyAI's WebSocket API.

Required: apiKey

Optional: model

Uses a local WebSocket proxy (/ws/assemblyai on port 23456) to inject Authorization headers. Supports 6 streaming languages; best suited for English content.

ElevenLabs

Scribe v2 Realtime ASR through ElevenLabs' WebSocket API.

Required: apiKey

Optional: model, languageHints

Uses a local WebSocket proxy (/ws/elevenlabs on port 23456) to inject xi-api-key headers. Supports 90+ languages including Mandarin Chinese. Audio is sent as base64-encoded JSON payloads.

Gladia

Solaria-1 real-time streaming ASR with sub-300ms latency and 100+ language support.

Required: apiKey

Optional: model, languageHints

Uses a local WebSocket proxy (/ws/gladia on port 23456) that handles HTTP POST session initialization and injects the x-gladia-key authentication header. Supports live capture via system audio.

Cloudflare Workers AI

Whisper-based transcription through Cloudflare's Workers AI platform. Low cost with a generous free tier.

Required: apiToken, accountId

Optional: model, languageHints

Uses windowed batch retranscription with VAD filtering and anti-hallucination measures. Supports both live capture and file transcription. Available models include @cf/openai/whisper and @cf/openai/whisper-large-v3-turbo.

Local OpenAI-Compatible

Works with Ollama or any service exposing the OpenAI-compatible audio transcription endpoint.

Required: baseUrl, model

Optional: apiKey, languageHints

DeLive can probe the service at baseUrl, list installed models via /v1/models, and pull models from Ollama if detected.

Local whisper.cpp

Fully offline transcription using the whisper-server binary.

Required: modelPath

Optional: binaryPath, port (default 8177), languageHints

DeLive can import or download both the binary and model files. Silent audio chunks are automatically skipped to reduce unnecessary inference.

ASR Providers ​

Provider Comparison ​

Execution Modes ​

Real-Time Streaming ​

Windowed Batch ​

Electron-Managed Runtime ​

Soniox V4 ​

Volcengine (火山引擎) ​

Groq ​

SiliconFlow (硅基流动) ​

Mistral AI ​

Deepgram ​

AssemblyAI ​

ElevenLabs ​

Gladia ​

Cloudflare Workers AI ​

Local OpenAI-Compatible ​

Local whisper.cpp ​

ASR Providers

Provider Comparison

Execution Modes

Real-Time Streaming

Windowed Batch

Electron-Managed Runtime

Soniox V4

Volcengine (火山引擎)

Groq

SiliconFlow (硅基流动)

Mistral AI

Deepgram

AssemblyAI

ElevenLabs

Gladia

Cloudflare Workers AI

Local OpenAI-Compatible

Local whisper.cpp