Documentation Index
Fetch the complete documentation index at: https://patter-06b046ce-feat-observability-otel-attrs-0-6-1.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
STT (Speech-to-Text)
STT is used in pipeline mode to transcribe caller audio before it reaches your LLM. If you use an engine such asOpenAIRealtime or ElevenLabsConvAI, speech recognition is handled internally by the engine and you do not configure STT separately.
Each STT ships as both a namespaced class (from getpatter.stt import deepgram → deepgram.STT()) and a flat alias (from getpatter import DeepgramSTT). They are equivalent — pick whichever reads best. The flat aliases are convenient for short examples; the namespaced form avoids name collisions when you import several STTs together.
Quickstart
Supported providers
| Flat import | Namespaced import | Env var | Install extra |
|---|---|---|---|
DeepgramSTT | getpatter.stt.deepgram.STT | DEEPGRAM_API_KEY | included |
WhisperSTT | getpatter.stt.whisper.STT | OPENAI_API_KEY | included |
OpenAITranscribeSTT | getpatter.stt.openai_transcribe.STT | OPENAI_API_KEY | included |
CartesiaSTT | getpatter.stt.cartesia.STT | CARTESIA_API_KEY | getpatter[cartesia] |
AssemblyAISTT | getpatter.stt.assemblyai.STT | ASSEMBLYAI_API_KEY | getpatter[assemblyai] |
SonioxSTT | getpatter.stt.soniox.STT | SONIOX_API_KEY | getpatter[soniox] |
SpeechmaticsSTT | getpatter.stt.speechmatics.STT | SPEECHMATICS_API_KEY | getpatter[speechmatics] |
Model enums
Each provider exports a typedStrEnum of valid model IDs alongside the provider class. They keep model= arguments tab-completable and reject typos at construction time, while still accepting raw strings for forward compatibility:
Deepgram
Streaming STT backed by Deepgram’snova-3 model.
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | None | None | API key — reads from DEEPGRAM_API_KEY if omitted. |
language | str | "en" | BCP-47 language code. |
model | str | "nova-3" | Deepgram model ID. |
encoding | str | "linear16" | Audio encoding sent to Deepgram. |
sample_rate | int | 16000 | Sample rate in Hz. |
endpointing_ms | int | 150 | Utterance endpointing in milliseconds. |
utterance_end_ms | int | None | 1000 | Grace period after speech ends. |
smart_format | bool | False | Smart formatting (numbers, dates, punctuation). Defaults to False because telephony agents feed transcripts straight back into an LLM, where smart-format rewrites can confuse downstream tool-call argument parsing. Pass smart_format=True to opt back in. |
interim_results | bool | True | Stream interim transcripts. |
vad_events | bool | True | Emit VAD start/end markers. |
Whisper (OpenAI)
HTTP-based STT via OpenAI Whisper. ReusesOPENAI_API_KEY.
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | None | None | API key — reads from OPENAI_API_KEY if omitted. |
language | str | "en" | BCP-47 language code. |
model | str | "whisper-1" | Whisper model ID. |
Whisper on mulaw 8 kHz routinely hallucinates short fillers (
"you", ".", "thank you") and emits is_final=true on every chunk regardless of speech. The pipeline drops these by default plus duplicate / sub-500 ms back-to-back finals, but for production prefer OpenAITranscribeSTT (gpt-4o-transcribe) — same OPENAI_API_KEY, ~10× faster, no hallucination floor.OpenAI Transcribe (gpt-4o-transcribe)
First-class STT for OpenAI’sgpt-4o-transcribe and gpt-4o-mini-transcribe models — drop-in replacement for WhisperSTT with stronger multilingual quality and significantly lower latency. Reuses OPENAI_API_KEY.
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | None | None | API key — reads from OPENAI_API_KEY if omitted. |
language | str | None | None | BCP-47 language code. Auto-detect when omitted. |
model | str | "gpt-4o-transcribe" | Either "gpt-4o-transcribe" or "gpt-4o-mini-transcribe". |
response_format | str | "json" | Pass "verbose_json" to expose segment-level confidence and timestamps. |
Cartesia
Streaming STT using Cartesia’sink-whisper. See Cartesia setup.
AssemblyAI
Universal Streaming STT via the AssemblyAI v3 WebSocket API. See AssemblyAI setup.Soniox
Real-time STT via Soniox.Speechmatics
Real-time STT via Speechmatics (Python SDK only — not yet ported to TypeScript).Missing credentials
Each class raisesValueError at construction time if no API key is resolved from either api_key= or the matching env var:
What’s Next
LLM
Configure the language model.
TTS
Configure speech synthesis.

