Documentation Index
Fetch the complete documentation index at: https://patter-06b046ce-feat-observability-otel-attrs-0-6-1.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
STT (Speech-to-Text)
STT is used in pipeline mode to transcribe caller audio before it reaches your LLM. If you use an engine such asOpenAIRealtime or ElevenLabsConvAI, speech recognition is handled internally by the engine.
Every STT class is imported by name from the package barrel: import { DeepgramSTT } from "getpatter".
Quickstart
Supported providers
| Flat import | Namespaced import | Env var |
|---|---|---|
DeepgramSTT | getpatter/stt/deepgram | DEEPGRAM_API_KEY |
WhisperSTT | getpatter/stt/whisper | OPENAI_API_KEY |
OpenAITranscribeSTT | getpatter/stt/openai-transcribe | OPENAI_API_KEY |
CartesiaSTT | getpatter/stt/cartesia | CARTESIA_API_KEY |
AssemblyAISTT | getpatter/stt/assemblyai | ASSEMBLYAI_API_KEY |
SonioxSTT | getpatter/stt/soniox | SONIOX_API_KEY |
SpeechmaticsSTT | getpatter/stt/speechmatics | SPEECHMATICS_API_KEY |
SpeechmaticsSTT is being ported to TypeScript in the upcoming release — see the ## Unreleased section in CHANGELOG.md. Use Python or wait for the next minor version.Speechmatics is supported by the Python SDK but not yet by the TypeScript SDK — use the Python SDK if you need Speechmatics.
Model enums
Each provider exports a typed const-object of valid model IDs alongside the provider class. They keepmodel options tab-completable and reject typos at construction time, while still accepting raw strings for forward compatibility:
AssemblyAIModel, CartesiaSTTModel, and SonioxModel.
Deepgram
Streaming STT backed by Deepgram’snova-3 model.
| Parameter | Type | Default | Description |
|---|---|---|---|
apiKey | string | — | API key — reads from DEEPGRAM_API_KEY if omitted. |
language | string | "en" | BCP-47 language code. |
model | string | "nova-3" | Deepgram model ID. |
encoding | string | "linear16" | Audio encoding sent to Deepgram. |
sampleRate | number | 16000 | Sample rate in Hz. |
endpointingMs | number | 150 | Utterance endpointing in milliseconds. |
utteranceEndMs | number | null | 1000 | Grace period after speech ends. |
smartFormat | boolean | false | Smart formatting (numbers, dates, punctuation). Defaults to false because telephony agents feed transcripts straight back into an LLM, where smart-format rewrites can confuse downstream tool-call argument parsing. Pass smartFormat: true to opt back in. |
interimResults | boolean | true | Stream interim transcripts. |
vadEvents | boolean | true | Emit VAD start/end markers. |
Whisper (OpenAI)
HTTP-based STT via OpenAI Whisper. ReusesOPENAI_API_KEY.
Whisper on mulaw 8 kHz routinely hallucinates short fillers (
"you", ".", "thank you"). For production prefer OpenAITranscribeSTT (gpt-4o-transcribe) — same OPENAI_API_KEY, ~10× faster, no hallucination floor.OpenAI Transcribe (gpt-4o-transcribe)
First-class STT for OpenAI’sgpt-4o-transcribe and gpt-4o-mini-transcribe models — drop-in replacement for WhisperSTT with stronger multilingual quality and significantly lower latency. Reuses OPENAI_API_KEY.
| Parameter | Type | Default | Description |
|---|---|---|---|
apiKey | string | — | API key — reads from OPENAI_API_KEY if omitted. |
language | string | — | BCP-47 language code. Auto-detect when omitted. |
model | string | "gpt-4o-transcribe" | Either "gpt-4o-transcribe" or "gpt-4o-mini-transcribe". |
responseFormat | string | "json" | Pass "verbose_json" to expose segment-level confidence and timestamps. |
Cartesia
Streaming STT using Cartesia’sink-whisper. See Cartesia setup.
AssemblyAI
Universal Streaming STT via the AssemblyAI v3 WebSocket API. See AssemblyAI setup.Soniox
Real-time STT via Soniox.Missing credentials
Each class throws at construction time if no API key is resolved:What’s Next
LLM
Configure the language model.
TTS
Configure speech synthesis.

