Documentation Index
Fetch the complete documentation index at: https://patter-06b046ce-feat-observability-otel-attrs-0-6-1.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
TTS (Text-to-Speech)
TTS is used in pipeline mode to synthesize the agent’s response audio. If you use an engine such asOpenAIRealtime or ElevenLabsConvAI, speech synthesis is handled internally by the engine.
Every TTS class is imported by name from the package barrel: import { ElevenLabsTTS } from "getpatter".
Quickstart
Supported providers
| Class | Env var |
|---|---|
ElevenLabsTTS | ELEVENLABS_API_KEY |
ElevenLabsWebSocketTTS | ELEVENLABS_API_KEY |
OpenAITTS | OPENAI_API_KEY |
CartesiaTTS | CARTESIA_API_KEY |
RimeTTS | RIME_API_KEY |
LMNTTTS | LMNT_API_KEY |
Model / voice / format enums
Each provider exports typed const-objects for valid model IDs, voice presets, and output formats alongside the provider class. They keepmodel / voice / outputFormat options tab-completable and reject typos at construction time, while still accepting raw strings for forward compatibility:
ElevenLabs
Streaming HTTP TTS via ElevenLabs. Default model"eleven_flash_v2_5" (~75 ms TTFB, drop-in replacement for eleven_turbo_v2_5). Other valid modelId literals: "eleven_v3", "eleven_turbo_v2_5", "eleven_multilingual_v2", "eleven_monolingual_v1".
| Parameter | Type | Default | Description |
|---|---|---|---|
apiKey | string | — | API key — reads from ELEVENLABS_API_KEY if omitted. |
voiceId | string | "EXAVITQu4vr4xnSDxMaL" (Sarah) | ElevenLabs voice ID (or name). |
modelId | ElevenLabsModel | string | "eleven_flash_v2_5" | Typed literal: eleven_flash_v2_5 / eleven_turbo_v2_5 / eleven_v3 / eleven_multilingual_v2 / eleven_monolingual_v1. |
outputFormat | string | "pcm_16000" | ElevenLabs output format. |
Telephony factories — forTwilio() / forTelnyx()
When ElevenLabs runs in pipeline mode behind a phone carrier you can negotiate the carrier-native codec at the ElevenLabs HTTP layer and skip per-chunk SDK-side transcoding. The factory variants do that for you:
CartesiaTTS.forTwilio() / forTelnyx() and ElevenLabsConvAI.forTwilio() / forTelnyx() work the same way. Use them whenever you know the call will go out over Twilio or Telnyx — they shave tens of milliseconds off TTFB and drop CPU on long calls.
WebSocket variant:
ElevenLabsWebSocketTTS is a drop-in alternative that streams audio over a WebSocket connection, saving ~50 ms of HTTP setup + TLS cold-start per utterance. See ElevenLabs WebSocket TTS for the full reference and limitations.OpenAI
| Parameter | Type | Default | Description |
|---|---|---|---|
apiKey | string | — | API key — reads from OPENAI_API_KEY if omitted. |
voice | OpenAITTSVoice | string | "alloy" | One of alloy, echo, fable, onyx, nova, shimmer. |
model | OpenAITTSModel | string | "gpt-4o-mini-tts" | OpenAI TTS model ID. |
instructions | string | — | Voice direction (only honored by gpt-4o-mini-tts and newer). |
speed | number | — | Playback speed multiplier in [0.25, 4.0]. |
targetSampleRate | 8000 | 16000 | 16000 | Output sample rate. Set to 8000 for Twilio carriers to collapse the 24 k→16 k→8 k chain into a single resample. |
OpenAITTSVoice and OpenAITTSModel are exported alongside the provider class.
OpenAI TTS returns audio at 24 kHz — Patter automatically resamples to
targetSampleRate (16 kHz by default; pass targetSampleRate: 8000 to deliver μ-law-ready PCM directly to Twilio).Cartesia
Raw PCM streaming via Cartesia’s sonic-2 bytes endpoint. See Cartesia setup.Rime
Arcana (high fidelity) and Mist (low latency) via Rime’s HTTP endpoint. See Rime setup.LMNT
Blizzard and Aurora via the LMNT HTTP API. See LMNT setup.Missing credentials
Each class throws at construction time if no API key is resolved:What’s Next
STT
Speech-to-text providers.
LLM
Language model providers.

