Documentation Index
Fetch the complete documentation index at: https://patter-06b046ce-feat-observability-otel-attrs-0-6-1.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
TTS (Text-to-Speech)
TTS is used in pipeline mode to synthesize the agent’s response audio. If you use an engine such asOpenAIRealtime or ElevenLabsConvAI, speech synthesis is handled internally by the engine.
Each TTS ships as both a namespaced class (from getpatter.tts import elevenlabs → elevenlabs.TTS()) and a flat alias (from getpatter import ElevenLabsTTS). They are equivalent — the flat aliases are convenient for short examples, the namespaced form avoids name collisions when mixing providers.
Quickstart
Supported providers
| Flat import | Namespaced import | Env var | Install extra |
|---|---|---|---|
ElevenLabsTTS | getpatter.tts.elevenlabs.TTS | ELEVENLABS_API_KEY | included |
ElevenLabsWebSocketTTS | getpatter.tts.elevenlabs_ws.TTS | ELEVENLABS_API_KEY | included |
OpenAITTS | getpatter.tts.openai.TTS | OPENAI_API_KEY | included |
CartesiaTTS | getpatter.tts.cartesia.TTS | CARTESIA_API_KEY | getpatter[cartesia] |
RimeTTS | getpatter.tts.rime.TTS | RIME_API_KEY | getpatter[rime] |
LMNTTTS | getpatter.tts.lmnt.TTS | LMNT_API_KEY | getpatter[lmnt] |
Model / voice / format enums
Each provider exports typedStrEnums for valid model IDs, voice presets, and output formats alongside the provider class. They keep model= / voice= / output_format= arguments tab-completable and reject typos at construction time, while still accepting raw strings for forward compatibility:
ElevenLabs
Streaming HTTP TTS via ElevenLabs. Default model"eleven_flash_v2_5" (~75 ms TTFB, drop-in replacement for eleven_turbo_v2_5). Other valid model_id literals: "eleven_v3", "eleven_turbo_v2_5", "eleven_multilingual_v2", "eleven_monolingual_v1".
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | None | None | API key — reads from ELEVENLABS_API_KEY if omitted. |
voice_id | str | "EXAVITQu4vr4xnSDxMaL" (Sarah) | ElevenLabs voice ID (or name). |
model_id | ElevenLabsModel | str | "eleven_flash_v2_5" | Typed literal: eleven_flash_v2_5 / eleven_turbo_v2_5 / eleven_v3 / eleven_multilingual_v2 / eleven_monolingual_v1. |
output_format | str | "pcm_16000" | ElevenLabs output format. |
Telephony factories — for_twilio() / for_telnyx()
When ElevenLabs runs in pipeline mode behind a phone carrier you can negotiate the carrier-native codec at the ElevenLabs HTTP layer and skip per-chunk SDK-side transcoding. The factory variants do that for you:
CartesiaTTS.for_twilio() / for_telnyx() and ElevenLabsConvAI.for_twilio() / for_telnyx() work the same way. Use them whenever you know the call will go out over Twilio or Telnyx — they shave tens of milliseconds off TTFB and drop CPU on long calls.
WebSocket variant
ElevenLabsWebSocketTTS is an opt-in low-latency drop-in for ElevenLabsTTS that uses the /stream-input WebSocket endpoint. It saves ~50 ms of HTTP request setup per utterance and avoids TLS cold-starts on bursty traffic. See the ElevenLabs WebSocket setup page for full details.
The WebSocket endpoint does not support
eleven_v3* models — use the HTTP ElevenLabsTTS for v3.OpenAI
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | None | None | API key — reads from OPENAI_API_KEY if omitted. |
voice | OpenAITTSVoice | str | OpenAITTSVoice.ALLOY | One of alloy, echo, fable, onyx, nova, shimmer. |
model | OpenAITTSModel | str | OpenAITTSModel.GPT_4O_MINI_TTS | OpenAI TTS model ID. Older tts-1 / tts-1-hd are accepted as raw strings. |
instructions | str | None | None | Voice direction (only honored by gpt-4o-mini-tts and newer). |
speed | float | None | None | Playback speed multiplier in [0.25, 4.0]. |
target_sample_rate | int | 16000 | Output sample rate. Must be 8000 or 16000. Set to 8000 for Twilio carriers to collapse the 24 k→16 k→8 k chain into a single resample (~1 ms saved per chunk). |
OpenAITTSVoice and OpenAITTSModel are exported alongside the provider class:
OpenAI TTS returns audio at 24 kHz — Patter automatically resamples to
target_sample_rate (16 kHz by default; pass target_sample_rate=8000 to deliver μ-law-ready PCM directly to Twilio).Cartesia
Raw PCM streaming via Cartesia’s sonic-2 bytes endpoint. See Cartesia setup.Rime
Arcana (high fidelity) and Mist (low latency) via Rime’s HTTP endpoint. See Rime setup.LMNT
Blizzard and Aurora via the LMNT HTTP API. See LMNT setup.Missing credentials
Each class raisesValueError at construction time if no API key is resolved:
What’s Next
STT
Speech-to-text providers.
LLM
Language model providers.

