Skip to main content

Documentation Index

Fetch the complete documentation index at: https://patter-06b046ce-feat-observability-otel-attrs-0-6-1.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

TTS (Text-to-Speech)

TTS is used in pipeline mode to synthesize the agent’s response audio. If you use an engine such as OpenAIRealtime or ElevenLabsConvAI, speech synthesis is handled internally by the engine. Each TTS ships as both a namespaced class (from getpatter.tts import elevenlabselevenlabs.TTS()) and a flat alias (from getpatter import ElevenLabsTTS). They are equivalent — the flat aliases are convenient for short examples, the namespaced form avoids name collisions when mixing providers.

Quickstart

import asyncio
from getpatter import Patter, Twilio, DeepgramSTT, ElevenLabsTTS

phone = Patter(carrier=Twilio(), phone_number="+15550001234")  # TWILIO_* from env

agent = phone.agent(
    stt=DeepgramSTT(),                            # DEEPGRAM_API_KEY from env
    tts=ElevenLabsTTS(voice_id="rachel"),         # ELEVENLABS_API_KEY from env
    system_prompt="You are a helpful assistant.",
)

async def main():
    await phone.serve(agent)

asyncio.run(main())
The same agent using namespaced imports:
from getpatter.stt import deepgram
from getpatter.tts import elevenlabs

agent = phone.agent(
    stt=deepgram.STT(),
    tts=elevenlabs.TTS(voice_id="rachel"),
    system_prompt="You are a helpful assistant.",
)

Supported providers

Flat importNamespaced importEnv varInstall extra
ElevenLabsTTSgetpatter.tts.elevenlabs.TTSELEVENLABS_API_KEYincluded
ElevenLabsWebSocketTTSgetpatter.tts.elevenlabs_ws.TTSELEVENLABS_API_KEYincluded
OpenAITTSgetpatter.tts.openai.TTSOPENAI_API_KEYincluded
CartesiaTTSgetpatter.tts.cartesia.TTSCARTESIA_API_KEYgetpatter[cartesia]
RimeTTSgetpatter.tts.rime.TTSRIME_API_KEYgetpatter[rime]
LMNTTTSgetpatter.tts.lmnt.TTSLMNT_API_KEYgetpatter[lmnt]

Model / voice / format enums

Each provider exports typed StrEnums for valid model IDs, voice presets, and output formats alongside the provider class. They keep model= / voice= / output_format= arguments tab-completable and reject typos at construction time, while still accepting raw strings for forward compatibility:
from getpatter.providers.openai_tts import OpenAITTSModel, OpenAITTSVoice
from getpatter.providers.elevenlabs_tts import ElevenLabsModel, ElevenLabsOutputFormat
from getpatter.providers.cartesia_tts import CartesiaTTSModel, CartesiaVoiceSpeed
from getpatter.providers.rime_tts import RimeModel, RimeAudioFormat
from getpatter.providers.lmnt_tts import LMNTModel, LMNTAudioFormat

tts = OpenAITTS(voice=OpenAITTSVoice.NOVA, model=OpenAITTSModel.GPT_4O_MINI_TTS)

ElevenLabs

Streaming HTTP TTS via ElevenLabs. Default model "eleven_flash_v2_5" (~75 ms TTFB, drop-in replacement for eleven_turbo_v2_5). Other valid model_id literals: "eleven_v3", "eleven_turbo_v2_5", "eleven_multilingual_v2", "eleven_monolingual_v1".
from getpatter import ElevenLabsTTS

tts = ElevenLabsTTS()                        # reads ELEVENLABS_API_KEY
tts = ElevenLabsTTS(voice_id="rachel")
tts = ElevenLabsTTS(api_key="...", voice_id="EXAVITQu4vr4xnSDxMaL", model_id="eleven_v3")
ParameterTypeDefaultDescription
api_keystr | NoneNoneAPI key — reads from ELEVENLABS_API_KEY if omitted.
voice_idstr"EXAVITQu4vr4xnSDxMaL" (Sarah)ElevenLabs voice ID (or name).
model_idElevenLabsModel | str"eleven_flash_v2_5"Typed literal: eleven_flash_v2_5 / eleven_turbo_v2_5 / eleven_v3 / eleven_multilingual_v2 / eleven_monolingual_v1.
output_formatstr"pcm_16000"ElevenLabs output format.

Telephony factories — for_twilio() / for_telnyx()

When ElevenLabs runs in pipeline mode behind a phone carrier you can negotiate the carrier-native codec at the ElevenLabs HTTP layer and skip per-chunk SDK-side transcoding. The factory variants do that for you:
from getpatter import ElevenLabsTTS

# Twilio Media Streams: μ-law @ 8 kHz native — no resample, no μ-law encode in Python.
tts = ElevenLabsTTS.for_twilio(voice_id="rachel")

# Telnyx default: PCM @ 16 kHz native — no resample.
tts = ElevenLabsTTS.for_telnyx(voice_id="rachel")
CartesiaTTS.for_twilio() / for_telnyx() and ElevenLabsConvAI.for_twilio() / for_telnyx() work the same way. Use them whenever you know the call will go out over Twilio or Telnyx — they shave tens of milliseconds off TTFB and drop CPU on long calls.

WebSocket variant

ElevenLabsWebSocketTTS is an opt-in low-latency drop-in for ElevenLabsTTS that uses the /stream-input WebSocket endpoint. It saves ~50 ms of HTTP request setup per utterance and avoids TLS cold-starts on bursty traffic. See the ElevenLabs WebSocket setup page for full details.
from getpatter import ElevenLabsWebSocketTTS

tts = ElevenLabsWebSocketTTS()                       # reads ELEVENLABS_API_KEY
tts = ElevenLabsWebSocketTTS.for_twilio(api_key="...")   # ulaw_8000 native
tts = ElevenLabsWebSocketTTS.for_telnyx(api_key="...")   # pcm_16000 native
The WebSocket endpoint does not support eleven_v3* models — use the HTTP ElevenLabsTTS for v3.

OpenAI

from getpatter import OpenAITTS

tts = OpenAITTS()                            # reads OPENAI_API_KEY
tts = OpenAITTS(voice="nova")

# Twilio: skip the intermediate 16 kHz step — resample 24k → 8k directly.
tts = OpenAITTS(target_sample_rate=8000)
ParameterTypeDefaultDescription
api_keystr | NoneNoneAPI key — reads from OPENAI_API_KEY if omitted.
voiceOpenAITTSVoice | strOpenAITTSVoice.ALLOYOne of alloy, echo, fable, onyx, nova, shimmer.
modelOpenAITTSModel | strOpenAITTSModel.GPT_4O_MINI_TTSOpenAI TTS model ID. Older tts-1 / tts-1-hd are accepted as raw strings.
instructionsstr | NoneNoneVoice direction (only honored by gpt-4o-mini-tts and newer).
speedfloat | NoneNonePlayback speed multiplier in [0.25, 4.0].
target_sample_rateint16000Output sample rate. Must be 8000 or 16000. Set to 8000 for Twilio carriers to collapse the 24 k→16 k→8 k chain into a single resample (~1 ms saved per chunk).
OpenAITTSVoice and OpenAITTSModel are exported alongside the provider class:
from getpatter.providers.openai_tts import OpenAITTSVoice, OpenAITTSModel
OpenAI TTS returns audio at 24 kHz — Patter automatically resamples to target_sample_rate (16 kHz by default; pass target_sample_rate=8000 to deliver μ-law-ready PCM directly to Twilio).

Cartesia

Raw PCM streaming via Cartesia’s sonic-2 bytes endpoint. See Cartesia setup.
from getpatter import CartesiaTTS

tts = CartesiaTTS()                          # reads CARTESIA_API_KEY
tts = CartesiaTTS(voice="f786b574-daa5-4673-aa0c-cbe3e8534c02")  # Katie

Rime

Arcana (high fidelity) and Mist (low latency) via Rime’s HTTP endpoint. See Rime setup.
from getpatter import RimeTTS

tts = RimeTTS()                              # reads RIME_API_KEY
tts = RimeTTS(model="arcana", speaker="astra")
tts = RimeTTS(model="mistv2", speaker="cove", speed_alpha=1.1, reduce_latency=True)

LMNT

Blizzard and Aurora via the LMNT HTTP API. See LMNT setup.
from getpatter import LMNTTTS

tts = LMNTTTS()                              # reads LMNT_API_KEY
tts = LMNTTTS(model="blizzard", voice="leah")

Missing credentials

Each class raises ValueError at construction time if no API key is resolved:
ValueError: ElevenLabs TTS requires an api_key. Pass api_key='...' or
set ELEVENLABS_API_KEY in the environment.

What’s Next

STT

Speech-to-text providers.

LLM

Language model providers.