Documentation Index
Fetch the complete documentation index at: https://patter-06b046ce-feat-observability-otel-attrs-0-6-1.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Agent Configuration
AnAgent defines how your voice AI behaves: what it says, how it sounds, what tools it can use, and what guardrails it follows.
Creating an Agent
Use thephone.agent() factory method. The simplest form leans on env-var fallback and a default engine (OpenAIRealtime):
OpenAILLM, AnthropicLLM, GroqLLM, CerebrasLLM, GoogleLLM. Tool calling works across all five. See LLM for the full reference. For fully custom logic (multi-model routing, local models), drop llm= and pass an on_message callback to serve() instead — llm= and on_message are mutually exclusive.
The same pipeline using namespaced imports:
Agent Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
system_prompt | str | required | Instructions that define the agent’s behavior. |
engine | OpenAIRealtime | ElevenLabsConvAI | None | None → OpenAI Realtime | End-to-end engine. See Engines. Omit for pipeline mode. |
stt | STTProvider | None | None | STT instance for pipeline mode (DeepgramSTT(), CartesiaSTT(), …). See STT. |
llm | LLMProvider | None | None | LLM instance for pipeline mode (AnthropicLLM(), GroqLLM(), …). Mutually exclusive with on_message on serve(). Ignored when engine is set. See LLM. |
tts | TTSProvider | None | None | TTS instance for pipeline mode (ElevenLabsTTS(), RimeTTS(), …). See TTS. |
voice | str | "alloy" | Voice name. Usually inferred from the engine or TTS instance. |
model | str | "gpt-4o-mini-realtime-preview" | Model ID for OpenAI Realtime. Usually inferred from the engine. |
language | str | "en" | BCP-47 language code. |
first_message | str | "" | If set, the agent speaks this immediately when a call connects. |
tools | list[Tool] | None | None | Tool(...) instances for function calling. See Tools. |
variables | dict | None | None | Dynamic variable substitutions for {placeholder} patterns in the system prompt. Values limited to 500 chars. |
guardrails | list[Guardrail] | None | None | Guardrail(...) instances applied to LLM output. See Guardrails. |
hooks | PipelineHooks | None | None | Pipeline hooks for intercepting STT/TTS processing. Pipeline mode only. See Events. |
text_transforms | list[Callable] | None | None | Text transformation functions applied to LLM output before TTS. Pipeline mode only. |
vad | VADProvider | None | None | Voice activity detection provider (e.g. Silero). Pipeline mode only. |
audio_filter | AudioFilter | None | None | Pre-STT audio filter (e.g. Krisp noise suppression). Pipeline mode only. |
background_audio | BackgroundAudioPlayer | None | None | Hold music / ambient-cue mixer. Pipeline mode only. |
barge_in_threshold_ms | int | 300 | Sustained-voice window (ms) before treating caller audio as barge-in. Set to 0 to disable. |
aggressive_first_flush | bool | False | Opt-in low-latency mode: emits the first clause on a soft punctuation boundary (,, em-dash, en-dash) once the buffer reaches ~40 chars. Saves 200–500 ms TTFA on the first sentence at the cost of slightly clipped prosody. Hard-disabled when language starts with "it" (Italian decimal commas would split mid-number). Pipeline mode only. |
disable_phone_preamble | bool | False | When False (default), Patter prepends a phone-friendly preamble to system_prompt that instructs the LLM to avoid markdown, emojis, bullet lists, and code blocks; spell out numbers and dates; and keep replies short. Set to True to ship system_prompt verbatim. |
Agent Dataclass
Agent is a frozen (immutable) dataclass. You can construct it directly when you need a dataclass outside of phone.agent():
Prefer
phone.agent() over constructing Agent directly — the factory method validates credentials, unpacks the engine/STT/TTS instances, and surfaces clear errors up front.System Prompt
Thesystem_prompt defines the agent’s personality, instructions, and constraints:
Dynamic Variables
Use{placeholder} syntax in the system prompt to inject dynamic values at call start. Values are limited to 500 characters each.
First Message
Whenfirst_message is set, the agent speaks it immediately when a call connects:
Voice Selection
Voice is usually inferred from the engine or TTS instance — e.g.OpenAIRealtime(voice="nova") or ElevenLabsTTS(voice_id="rachel"). Available voices depend on the provider.
- OpenAI Realtime
- ElevenLabs
- Pipeline
"alloy", "echo", "fable", "onyx", "nova", "shimmer"Voice Activity Detection (VAD)
Pipeline-mode agents can plug a VAD provider into thevad= parameter to gate STT around real speech and drive barge-in detection. The SDK ships Silero VAD (an ONNX model, ~1 MB) with a telephony-tuned factory:
SileroVAD.for_phone_call(**overrides) is identical to SileroVAD.load(...) but pins sample_rate to 16 000 Hz — the only sample rate Patter’s pipeline-mode audio bus uses (8 kHz mulaw from Twilio is upsampled to 16 kHz PCM before reaching the VAD). All other parameters use the upstream snakers4/silero-vad defaults:
| Field | Default | Upstream equivalent |
|---|---|---|
activation_threshold | 0.5 | threshold |
deactivation_threshold | 0.35 | neg_threshold = threshold − 0.15 |
min_speech_duration | 0.25 s | min_speech_duration_ms = 250 |
min_silence_duration | 0.1 s | min_silence_duration_ms = 100 |
prefix_padding_duration | 0.03 s | speech_pad_ms = 30 |
min_silence_duration to 0.5–1.0 s:
SileroVAD.load(...) and SileroVAD.for_phone_call(...) are synchronous (they load the ONNX model). Wrap them in asyncio.to_thread(...) so the event loop stays responsive during process startup.
