Documentation Index
Fetch the complete documentation index at: https://patter-06b046ce-feat-observability-otel-attrs-0-6-1.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Agents
An agent configuration defines the personality, capabilities, and behavior of your AI voice assistant. Usephone.agent() to validate your agent config before connecting it to a phone number.
Basic Agent
The simplest form leans on env-var fallback and a default engine (OpenAIRealtime):
OpenAILLM, AnthropicLLM, GroqLLM, CerebrasLLM, GoogleLLM. Tool calling works across all five. See LLM for the full reference. For fully custom logic (multi-model routing, local models), drop llm and pass an onMessage callback to serve() instead — llm and onMessage are mutually exclusive.
AgentOptions
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
systemPrompt | string | Yes | — | Instructions that define the agent’s persona and behavior. |
engine | OpenAIRealtime | ElevenLabsConvAI | No | defaults to OpenAIRealtime | End-to-end engine. See Engines. Omit for pipeline mode. |
stt | STTProvider | No | — | STT instance for pipeline mode (new DeepgramSTT(), new CartesiaSTT(), …). |
llm | LLMProvider | No | — | LLM instance for pipeline mode (new AnthropicLLM(), new GroqLLM(), …). Mutually exclusive with onMessage on serve(). Ignored when engine is set. See LLM. |
tts | TTSProvider | No | — | TTS instance for pipeline mode (new ElevenLabsTTS(), new RimeTTS(), …). |
voice | string | No | Provider default | Voice ID. Usually inferred from the engine or TTS instance. |
model | string | No | "gpt-4o-mini-realtime-preview" | Model ID for OpenAI Realtime. Usually inferred from the engine. |
language | string | No | "en" | BCP-47 language code. |
firstMessage | string | No | — | Greeting spoken when the call connects. |
tools | ToolDefinition[] | No | — | Function calling tools. See Tools. |
variables | Record<string, string> | No | — | Dynamic variables for {placeholder} substitution in systemPrompt. |
guardrails | Guardrail[] | No | — | Output guardrails. See Guardrails. |
hooks | PipelineHooks | No | — | Pipeline hooks for intercepting STT/TTS processing. |
textTransforms | ((text: string) => string)[] | No | — | Text transformation functions for LLM responses. |
vad | VADProvider | No | — | Voice activity detection provider. |
audioFilter | AudioFilter | No | — | Audio preprocessing filter. |
backgroundAudio | BackgroundAudioPlayer | No | — | Background audio player. |
bargeInThresholdMs | number | No | 300 | Barge-in hang-over window (ms). Set to 0 to disable. |
aggressiveFirstFlush | boolean | No | false | Opt-in low-latency mode: emits the first clause on soft punctuation (,, em-dash) once the buffer reaches ≥40 chars. Saves 200–500 ms TTFA. Hard-disabled when language="it" (Italian punctuation patterns are incompatible). |
disablePhonePreamble | boolean | No | false | When false (default), Patter prepends a phone-friendly preamble to systemPrompt that instructs the LLM to avoid markdown, emojis, bullet lists, and code blocks; spell out numbers and dates; and keep replies short. Set to true to ship systemPrompt verbatim. |
provider | 'openai_realtime' | 'elevenlabs_convai' | 'pipeline' | No | derived | Provider mode. Normally derived from engine / stt + tts. Pass 'pipeline' explicitly when building a pipeline-mode agent without an engine instance. |
Validation Rules
Thephone.agent() method validates:
- Engine / pipeline: exactly one of
engine,(stt+tts)must resolve correctly. - Tools: must be an array. Each tool requires a
namefield and either awebhookUrlor ahandler. - Variables: must be a plain object (not an array).
Dynamic Variables
Use{placeholder} syntax in your system prompt. Variables are replaced at call time:
System Tools
Two system tools are automatically injected into every agent:transfer_call— Transfers the call to a specified phone number (E.164 format).end_call— Ends the current call with an optional reason.
tools array. The AI model can invoke them based on conversation context.
Voice Activity Detection (VAD)
Pipeline-mode agents can plug a VAD provider into thevad option to gate STT around real speech and drive barge-in detection. The SDK ships Silero VAD (an ONNX model, ~1 MB) with a telephony-tuned factory:
SileroVAD.forPhoneCall(options?) is identical to SileroVAD.load(...) but pins sampleRate to 16 000 Hz — the only sample rate Patter’s pipeline-mode audio bus uses (8 kHz mulaw from Twilio is upsampled to 16 kHz PCM before reaching the VAD). All other parameters use the upstream snakers4/silero-vad defaults:
| Field | Default | Upstream equivalent |
|---|---|---|
activationThreshold | 0.5 | threshold |
deactivationThreshold | 0.35 | neg_threshold = threshold − 0.15 |
minSpeechDuration | 0.25 s | min_speech_duration_ms = 250 |
minSilenceDuration | 0.1 s | min_silence_duration_ms = 100 |
prefixPaddingDuration | 0.03 s | speech_pad_ms = 30 |
minSilenceDuration to 0.5–1.0 s:
SileroVAD.forPhoneCall() returns a Promise<SileroVAD> — await it once at process startup before constructing your agent. The underlying ONNX session is reused across calls.
