Documentation Index
Fetch the complete documentation index at: https://patter-06b046ce-feat-observability-otel-attrs-0-6-1.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Advanced Features
Patter includes production-ready features for handling real-world telephony scenarios.Call Recording
Enable recording on a per-call basis by passingrecording=True to serve(). Recordings are created via the Twilio Recordings API.
Call recording is available in local mode with Twilio. Recordings are managed through the Twilio Recordings API and stored in your Twilio account.
Accessing Recordings
Use the Twilio API to list and download recordings:Answering Machine Detection (AMD)
Detect whether a human or machine answered an outbound call. When a machine is detected, optionally leave a voicemail message and hang up.| Parameter | Type | Default | Description |
|---|---|---|---|
machine_detection | bool | False | Enable answering machine detection. |
voicemail_message | str | "" | Message to speak when a machine is detected. If empty, the call hangs up silently. |
How It Works
- Patter initiates the outbound call with AMD enabled.
- The telephony provider analyzes the audio to determine if a human or machine answered.
- Human detected: The call proceeds normally with the agent.
- Machine detected: If
voicemail_messageis set, it is spoken and the call ends. Otherwise, the call is disconnected.
voicemail_message on serve() for use with any outbound call made during the server’s lifetime:
DTMF Input
Keypad presses (DTMF tones) during a call are captured and forwarded to the AI agent as natural language text in the format[DTMF: N], where N is the key pressed (0-9, *, #).
Call Transfer
Patter automatically injects atransfer_call system tool into every agent. The AI decides when to transfer based on the conversation context and system prompt instructions.
You do not need to define
transfer_call as a tool. It is injected automatically by Patter.Barge-In (Interruption Handling)
Patter uses mark-based tracking for precise interruption handling. When a caller speaks while the agent is talking, the system:- Detects the interruption via audio marks sent by the telephony provider.
- Stops the current TTS playback at the exact point of interruption.
- Processes the caller’s new input immediately.
Configuration
Barge-in is enabled by default with a 300 ms hang-over window. Customize the sensitivity usingbarge_in_threshold_ms:
| Parameter | Type | Default | Description |
|---|---|---|---|
barge_in_threshold_ms | int | 300 | Hang-over window in milliseconds. Set to 0 to disable barge-in. Higher values delay interruption detection. |
Echo Cancellation (NLMS AEC)
On speakerphone or dev-tunnel deployments the agent’s outbound TTS bleeds back into the inbound mic feed. The pipeline-mode VAD then sees continuous voice-like energy and never registers silence — barge-in only fires during natural pauses in the TTS, producing the intermittent “interrupt sometimes works, other times the agent keeps talking” symptom. Acoustic echo cancellation (AEC) subtracts the estimated echo from the mic stream before VAD/STT see it. Patter ships a built-in NLMS (normalised least-mean-squares) adaptive filter with Geigel double-talk detection. Enable it with one flag — pipeline mode only:| Parameter | Type | Default | Description |
|---|---|---|---|
echo_cancellation | bool | False | When True (pipeline mode only), instantiates an NlmsEchoCanceller per call that subtracts the agent’s own TTS bleed from the inbound mic stream before VAD/STT see it. |
When to enable
- Enable for speakerphone callers, ngrok / Cloudflare tunnel demos, laptop-mic test harnesses, and any deployment where the agent can hear itself.
- Leave off for handset / headset callers — there is no bleed to cancel, and the 0.5–2 s convergence period would briefly attenuate caller speech if they spoke before any TTS played.
- See Barge-In above — AEC is the fix when barge-in only fires intermittently because of self-bleed.
Tuning
The defaultNlmsEchoCanceller is tuned for narrowband mono 16 kHz PCM (the format Patter’s pipeline pushes between transcoding and STT). For lower-level control — custom tap counts, step size, warmup behaviour — instantiate one directly and wire it into your pipeline:
| Constructor arg | Default | Notes |
|---|---|---|
sample_rate | 16000 | 8000 or 16000 only. |
filter_taps | 512 | 32 ms @ 16 kHz — covers typical cellular / VoIP echo paths. |
step_size | 0.1 | NLMS step in (0, 1] post-warmup. |
warmup_step_size | 0.5 | Aggressive 5× ramp during the first ~0.5 s for fast convergence. |
warmup_seconds | 0.5 | Duration of the warmup phase. |
leakage | 0.9999 | Slow forgetting of stale tap estimates. |
double_talk_rho | 0.6 | Geigel threshold — freezes adaptation when caller speaks over agent. |
NLMS AEC adds CPU work proportional to
filter_taps × frame_samples per inbound frame (~0.5–1 ms per 20 ms frame at the defaults). On commodity CPUs this is well under the per-frame budget, but profile if you stack AEC with heavy VAD + STT in the same event loop.Aggressive First-Flush (Low-Latency)
In pipeline mode, the sentence chunker normally waits for a hard sentence terminator (., !, ?, etc.) before emitting a chunk to TTS. With aggressive_first_flush=True on phone.agent(...), the chunker emits the first clause of each response on a soft punctuation boundary (,, em-dash —, en-dash –) once the buffer reaches ~40 characters.
Sentence chunker — abbreviations & terminators
The chunker does not split on common abbreviations (no spurious sentence breaks afterDr., vs., etc.). Coverage:
- English:
Mr,Mrs,Ms,Dr,St,Jr,Sr,Prof,Hon,Rev,vs,etc,Gen,Sen, plus the standard month/measurement set. - Italian:
Sig,Sig.ra,Sgr,Dott,Dott.ssa,Prof,Avv,Ing,Geom,Rag,Arch,On,Egr,Spett,Gent,Ill, plus business/legal abbreviations likeS.p.A.,S.r.l.,S.a.s.,ecc. - Multilingual sentence terminators: Latin (
. ! ?), Western ellipsis (…), CJK (。 ! ? 。 . ;), Hindi/Devanagari (। ॥), Arabic (؟ ؛ ۔ ؏), Armenian (։ ՜ ՞), Ethiopic (։ ፧), Khmer (។ ៕), Burmese (။), Tibetan (༎ ༏).
SentenceChunker constructor accepts an optional language= argument (BCP-47 code) — Patter forwards agent.language automatically, but you can construct one directly with the language you want when wiring the chunker manually:
Phone Preamble (System Prompt Wrapper)
By default, Patter prepends a phone-friendly preamble to every agent’ssystem_prompt before sending it to the LLM. The preamble instructs the model to:
- Avoid markdown, emojis, bullet lists, and code blocks.
- Spell out numbers and dates (e.g., “two thousand twenty-six”, not
2026). - Keep replies short — phone calls reward brevity over completeness.
| Parameter | Type | Default | Description |
|---|---|---|---|
disable_phone_preamble | bool | False | When True, ship system_prompt verbatim to the LLM. When False (default), prepend the phone-friendly preamble. |
AI Disclosure
Many jurisdictions require disclosure that the caller is speaking with an AI. Patter does not automatically inject a disclosure message. Instead, use thefirst_message field on your agent configuration to include an appropriate disclosure at the start of every call:
Conversation History
All callbacks receive the full conversation history asdata.history. Each entry includes the speaker role, text content, and timestamp:
History Entry Format
on_transcript, on_message, and on_call_end callbacks.

