Skip to main content

Documentation Index

Fetch the complete documentation index at: https://patter-06b046ce-feat-observability-otel-attrs-0-6-1.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Advanced Features

Patter includes production-ready features for handling real-world telephony scenarios.

Call Recording

Enable recording on a per-call basis by passing recording=True to serve(). Recordings are created via the Twilio Recordings API.
await phone.serve(agent, port=8000, recording=True)
Call recording is available in local mode with Twilio. Recordings are managed through the Twilio Recordings API and stored in your Twilio account.

Accessing Recordings

Use the Twilio API to list and download recordings:
from twilio.rest import Client

client = Client(account_sid, auth_token)
recordings = client.recordings.list(limit=20)
for record in recordings:
    print(record.sid, record.duration)

Answering Machine Detection (AMD)

Detect whether a human or machine answered an outbound call. When a machine is detected, optionally leave a voicemail message and hang up.
# Enable AMD on outbound calls
await phone.call(
    to="+15550009876",
    agent=agent,
    machine_detection=True,
    voicemail_message="Hi, this is Acme Corp calling about your appointment. Please call us back at 555-000-1234.",
)
ParameterTypeDefaultDescription
machine_detectionboolFalseEnable answering machine detection.
voicemail_messagestr""Message to speak when a machine is detected. If empty, the call hangs up silently.

How It Works

  1. Patter initiates the outbound call with AMD enabled.
  2. The telephony provider analyzes the audio to determine if a human or machine answered.
  3. Human detected: The call proceeds normally with the agent.
  4. Machine detected: If voicemail_message is set, it is spoken and the call ends. Otherwise, the call is disconnected.
You can also set voicemail_message on serve() for use with any outbound call made during the server’s lifetime:
await phone.serve(
    agent,
    port=8000,
    voicemail_message="Hi, please call us back at your earliest convenience.",
)

DTMF Input

Keypad presses (DTMF tones) during a call are captured and forwarded to the AI agent as natural language text in the format [DTMF: N], where N is the key pressed (0-9, *, #).
User presses: 1
Agent receives: "[DTMF: 1]"

User presses: #
Agent receives: "[DTMF: #]"
No configuration is required. DTMF input is automatically handled and sent to the AI as part of the conversation transcript. The AI can interpret keypresses based on its system prompt:
agent = phone.agent(
    system_prompt="""You are an automated phone menu.
When the caller presses 1, transfer to sales.
When the caller presses 2, transfer to support.
When the caller presses 0, transfer to a human operator.""",
)

Call Transfer

Patter automatically injects a transfer_call system tool into every agent. The AI decides when to transfer based on the conversation context and system prompt instructions.
agent = phone.agent(
    system_prompt="""You are a front desk receptionist.
If the caller asks about billing, transfer them to +15550001111.
If the caller asks about technical support, transfer them to +15550002222.
If the caller asks to speak to a manager, transfer them to +15550003333.""",
)
The transfer is executed via the Twilio API as a redirect. The caller hears hold music briefly while the transfer completes.
You do not need to define transfer_call as a tool. It is injected automatically by Patter.

Barge-In (Interruption Handling)

Patter uses mark-based tracking for precise interruption handling. When a caller speaks while the agent is talking, the system:
  1. Detects the interruption via audio marks sent by the telephony provider.
  2. Stops the current TTS playback at the exact point of interruption.
  3. Processes the caller’s new input immediately.
This creates a natural conversational experience where the caller can interrupt the agent mid-sentence, just like a real phone call.

Configuration

Barge-in is enabled by default with a 300 ms hang-over window. Customize the sensitivity using barge_in_threshold_ms:
agent = phone.agent(
    system_prompt="...",
    barge_in_threshold_ms=0,  # Disable barge-in (exact interruption)
)
ParameterTypeDefaultDescription
barge_in_threshold_msint300Hang-over window in milliseconds. Set to 0 to disable barge-in. Higher values delay interruption detection.
A hang-over window of 300 ms prevents false positives from background noise while remaining responsive to genuine interruptions.

Echo Cancellation (NLMS AEC)

On speakerphone or dev-tunnel deployments the agent’s outbound TTS bleeds back into the inbound mic feed. The pipeline-mode VAD then sees continuous voice-like energy and never registers silence — barge-in only fires during natural pauses in the TTS, producing the intermittent “interrupt sometimes works, other times the agent keeps talking” symptom. Acoustic echo cancellation (AEC) subtracts the estimated echo from the mic stream before VAD/STT see it. Patter ships a built-in NLMS (normalised least-mean-squares) adaptive filter with Geigel double-talk detection. Enable it with one flag — pipeline mode only:
from getpatter import DeepgramSTT, AnthropicLLM, ElevenLabsTTS

agent = phone.agent(
    stt=DeepgramSTT(),
    llm=AnthropicLLM(),
    tts=ElevenLabsTTS(voice_id="rachel"),
    system_prompt="You are a helpful assistant.",
    echo_cancellation=True,
)
ParameterTypeDefaultDescription
echo_cancellationboolFalseWhen True (pipeline mode only), instantiates an NlmsEchoCanceller per call that subtracts the agent’s own TTS bleed from the inbound mic stream before VAD/STT see it.

When to enable

  • Enable for speakerphone callers, ngrok / Cloudflare tunnel demos, laptop-mic test harnesses, and any deployment where the agent can hear itself.
  • Leave off for handset / headset callers — there is no bleed to cancel, and the 0.5–2 s convergence period would briefly attenuate caller speech if they spoke before any TTS played.
  • See Barge-In above — AEC is the fix when barge-in only fires intermittently because of self-bleed.

Tuning

The default NlmsEchoCanceller is tuned for narrowband mono 16 kHz PCM (the format Patter’s pipeline pushes between transcoding and STT). For lower-level control — custom tap counts, step size, warmup behaviour — instantiate one directly and wire it into your pipeline:
from getpatter.audio.aec import NlmsEchoCanceller

# 8 kHz callers benefit from a longer filter window
aec = NlmsEchoCanceller(sample_rate=8000, filter_taps=1024)
Constructor argDefaultNotes
sample_rate160008000 or 16000 only.
filter_taps51232 ms @ 16 kHz — covers typical cellular / VoIP echo paths.
step_size0.1NLMS step in (0, 1] post-warmup.
warmup_step_size0.5Aggressive 5× ramp during the first ~0.5 s for fast convergence.
warmup_seconds0.5Duration of the warmup phase.
leakage0.9999Slow forgetting of stale tap estimates.
double_talk_rho0.6Geigel threshold — freezes adaptation when caller speaks over agent.
NLMS AEC adds CPU work proportional to filter_taps × frame_samples per inbound frame (~0.5–1 ms per 20 ms frame at the defaults). On commodity CPUs this is well under the per-frame budget, but profile if you stack AEC with heavy VAD + STT in the same event loop.
This is a lightweight time-domain AEC, not a drop-in replacement for production-grade DSP (WebRTC’s AEC3, Speex AEC). For tight integration with battle-tested DSP, wrap a binding to libwebrtc-audio-processing externally and feed it via audio_filter= instead.

Aggressive First-Flush (Low-Latency)

In pipeline mode, the sentence chunker normally waits for a hard sentence terminator (., !, ?, etc.) before emitting a chunk to TTS. With aggressive_first_flush=True on phone.agent(...), the chunker emits the first clause of each response on a soft punctuation boundary (,, em-dash , en-dash ) once the buffer reaches ~40 characters.
agent = phone.agent(
    stt=DeepgramSTT(),
    llm=AnthropicLLM(),
    tts=ElevenLabsTTS(voice_id="rachel"),
    system_prompt="You are a helpful assistant.",
    aggressive_first_flush=True,
)
Trade-off: Saves 200–500 ms of time-to-first-audio (TTFA) on the first sentence of each turn, at the cost of slightly clipped prosody on the very first chunk.
aggressive_first_flush is hard-disabled when language starts with "it" (Italian). Italian uses the comma as a decimal separator (12,5), so an aggressive flush would split mid-number. The flag silently has no effect for Italian agents.

Sentence chunker — abbreviations & terminators

The chunker does not split on common abbreviations (no spurious sentence breaks after Dr., vs., etc.). Coverage:
  • English: Mr, Mrs, Ms, Dr, St, Jr, Sr, Prof, Hon, Rev, vs, etc, Gen, Sen, plus the standard month/measurement set.
  • Italian: Sig, Sig.ra, Sgr, Dott, Dott.ssa, Prof, Avv, Ing, Geom, Rag, Arch, On, Egr, Spett, Gent, Ill, plus business/legal abbreviations like S.p.A., S.r.l., S.a.s., ecc.
  • Multilingual sentence terminators: Latin (. ! ?), Western ellipsis (), CJK (。 ! ? 。 . ;), Hindi/Devanagari (। ॥), Arabic (؟ ؛ ۔ ؏), Armenian (։ ՜ ՞), Ethiopic (։ ፧), Khmer (។ ៕), Burmese (), Tibetan (༎ ༏).
This means the chunker streams cleanly on multilingual responses without hand-tuning. The SentenceChunker constructor accepts an optional language= argument (BCP-47 code) — Patter forwards agent.language automatically, but you can construct one directly with the language you want when wiring the chunker manually:
from getpatter import SentenceChunker

chunker = SentenceChunker(language="it")    # uses Italian honorifics + terminators

Phone Preamble (System Prompt Wrapper)

By default, Patter prepends a phone-friendly preamble to every agent’s system_prompt before sending it to the LLM. The preamble instructs the model to:
  • Avoid markdown, emojis, bullet lists, and code blocks.
  • Spell out numbers and dates (e.g., “two thousand twenty-six”, not 2026).
  • Keep replies short — phone calls reward brevity over completeness.
Most callers benefit from this. If you ship a custom prompt that already encodes phone conventions — or you want to drive a non-voice LLM channel through the same agent — opt out:
agent = phone.agent(
    system_prompt="...",  # shipped to the LLM verbatim
    disable_phone_preamble=True,
)
ParameterTypeDefaultDescription
disable_phone_preambleboolFalseWhen True, ship system_prompt verbatim to the LLM. When False (default), prepend the phone-friendly preamble.

AI Disclosure

Many jurisdictions require disclosure that the caller is speaking with an AI. Patter does not automatically inject a disclosure message. Instead, use the first_message field on your agent configuration to include an appropriate disclosure at the start of every call:
agent = phone.agent(
    system_prompt="You are a helpful assistant.",
    first_message="Hi, this is an AI-powered assistant calling on behalf of Acme Corp. How can I help you?",
)
You are responsible for ensuring your AI disclosure complies with the regulations in your jurisdiction. Always consult legal counsel for compliance requirements.

Conversation History

All callbacks receive the full conversation history as data.history. Each entry includes the speaker role, text content, and timestamp:
async def on_transcript(event):
    history = event.get("history", [])
    for entry in history:
        print(f"[{entry['timestamp']}] {entry['role']}: {entry['text']}")

History Entry Format

{
    "role": "user",       # or "assistant"
    "text": "Hello!",
    "timestamp": 1710489601.234  # Unix float from time.time()
}
History is available in on_transcript, on_message, and on_call_end callbacks.

Complete Example

import os
import asyncio
from dotenv import load_dotenv
from getpatter import Patter, Twilio, OpenAIRealtime

load_dotenv()

phone = Patter(
    carrier=Twilio(),                               # TWILIO_* from env
    phone_number=os.environ["PHONE_NUMBER"],
    webhook_url=os.environ["WEBHOOK_URL"],
)

agent = phone.agent(
    engine=OpenAIRealtime(voice="nova"),            # OPENAI_API_KEY from env
    system_prompt="""You are an appointment reminder bot for Dr. Smith's office.

Behavior:
- Confirm the patient's identity by name and date of birth.
- Remind them of their upcoming appointment.
- If they want to reschedule, transfer to the front desk at +15550001111.
- If they press 1, confirm the appointment.
- If they press 2, cancel the appointment.

Be concise and professional.""",
    first_message="Hello! This is Dr. Smith's office calling with an appointment reminder.",
    variables={
        "patient_name": "Jane Doe",
        "appointment_date": "March 20th at 2:00 PM",
    },
)

async def on_call_start(event):
    print(f"Calling {event['callee']} for appointment reminder")

async def on_call_end(event):
    transcript = event["transcript"]
    # Save transcript to your database
    print(f"Call complete. {len(transcript)} messages exchanged.")

async def main():
    await phone.serve(
        agent,
        port=8000,
        recording=True,
        on_call_start=on_call_start,
        on_call_end=on_call_end,
    )

asyncio.run(main())