Skip to main content

Documentation Index

Fetch the complete documentation index at: https://patter-06b046ce-feat-observability-otel-attrs-0-6-1.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

What Patter does

Patter is an open-source SDK that lets an AI agent answer and make phone calls. You write the agent logic — Patter handles the telephony plumbing (phone carrier, audio streams, speech recognition, speech synthesis, barge-in, call transfer, recording). A “hello world” is four lines:
from getpatter import Patter, Twilio, OpenAIRealtime

phone = Patter(carrier=Twilio(), phone_number="+15550001234")
agent = phone.agent(engine=OpenAIRealtime(), system_prompt="You are a friendly receptionist.")
await phone.serve(agent, tunnel=True)
Call your Twilio number; the AI picks up, talks to the caller, hangs up. Everything else — audio transcoding, webhook configuration, session lifecycle — is automatic.

Anatomy of a call

When someone dials your number, audio flows through five layers. Each layer is pluggable:
┌─────────────┐
│ Caller      │  "I'd like to book an appointment"
└──────┬──────┘
       │  (phone network, PSTN)

┌─────────────┐
│ Carrier     │  Twilio or Telnyx — answers the call, opens a media stream
└──────┬──────┘
       │  (WebSocket, audio frames)

┌─────────────┐
│ Patter SDK  │  Transcodes audio, runs barge-in detection, routes to the engine
└──────┬──────┘
       │  (PCM 16 kHz)

┌─────────────┐
│ Engine      │  Speech → Text (STT) → Your LLM → Text → Speech (TTS)
└──────┬──────┘
       │  (audio back, same path in reverse)

  Caller hears the AI reply
The five building blocks you’ll touch:

1. Carrier — the phone line

The carrier puts the call on the internet. Patter ships two:
CarrierEnv varsNotes
Twilio()TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKENmulaw 8 kHz; webhook signature verification via HMAC-SHA1
Telnyx()TELNYX_API_KEY, TELNYX_CONNECTION_ID, TELNYX_PUBLIC_KEYPCM 16 kHz native; Ed25519 webhook verification
You pass one instance to Patter(carrier=...). See Carrier.

2. Engine vs pipeline — how the AI talks

Patter supports two ways to get from audio-in to audio-out: Engine mode (easiest) — a single provider does everything:
from getpatter import OpenAIRealtime, ElevenLabsConvAI

agent = phone.agent(
    engine=OpenAIRealtime(),   # STT + LLM + TTS in one WebSocket — ~500 ms latency
    system_prompt="...",
)
EngineWhat it doesBest for
OpenAIRealtime()Speech-to-speech via OpenAI’s Realtime APILowest latency, general-purpose
ElevenLabsConvAI()Managed conversational agent on ElevenLabsPremium voice quality
Pipeline mode (full control) — you pick each stage:
from getpatter import DeepgramSTT, AnthropicLLM, ElevenLabsTTS

agent = phone.agent(
    stt=DeepgramSTT(),           # speech → text
    llm=AnthropicLLM(),          # text → text (ANTHROPIC_API_KEY from env)
    tts=ElevenLabsTTS(),         # text → speech
    system_prompt="...",
)
await phone.serve(agent)
Want fully custom LLM logic (multi-model routing, local models, an internal gateway)? Drop llm= and pass an async on_message callback to serve() instead. llm= and on_message are mutually exclusive. Three independent stages — swap any of them for a different vendor or local model:
  • STT (Speech-to-Text) — transcribes caller audio in real time. Providers: DeepgramSTT, WhisperSTT, CartesiaSTT, SonioxSTT, SpeechmaticsSTT (Python-only), AssemblyAISTT. See STT.
  • LLM (Large Language Model) — generates the reply. Pass a class instance via llm=: OpenAILLM, AnthropicLLM, GroqLLM, CerebrasLLM, GoogleLLM. Tool calling works across all five. For anything else, use on_message. See LLM.
  • TTS (Text-to-Speech) — synthesizes the reply audio. Providers: ElevenLabsTTS, OpenAITTS, CartesiaTTS, RimeTTS, LMNTTTS. See TTS.
Pick engine mode when you want minimum code. Pick pipeline mode when you need a specific LLM, a custom voice, or fine-grained control over latency / costs.

3. Tools — letting the AI do things

Tools are functions the LLM can call mid-conversation — “look up order #123”, “transfer to billing”, “schedule a callback for 3pm”. When the LLM decides to use a tool, Patter runs your handler (or POSTs to a webhook) and feeds the result back into the conversation.
from getpatter import tool

@tool
async def lookup_order(order_id: str) -> dict:
    """Look up an order by ID."""
    return await db.orders.find_one({"id": order_id})

agent = phone.agent(
    engine=OpenAIRealtime(),
    system_prompt="You are a support agent.",
    tools=[lookup_order],
)
Two system tools are always available: transfer_call (move the call to a human) and end_call (hang up). See Tools.

4. Guardrails — what the AI can’t say

Guardrails run on every LLM output before it reaches TTS. They can block terms, run a custom check, or substitute a safe reply:
from getpatter import guardrail

agent = phone.agent(
    engine=OpenAIRealtime(),
    system_prompt="...",
    guardrails=[
        guardrail(name="no-medical", blocked_terms=["diagnosis", "prescription"],
                  replacement="Please consult a doctor."),
    ],
)
See Guardrails.

5. Tunnel — making your laptop reachable

A carrier needs a public HTTPS URL to deliver webhooks. In dev you don’t have one — Patter ships a built-in Cloudflare Quick Tunnel:
await phone.serve(agent, tunnel=True)   # starts cloudflared, configures Twilio webhook
In production you point webhook_url="api.yourcompany.com" at your own server and skip the tunnel entirely. See Tunneling.

Barge-in

When the caller starts speaking while the AI is mid-reply, Patter cancels the in-flight audio and routes the new input to the LLM. Tracking is mark-based (per-frame) so cancellation happens on the exact syllable that got interrupted — not at a coarse chunk boundary. Default sensitivity is 300 ms of sustained voice. Tune with agent(barge_in_threshold_ms=500) if you’re on a noisy line (speakerphone, ngrok relay).

Audio pipeline — transcoding details

The carriers don’t all speak the same format. Patter transcodes so every STT / engine sees the same PCM 16 kHz stream:
CarrierWire formatPatter does
Twiliomulaw 8 kHz (ulaw/8000)Decode to PCM 16 kHz in, encode back to mulaw out
TelnyxPCM 16 kHz (L16/16000)Passthrough
OpenAI TTS returns 24 kHz PCM — Patter resamples to 16 kHz before sending to the carrier. You never touch any of this.

What’s next

Python Quickstart

Answer a real phone call in 5 minutes.

TypeScript Quickstart

Same, but in TS.

Carrier setup

Twilio / Telnyx credentials.

Engines

OpenAI Realtime, ElevenLabs ConvAI.