Documentation Index
Fetch the complete documentation index at: https://patter-06b046ce-feat-observability-otel-attrs-0-6-1.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
What Patter does
Patter is an open-source SDK that lets an AI agent answer and make phone calls. You write the agent logic — Patter handles the telephony plumbing (phone carrier, audio streams, speech recognition, speech synthesis, barge-in, call transfer, recording). A “hello world” is four lines:Anatomy of a call
When someone dials your number, audio flows through five layers. Each layer is pluggable:1. Carrier — the phone line
The carrier puts the call on the internet. Patter ships two:| Carrier | Env vars | Notes |
|---|---|---|
Twilio() | TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN | mulaw 8 kHz; webhook signature verification via HMAC-SHA1 |
Telnyx() | TELNYX_API_KEY, TELNYX_CONNECTION_ID, TELNYX_PUBLIC_KEY | PCM 16 kHz native; Ed25519 webhook verification |
Patter(carrier=...). See Carrier.
2. Engine vs pipeline — how the AI talks
Patter supports two ways to get from audio-in to audio-out: Engine mode (easiest) — a single provider does everything:| Engine | What it does | Best for |
|---|---|---|
OpenAIRealtime() | Speech-to-speech via OpenAI’s Realtime API | Lowest latency, general-purpose |
ElevenLabsConvAI() | Managed conversational agent on ElevenLabs | Premium voice quality |
llm= and pass an async on_message callback to serve() instead. llm= and on_message are mutually exclusive.
Three independent stages — swap any of them for a different vendor or local model:
- STT (Speech-to-Text) — transcribes caller audio in real time. Providers:
DeepgramSTT,WhisperSTT,CartesiaSTT,SonioxSTT,SpeechmaticsSTT(Python-only),AssemblyAISTT. See STT. - LLM (Large Language Model) — generates the reply. Pass a class instance via
llm=:OpenAILLM,AnthropicLLM,GroqLLM,CerebrasLLM,GoogleLLM. Tool calling works across all five. For anything else, useon_message. See LLM. - TTS (Text-to-Speech) — synthesizes the reply audio. Providers:
ElevenLabsTTS,OpenAITTS,CartesiaTTS,RimeTTS,LMNTTTS. See TTS.
3. Tools — letting the AI do things
Tools are functions the LLM can call mid-conversation — “look up order #123”, “transfer to billing”, “schedule a callback for 3pm”. When the LLM decides to use a tool, Patter runs your handler (or POSTs to a webhook) and feeds the result back into the conversation.transfer_call (move the call to a human) and end_call (hang up). See Tools.
4. Guardrails — what the AI can’t say
Guardrails run on every LLM output before it reaches TTS. They can block terms, run a custom check, or substitute a safe reply:5. Tunnel — making your laptop reachable
A carrier needs a public HTTPS URL to deliver webhooks. In dev you don’t have one — Patter ships a built-in Cloudflare Quick Tunnel:webhook_url="api.yourcompany.com" at your own server and skip the tunnel entirely. See Tunneling.
Barge-in
When the caller starts speaking while the AI is mid-reply, Patter cancels the in-flight audio and routes the new input to the LLM. Tracking is mark-based (per-frame) so cancellation happens on the exact syllable that got interrupted — not at a coarse chunk boundary. Default sensitivity is 300 ms of sustained voice. Tune withagent(barge_in_threshold_ms=500) if you’re on a noisy line (speakerphone, ngrok relay).
Audio pipeline — transcoding details
The carriers don’t all speak the same format. Patter transcodes so every STT / engine sees the same PCM 16 kHz stream:| Carrier | Wire format | Patter does |
|---|---|---|
| Twilio | mulaw 8 kHz (ulaw/8000) | Decode to PCM 16 kHz in, encode back to mulaw out |
| Telnyx | PCM 16 kHz (L16/16000) | Passthrough |
What’s next
Python Quickstart
Answer a real phone call in 5 minutes.
TypeScript Quickstart
Same, but in TS.
Carrier setup
Twilio / Telnyx credentials.
Engines
OpenAI Realtime, ElevenLabs ConvAI.

