Documentation Index
Fetch the complete documentation index at: https://patter-06b046ce-feat-observability-otel-attrs-0-6-1.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
LLM (Voice Mode)
Patter supports two voice architectures:| Mode | How to enable | When to use |
|---|---|---|
| Engine (speech-to-speech) | phone.agent(engine=OpenAIRealtime(...)) or engine=ElevenLabsConvAI(...) | Lowest-latency speech-to-speech. A single provider handles STT + LLM + TTS. |
| Pipeline (STT + LLM + TTS) | phone.agent(stt=..., llm=..., tts=...) (omit engine=) | Full control. Mix and match providers per stage. |
llm= selector in pipeline mode.
Pipeline mode
Compose the three stages independently. Each provider reads its credentials from the environment by default.{type: "text" | "tool_call" | "done"} chunk protocol, so your tools are defined once and run everywhere.
llm= and on_message are mutually exclusive. Pass one or the other on serve() — passing both raises a clear error at serve() time. When engine= is set, llm= is ignored (with a one-time warning in the logs). If neither llm= nor on_message is passed and OPENAI_API_KEY is set, Patter auto-constructs the default OpenAI LLM loop — existing 0.5.0 code still works.Supported LLM providers
| Flat import | Namespaced import | Env var | Install extra |
|---|---|---|---|
OpenAILLM | getpatter.llm.openai.LLM | OPENAI_API_KEY | included |
AnthropicLLM | getpatter.llm.anthropic.LLM | ANTHROPIC_API_KEY | getpatter[anthropic] |
GroqLLM | getpatter.llm.groq.LLM | GROQ_API_KEY | getpatter[groq] |
CerebrasLLM | getpatter.llm.cerebras.LLM | CEREBRAS_API_KEY | getpatter[cerebras] |
GoogleLLM | getpatter.llm.google.LLM | GEMINI_API_KEY (falls back to GOOGLE_API_KEY) | getpatter[google] |
api_key: str | None = None and fall back to the listed env var when it is omitted.
OpenAILLM
OpenAI Chat Completions with streaming + tool calling. Default model"gpt-4o-mini". For other OpenAI-compatible endpoints use the dedicated wrappers (GroqLLM, CerebrasLLM) — they subclass OpenAILLMProvider with the right base_url.
AnthropicLLM
Anthropic Messages API with native streaming andtool_use blocks, normalised to Patter’s chunk protocol. Default model "claude-haiku-4-5-20251001", default max_tokens=1024 (Anthropic requires an explicit cap on every request).
Prompt caching is enabled by default — cache_control: { type: "ephemeral" } is attached to the system prompt and the last tool block, which cuts time-to-first-token on long system prompts and large tool catalogs. Pass prompt_caching=False to disable.
pip install 'getpatter[anthropic]'.
GroqLLM
Hardware-accelerated Llama inference via Groq’s OpenAI-compatible Chat Completions API athttps://api.groq.com/openai/v1. Default model "llama-3.3-70b-versatile".
pip install 'getpatter[groq]'.
CerebrasLLM
Cerebras Inference API (OpenAI-compatible) athttps://api.cerebras.ai/v1. Default model "gpt-oss-120b" — production tier, ~3000 tok/sec on WSE-3, no deprecation date. Pass model="llama3.1-8b" (8B params, sub-100ms TTFT) for the smaller free-tier alternative. The 404 model_not_found error includes a recovery hint listing other valid IDs (qwen-3-235b-a22b-instruct-2507, llama-3.3-70b on paid tier).
Supports forwarding all OpenAI-style sampling kwargs (response_format, parallel_tool_calls, tool_choice, seed, top_p, frequency_penalty, presence_penalty, stop) and optional msgpack + gzip payload compression (enabled by default) — see Cerebras payload optimization. Failures retry once with exponential backoff and honour x-ratelimit-reset-* advisory headers; terminal errors raise PatterError.
pip install 'getpatter[cerebras]'.
GoogleLLM
Google Gemini via thegoogle-genai SDK. Supports the Gemini Developer API (API key) and Vertex AI (GCP project + location). Default model "gemini-2.5-flash".
pip install 'getpatter[google]'.
Custom LLM via on_message
For cases the five built-in providers don’t cover — multi-model routing, local llama.cpp, an internal gateway, caching layers — drop llm= and plug an async on_message callback instead:
Advanced: building a custom LLM provider
Three primitives are exported from the package barrel for users who need to plug in a custom LLM or tool dispatcher:LLMChunk— the streaming-output type yielded by everyLLMProvider.stream(...)implementation. Carries either a partial text delta, a tool-call delta, or a stream-end marker.DefaultToolExecutor— the default tool dispatcher used byLLMLoop. Constructs from atools=list and resolves both Pythonhandler=callables andwebhook_url=HTTP tools. Override its hooks to swap in custom error handling, telemetry, or authentication.OpenAILLMProvider— the parent class shared byOpenAILLM,GroqLLM,CerebrasLLM. Sampling kwargs (temperature,top_p,seed,tool_choice,response_format, …) live here and are forwarded by every subclass.
What’s next
STT
STT providers for pipeline mode.
TTS
TTS providers for pipeline mode.
Tools
Function calling (works across every LLM).
Engines
Speech-to-speech engines (OpenAI Realtime, ElevenLabs ConvAI).

