Documentation Index
Fetch the complete documentation index at: https://patter-06b046ce-feat-observability-otel-attrs-0-6-1.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
LLM (Voice Mode)
Patter supports two voice architectures:| Mode | How to enable | When to use |
|---|---|---|
| Engine (speech-to-speech) | phone.agent({ engine: new OpenAIRealtime(...) }) or engine: new ElevenLabsConvAI(...) | Lowest-latency speech-to-speech. A single provider handles STT + LLM + TTS. |
| Pipeline (STT + LLM + TTS) | phone.agent({ stt, llm, tts }) (omit engine) | Full control. Mix and match providers per stage. |
llm selector in pipeline mode.
Pipeline mode
Compose the three stages independently. Each provider reads its credentials from the environment by default.{ type: "text" | "tool_call" | "done" } chunk protocol, so your tools are defined once and run everywhere.
llm and onMessage are mutually exclusive. Pass one or the other on serve() — passing both raises a clear error at serve() time. When engine is set, llm is ignored (with a one-time warning in the logs). If neither llm nor onMessage is passed and OPENAI_API_KEY is set, Patter auto-constructs the default OpenAI LLM loop — existing 0.5.0 code still works.Supported LLM providers
| Class | Env var | Install |
|---|---|---|
OpenAILLM | OPENAI_API_KEY | included |
AnthropicLLM | ANTHROPIC_API_KEY | included |
GroqLLM | GROQ_API_KEY | included |
CerebrasLLM | CEREBRAS_API_KEY | included |
GoogleLLM | GEMINI_API_KEY (falls back to GOOGLE_API_KEY) | included |
apiKey?: string and fall back to the listed env var when it is omitted.
OpenAILLM
OpenAI Chat Completions with streaming + tool calling. Default model"gpt-4o-mini".
AnthropicLLM
Anthropic Messages API with native streaming andtool_use blocks, normalised to Patter’s chunk protocol. Default model "claude-haiku-4-5-20251001". Pass maxTokens to override the default token cap.
Prompt caching is enabled by default — cache_control: { type: "ephemeral" } is attached to the system prompt and the last tool block, which cuts time-to-first-token on long system prompts and large tool catalogs. Pass promptCaching: false to disable.
GroqLLM
Hardware-accelerated Llama inference via Groq’s OpenAI-compatible Chat Completions API athttps://api.groq.com/openai/v1. Default model "llama-3.3-70b-versatile".
CerebrasLLM
Cerebras Inference API (OpenAI-compatible) athttps://api.cerebras.ai/v1. Default model "gpt-oss-120b" — production tier, ~3000 tok/sec on WSE-3, no deprecation date. Pass model: "llama3.1-8b" for the smaller free-tier alternative. The 404 model_not_found error includes a recovery hint listing other valid IDs.
Supports forwarding OpenAI-style sampling kwargs (responseFormat, parallelToolCalls, toolChoice, seed, topP, frequencyPenalty, presencePenalty, stop) and gzip request-body compression (enabled by default) — see Cerebras payload optimization. Failures retry once with exponential backoff and honour x-ratelimit-reset-* advisory headers; terminal errors throw PatterError.
GoogleLLM
Google Gemini via the Developer API (streaming SSE). Default model"gemini-2.5-flash".
Custom LLM via onMessage
For cases the five built-in providers don’t cover — multi-model routing, local inference, an internal gateway, caching layers — drop llm and plug an async onMessage callback instead:
Advanced: building a custom LLM provider
Three primitives are exported from the package barrel for users who need to plug in a custom LLM or tool dispatcher:LLMChunk— the streaming-output type yielded by everyLLMProvider.stream(...)implementation. Carries either a partial text delta, a tool-call delta, or a stream-end marker.DefaultToolExecutor— the default tool dispatcher used byLLMLoop. Constructs from atoolsarray and resolves both inlinehandlercallables andwebhookUrlHTTP tools. Override its hooks to swap in custom error handling, telemetry, or authentication.OpenAILLMProvider— the parent class shared byOpenAILLM,GroqLLM,CerebrasLLM. Sampling options (temperature,topP,seed,toolChoice,responseFormat, …) live here and are forwarded by every subclass.LLMLoop— the orchestration loop wiring anLLMProvider, aDefaultToolExecutor, and the streaming output back to TTS.
What’s next
STT
STT providers for pipeline mode.
TTS
TTS providers for pipeline mode.
Tools
Function calling (works across every LLM).
Engines
Speech-to-speech engines (OpenAI Realtime, ElevenLabs ConvAI).

