LLM (Voice Mode)

Patter supports two voice architectures:

Mode	How to enable	When to use
Engine (speech-to-speech)	`phone.agent({ engine: new OpenAIRealtime(...) })` or `engine: new ElevenLabsConvAI(...)`	Lowest-latency speech-to-speech. A single provider handles STT + LLM + TTS.
Pipeline (STT + LLM + TTS)	`phone.agent({ stt, llm, tts })` (omit `engine`)	Full control. Mix and match providers per stage.

See Engines for engine-mode reference. This page focuses on the llm selector in pipeline mode.

Pipeline mode

Compose the three stages independently. Each provider reads its credentials from the environment by default.

// npx tsx example.ts
import { Patter, Twilio, DeepgramSTT, AnthropicLLM, ElevenLabsTTS } from "getpatter";

const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });

const agent = phone.agent({
  stt: new DeepgramSTT(),                       // DEEPGRAM_API_KEY
  llm: new AnthropicLLM(),                      // ANTHROPIC_API_KEY
  tts: new ElevenLabsTTS({ voiceId: "rachel" }), // ELEVENLABS_API_KEY
  systemPrompt: "You are a helpful assistant.",
  firstMessage: "Hi!",
});

await phone.serve({ agent });

Tool calling works across every provider — each adapter normalizes its vendor-specific streaming format to Patter’s unified { type: "text" | "tool_call" | "done" } chunk protocol, so your tools are defined once and run everywhere.

llm and onMessage are mutually exclusive. Pass one or the other on serve() — passing both raises a clear error at serve() time. When engine is set, llm is ignored (with a one-time warning in the logs). If neither llm nor onMessage is passed and OPENAI_API_KEY is set, Patter auto-constructs the default OpenAI LLM loop — existing 0.5.0 code still works.

Supported LLM providers

Class	Env var	Install
`OpenAILLM`	`OPENAI_API_KEY`	included
`AnthropicLLM`	`ANTHROPIC_API_KEY`	included
`GroqLLM`	`GROQ_API_KEY`	included
`CerebrasLLM`	`CEREBRAS_API_KEY`	included
`GoogleLLM`	`GEMINI_API_KEY` (falls back to `GOOGLE_API_KEY`)	included

All classes accept an options object with apiKey?: string and fall back to the listed env var when it is omitted.

OpenAILLM

OpenAI Chat Completions with streaming + tool calling. Default model "gpt-4o-mini".

import { OpenAILLM } from "getpatter";

const llm = new OpenAILLM();                              // reads OPENAI_API_KEY
const llm2 = new OpenAILLM({ apiKey: "sk-...", model: "gpt-4o-mini" });

AnthropicLLM

Anthropic Messages API with native streaming and tool_use blocks, normalised to Patter’s chunk protocol. Default model "claude-haiku-4-5-20251001". Pass maxTokens to override the default token cap. Prompt caching is enabled by default — cache_control: { type: "ephemeral" } is attached to the system prompt and the last tool block, which cuts time-to-first-token on long system prompts and large tool catalogs. Pass promptCaching: false to disable.

import { AnthropicLLM } from "getpatter";

const llm = new AnthropicLLM();                           // reads ANTHROPIC_API_KEY
const llm2 = new AnthropicLLM({
  apiKey: "sk-ant-...",
  model: "claude-haiku-4-5-20251001",
  maxTokens: 2048,
});

GroqLLM

Hardware-accelerated Llama inference via Groq’s OpenAI-compatible Chat Completions API at https://api.groq.com/openai/v1. Default model "llama-3.3-70b-versatile".

import { GroqLLM } from "getpatter";

const llm = new GroqLLM();                                // reads GROQ_API_KEY
const llm2 = new GroqLLM({ apiKey: "gsk_...", model: "llama-3.3-70b-versatile" });

CerebrasLLM

Cerebras Inference API (OpenAI-compatible) at https://api.cerebras.ai/v1. Default model "gpt-oss-120b" — production tier, ~3000 tok/sec on WSE-3, no deprecation date. Pass model: "llama3.1-8b" for the smaller free-tier alternative. The 404 model_not_found error includes a recovery hint listing other valid IDs. Supports forwarding OpenAI-style sampling kwargs (responseFormat, parallelToolCalls, toolChoice, seed, topP, frequencyPenalty, presencePenalty, stop) and gzip request-body compression (enabled by default) — see Cerebras payload optimization. Failures retry once with exponential backoff and honour x-ratelimit-reset-* advisory headers; terminal errors throw PatterError.

import { CerebrasLLM } from "getpatter";

const llm = new CerebrasLLM();                            // reads CEREBRAS_API_KEY
const llm2 = new CerebrasLLM({
  apiKey: "csk-...",
  model: "gpt-oss-120b",                                  // default
  gzipCompression: true,                                  // defaults to true
  responseFormat: { type: "json_object" },                // OpenAI-style structured outputs
});

GoogleLLM

Google Gemini via the Developer API (streaming SSE). Default model "gemini-2.5-flash".

import { GoogleLLM } from "getpatter";

const llm = new GoogleLLM();                              // reads GEMINI_API_KEY, falls back to GOOGLE_API_KEY
const llm2 = new GoogleLLM({ apiKey: "AIza...", model: "gemini-2.5-flash" });

Custom LLM via `onMessage`

For cases the five built-in providers don’t cover — multi-model routing, local inference, an internal gateway, caching layers — drop llm and plug an async onMessage callback instead:

// npx tsx example.ts
import { Patter, Twilio, DeepgramSTT, ElevenLabsTTS } from "getpatter";

const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });

const agent = phone.agent({
  stt: new DeepgramSTT(),
  tts: new ElevenLabsTTS({ voiceId: "rachel" }),
  systemPrompt: "You are a helpful assistant.",
});

await phone.serve({
  agent,
  onMessage: async ({ text }) => {
    // Route to any model you like — local inference, a private gateway, etc.
    return `You said: ${text}. How can I help?`;
  },
});

onMessage and llm cannot be used together. Combining them raises a clear error at serve() time — pick one.

Advanced: building a custom LLM provider

Three primitives are exported from the package barrel for users who need to plug in a custom LLM or tool dispatcher:

import { LLMChunk, DefaultToolExecutor, LLMLoop, OpenAILLMProvider } from "getpatter";

LLMChunk — the streaming-output type yielded by every LLMProvider.stream(...) implementation. Carries either a partial text delta, a tool-call delta, or a stream-end marker.
DefaultToolExecutor — the default tool dispatcher used by LLMLoop. Constructs from a tools array and resolves both inline handler callables and webhookUrl HTTP tools. Override its hooks to swap in custom error handling, telemetry, or authentication.
OpenAILLMProvider — the parent class shared by OpenAILLM, GroqLLM, CerebrasLLM. Sampling options (temperature, topP, seed, toolChoice, responseFormat, …) live here and are forwarded by every subclass.
LLMLoop — the orchestration loop wiring an LLMProvider, a DefaultToolExecutor, and the streaming output back to TTS.

These are stable public symbols mirrored byte-for-byte by the Python SDK.

What’s next

STT

STT providers for pipeline mode.

TTS

TTS providers for pipeline mode.

Tools

Function calling (works across every LLM).

Engines

Speech-to-speech engines (OpenAI Realtime, ElevenLabs ConvAI).

Get Started

Setting up Patter

Observability

Integrations

Development

LLM

LLM (Voice Mode)

Pipeline mode

Supported LLM providers

OpenAILLM

AnthropicLLM

GroqLLM

CerebrasLLM

GoogleLLM

Custom LLM via `onMessage`

Advanced: building a custom LLM provider

What’s next

STT

TTS

Tools

Engines

Get Started

Setting up Patter

Observability

Integrations

Development

Documentation Index

​LLM (Voice Mode)

​Pipeline mode

​Supported LLM providers

​OpenAILLM

​AnthropicLLM

​GroqLLM

​CerebrasLLM

​GoogleLLM

​Custom LLM via onMessage

​Advanced: building a custom LLM provider

​What’s next

STT

TTS

Tools

Engines

LLM (Voice Mode)

Pipeline mode

Supported LLM providers

OpenAILLM

AnthropicLLM

GroqLLM

CerebrasLLM

GoogleLLM

Custom LLM via `onMessage`

Advanced: building a custom LLM provider

What’s next