Skip to main content

Documentation Index

Fetch the complete documentation index at: https://patter-06b046ce-feat-observability-otel-attrs-0-6-1.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Agents

An agent configuration defines the personality, capabilities, and behavior of your AI voice assistant. Use phone.agent() to validate your agent config before connecting it to a phone number.

Basic Agent

The simplest form leans on env-var fallback and a default engine (OpenAIRealtime):
import { Patter, Twilio } from "getpatter";

const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });

const agent = phone.agent({
  systemPrompt: "You are a helpful customer support agent for Acme Corp.",
  firstMessage: "Hello! How can I help you today?",
});   // defaults to engine: OpenAIRealtime, reads OPENAI_API_KEY
To pick the engine explicitly (flat imports):
import { Patter, Twilio, OpenAIRealtime } from "getpatter";

const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });

const agent = phone.agent({
  engine: new OpenAIRealtime({ voice: "nova" }),
  systemPrompt: "You are a helpful customer support agent for Acme Corp.",
});
Pipeline mode (pick STT, LLM, TTS independently):
import { Patter, Twilio, DeepgramSTT, AnthropicLLM, ElevenLabsTTS } from "getpatter";

const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });

const agent = phone.agent({
  stt: new DeepgramSTT({ endpointingMs: 80 }),           // DEEPGRAM_API_KEY from env
  llm: new AnthropicLLM(),                               // ANTHROPIC_API_KEY from env
  tts: new ElevenLabsTTS({ voiceId: "rachel" }),         // ELEVENLABS_API_KEY from env
  systemPrompt: "You are a helpful assistant.",
  firstMessage: "Hi!",
});
Available LLM providers: OpenAILLM, AnthropicLLM, GroqLLM, CerebrasLLM, GoogleLLM. Tool calling works across all five. See LLM for the full reference. For fully custom logic (multi-model routing, local models), drop llm and pass an onMessage callback to serve() instead — llm and onMessage are mutually exclusive.

AgentOptions

ParameterTypeRequiredDefaultDescription
systemPromptstringYesInstructions that define the agent’s persona and behavior.
engineOpenAIRealtime | ElevenLabsConvAINodefaults to OpenAIRealtimeEnd-to-end engine. See Engines. Omit for pipeline mode.
sttSTTProviderNoSTT instance for pipeline mode (new DeepgramSTT(), new CartesiaSTT(), …).
llmLLMProviderNoLLM instance for pipeline mode (new AnthropicLLM(), new GroqLLM(), …). Mutually exclusive with onMessage on serve(). Ignored when engine is set. See LLM.
ttsTTSProviderNoTTS instance for pipeline mode (new ElevenLabsTTS(), new RimeTTS(), …).
voicestringNoProvider defaultVoice ID. Usually inferred from the engine or TTS instance.
modelstringNo"gpt-4o-mini-realtime-preview"Model ID for OpenAI Realtime. Usually inferred from the engine.
languagestringNo"en"BCP-47 language code.
firstMessagestringNoGreeting spoken when the call connects.
toolsToolDefinition[]NoFunction calling tools. See Tools.
variablesRecord<string, string>NoDynamic variables for {placeholder} substitution in systemPrompt.
guardrailsGuardrail[]NoOutput guardrails. See Guardrails.
hooksPipelineHooksNoPipeline hooks for intercepting STT/TTS processing.
textTransforms((text: string) => string)[]NoText transformation functions for LLM responses.
vadVADProviderNoVoice activity detection provider.
audioFilterAudioFilterNoAudio preprocessing filter.
backgroundAudioBackgroundAudioPlayerNoBackground audio player.
bargeInThresholdMsnumberNo300Barge-in hang-over window (ms). Set to 0 to disable.
aggressiveFirstFlushbooleanNofalseOpt-in low-latency mode: emits the first clause on soft punctuation (,, em-dash) once the buffer reaches ≥40 chars. Saves 200–500 ms TTFA. Hard-disabled when language="it" (Italian punctuation patterns are incompatible).
disablePhonePreamblebooleanNofalseWhen false (default), Patter prepends a phone-friendly preamble to systemPrompt that instructs the LLM to avoid markdown, emojis, bullet lists, and code blocks; spell out numbers and dates; and keep replies short. Set to true to ship systemPrompt verbatim.
provider'openai_realtime' | 'elevenlabs_convai' | 'pipeline'NoderivedProvider mode. Normally derived from engine / stt + tts. Pass 'pipeline' explicitly when building a pipeline-mode agent without an engine instance.

Validation Rules

The phone.agent() method validates:
  • Engine / pipeline: exactly one of engine, (stt + tts) must resolve correctly.
  • Tools: must be an array. Each tool requires a name field and either a webhookUrl or a handler.
  • Variables: must be a plain object (not an array).

Dynamic Variables

Use {placeholder} syntax in your system prompt. Variables are replaced at call time:
const agent = phone.agent({
  systemPrompt: "You are a support agent for {company}. The caller's name is {caller_name}.",
  variables: {
    company: "Acme Corp",
    caller_name: "John",
  },
});
Variables can also be overridden per-call when making outbound calls. See Features.

System Tools

Two system tools are automatically injected into every agent:
  • transfer_call — Transfers the call to a specified phone number (E.164 format).
  • end_call — Ends the current call with an optional reason.
You do not need to define these in your tools array. The AI model can invoke them based on conversation context.

Voice Activity Detection (VAD)

Pipeline-mode agents can plug a VAD provider into the vad option to gate STT around real speech and drive barge-in detection. The SDK ships Silero VAD (an ONNX model, ~1 MB) with a telephony-tuned factory:
import { SileroVAD, DeepgramSTT, AnthropicLLM, ElevenLabsTTS } from "getpatter";

// Recommended for any phone-call deployment.
const vad = await SileroVAD.forPhoneCall();

const agent = phone.agent({
  stt: new DeepgramSTT(),
  llm: new AnthropicLLM(),
  tts: new ElevenLabsTTS({ voiceId: "rachel" }),
  systemPrompt: "You are a helpful assistant.",
  vad,
});
SileroVAD.forPhoneCall(options?) is identical to SileroVAD.load(...) but pins sampleRate to 16 000 Hz — the only sample rate Patter’s pipeline-mode audio bus uses (8 kHz mulaw from Twilio is upsampled to 16 kHz PCM before reaching the VAD). All other parameters use the upstream snakers4/silero-vad defaults:
FieldDefaultUpstream equivalent
activationThreshold0.5threshold
deactivationThreshold0.35neg_threshold = threshold − 0.15
minSpeechDuration0.25 smin_speech_duration_ms = 250
minSilenceDuration0.1 smin_silence_duration_ms = 100
prefixPaddingDuration0.03 sspeech_pad_ms = 30
Override per call site rather than as a global default. A common tweak: deployments that experience truncation on natural pauses raise minSilenceDuration to 0.5–1.0 s:
const vad = await SileroVAD.forPhoneCall({ minSilenceDuration: 0.5 });
SileroVAD.forPhoneCall() returns a Promise<SileroVAD>await it once at process startup before constructing your agent. The underlying ONNX session is reused across calls.

Engine vs Pipeline Mode

import { OpenAIRealtime, ElevenLabsConvAI, DeepgramSTT, AnthropicLLM, ElevenLabsTTS } from "getpatter";

// OpenAI Realtime — end-to-end
const agent = phone.agent({
  engine: new OpenAIRealtime(),
  systemPrompt: "...",
});

// ElevenLabs Conversational AI — natural voices
const agent = phone.agent({
  engine: new ElevenLabsConvAI({ agentId: "agent_abc123" }),
  systemPrompt: "...",
});

// Pipeline — pick STT, LLM, TTS independently
const agent = phone.agent({
  stt: new DeepgramSTT(),
  llm: new AnthropicLLM(),
  tts: new ElevenLabsTTS({ voiceId: "rachel" }),
  systemPrompt: "...",
});
See LLM for a deeper comparison.