Skip to main content

Documentation Index

Fetch the complete documentation index at: https://patter-06b046ce-feat-observability-otel-attrs-0-6-1.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Features

Advanced capabilities available in the Patter TypeScript SDK.

Call Recording

Enable call recording via the Twilio Recordings API. Recordings are stored in your Twilio account.
await phone.serve({
  agent,
  recording: true,
  onCallEnd: async (data) => {
    console.log("Recording available in your Twilio console");
  },
});
Recording is only available with Twilio. Telnyx recording is not yet supported.
The SDK sets up a /webhooks/twilio/recording endpoint to receive recording status callbacks. Recording URLs are logged to the console when they become available.

Answering Machine Detection (AMD)

Detect voicemail systems on outbound calls and optionally leave a message. AMD is available with Twilio only.
await phone.call({
  to: "+15559876543",
  agent,
  machineDetection: true,
  voicemailMessage: "Hi, this is Acme Corp calling about your appointment tomorrow. Please call us back at 555-000-1234.",
});
When AMD detects a machine (after the beep or after silence), the SDK automatically:
  1. Plays the voicemailMessage as TwiML
  2. Hangs up the call
If no voicemailMessage is set, the call proceeds normally even when a machine is detected.

DTMF Input

Keypad presses (DTMF tones) during a call are automatically forwarded to the AI agent as text in the format [DTMF: N], where N is the digit pressed (0-9, *, #).
User presses "1" → Agent receives: [DTMF: 1]
User presses "#" → Agent receives: [DTMF: #]
No additional configuration is needed. The AI agent’s system prompt can include instructions on how to handle DTMF input:
const agent = phone.agent({
  systemPrompt: `You are an IVR system. When the user presses:
- 1: Transfer to sales
- 2: Transfer to support
- 0: Transfer to operator
Respond to [DTMF: N] inputs accordingly.`,
});

Call Transfer

The transfer_call system tool is automatically available to every agent. The AI model invokes it when the caller asks to speak to a human.
const agent = phone.agent({
  systemPrompt: `You are a front desk assistant. If the caller wants to speak to a human, transfer them to +15559876543.`,
});
When the transfer tool is invoked, the SDK uses the Twilio REST API to redirect the active call to the target number.

Barge-In

Patter supports barge-in — the caller can interrupt the AI agent while it is speaking. The SDK uses mark-based audio tracking to detect when the caller starts speaking during AI playback. When barge-in occurs:
  1. The current TTS audio is immediately stopped
  2. The caller’s speech is processed normally
  3. The AI generates a new response

Configuration

Barge-in is enabled by default with a 300 ms hang-over window. Customize the sensitivity using bargeInThresholdMs:
const agent = phone.agent({
  systemPrompt: "...",
  bargeInThresholdMs: 0, // Disable barge-in (exact interruption)
});
ParameterTypeDefaultDescription
bargeInThresholdMsnumber300Hang-over window in milliseconds. Set to 0 to disable barge-in. Higher values delay interruption detection.
A hang-over window of 300 ms prevents false positives from background noise while remaining responsive to genuine interruptions.

Echo Cancellation (NLMS AEC)

On speakerphone or dev-tunnel deployments the agent’s outbound TTS bleeds back into the inbound mic feed. The pipeline-mode VAD then sees continuous voice-like energy and never registers silence — barge-in only fires during natural pauses in the TTS, producing the intermittent “interrupt sometimes works, other times the agent keeps talking” symptom. Acoustic echo cancellation (AEC) subtracts the estimated echo from the mic stream before VAD/STT see it. Patter ships a built-in NLMS (normalised least-mean-squares) adaptive filter with Geigel double-talk detection. Enable it with one flag — pipeline mode only:
import { DeepgramSTT, AnthropicLLM, ElevenLabsTTS } from "getpatter";

const agent = phone.agent({
  stt: new DeepgramSTT(),
  llm: new AnthropicLLM(),
  tts: new ElevenLabsTTS({ voiceId: "rachel" }),
  systemPrompt: "You are a helpful assistant.",
  echoCancellation: true,
});
ParameterTypeDefaultDescription
echoCancellationbooleanfalseWhen true (pipeline mode only), instantiates an NlmsEchoCanceller per call that subtracts the agent’s own TTS bleed from the inbound mic stream before VAD/STT see it.

When to enable

  • Enable for speakerphone callers, ngrok / Cloudflare tunnel demos, laptop-mic test harnesses, and any deployment where the agent can hear itself.
  • Leave off for handset / headset callers — there is no bleed to cancel, and the 0.5–2 s convergence period would briefly attenuate caller speech if they spoke before any TTS played.
  • See Barge-In above — AEC is the fix when barge-in only fires intermittently because of self-bleed.

Tuning

The default NlmsEchoCanceller is tuned for narrowband mono 16 kHz PCM (the format Patter’s pipeline pushes between transcoding and STT). For lower-level control — custom tap counts, step size, warmup behaviour — instantiate one directly:
import { NlmsEchoCanceller } from "getpatter/audio/aec";

// 8 kHz callers benefit from a longer filter window
const aec = new NlmsEchoCanceller({ sampleRate: 8000, filterTaps: 1024 });
Constructor optionDefaultNotes
sampleRate160008000 or 16000 only.
filterTaps51232 ms @ 16 kHz — covers typical cellular / VoIP echo paths.
stepSize0.1NLMS step in (0, 1] post-warmup.
warmupStepSize0.5Aggressive 5× ramp during the first ~0.5 s for fast convergence.
warmupSeconds0.5Duration of the warmup phase.
leakage0.9999Slow forgetting of stale tap estimates.
doubleTalkRho0.6Geigel threshold — freezes adaptation when caller speaks over agent.
NLMS AEC adds CPU work proportional to filterTaps × frameSamples per inbound frame (~0.5–1 ms per 20 ms frame at the defaults). On commodity CPUs this is well under the per-frame budget, but profile if you stack AEC with heavy VAD + STT in the same event loop.
This is a lightweight time-domain AEC, not a drop-in replacement for production-grade DSP (WebRTC’s AEC3, Speex AEC). For tight integration with battle-tested DSP, wrap a binding externally and feed it via audioFilter instead.

Aggressive First-Flush (Low-Latency)

In pipeline mode, the sentence chunker normally waits for a hard sentence terminator (., !, ?, etc.) before emitting a chunk to TTS. With aggressiveFirstFlush: true on phone.agent({ ... }), the chunker emits the first clause of each response on a soft punctuation boundary (,, em-dash , en-dash ) once the buffer reaches ~40 characters.
const agent = phone.agent({
  stt: new DeepgramSTT(),
  llm: new AnthropicLLM(),
  tts: new ElevenLabsTTS({ voiceId: "rachel" }),
  systemPrompt: "You are a helpful assistant.",
  aggressiveFirstFlush: true,
});
Trade-off: Saves 200–500 ms of time-to-first-audio (TTFA) on the first sentence of each turn, at the cost of slightly clipped prosody on the very first chunk.
aggressiveFirstFlush is hard-disabled when language starts with "it" (Italian). Italian uses the comma as a decimal separator (12,5), so an aggressive flush would split mid-number. The flag silently has no effect for Italian agents.

Phone Preamble (System Prompt Wrapper)

By default, Patter prepends a phone-friendly preamble to every agent’s systemPrompt before sending it to the LLM. The preamble instructs the model to:
  • Avoid markdown, emojis, bullet lists, and code blocks.
  • Spell out numbers and dates (e.g., “two thousand twenty-six”, not 2026).
  • Keep replies short — phone calls reward brevity over completeness.
Most callers benefit from this. If you ship a custom prompt that already encodes phone conventions — or you want to drive a non-voice LLM channel through the same agent — opt out:
const agent = phone.agent({
  systemPrompt: "...",  // shipped to the LLM verbatim
  disablePhonePreamble: true,
});
ParameterTypeDefaultDescription
disablePhonePreamblebooleanfalseWhen true, ship systemPrompt verbatim to the LLM. When false (default), prepend the phone-friendly preamble.

Dynamic Variables

Use {placeholder} syntax in system prompts for per-call customization:
const agent = phone.agent({
  systemPrompt: "You are a support agent for {company}. The caller is {caller_name} with account {account_id}.",
  variables: {
    company: "Acme Corp",
    caller_name: "Default",
    account_id: "unknown",
  },
});

Per-Call Variable Override

Override agent-level variables for individual outbound calls:
await phone.call({
  to: "+15559876543",
  agent,
  variables: {
    caller_name: "Jane Smith",
    account_id: "ACC-12345",
  },
});
Call-level variables are merged with agent-level variables, with call-level taking precedence.
Variables are sanitized before substitution. Keys like __proto__, constructor, and prototype are stripped to prevent prototype pollution.

Conversation History

Every call maintains a conversation history that accumulates throughout the call. The history is:
  • Passed to onMessage callbacks as data.history
  • Included in onCallEnd as data.transcript
  • Capped at 200 entries per call (oldest entries are dropped when the limit is reached)
Each entry contains:
{
  role: string;       // "user" or "assistant"
  text: string;       // Transcript text
  timestamp: number;  // Unix timestamp (ms)
}

AI Disclosure

Patter does not automatically play an AI disclosure message. If your jurisdiction requires callers to be informed they are speaking with an AI, include a disclosure in your agent’s firstMessage:
const agent = phone.agent({
  systemPrompt: "You are a helpful assistant.",
  firstMessage: "Hi, this is an AI assistant from Acme Corp. How can I help you today?",
});
The firstMessage is spoken as soon as the call connects, before the caller says anything. This is the recommended place for any legally required AI disclosure.

Max Call Duration

As a safety measure, calls are automatically terminated after 1 hour (60 minutes). This prevents runaway billing from calls that are accidentally left open. When the limit is reached:
  1. The SDK logs a warning: Call {callId} hit max duration (60min), terminating
  2. The call is hung up via the telephony provider API
This limit is not configurable and applies to all calls.

Outbound Calls

Make outbound calls in local mode:
// Twilio
await phone.call({
  to: "+15559876543",
  agent,
});

// With AMD and voicemail
await phone.call({
  to: "+15559876543",
  agent,
  machineDetection: true,
  voicemailMessage: "Please call us back.",
});

LocalCallOptions

ParameterTypeRequiredDefaultDescription
tostringYesDestination phone number (E.164 format).
agentAgentOptionsYesAgent configuration for the call.
machineDetectionbooleanNofalseEnable AMD (Twilio only).
voicemailMessagestringNoMessage to leave on voicemail. Requires machineDetection: true.
variablesRecord<string, string>NoPer-call variable overrides merged into agent.variables.
The to parameter must be in E.164 format (e.g., +15559876543). The SDK validates this and throws if the format is invalid.