Documentation Index
Fetch the complete documentation index at: https://patter-06b046ce-feat-observability-otel-attrs-0-6-1.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Metrics & Cost Tracking
Patter automatically tracks cost and latency for every call, broken down by provider component (STT, TTS, LLM, telephony).
How It Works
Metrics are collected automatically during calls. When a call ends, the onCallEnd callback receives a CallMetrics object with the full breakdown:
await phone.serve({
agent,
port: 8000,
onCallEnd: async (event) => {
const metrics = event.metrics;
if (metrics) {
console.log(`Duration: ${metrics.duration_seconds}s`);
console.log(`Total cost: $${metrics.cost.total.toFixed(4)}`);
console.log(` STT: $${metrics.cost.stt.toFixed(4)}`);
console.log(` TTS: $${metrics.cost.tts.toFixed(4)}`);
console.log(` LLM: $${metrics.cost.llm.toFixed(4)}`);
console.log(` Telephony: $${metrics.cost.telephony.toFixed(4)}`);
console.log(`Avg latency: ${metrics.latency_avg.total_ms}ms`);
console.log(`P95 latency: ${metrics.latency_p95.total_ms}ms`);
}
},
});
Cost Breakdown
The CostBreakdown object provides per-component costs in USD:
| Field | Description |
|---|
stt | Speech-to-text cost (Deepgram, Whisper). |
tts | Text-to-speech cost (ElevenLabs, OpenAI TTS). |
llm | LLM cost (OpenAI Realtime tokens). |
telephony | Telephony cost (Twilio, Telnyx per-minute). |
total | Sum of all components. |
Latency Breakdown
The LatencyBreakdown object provides per-component latency in milliseconds:
| Field | Description |
|---|
stt_ms | Time from user speech to transcript. |
endpoint_ms | Time the endpointer waited after the last word before declaring end-of-utterance. |
llm_ttft_ms | Time from end-of-utterance to the first LLM token. |
llm_total_ms | Time from end-of-utterance to the last LLM token (full response). |
llm_ms | Alias for llm_ttft_ms (kept for back-compat). |
tts_ms | Time from first LLM token to first TTS audio byte. |
tts_total_ms | Time from first LLM token to last TTS audio byte. |
bargein_ms | Time from caller voice detected to TTS playback cancelled (only set on barge-in turns). |
total_ms | End-to-end latency (user speech to first audio). |
CallMetrics exposes the full distribution: latency_avg, latency_p50 (median / typical UX), latency_p90 (steady-state outliers), latency_p95 (SLA), and latency_p99 (cold-start outliers).
Per-Turn Metrics
Each conversation turn is tracked individually:
await phone.serve({
agent,
port: 8000,
onCallEnd: async (event) => {
const metrics = event.metrics;
if (metrics) {
for (const turn of metrics.turns) {
console.log(`Turn ${turn.turn_index}:`);
console.log(` User: ${turn.user_text}`);
console.log(` Agent: ${turn.agent_text}`);
console.log(` Latency: ${turn.latency.total_ms}ms`);
}
}
},
});
Custom Pricing
Override default provider pricing estimates:
await phone.serve({
agent,
port: 8000,
pricing: {
deepgram: { price: 0.005 }, // Override STT price per minute
elevenlabs: { price: 0.15 }, // Override TTS price per 1k chars
twilio: { price: 0.015 }, // Override telephony price per minute
},
});
PricingUnit
The pricing tables expose a PricingUnit constant so overrides don’t depend on raw strings:
import { PricingUnit } from "getpatter";
PricingUnit.MINUTE; // "minute" — per minute of audio (STT, telephony)
PricingUnit.THOUSAND_CHARS; // "1k_chars" — per thousand characters synthesised (TTS)
PricingUnit.TOKEN; // "token" — per token (LLM / Realtime)
Shipped as a const object plus value-union type so it is tree-shakeable. Mirrored byte-for-byte by the Python PricingUnit StrEnum.
Model-Aware Pricing
Patter’s pricing tables are model-aware: every entry in DEFAULT_PRICING carries provider-level defaults plus an optional models map keyed by model identifier. When the agent’s adapter exposes a model field, the metrics layer threads it through the cost-calc functions and the dashboard bills with model accuracy out of the box — no manual override required.
import { PRICING_VERSION, PRICING_LAST_UPDATED } from "getpatter";
PRICING_VERSION; // "2026.3"
PRICING_LAST_UPDATED; // "2026-05-08"
How resolution works
The cost-calc helpers (calculateSttCost, calculateTtsCost, calculateRealtimeCost, calculateRealtimeCachedSavings) accept an optional final model parameter. The exported resolveProviderRates(config, model) helper merges per-model overrides on top of provider defaults using:
- Exact match in the provider’s
models map.
- Longest-prefix match —
gpt-realtime-2-2026-05-08 resolves against gpt-realtime-2.
- Provider defaults — fallback when the model is unknown or omitted.
CallMetricsAccumulator auto-tracks sttModel, ttsModel, and realtimeModel from the agent’s adapter model field (agent.stt.model, agent.tts.model, agent.model for Realtime). On every recordRealtimeUsage(usage) call the realtime model is also pulled from the response.done payload itself, overriding the call-level default — so mid-call model switches are billed correctly.
The optional model argument defaults to undefined, which preserves the legacy provider-rate behaviour. Existing callers compile and run unchanged.
Example A — Just select a model
The most common case: pick a model on your adapter, and Patter bills the right rate automatically.
import { Patter, Twilio } from "getpatter";
import { OpenAIRealtimeAdapter, OpenAIRealtimeModel } from "getpatter";
const agent = Patter.agent({
systemPrompt: "You are a helpful assistant.",
realtime: new OpenAIRealtimeAdapter({ model: OpenAIRealtimeModel.GPT_REALTIME_2 }),
});
const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });
// Billing auto-uses the gpt-realtime-2 rate ($32/M audio in, $64/M audio out).
Example B — Override one model, keep siblings intact
mergePricing overlays the nested models map shallowly. Overriding a single model leaves the other rates inside the same provider untouched.
const phone = new Patter({
carrier: new Twilio(),
phoneNumber: "+15550001234",
pricing: {
// Negotiated a discount on Nova-2 only — Nova-3 / Whisper rates stay default.
deepgram: { models: { "nova-2": { price: 0.004 } } },
},
});
Example C — Register a brand-new model rate
Add a model that isn’t in the built-in table without touching SDK source.
const phone = new Patter({
carrier: new Twilio(),
phoneNumber: "+15550001234",
pricing: {
elevenlabs: {
models: { my_custom_voice: { price: 0.075 } },
},
},
});
// When agent.tts.model === "my_custom_voice", calculateTtsCost picks up $0.075/1k.
Default Pricing (2026.3)
Provider-level defaults are listed below. Per-model rates live under DEFAULT_PRICING[provider].models and are auto-resolved when the adapter exposes its model identifier.
| Provider | Unit | Default Price (default model) |
|---|
Deepgram (nova-3 streaming mono) | per minute | $0.0077 |
OpenAI Whisper (whisper-1) | per minute | $0.006 |
OpenAI Transcribe (gpt-4o-transcribe) | per minute | $0.006 |
| AssemblyAI | per minute | $0.0025 |
| Cartesia STT (ink-whisper) | per minute | $0.0025 |
| Soniox | per minute | $0.002 |
| Speechmatics (Pro) | per minute | $0.004 |
ElevenLabs (eleven_flash_v2_5) | per 1k chars | $0.06 |
OpenAI TTS (tts-1) | per 1k chars | $0.015 |
Cartesia TTS (sonic-2) | per 1k chars | $0.030 |
Rime (mistv2) | per 1k chars | $0.030 |
LMNT (aurora) | per 1k chars | $0.050 |
Inworld (inworld-tts-2) | per 1k chars | $0.020 |
OpenAI Realtime (gpt-realtime-mini / gpt-4o-mini-realtime-preview) | per token | 10/Maudioin⋅20/M audio out · 0.60/Mtextin⋅2.40/M text out (cached: 0.30/Maudio⋅0.06/M text) |
| Twilio (US inbound local) | per minute | $0.0085 (rounded up to whole minute, per Twilio) |
| Telnyx | per minute | $0.007 |
STT — per-model rates
| Provider | Model | Price |
|---|
| Deepgram | nova-3 (default) | $0.0077/min |
| Deepgram | nova-3-multilingual | $0.0092/min |
| Deepgram | nova-2 | $0.0058/min |
| Deepgram | nova | $0.0043/min |
| Deepgram | whisper-large / whisper-medium | $0.0048/min |
| OpenAI Whisper | whisper-1 (default) | $0.006/min |
| OpenAI Whisper | gpt-4o-transcribe | $0.006/min |
| OpenAI Whisper | gpt-4o-mini-transcribe | $0.003/min |
| OpenAI Whisper | gpt-realtime-whisper | $0.017/min |
OpenAI Transcribe (openai_transcribe) | gpt-4o-transcribe (default) | $0.006/min |
| OpenAI Transcribe | gpt-4o-mini-transcribe | $0.003/min |
| OpenAI Transcribe | whisper-1 | $0.006/min |
TTS — per-model rates
| Provider | Model | Price |
|---|
| ElevenLabs (REST + WebSocket) | eleven_flash_v2_5 (default) | $0.06/1k |
| ElevenLabs | eleven_turbo_v2_5 | $0.05/1k |
| ElevenLabs | eleven_multilingual_v2 / eleven_monolingual_v1 | $0.18/1k |
| ElevenLabs | eleven_v3 | $0.30/1k |
| OpenAI TTS | tts-1 (default) | $0.015/1k |
| OpenAI TTS | tts-1-hd | $0.030/1k |
| OpenAI TTS | gpt-4o-mini-tts | $0.012/1k |
| Cartesia | sonic-1 / sonic-2 / sonic-english / sonic-multilingual | $0.030/1k |
| Rime | mistv2 (default) / mist | $0.030/1k |
| Rime | arcana | $0.040/1k |
| LMNT | aurora (default) / blizzard | $0.050/1k |
| Inworld | inworld-tts-2 (default) | $0.020/1k |
| Inworld | inworld-tts-1.5-max / inworld-tts-1.5 | $0.025/1k |
OpenAI Realtime — per-model rates
| Model | Audio in / out (per token) | Text in / out (per token) | Cached audio / text (per token) |
|---|
gpt-realtime-mini (default) / gpt-4o-mini-realtime-preview | 0.00001/0.00002 | 0.0000006/0.0000024 | 0.0000003/0.00000006 |
gpt-realtime | 0.000032/0.000064 | 0.000004/0.000016 | 0.0000004/0.0000004 |
gpt-realtime-2 | 0.000032/0.000064 | 0.000004/0.000024 | 0.0000004/0.0000004 |
gpt-4o-realtime-preview | 0.0001/0.0002 | 0.000005/0.000020 | 0.0000020/0.0000025 |
gpt-4o-realtime-preview is roughly 10x the cost of gpt-realtime-mini for audio. Switching realtime models has direct billing impact — confirm the model on agent.realtime.model matches the rate you expect.
Twilio defaults match US inbound local. Override pricing.twilio.price for US toll-free inbound (~0.022/min)orUSoutboundlocal( 0.014/min). Default pricing is based on publicly listed provider rates and may become stale — check the provider’s pricing page or pass your own overrides for authoritative numbers.
Real-Time Metrics
Use the onMetrics callback for live cost updates during a call:
await phone.serve({
agent,
port: 8000,
onMetrics: async (data) => {
const turn = data.turn as Record<string, unknown>;
const latency = turn.latency as Record<string, number>;
console.log(`Call ${data.call_id} — turn ${turn.turn_index}`);
console.log(` Latency: ${latency.total_ms}ms`);
},
});
Data Types
import type {
CallMetrics,
CostBreakdown,
LatencyBreakdown,
TurnMetrics,
} from "getpatter";
CallMetrics
| Field | Type | Description |
|---|
call_id | string | Unique call identifier. |
duration_seconds | number | Total call duration. |
turns | TurnMetrics[] | Per-turn metrics. |
cost | CostBreakdown | Cost breakdown. |
latency_avg | LatencyBreakdown | Average latency. |
latency_p50 | LatencyBreakdown | Median (50th percentile) latency. |
latency_p95 | LatencyBreakdown | 95th percentile latency. |
latency_p99 | LatencyBreakdown | 99th percentile latency (cold-start outliers). |
provider_mode | string | Voice mode used. |
stt_provider | string | STT provider name. |
tts_provider | string | TTS provider name. |
llm_provider | string | LLM provider name. |
telephony_provider | string | Telephony provider name. |
TurnMetrics
| Field | Type | Description |
|---|
turn_index | number | Zero-based turn index. |
user_text | string | What the user said. |
agent_text | string | What the agent replied. |
latency | LatencyBreakdown | Latency for this turn. |
stt_audio_seconds | number | Audio duration processed by STT. |
tts_characters | number | Characters synthesized by TTS. |
timestamp | number | Unix timestamp. |