Skip to main content

Documentation Index

Fetch the complete documentation index at: https://patter-06b046ce-feat-observability-otel-attrs-0-6-1.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Metrics & Cost Tracking

Patter automatically tracks cost and latency for every call, broken down by provider component (STT, TTS, LLM, telephony).

How It Works

Metrics are collected automatically during calls. When a call ends, the onCallEnd callback receives a CallMetrics object with the full breakdown:
await phone.serve({
  agent,
  port: 8000,
  onCallEnd: async (event) => {
    const metrics = event.metrics;
    if (metrics) {
      console.log(`Duration: ${metrics.duration_seconds}s`);
      console.log(`Total cost: $${metrics.cost.total.toFixed(4)}`);
      console.log(`  STT: $${metrics.cost.stt.toFixed(4)}`);
      console.log(`  TTS: $${metrics.cost.tts.toFixed(4)}`);
      console.log(`  LLM: $${metrics.cost.llm.toFixed(4)}`);
      console.log(`  Telephony: $${metrics.cost.telephony.toFixed(4)}`);
      console.log(`Avg latency: ${metrics.latency_avg.total_ms}ms`);
      console.log(`P95 latency: ${metrics.latency_p95.total_ms}ms`);
    }
  },
});

Cost Breakdown

The CostBreakdown object provides per-component costs in USD:
FieldDescription
sttSpeech-to-text cost (Deepgram, Whisper).
ttsText-to-speech cost (ElevenLabs, OpenAI TTS).
llmLLM cost (OpenAI Realtime tokens).
telephonyTelephony cost (Twilio, Telnyx per-minute).
totalSum of all components.

Latency Breakdown

The LatencyBreakdown object provides per-component latency in milliseconds:
FieldDescription
stt_msTime from user speech to transcript.
endpoint_msTime the endpointer waited after the last word before declaring end-of-utterance.
llm_ttft_msTime from end-of-utterance to the first LLM token.
llm_total_msTime from end-of-utterance to the last LLM token (full response).
llm_msAlias for llm_ttft_ms (kept for back-compat).
tts_msTime from first LLM token to first TTS audio byte.
tts_total_msTime from first LLM token to last TTS audio byte.
bargein_msTime from caller voice detected to TTS playback cancelled (only set on barge-in turns).
total_msEnd-to-end latency (user speech to first audio).
CallMetrics exposes the full distribution: latency_avg, latency_p50 (median / typical UX), latency_p90 (steady-state outliers), latency_p95 (SLA), and latency_p99 (cold-start outliers).

Per-Turn Metrics

Each conversation turn is tracked individually:
await phone.serve({
  agent,
  port: 8000,
  onCallEnd: async (event) => {
    const metrics = event.metrics;
    if (metrics) {
      for (const turn of metrics.turns) {
        console.log(`Turn ${turn.turn_index}:`);
        console.log(`  User: ${turn.user_text}`);
        console.log(`  Agent: ${turn.agent_text}`);
        console.log(`  Latency: ${turn.latency.total_ms}ms`);
      }
    }
  },
});

Custom Pricing

Override default provider pricing estimates:
await phone.serve({
  agent,
  port: 8000,
  pricing: {
    deepgram: { price: 0.005 },      // Override STT price per minute
    elevenlabs: { price: 0.15 },      // Override TTS price per 1k chars
    twilio: { price: 0.015 },         // Override telephony price per minute
  },
});

PricingUnit

The pricing tables expose a PricingUnit constant so overrides don’t depend on raw strings:
import { PricingUnit } from "getpatter";

PricingUnit.MINUTE;          // "minute" — per minute of audio (STT, telephony)
PricingUnit.THOUSAND_CHARS;  // "1k_chars" — per thousand characters synthesised (TTS)
PricingUnit.TOKEN;           // "token" — per token (LLM / Realtime)
Shipped as a const object plus value-union type so it is tree-shakeable. Mirrored byte-for-byte by the Python PricingUnit StrEnum.

Model-Aware Pricing

Patter’s pricing tables are model-aware: every entry in DEFAULT_PRICING carries provider-level defaults plus an optional models map keyed by model identifier. When the agent’s adapter exposes a model field, the metrics layer threads it through the cost-calc functions and the dashboard bills with model accuracy out of the box — no manual override required.
import { PRICING_VERSION, PRICING_LAST_UPDATED } from "getpatter";

PRICING_VERSION;       // "2026.3"
PRICING_LAST_UPDATED;  // "2026-05-08"

How resolution works

The cost-calc helpers (calculateSttCost, calculateTtsCost, calculateRealtimeCost, calculateRealtimeCachedSavings) accept an optional final model parameter. The exported resolveProviderRates(config, model) helper merges per-model overrides on top of provider defaults using:
  1. Exact match in the provider’s models map.
  2. Longest-prefix matchgpt-realtime-2-2026-05-08 resolves against gpt-realtime-2.
  3. Provider defaults — fallback when the model is unknown or omitted.
CallMetricsAccumulator auto-tracks sttModel, ttsModel, and realtimeModel from the agent’s adapter model field (agent.stt.model, agent.tts.model, agent.model for Realtime). On every recordRealtimeUsage(usage) call the realtime model is also pulled from the response.done payload itself, overriding the call-level default — so mid-call model switches are billed correctly.
The optional model argument defaults to undefined, which preserves the legacy provider-rate behaviour. Existing callers compile and run unchanged.

Example A — Just select a model

The most common case: pick a model on your adapter, and Patter bills the right rate automatically.
import { Patter, Twilio } from "getpatter";
import { OpenAIRealtimeAdapter, OpenAIRealtimeModel } from "getpatter";

const agent = Patter.agent({
  systemPrompt: "You are a helpful assistant.",
  realtime: new OpenAIRealtimeAdapter({ model: OpenAIRealtimeModel.GPT_REALTIME_2 }),
});

const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });
// Billing auto-uses the gpt-realtime-2 rate ($32/M audio in, $64/M audio out).

Example B — Override one model, keep siblings intact

mergePricing overlays the nested models map shallowly. Overriding a single model leaves the other rates inside the same provider untouched.
const phone = new Patter({
  carrier: new Twilio(),
  phoneNumber: "+15550001234",
  pricing: {
    // Negotiated a discount on Nova-2 only — Nova-3 / Whisper rates stay default.
    deepgram: { models: { "nova-2": { price: 0.004 } } },
  },
});

Example C — Register a brand-new model rate

Add a model that isn’t in the built-in table without touching SDK source.
const phone = new Patter({
  carrier: new Twilio(),
  phoneNumber: "+15550001234",
  pricing: {
    elevenlabs: {
      models: { my_custom_voice: { price: 0.075 } },
    },
  },
});
// When agent.tts.model === "my_custom_voice", calculateTtsCost picks up $0.075/1k.

Default Pricing (2026.3)

Provider-level defaults are listed below. Per-model rates live under DEFAULT_PRICING[provider].models and are auto-resolved when the adapter exposes its model identifier.
ProviderUnitDefault Price (default model)
Deepgram (nova-3 streaming mono)per minute$0.0077
OpenAI Whisper (whisper-1)per minute$0.006
OpenAI Transcribe (gpt-4o-transcribe)per minute$0.006
AssemblyAIper minute$0.0025
Cartesia STT (ink-whisper)per minute$0.0025
Sonioxper minute$0.002
Speechmatics (Pro)per minute$0.004
ElevenLabs (eleven_flash_v2_5)per 1k chars$0.06
OpenAI TTS (tts-1)per 1k chars$0.015
Cartesia TTS (sonic-2)per 1k chars$0.030
Rime (mistv2)per 1k chars$0.030
LMNT (aurora)per 1k chars$0.050
Inworld (inworld-tts-2)per 1k chars$0.020
OpenAI Realtime (gpt-realtime-mini / gpt-4o-mini-realtime-preview)per token10/Maudioin10/M audio in · 20/M audio out · 0.60/Mtextin0.60/M text in · 2.40/M text out (cached: 0.30/Maudio0.30/M audio · 0.06/M text)
Twilio (US inbound local)per minute$0.0085 (rounded up to whole minute, per Twilio)
Telnyxper minute$0.007

STT — per-model rates

ProviderModelPrice
Deepgramnova-3 (default)$0.0077/min
Deepgramnova-3-multilingual$0.0092/min
Deepgramnova-2$0.0058/min
Deepgramnova$0.0043/min
Deepgramwhisper-large / whisper-medium$0.0048/min
OpenAI Whisperwhisper-1 (default)$0.006/min
OpenAI Whispergpt-4o-transcribe$0.006/min
OpenAI Whispergpt-4o-mini-transcribe$0.003/min
OpenAI Whispergpt-realtime-whisper$0.017/min
OpenAI Transcribe (openai_transcribe)gpt-4o-transcribe (default)$0.006/min
OpenAI Transcribegpt-4o-mini-transcribe$0.003/min
OpenAI Transcribewhisper-1$0.006/min

TTS — per-model rates

ProviderModelPrice
ElevenLabs (REST + WebSocket)eleven_flash_v2_5 (default)$0.06/1k
ElevenLabseleven_turbo_v2_5$0.05/1k
ElevenLabseleven_multilingual_v2 / eleven_monolingual_v1$0.18/1k
ElevenLabseleven_v3$0.30/1k
OpenAI TTStts-1 (default)$0.015/1k
OpenAI TTStts-1-hd$0.030/1k
OpenAI TTSgpt-4o-mini-tts$0.012/1k
Cartesiasonic-1 / sonic-2 / sonic-english / sonic-multilingual$0.030/1k
Rimemistv2 (default) / mist$0.030/1k
Rimearcana$0.040/1k
LMNTaurora (default) / blizzard$0.050/1k
Inworldinworld-tts-2 (default)$0.020/1k
Inworldinworld-tts-1.5-max / inworld-tts-1.5$0.025/1k

OpenAI Realtime — per-model rates

ModelAudio in / out (per token)Text in / out (per token)Cached audio / text (per token)
gpt-realtime-mini (default) / gpt-4o-mini-realtime-preview0.00001/0.00001 / 0.000020.0000006/0.0000006 / 0.00000240.0000003/0.0000003 / 0.00000006
gpt-realtime0.000032/0.000032 / 0.0000640.000004/0.000004 / 0.0000160.0000004/0.0000004 / 0.0000004
gpt-realtime-20.000032/0.000032 / 0.0000640.000004/0.000004 / 0.0000240.0000004/0.0000004 / 0.0000004
gpt-4o-realtime-preview0.0001/0.0001 / 0.00020.000005/0.000005 / 0.0000200.0000020/0.0000020 / 0.0000025
gpt-4o-realtime-preview is roughly 10x the cost of gpt-realtime-mini for audio. Switching realtime models has direct billing impact — confirm the model on agent.realtime.model matches the rate you expect.
Twilio defaults match US inbound local. Override pricing.twilio.price for US toll-free inbound (~0.022/min)orUSoutboundlocal( 0.022/min) or US outbound local (~0.014/min). Default pricing is based on publicly listed provider rates and may become stale — check the provider’s pricing page or pass your own overrides for authoritative numbers.

Real-Time Metrics

Use the onMetrics callback for live cost updates during a call:
await phone.serve({
  agent,
  port: 8000,
  onMetrics: async (data) => {
    const turn = data.turn as Record<string, unknown>;
    const latency = turn.latency as Record<string, number>;
    console.log(`Call ${data.call_id} — turn ${turn.turn_index}`);
    console.log(`  Latency: ${latency.total_ms}ms`);
  },
});

Data Types

import type {
  CallMetrics,
  CostBreakdown,
  LatencyBreakdown,
  TurnMetrics,
} from "getpatter";

CallMetrics

FieldTypeDescription
call_idstringUnique call identifier.
duration_secondsnumberTotal call duration.
turnsTurnMetrics[]Per-turn metrics.
costCostBreakdownCost breakdown.
latency_avgLatencyBreakdownAverage latency.
latency_p50LatencyBreakdownMedian (50th percentile) latency.
latency_p95LatencyBreakdown95th percentile latency.
latency_p99LatencyBreakdown99th percentile latency (cold-start outliers).
provider_modestringVoice mode used.
stt_providerstringSTT provider name.
tts_providerstringTTS provider name.
llm_providerstringLLM provider name.
telephony_providerstringTelephony provider name.

TurnMetrics

FieldTypeDescription
turn_indexnumberZero-based turn index.
user_textstringWhat the user said.
agent_textstringWhat the agent replied.
latencyLatencyBreakdownLatency for this turn.
stt_audio_secondsnumberAudio duration processed by STT.
tts_charactersnumberCharacters synthesized by TTS.
timestampnumberUnix timestamp.