Skip to main content

Documentation Index

Fetch the complete documentation index at: https://patter-06b046ce-feat-observability-otel-attrs-0-6-1.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Metrics & Cost Tracking

Patter automatically tracks cost and latency for every call, broken down by provider component (STT, TTS, LLM, telephony).

How It Works

Metrics are collected automatically during calls. When a call ends, the on_call_end callback receives a CallMetrics object with the full breakdown:
async def on_call_end(event):
    metrics = event.get("metrics")
    if metrics:
        print(f"Duration: {metrics.duration_seconds}s")
        print(f"Total cost: ${metrics.cost.total:.4f}")
        print(f"  STT: ${metrics.cost.stt:.4f}")
        print(f"  TTS: ${metrics.cost.tts:.4f}")
        print(f"  LLM: ${metrics.cost.llm:.4f}")
        print(f"  Telephony: ${metrics.cost.telephony:.4f}")
        print(f"Avg latency: {metrics.latency_avg.total_ms}ms")
        print(f"P95 latency: {metrics.latency_p95.total_ms}ms")

Cost Breakdown

The CostBreakdown object provides per-component costs in USD:
FieldDescription
sttSpeech-to-text cost (Deepgram, Whisper).
ttsText-to-speech cost (ElevenLabs, OpenAI TTS).
llmLLM cost (OpenAI Realtime tokens).
telephonyTelephony cost (Twilio, Telnyx per-minute).
totalSum of all components.

Latency Breakdown

The LatencyBreakdown object provides per-component latency in milliseconds:
FieldDescription
stt_msTime from user speech to transcript.
endpoint_msTime the endpointer waited after the last word before declaring end-of-utterance.
llm_ttft_msTime from end-of-utterance to the first LLM token.
llm_total_msTime from end-of-utterance to the last LLM token (full response).
llm_msAlias for llm_ttft_ms (kept for back-compat).
tts_msTime from first LLM token to first TTS audio byte.
tts_total_msTime from first LLM token to last TTS audio byte.
bargein_msTime from caller voice detected to TTS playback cancelled (only set on barge-in turns).
total_msEnd-to-end latency (user speech to first audio).
CallMetrics exposes the full distribution: latency_avg, latency_p50 (median / typical UX), latency_p90 (steady-state outliers), latency_p95 (SLA), and latency_p99 (cold-start outliers).

Per-Turn Metrics

Each conversation turn is tracked individually:
async def on_call_end(event):
    metrics = event.get("metrics")
    if metrics:
        for turn in metrics.turns:
            print(f"Turn {turn.turn_index}:")
            print(f"  User: {turn.user_text}")
            print(f"  Agent: {turn.agent_text}")
            print(f"  Latency: {turn.latency.total_ms}ms")

Custom Pricing

Override default provider pricing estimates:
from getpatter import Patter, Twilio

phone = Patter(
    carrier=Twilio(),
    phone_number="+15550001234",
    pricing={
        "deepgram": {"price": 0.005},      # Override STT price per minute
        "elevenlabs": {"price": 0.15},      # Override TTS price per 1k chars
        "twilio": {"price": 0.015},         # Override telephony price per minute
    },
)

PricingUnit

The pricing tables expose a PricingUnit StrEnum so overrides don’t depend on raw strings:
from getpatter.pricing import PricingUnit

PricingUnit.MINUTE          # "minute" — per minute of audio (STT, telephony)
PricingUnit.THOUSAND_CHARS  # "1k_chars" — per thousand characters synthesised (TTS)
PricingUnit.TOKEN           # "token" — per token (LLM / Realtime)
Subclassing str keeps the values JSON-serialisable and backward-compatible with code that compares against the literal strings (config.get("unit") == "minute").

Model-Aware Pricing

Patter’s pricing tables are model-aware: every entry in DEFAULT_PRICING carries provider-level defaults plus an optional models map keyed by model identifier. When the agent’s adapter exposes a model attribute, the metrics layer threads it through the cost-calc functions and the dashboard bills with model accuracy out of the box — no manual override required.
PRICING_VERSION       # "2026.3"
PRICING_LAST_UPDATED  # "2026-05-08"

How resolution works

The cost-calc helpers (calculate_stt_cost, calculate_tts_cost, calculate_realtime_cost, calculate_realtime_cached_savings) accept an optional trailing model arg. The internal _resolve_provider_rates(config, model) helper merges per-model overrides on top of provider defaults using:
  1. Exact match in the provider’s models dict.
  2. Longest-prefix matchgpt-realtime-2-2026-05-08 resolves against gpt-realtime-2.
  3. Provider defaults — fallback when the model is unknown or omitted.
CallMetricsAccumulator auto-tracks stt_model, tts_model, and realtime_model from the agent’s adapter model attribute (agent.stt.model, agent.tts.model, agent.model for Realtime). On every record_realtime_usage(usage) call the realtime model is also pulled from the response.done payload itself, overriding the call-level default — so mid-call model switches are billed correctly.
The optional model argument defaults to None, which preserves the legacy provider-rate behaviour. Existing callers compile and run unchanged.

Example A — Just select a model

The most common case: pick a model on your adapter, and Patter bills the right rate automatically.
from getpatter import Patter, Twilio
from getpatter.providers import OpenAIRealtimeAdapter, OpenAIRealtimeModel

agent = Patter.agent(
    system_prompt="You are a helpful assistant.",
    realtime=OpenAIRealtimeAdapter(model=OpenAIRealtimeModel.GPT_REALTIME_2),
)

phone = Patter(carrier=Twilio(), phone_number="+15550001234")
# Billing auto-uses the gpt-realtime-2 rate ($32/M audio in, $64/M audio out).

Example B — Override one model, keep siblings intact

merge_pricing overlays the nested models dict shallowly. Overriding a single model leaves the other rates inside the same provider untouched.
phone = Patter(
    carrier=Twilio(),
    phone_number="+15550001234",
    pricing={
        # Negotiated a discount on Nova-2 only — Nova-3 / Whisper rates stay default.
        "deepgram": {"models": {"nova-2": {"price": 0.004}}},
    },
)

Example C — Register a brand-new model rate

Add a model that isn’t in the built-in table without touching SDK source.
phone = Patter(
    carrier=Twilio(),
    phone_number="+15550001234",
    pricing={
        "elevenlabs": {
            "models": {"my_custom_voice": {"price": 0.075}},
        },
    },
)
# When agent.tts.model == "my_custom_voice", calculate_tts_cost picks up $0.075/1k.

Default Pricing (2026.3)

Provider-level defaults are listed below. Per-model rates live under DEFAULT_PRICING[provider]["models"] and are auto-resolved when the adapter exposes its model identifier.
ProviderUnitDefault Price (default model)
Deepgram (nova-3 streaming mono)per minute$0.0077
OpenAI Whisper (whisper-1)per minute$0.006
OpenAI Transcribe (gpt-4o-transcribe)per minute$0.006
AssemblyAIper minute$0.0025
Cartesia STT (ink-whisper)per minute$0.0025
Sonioxper minute$0.002
Speechmatics (Pro)per minute$0.004
ElevenLabs (eleven_flash_v2_5)per 1k chars$0.06
OpenAI TTS (tts-1)per 1k chars$0.015
Cartesia TTS (sonic-2)per 1k chars$0.030
Rime (mistv2)per 1k chars$0.030
LMNT (aurora)per 1k chars$0.050
Inworld (inworld-tts-2)per 1k chars$0.020
OpenAI Realtime (gpt-realtime-mini / gpt-4o-mini-realtime-preview)per token10/Maudioin10/M audio in · 20/M audio out · 0.60/Mtextin0.60/M text in · 2.40/M text out (cached: 0.30/Maudio0.30/M audio · 0.06/M text)
Twilio (US inbound local)per minute$0.0085 (rounded up to whole minute, per Twilio)
Telnyxper minute$0.007

STT — per-model rates

ProviderModelPrice
Deepgramnova-3 (default)$0.0077/min
Deepgramnova-3-multilingual$0.0092/min
Deepgramnova-2$0.0058/min
Deepgramnova$0.0043/min
Deepgramwhisper-large / whisper-medium$0.0048/min
OpenAI Whisperwhisper-1 (default)$0.006/min
OpenAI Whispergpt-4o-transcribe$0.006/min
OpenAI Whispergpt-4o-mini-transcribe$0.003/min
OpenAI Whispergpt-realtime-whisper$0.017/min
OpenAI Transcribe (openai_transcribe)gpt-4o-transcribe (default)$0.006/min
OpenAI Transcribegpt-4o-mini-transcribe$0.003/min
OpenAI Transcribewhisper-1$0.006/min

TTS — per-model rates

ProviderModelPrice
ElevenLabs (REST + WebSocket)eleven_flash_v2_5 (default)$0.06/1k
ElevenLabseleven_turbo_v2_5$0.05/1k
ElevenLabseleven_multilingual_v2 / eleven_monolingual_v1$0.18/1k
ElevenLabseleven_v3$0.30/1k
OpenAI TTStts-1 (default)$0.015/1k
OpenAI TTStts-1-hd$0.030/1k
OpenAI TTSgpt-4o-mini-tts$0.012/1k
Cartesiasonic-1 / sonic-2 / sonic-english / sonic-multilingual$0.030/1k
Rimemistv2 (default) / mist$0.030/1k
Rimearcana$0.040/1k
LMNTaurora (default) / blizzard$0.050/1k
Inworldinworld-tts-2 (default)$0.020/1k
Inworldinworld-tts-1.5-max / inworld-tts-1.5$0.025/1k

OpenAI Realtime — per-model rates

ModelAudio in / out (per token)Text in / out (per token)Cached audio / text (per token)
gpt-realtime-mini (default) / gpt-4o-mini-realtime-preview0.00001/0.00001 / 0.000020.0000006/0.0000006 / 0.00000240.0000003/0.0000003 / 0.00000006
gpt-realtime0.000032/0.000032 / 0.0000640.000004/0.000004 / 0.0000160.0000004/0.0000004 / 0.0000004
gpt-realtime-20.000032/0.000032 / 0.0000640.000004/0.000004 / 0.0000240.0000004/0.0000004 / 0.0000004
gpt-4o-realtime-preview0.0001/0.0001 / 0.00020.000005/0.000005 / 0.0000200.0000020/0.0000020 / 0.0000025
gpt-4o-realtime-preview is roughly 10x the cost of gpt-realtime-mini for audio. Switching realtime models has direct billing impact — confirm the model on agent.realtime.model matches the rate you expect.
Twilio defaults match US inbound local. Override pricing.twilio.price for US toll-free inbound (~0.022/min)orUSoutboundlocal( 0.022/min) or US outbound local (~0.014/min). Default pricing is based on publicly listed provider rates and may become stale — check the provider’s pricing page or pass your own overrides for authoritative numbers.

Real-Time Metrics

Use the on_metrics callback for live cost updates during a call:
async def on_metrics(data):
    cost = data.get("cost_so_far")
    if cost:
        print(f"Running cost: ${cost.total:.4f}")

await phone.serve(
    agent,
    port=8000,
    on_metrics=on_metrics,
)
The cost_so_far value is a CostBreakdown dataclass, so access its fields as attributes (e.g. cost.total, cost.stt) rather than dictionary keys.

Data Types

from getpatter import CallMetrics, CostBreakdown, LatencyBreakdown, TurnMetrics

CallMetrics

FieldTypeDescription
call_idstrUnique call identifier.
duration_secondsfloatTotal call duration.
turnstuple[TurnMetrics, ...]Per-turn metrics.
costCostBreakdownCost breakdown.
latency_avgLatencyBreakdownAverage latency.
latency_p50LatencyBreakdownMedian (50th percentile) latency.
latency_p90LatencyBreakdown90th percentile latency (steady-state outliers).
latency_p95LatencyBreakdown95th percentile latency.
latency_p99LatencyBreakdown99th percentile latency (cold-start outliers).
provider_modestrVoice mode used.
stt_providerstrSTT provider name.
tts_providerstrTTS provider name.
llm_providerstrLLM provider name.
telephony_providerstrTelephony provider name.

TurnMetrics

FieldTypeDescription
turn_indexintZero-based turn index.
user_textstrWhat the user said.
agent_textstrWhat the agent replied.
latencyLatencyBreakdownLatency for this turn.
stt_audio_secondsfloatAudio duration processed by STT.
tts_charactersintCharacters synthesized by TTS.
timestampfloatUnix timestamp.