Metrics & Cost Tracking

Patter automatically tracks cost and latency for every call, broken down by provider component (STT, TTS, LLM, telephony).

How It Works

Metrics are collected automatically during calls. When a call ends, the on_call_end callback receives a CallMetrics object with the full breakdown:

async def on_call_end(event):
    metrics = event.get("metrics")
    if metrics:
        print(f"Duration: {metrics.duration_seconds}s")
        print(f"Total cost: ${metrics.cost.total:.4f}")
        print(f"  STT: ${metrics.cost.stt:.4f}")
        print(f"  TTS: ${metrics.cost.tts:.4f}")
        print(f"  LLM: ${metrics.cost.llm:.4f}")
        print(f"  Telephony: ${metrics.cost.telephony:.4f}")
        print(f"Avg latency: {metrics.latency_avg.total_ms}ms")
        print(f"P95 latency: {metrics.latency_p95.total_ms}ms")

Cost Breakdown

The CostBreakdown object provides per-component costs in USD:

Field	Description
`stt`	Speech-to-text cost (Deepgram, Whisper).
`tts`	Text-to-speech cost (ElevenLabs, OpenAI TTS).
`llm`	LLM cost (OpenAI Realtime tokens).
`telephony`	Telephony cost (Twilio, Telnyx per-minute).
`total`	Sum of all components.

Latency Breakdown

The LatencyBreakdown object provides per-component latency in milliseconds:

Field	Description
`stt_ms`	Time from user speech to transcript.
`endpoint_ms`	Time the endpointer waited after the last word before declaring end-of-utterance.
`llm_ttft_ms`	Time from end-of-utterance to the first LLM token.
`llm_total_ms`	Time from end-of-utterance to the last LLM token (full response).
`llm_ms`	Alias for `llm_ttft_ms` (kept for back-compat).
`tts_ms`	Time from first LLM token to first TTS audio byte.
`tts_total_ms`	Time from first LLM token to last TTS audio byte.
`bargein_ms`	Time from caller voice detected to TTS playback cancelled (only set on barge-in turns).
`total_ms`	End-to-end latency (user speech to first audio).

CallMetrics exposes the full distribution: latency_avg, latency_p50 (median / typical UX), latency_p90 (steady-state outliers), latency_p95 (SLA), and latency_p99 (cold-start outliers).

Per-Turn Metrics

Each conversation turn is tracked individually:

async def on_call_end(event):
    metrics = event.get("metrics")
    if metrics:
        for turn in metrics.turns:
            print(f"Turn {turn.turn_index}:")
            print(f"  User: {turn.user_text}")
            print(f"  Agent: {turn.agent_text}")
            print(f"  Latency: {turn.latency.total_ms}ms")

Custom Pricing

Override default provider pricing estimates:

from getpatter import Patter, Twilio

phone = Patter(
    carrier=Twilio(),
    phone_number="+15550001234",
    pricing={
        "deepgram": {"price": 0.005},      # Override STT price per minute
        "elevenlabs": {"price": 0.15},      # Override TTS price per 1k chars
        "twilio": {"price": 0.015},         # Override telephony price per minute
    },
)

PricingUnit

The pricing tables expose a PricingUnit StrEnum so overrides don’t depend on raw strings:

from getpatter.pricing import PricingUnit

PricingUnit.MINUTE          # "minute" — per minute of audio (STT, telephony)
PricingUnit.THOUSAND_CHARS  # "1k_chars" — per thousand characters synthesised (TTS)
PricingUnit.TOKEN           # "token" — per token (LLM / Realtime)

Subclassing str keeps the values JSON-serialisable and backward-compatible with code that compares against the literal strings (config.get("unit") == "minute").

Model-Aware Pricing

Patter’s pricing tables are model-aware: every entry in DEFAULT_PRICING carries provider-level defaults plus an optional models map keyed by model identifier. When the agent’s adapter exposes a model attribute, the metrics layer threads it through the cost-calc functions and the dashboard bills with model accuracy out of the box — no manual override required.

PRICING_VERSION       # "2026.3"
PRICING_LAST_UPDATED  # "2026-05-08"

How resolution works

The cost-calc helpers (calculate_stt_cost, calculate_tts_cost, calculate_realtime_cost, calculate_realtime_cached_savings) accept an optional trailing model arg. The internal _resolve_provider_rates(config, model) helper merges per-model overrides on top of provider defaults using:

Exact match in the provider’s models dict.
Longest-prefix match — gpt-realtime-2-2026-05-08 resolves against gpt-realtime-2.
Provider defaults — fallback when the model is unknown or omitted.

CallMetricsAccumulator auto-tracks stt_model, tts_model, and realtime_model from the agent’s adapter model attribute (agent.stt.model, agent.tts.model, agent.model for Realtime). On every record_realtime_usage(usage) call the realtime model is also pulled from the response.done payload itself, overriding the call-level default — so mid-call model switches are billed correctly.

The optional model argument defaults to None, which preserves the legacy provider-rate behaviour. Existing callers compile and run unchanged.

Example A — Just select a model

The most common case: pick a model on your adapter, and Patter bills the right rate automatically.

from getpatter import Patter, Twilio
from getpatter.providers import OpenAIRealtimeAdapter, OpenAIRealtimeModel

agent = Patter.agent(
    system_prompt="You are a helpful assistant.",
    realtime=OpenAIRealtimeAdapter(model=OpenAIRealtimeModel.GPT_REALTIME_2),
)

phone = Patter(carrier=Twilio(), phone_number="+15550001234")
# Billing auto-uses the gpt-realtime-2 rate ($32/M audio in, $64/M audio out).

Example B — Override one model, keep siblings intact

merge_pricing overlays the nested models dict shallowly. Overriding a single model leaves the other rates inside the same provider untouched.

phone = Patter(
    carrier=Twilio(),
    phone_number="+15550001234",
    pricing={
        # Negotiated a discount on Nova-2 only — Nova-3 / Whisper rates stay default.
        "deepgram": {"models": {"nova-2": {"price": 0.004}}},
    },
)

Example C — Register a brand-new model rate

Add a model that isn’t in the built-in table without touching SDK source.

phone = Patter(
    carrier=Twilio(),
    phone_number="+15550001234",
    pricing={
        "elevenlabs": {
            "models": {"my_custom_voice": {"price": 0.075}},
        },
    },
)
# When agent.tts.model == "my_custom_voice", calculate_tts_cost picks up $0.075/1k.

Default Pricing (2026.3)

Provider-level defaults are listed below. Per-model rates live under DEFAULT_PRICING[provider]["models"] and are auto-resolved when the adapter exposes its model identifier.

Provider	Unit	Default Price (default model)
Deepgram (`nova-3` streaming mono)	per minute	$0.0077
OpenAI Whisper (`whisper-1`)	per minute	$0.006
OpenAI Transcribe (`gpt-4o-transcribe`)	per minute	$0.006
AssemblyAI	per minute	$0.0025
Cartesia STT (ink-whisper)	per minute	$0.0025
Soniox	per minute	$0.002
Speechmatics (Pro)	per minute	$0.004
ElevenLabs (`eleven_flash_v2_5`)	per 1k chars	$0.06
OpenAI TTS (`tts-1`)	per 1k chars	$0.015
Cartesia TTS (`sonic-2`)	per 1k chars	$0.030
Rime (`mistv2`)	per 1k chars	$0.030
LMNT (`aurora`)	per 1k chars	$0.050
Inworld (`inworld-tts-2`)	per 1k chars	$0.020
OpenAI Realtime (`gpt-realtime-mini` / `gpt-4o-mini-realtime-preview`)	per token	$10/M audio in ·$ 20/M audio out · $0.60/M text in ·$ 2.40/M text out (cached: $0.30/M audio ·$ 0.06/M text)
Twilio (US inbound local)	per minute	$0.0085 (rounded up to whole minute, per Twilio)
Telnyx	per minute	$0.007

STT — per-model rates

Provider	Model	Price
Deepgram	`nova-3` (default)	$0.0077/min
Deepgram	`nova-3-multilingual`	$0.0092/min
Deepgram	`nova-2`	$0.0058/min
Deepgram	`nova`	$0.0043/min
Deepgram	`whisper-large` / `whisper-medium`	$0.0048/min
OpenAI Whisper	`whisper-1` (default)	$0.006/min
OpenAI Whisper	`gpt-4o-transcribe`	$0.006/min
OpenAI Whisper	`gpt-4o-mini-transcribe`	$0.003/min
OpenAI Whisper	`gpt-realtime-whisper`	$0.017/min
OpenAI Transcribe (`openai_transcribe`)	`gpt-4o-transcribe` (default)	$0.006/min
OpenAI Transcribe	`gpt-4o-mini-transcribe`	$0.003/min
OpenAI Transcribe	`whisper-1`	$0.006/min

TTS — per-model rates

Provider	Model	Price
ElevenLabs (REST + WebSocket)	`eleven_flash_v2_5` (default)	$0.06/1k
ElevenLabs	`eleven_turbo_v2_5`	$0.05/1k
ElevenLabs	`eleven_multilingual_v2` / `eleven_monolingual_v1`	$0.18/1k
ElevenLabs	`eleven_v3`	$0.30/1k
OpenAI TTS	`tts-1` (default)	$0.015/1k
OpenAI TTS	`tts-1-hd`	$0.030/1k
OpenAI TTS	`gpt-4o-mini-tts`	$0.012/1k
Cartesia	`sonic-1` / `sonic-2` / `sonic-english` / `sonic-multilingual`	$0.030/1k
Rime	`mistv2` (default) / `mist`	$0.030/1k
Rime	`arcana`	$0.040/1k
LMNT	`aurora` (default) / `blizzard`	$0.050/1k
Inworld	`inworld-tts-2` (default)	$0.020/1k
Inworld	`inworld-tts-1.5-max` / `inworld-tts-1.5`	$0.025/1k

OpenAI Realtime — per-model rates

Model	Audio in / out (per token)	Text in / out (per token)	Cached audio / text (per token)
`gpt-realtime-mini` (default) / `gpt-4o-mini-realtime-preview`	$0.00001 /$ 0.00002	$0.0000006 /$ 0.0000024	$0.0000003 /$ 0.00000006
`gpt-realtime`	$0.000032 /$ 0.000064	$0.000004 /$ 0.000016	$0.0000004 /$ 0.0000004
`gpt-realtime-2`	$0.000032 /$ 0.000064	$0.000004 /$ 0.000024	$0.0000004 /$ 0.0000004
`gpt-4o-realtime-preview`	$0.0001 /$ 0.0002	$0.000005 /$ 0.000020	$0.0000020 /$ 0.0000025

gpt-4o-realtime-preview is roughly 10x the cost of gpt-realtime-mini for audio. Switching realtime models has direct billing impact — confirm the model on agent.realtime.model matches the rate you expect.

Twilio defaults match US inbound local. Override pricing.twilio.price for US toll-free inbound (~

0.022/min) or US outbound local (~

0.014/min). Default pricing is based on publicly listed provider rates and may become stale — check the provider’s pricing page or pass your own overrides for authoritative numbers.

Real-Time Metrics

Use the on_metrics callback for live cost updates during a call:

async def on_metrics(data):
    cost = data.get("cost_so_far")
    if cost:
        print(f"Running cost: ${cost.total:.4f}")

await phone.serve(
    agent,
    port=8000,
    on_metrics=on_metrics,
)

The cost_so_far value is a CostBreakdown dataclass, so access its fields as attributes (e.g. cost.total, cost.stt) rather than dictionary keys.

Data Types

from getpatter import CallMetrics, CostBreakdown, LatencyBreakdown, TurnMetrics

CallMetrics

Field	Type	Description
`call_id`	`str`	Unique call identifier.
`duration_seconds`	`float`	Total call duration.
`turns`	`tuple[TurnMetrics, ...]`	Per-turn metrics.
`cost`	`CostBreakdown`	Cost breakdown.
`latency_avg`	`LatencyBreakdown`	Average latency.
`latency_p50`	`LatencyBreakdown`	Median (50th percentile) latency.
`latency_p90`	`LatencyBreakdown`	90th percentile latency (steady-state outliers).
`latency_p95`	`LatencyBreakdown`	95th percentile latency.
`latency_p99`	`LatencyBreakdown`	99th percentile latency (cold-start outliers).
`provider_mode`	`str`	Voice mode used.
`stt_provider`	`str`	STT provider name.
`tts_provider`	`str`	TTS provider name.
`llm_provider`	`str`	LLM provider name.
`telephony_provider`	`str`	Telephony provider name.

TurnMetrics

Field	Type	Description
`turn_index`	`int`	Zero-based turn index.
`user_text`	`str`	What the user said.
`agent_text`	`str`	What the agent replied.
`latency`	`LatencyBreakdown`	Latency for this turn.
`stt_audio_seconds`	`float`	Audio duration processed by STT.
`tts_characters`	`int`	Characters synthesized by TTS.
`timestamp`	`float`	Unix timestamp.

Get Started

Setting up Patter

Observability

Integrations

Development

Metrics & Cost Tracking

Metrics & Cost Tracking

How It Works

Cost Breakdown

Latency Breakdown

Per-Turn Metrics

Custom Pricing

PricingUnit

Model-Aware Pricing

How resolution works

Example A — Just select a model

Example B — Override one model, keep siblings intact

Example C — Register a brand-new model rate

Default Pricing (2026.3)

STT — per-model rates

TTS — per-model rates

OpenAI Realtime — per-model rates

Real-Time Metrics

Data Types

CallMetrics

TurnMetrics

Get Started

Setting up Patter

Observability

Integrations

Development

Documentation Index

​Metrics & Cost Tracking

​How It Works

​Cost Breakdown

​Latency Breakdown

​Per-Turn Metrics

​Custom Pricing

​PricingUnit

​Model-Aware Pricing

​How resolution works

​Example A — Just select a model

​Example B — Override one model, keep siblings intact

​Example C — Register a brand-new model rate

​Default Pricing (2026.3)

​STT — per-model rates

​TTS — per-model rates

​OpenAI Realtime — per-model rates

​Real-Time Metrics

​Data Types

​CallMetrics

​TurnMetrics

Metrics & Cost Tracking

How It Works

Cost Breakdown

Latency Breakdown

Per-Turn Metrics

Custom Pricing

PricingUnit

Model-Aware Pricing

How resolution works

Example A — Just select a model

Example B — Override one model, keep siblings intact

Example C — Register a brand-new model rate

Default Pricing (2026.3)

STT — per-model rates

TTS — per-model rates

OpenAI Realtime — per-model rates

Real-Time Metrics

Data Types

CallMetrics

TurnMetrics