Documentation Index
Fetch the complete documentation index at: https://patter-06b046ce-feat-observability-otel-attrs-0-6-1.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Metrics & Cost Tracking
Patter automatically tracks cost and latency for every call, broken down by provider component (STT, TTS, LLM, telephony).
How It Works
Metrics are collected automatically during calls. When a call ends, the on_call_end callback receives a CallMetrics object with the full breakdown:
async def on_call_end(event):
metrics = event.get("metrics")
if metrics:
print(f"Duration: {metrics.duration_seconds}s")
print(f"Total cost: ${metrics.cost.total:.4f}")
print(f" STT: ${metrics.cost.stt:.4f}")
print(f" TTS: ${metrics.cost.tts:.4f}")
print(f" LLM: ${metrics.cost.llm:.4f}")
print(f" Telephony: ${metrics.cost.telephony:.4f}")
print(f"Avg latency: {metrics.latency_avg.total_ms}ms")
print(f"P95 latency: {metrics.latency_p95.total_ms}ms")
Cost Breakdown
The CostBreakdown object provides per-component costs in USD:
| Field | Description |
|---|
stt | Speech-to-text cost (Deepgram, Whisper). |
tts | Text-to-speech cost (ElevenLabs, OpenAI TTS). |
llm | LLM cost (OpenAI Realtime tokens). |
telephony | Telephony cost (Twilio, Telnyx per-minute). |
total | Sum of all components. |
Latency Breakdown
The LatencyBreakdown object provides per-component latency in milliseconds:
| Field | Description |
|---|
stt_ms | Time from user speech to transcript. |
endpoint_ms | Time the endpointer waited after the last word before declaring end-of-utterance. |
llm_ttft_ms | Time from end-of-utterance to the first LLM token. |
llm_total_ms | Time from end-of-utterance to the last LLM token (full response). |
llm_ms | Alias for llm_ttft_ms (kept for back-compat). |
tts_ms | Time from first LLM token to first TTS audio byte. |
tts_total_ms | Time from first LLM token to last TTS audio byte. |
bargein_ms | Time from caller voice detected to TTS playback cancelled (only set on barge-in turns). |
total_ms | End-to-end latency (user speech to first audio). |
CallMetrics exposes the full distribution: latency_avg, latency_p50 (median / typical UX), latency_p90 (steady-state outliers), latency_p95 (SLA), and latency_p99 (cold-start outliers).
Per-Turn Metrics
Each conversation turn is tracked individually:
async def on_call_end(event):
metrics = event.get("metrics")
if metrics:
for turn in metrics.turns:
print(f"Turn {turn.turn_index}:")
print(f" User: {turn.user_text}")
print(f" Agent: {turn.agent_text}")
print(f" Latency: {turn.latency.total_ms}ms")
Custom Pricing
Override default provider pricing estimates:
from getpatter import Patter, Twilio
phone = Patter(
carrier=Twilio(),
phone_number="+15550001234",
pricing={
"deepgram": {"price": 0.005}, # Override STT price per minute
"elevenlabs": {"price": 0.15}, # Override TTS price per 1k chars
"twilio": {"price": 0.015}, # Override telephony price per minute
},
)
PricingUnit
The pricing tables expose a PricingUnit StrEnum so overrides don’t depend on raw strings:
from getpatter.pricing import PricingUnit
PricingUnit.MINUTE # "minute" — per minute of audio (STT, telephony)
PricingUnit.THOUSAND_CHARS # "1k_chars" — per thousand characters synthesised (TTS)
PricingUnit.TOKEN # "token" — per token (LLM / Realtime)
Subclassing str keeps the values JSON-serialisable and backward-compatible with code that compares against the literal strings (config.get("unit") == "minute").
Model-Aware Pricing
Patter’s pricing tables are model-aware: every entry in DEFAULT_PRICING carries provider-level defaults plus an optional models map keyed by model identifier. When the agent’s adapter exposes a model attribute, the metrics layer threads it through the cost-calc functions and the dashboard bills with model accuracy out of the box — no manual override required.
PRICING_VERSION # "2026.3"
PRICING_LAST_UPDATED # "2026-05-08"
How resolution works
The cost-calc helpers (calculate_stt_cost, calculate_tts_cost, calculate_realtime_cost, calculate_realtime_cached_savings) accept an optional trailing model arg. The internal _resolve_provider_rates(config, model) helper merges per-model overrides on top of provider defaults using:
- Exact match in the provider’s
models dict.
- Longest-prefix match —
gpt-realtime-2-2026-05-08 resolves against gpt-realtime-2.
- Provider defaults — fallback when the model is unknown or omitted.
CallMetricsAccumulator auto-tracks stt_model, tts_model, and realtime_model from the agent’s adapter model attribute (agent.stt.model, agent.tts.model, agent.model for Realtime). On every record_realtime_usage(usage) call the realtime model is also pulled from the response.done payload itself, overriding the call-level default — so mid-call model switches are billed correctly.
The optional model argument defaults to None, which preserves the legacy provider-rate behaviour. Existing callers compile and run unchanged.
Example A — Just select a model
The most common case: pick a model on your adapter, and Patter bills the right rate automatically.
from getpatter import Patter, Twilio
from getpatter.providers import OpenAIRealtimeAdapter, OpenAIRealtimeModel
agent = Patter.agent(
system_prompt="You are a helpful assistant.",
realtime=OpenAIRealtimeAdapter(model=OpenAIRealtimeModel.GPT_REALTIME_2),
)
phone = Patter(carrier=Twilio(), phone_number="+15550001234")
# Billing auto-uses the gpt-realtime-2 rate ($32/M audio in, $64/M audio out).
Example B — Override one model, keep siblings intact
merge_pricing overlays the nested models dict shallowly. Overriding a single model leaves the other rates inside the same provider untouched.
phone = Patter(
carrier=Twilio(),
phone_number="+15550001234",
pricing={
# Negotiated a discount on Nova-2 only — Nova-3 / Whisper rates stay default.
"deepgram": {"models": {"nova-2": {"price": 0.004}}},
},
)
Example C — Register a brand-new model rate
Add a model that isn’t in the built-in table without touching SDK source.
phone = Patter(
carrier=Twilio(),
phone_number="+15550001234",
pricing={
"elevenlabs": {
"models": {"my_custom_voice": {"price": 0.075}},
},
},
)
# When agent.tts.model == "my_custom_voice", calculate_tts_cost picks up $0.075/1k.
Default Pricing (2026.3)
Provider-level defaults are listed below. Per-model rates live under DEFAULT_PRICING[provider]["models"] and are auto-resolved when the adapter exposes its model identifier.
| Provider | Unit | Default Price (default model) |
|---|
Deepgram (nova-3 streaming mono) | per minute | $0.0077 |
OpenAI Whisper (whisper-1) | per minute | $0.006 |
OpenAI Transcribe (gpt-4o-transcribe) | per minute | $0.006 |
| AssemblyAI | per minute | $0.0025 |
| Cartesia STT (ink-whisper) | per minute | $0.0025 |
| Soniox | per minute | $0.002 |
| Speechmatics (Pro) | per minute | $0.004 |
ElevenLabs (eleven_flash_v2_5) | per 1k chars | $0.06 |
OpenAI TTS (tts-1) | per 1k chars | $0.015 |
Cartesia TTS (sonic-2) | per 1k chars | $0.030 |
Rime (mistv2) | per 1k chars | $0.030 |
LMNT (aurora) | per 1k chars | $0.050 |
Inworld (inworld-tts-2) | per 1k chars | $0.020 |
OpenAI Realtime (gpt-realtime-mini / gpt-4o-mini-realtime-preview) | per token | 10/Maudioin⋅20/M audio out · 0.60/Mtextin⋅2.40/M text out (cached: 0.30/Maudio⋅0.06/M text) |
| Twilio (US inbound local) | per minute | $0.0085 (rounded up to whole minute, per Twilio) |
| Telnyx | per minute | $0.007 |
STT — per-model rates
| Provider | Model | Price |
|---|
| Deepgram | nova-3 (default) | $0.0077/min |
| Deepgram | nova-3-multilingual | $0.0092/min |
| Deepgram | nova-2 | $0.0058/min |
| Deepgram | nova | $0.0043/min |
| Deepgram | whisper-large / whisper-medium | $0.0048/min |
| OpenAI Whisper | whisper-1 (default) | $0.006/min |
| OpenAI Whisper | gpt-4o-transcribe | $0.006/min |
| OpenAI Whisper | gpt-4o-mini-transcribe | $0.003/min |
| OpenAI Whisper | gpt-realtime-whisper | $0.017/min |
OpenAI Transcribe (openai_transcribe) | gpt-4o-transcribe (default) | $0.006/min |
| OpenAI Transcribe | gpt-4o-mini-transcribe | $0.003/min |
| OpenAI Transcribe | whisper-1 | $0.006/min |
TTS — per-model rates
| Provider | Model | Price |
|---|
| ElevenLabs (REST + WebSocket) | eleven_flash_v2_5 (default) | $0.06/1k |
| ElevenLabs | eleven_turbo_v2_5 | $0.05/1k |
| ElevenLabs | eleven_multilingual_v2 / eleven_monolingual_v1 | $0.18/1k |
| ElevenLabs | eleven_v3 | $0.30/1k |
| OpenAI TTS | tts-1 (default) | $0.015/1k |
| OpenAI TTS | tts-1-hd | $0.030/1k |
| OpenAI TTS | gpt-4o-mini-tts | $0.012/1k |
| Cartesia | sonic-1 / sonic-2 / sonic-english / sonic-multilingual | $0.030/1k |
| Rime | mistv2 (default) / mist | $0.030/1k |
| Rime | arcana | $0.040/1k |
| LMNT | aurora (default) / blizzard | $0.050/1k |
| Inworld | inworld-tts-2 (default) | $0.020/1k |
| Inworld | inworld-tts-1.5-max / inworld-tts-1.5 | $0.025/1k |
OpenAI Realtime — per-model rates
| Model | Audio in / out (per token) | Text in / out (per token) | Cached audio / text (per token) |
|---|
gpt-realtime-mini (default) / gpt-4o-mini-realtime-preview | 0.00001/0.00002 | 0.0000006/0.0000024 | 0.0000003/0.00000006 |
gpt-realtime | 0.000032/0.000064 | 0.000004/0.000016 | 0.0000004/0.0000004 |
gpt-realtime-2 | 0.000032/0.000064 | 0.000004/0.000024 | 0.0000004/0.0000004 |
gpt-4o-realtime-preview | 0.0001/0.0002 | 0.000005/0.000020 | 0.0000020/0.0000025 |
gpt-4o-realtime-preview is roughly 10x the cost of gpt-realtime-mini for audio. Switching realtime models has direct billing impact — confirm the model on agent.realtime.model matches the rate you expect.
Twilio defaults match US inbound local. Override pricing.twilio.price for US toll-free inbound (~0.022/min)orUSoutboundlocal( 0.014/min). Default pricing is based on publicly listed provider rates and may become stale — check the provider’s pricing page or pass your own overrides for authoritative numbers.
Real-Time Metrics
Use the on_metrics callback for live cost updates during a call:
async def on_metrics(data):
cost = data.get("cost_so_far")
if cost:
print(f"Running cost: ${cost.total:.4f}")
await phone.serve(
agent,
port=8000,
on_metrics=on_metrics,
)
The cost_so_far value is a CostBreakdown dataclass, so access its fields as attributes (e.g. cost.total, cost.stt) rather than dictionary keys.
Data Types
from getpatter import CallMetrics, CostBreakdown, LatencyBreakdown, TurnMetrics
CallMetrics
| Field | Type | Description |
|---|
call_id | str | Unique call identifier. |
duration_seconds | float | Total call duration. |
turns | tuple[TurnMetrics, ...] | Per-turn metrics. |
cost | CostBreakdown | Cost breakdown. |
latency_avg | LatencyBreakdown | Average latency. |
latency_p50 | LatencyBreakdown | Median (50th percentile) latency. |
latency_p90 | LatencyBreakdown | 90th percentile latency (steady-state outliers). |
latency_p95 | LatencyBreakdown | 95th percentile latency. |
latency_p99 | LatencyBreakdown | 99th percentile latency (cold-start outliers). |
provider_mode | str | Voice mode used. |
stt_provider | str | STT provider name. |
tts_provider | str | TTS provider name. |
llm_provider | str | LLM provider name. |
telephony_provider | str | Telephony provider name. |
TurnMetrics
| Field | Type | Description |
|---|
turn_index | int | Zero-based turn index. |
user_text | str | What the user said. |
agent_text | str | What the agent replied. |
latency | LatencyBreakdown | Latency for this turn. |
stt_audio_seconds | float | Audio duration processed by STT. |
tts_characters | int | Characters synthesized by TTS. |
timestamp | float | Unix timestamp. |