Skip to main content

Documentation Index

Fetch the complete documentation index at: https://patter-06b046ce-feat-observability-otel-attrs-0-6-1.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

STT (Speech-to-Text)

STT is used in pipeline mode to transcribe caller audio before it reaches your LLM. If you use an engine such as OpenAIRealtime or ElevenLabsConvAI, speech recognition is handled internally by the engine. Every STT class is imported by name from the package barrel: import { DeepgramSTT } from "getpatter".

Quickstart

// npx tsx example.ts
import { Patter, Twilio, DeepgramSTT, ElevenLabsTTS } from "getpatter";

const phone = new Patter({ carrier: new Twilio(), phoneNumber: "+15550001234" });  // TWILIO_* from env

const agent = phone.agent({
  stt: new DeepgramSTT({ endpointingMs: 80 }),      // DEEPGRAM_API_KEY from env
  tts: new ElevenLabsTTS({ voiceId: "rachel" }),    // ELEVENLABS_API_KEY from env
  systemPrompt: "You are a helpful assistant.",
  firstMessage: "Hi!",
});

await phone.serve({ agent });

Supported providers

Flat importNamespaced importEnv var
DeepgramSTTgetpatter/stt/deepgramDEEPGRAM_API_KEY
WhisperSTTgetpatter/stt/whisperOPENAI_API_KEY
OpenAITranscribeSTTgetpatter/stt/openai-transcribeOPENAI_API_KEY
CartesiaSTTgetpatter/stt/cartesiaCARTESIA_API_KEY
AssemblyAISTTgetpatter/stt/assemblyaiASSEMBLYAI_API_KEY
SonioxSTTgetpatter/stt/sonioxSONIOX_API_KEY
SpeechmaticsSTTgetpatter/stt/speechmaticsSPEECHMATICS_API_KEY
SpeechmaticsSTT is being ported to TypeScript in the upcoming release — see the ## Unreleased section in CHANGELOG.md. Use Python or wait for the next minor version.
Speechmatics is supported by the Python SDK but not yet by the TypeScript SDK — use the Python SDK if you need Speechmatics.

Model enums

Each provider exports a typed const-object of valid model IDs alongside the provider class. They keep model options tab-completable and reject typos at construction time, while still accepting raw strings for forward compatibility:
import { DeepgramSTT, DeepgramModel } from "getpatter";

const stt = new DeepgramSTT({ model: DeepgramModel.NOVA_3 });
The same pattern applies to AssemblyAIModel, CartesiaSTTModel, and SonioxModel.

Deepgram

Streaming STT backed by Deepgram’s nova-3 model.
import { DeepgramSTT } from "getpatter";

const stt = new DeepgramSTT();                                    // reads DEEPGRAM_API_KEY
const stt = new DeepgramSTT({ apiKey: "dg_...", endpointingMs: 80 });
ParameterTypeDefaultDescription
apiKeystringAPI key — reads from DEEPGRAM_API_KEY if omitted.
languagestring"en"BCP-47 language code.
modelstring"nova-3"Deepgram model ID.
encodingstring"linear16"Audio encoding sent to Deepgram.
sampleRatenumber16000Sample rate in Hz.
endpointingMsnumber150Utterance endpointing in milliseconds.
utteranceEndMsnumber | null1000Grace period after speech ends.
smartFormatbooleanfalseSmart formatting (numbers, dates, punctuation). Defaults to false because telephony agents feed transcripts straight back into an LLM, where smart-format rewrites can confuse downstream tool-call argument parsing. Pass smartFormat: true to opt back in.
interimResultsbooleantrueStream interim transcripts.
vadEventsbooleantrueEmit VAD start/end markers.

Whisper (OpenAI)

HTTP-based STT via OpenAI Whisper. Reuses OPENAI_API_KEY.
import { WhisperSTT } from "getpatter";

const stt = new WhisperSTT();                                     // reads OPENAI_API_KEY
const stt = new WhisperSTT({ apiKey: "sk-...", language: "es" });
Whisper on mulaw 8 kHz routinely hallucinates short fillers ("you", ".", "thank you"). For production prefer OpenAITranscribeSTT (gpt-4o-transcribe) — same OPENAI_API_KEY, ~10× faster, no hallucination floor.

OpenAI Transcribe (gpt-4o-transcribe)

First-class STT for OpenAI’s gpt-4o-transcribe and gpt-4o-mini-transcribe models — drop-in replacement for WhisperSTT with stronger multilingual quality and significantly lower latency. Reuses OPENAI_API_KEY.
import { OpenAITranscribeSTT } from "getpatter";

const stt = new OpenAITranscribeSTT();                                  // reads OPENAI_API_KEY, defaults to gpt-4o-transcribe
const stt2 = new OpenAITranscribeSTT({ model: "gpt-4o-mini-transcribe" }); // cheaper variant
const stt3 = new OpenAITranscribeSTT({ apiKey: "sk-...", language: "es" });
ParameterTypeDefaultDescription
apiKeystringAPI key — reads from OPENAI_API_KEY if omitted.
languagestringBCP-47 language code. Auto-detect when omitted.
modelstring"gpt-4o-transcribe"Either "gpt-4o-transcribe" or "gpt-4o-mini-transcribe".
responseFormatstring"json"Pass "verbose_json" to expose segment-level confidence and timestamps.

Cartesia

Streaming STT using Cartesia’s ink-whisper. See Cartesia setup.
import { CartesiaSTT } from "getpatter";

const stt = new CartesiaSTT();                                    // reads CARTESIA_API_KEY
const stt = new CartesiaSTT({ apiKey: "csk_...", language: "en" });

AssemblyAI

Universal Streaming STT via the AssemblyAI v3 WebSocket API. See AssemblyAI setup.
import { AssemblyAISTT } from "getpatter";

const stt = new AssemblyAISTT();                                  // reads ASSEMBLYAI_API_KEY

Soniox

Real-time STT via Soniox.
import { SonioxSTT } from "getpatter";

const stt = new SonioxSTT();                                      // reads SONIOX_API_KEY

Missing credentials

Each class throws at construction time if no API key is resolved:
Error: Deepgram STT requires an apiKey. Pass { apiKey: 'dg_...' } or
set DEEPGRAM_API_KEY in the environment.

What’s Next

LLM

Configure the language model.

TTS

Configure speech synthesis.