Skip to content

Models

The brain behind every agent. Pick a model per agent, per session, or per turn — and BYOK to skip metering on the model side entirely.

Frontier closed-source

ModelProviderContextStrengthsBYOK
gpt-5openai400kFlagship reasoning, tool use, vision
gpt-5-miniopenai400k5x cheaper, 90% of the quality, our voice-agent default
gpt-5-nanoopenai200kFastest streaming, low-stakes turns
gpt-4oopenai128kMultimodal, audio in/out
gpt-4o-miniopenai128kCheap structured-output workhorse
claude-opus-4-7anthropic1MLong-context comprehension, careful reasoning
claude-sonnet-4-6anthropic200kBest agentic tool-use available
claude-haiku-4-5anthropic200kCheapest Claude, ~250 tok/s

Frontier open-weights (via Groq LPU)

ModelProviderContextTok/sBYOK
llama-4-scout-17bgroq128k~750
llama-4-maverickgroq1M~400
llama-3.3-70bgroq128k~280
mixtral-8x22bgroq64k~500
qwen-2.5-72bgroq128k~450
deepseek-r1-distill-70bgroq128k~280

Groq's LPU runs open-weights at speeds closed APIs can't match — for streaming voice agents this is the latency win.

Multilingual / regional

ModelProviderContextNotes
glm-4-pluszhipu128kChinese + English. Cheaper than GPT-4o on identical tasks.
glm-4-airzhipu128kFaster GLM tier.

Specialty

ModelProviderUse case
obliteratus-r1obliteratusUncensored / roleplay (apply: sales@ph0ny.com)

Picking a model

The Builder agent (and the meta-builder behind ph0ny.com) routes turns by stake:

text
fast read     →  gpt-5-nano | claude-haiku-4-5 | llama-4-scout
chat reply    →  gpt-5-mini | claude-sonnet-4-6 | llama-3.3-70b
tool dispatch →  gpt-5      | claude-sonnet-4-6
audit / eval  →  claude-opus-4-7 (1M context)

Set the model on the agent definition once, override per call when needed:

ts
const agent = await ph0ny.agents.create({
  name: 'OrderAI',
  llmModel: 'gpt-5-mini',
  // …
})

// Per-turn override:
await ph0ny.sessions.send(sessionId, {
  message: 'Confirm the order.',
  llmModel: 'gpt-5',
})

Provider rotation

For higher availability or cost arbitrage, hand the SDK a list — we round-robin across providers and skip the ones that 5xx or rate-limit:

ts
await ph0ny.agents.update(agentId, {
  llmModelRotation: [
    'claude-haiku-4-5',
    'gpt-5-mini',
    'llama-3.3-70b',
    'gpt-4o-mini',
  ],
})

The rotation also drives the anonymous chat on builder.ph0ny.com — a turn that 5xx's on one provider gets retried on the next without the user noticing.

Where to next

Built by ph0ny.