Models

The brain behind every agent. Pick a model per agent, per session, or per turn — and BYOK to skip metering on the model side entirely.

Frontier closed-source

Model	Provider	Context	Strengths	BYOK
`gpt-5`	openai	400k	Flagship reasoning, tool use, vision	✓
`gpt-5-mini`	openai	400k	5x cheaper, 90% of the quality, our voice-agent default	✓
`gpt-5-nano`	openai	200k	Fastest streaming, low-stakes turns	✓
`gpt-4o`	openai	128k	Multimodal, audio in/out	✓
`gpt-4o-mini`	openai	128k	Cheap structured-output workhorse	✓
`claude-opus-4-7`	anthropic	1M	Long-context comprehension, careful reasoning	✓
`claude-sonnet-4-6`	anthropic	200k	Best agentic tool-use available	✓
`claude-haiku-4-5`	anthropic	200k	Cheapest Claude, ~250 tok/s	✓

Frontier open-weights (via Groq LPU)

Model	Provider	Context	Tok/s	BYOK
`llama-4-scout-17b`	groq	128k	~750	✓
`llama-4-maverick`	groq	1M	~400	✓
`llama-3.3-70b`	groq	128k	~280	✓
`mixtral-8x22b`	groq	64k	~500	✓
`qwen-2.5-72b`	groq	128k	~450	✓
`deepseek-r1-distill-70b`	groq	128k	~280	✓

Groq's LPU runs open-weights at speeds closed APIs can't match — for streaming voice agents this is the latency win.

Multilingual / regional

Model	Provider	Context	Notes
`glm-4-plus`	zhipu	128k	Chinese + English. Cheaper than GPT-4o on identical tasks.
`glm-4-air`	zhipu	128k	Faster GLM tier.

Specialty

Model	Provider	Use case
`obliteratus-r1`	obliteratus	Uncensored / roleplay (apply: sales@ph0ny.com)

Picking a model

The Builder agent (and the meta-builder behind ph0ny.com) routes turns by stake:

text

fast read     →  gpt-5-nano | claude-haiku-4-5 | llama-4-scout
chat reply    →  gpt-5-mini | claude-sonnet-4-6 | llama-3.3-70b
tool dispatch →  gpt-5      | claude-sonnet-4-6
audit / eval  →  claude-opus-4-7 (1M context)

Set the model on the agent definition once, override per call when needed:

const agent = await ph0ny.agents.create({
  name: 'OrderAI',
  llmModel: 'gpt-5-mini',
  // …
})

// Per-turn override:
await ph0ny.sessions.send(sessionId, {
  message: 'Confirm the order.',
  llmModel: 'gpt-5',
})

Provider rotation

For higher availability or cost arbitrage, hand the SDK a list — we round-robin across providers and skip the ones that 5xx or rate-limit:

await ph0ny.agents.update(agentId, {
  llmModelRotation: [
    'claude-haiku-4-5',
    'gpt-5-mini',
    'llama-3.3-70b',
    'gpt-4o-mini',
  ],
})

The rotation also drives the anonymous chat on builder.ph0ny.com — a turn that 5xx's on one provider gets retried on the next without the user noticing.

Where to next

Voices → — pair the brain with a voice.
Providers → — full vendor matrix with logos and capabilities.
Agents API → — wire models into agent definitions.

Models ​

Frontier closed-source ​

Frontier open-weights (via Groq LPU) ​

Multilingual / regional ​

Specialty ​

Picking a model ​

Provider rotation ​

Where to next ​