Models
The brain behind every agent. Pick a model per agent, per session, or per turn — and BYOK to skip metering on the model side entirely.
Frontier closed-source
| Model | Provider | Context | Strengths | BYOK |
|---|---|---|---|---|
gpt-5 | openai | 400k | Flagship reasoning, tool use, vision | ✓ |
gpt-5-mini | openai | 400k | 5x cheaper, 90% of the quality, our voice-agent default | ✓ |
gpt-5-nano | openai | 200k | Fastest streaming, low-stakes turns | ✓ |
gpt-4o | openai | 128k | Multimodal, audio in/out | ✓ |
gpt-4o-mini | openai | 128k | Cheap structured-output workhorse | ✓ |
claude-opus-4-7 | anthropic | 1M | Long-context comprehension, careful reasoning | ✓ |
claude-sonnet-4-6 | anthropic | 200k | Best agentic tool-use available | ✓ |
claude-haiku-4-5 | anthropic | 200k | Cheapest Claude, ~250 tok/s | ✓ |
Frontier open-weights (via Groq LPU)
| Model | Provider | Context | Tok/s | BYOK |
|---|---|---|---|---|
llama-4-scout-17b | groq | 128k | ~750 | ✓ |
llama-4-maverick | groq | 1M | ~400 | ✓ |
llama-3.3-70b | groq | 128k | ~280 | ✓ |
mixtral-8x22b | groq | 64k | ~500 | ✓ |
qwen-2.5-72b | groq | 128k | ~450 | ✓ |
deepseek-r1-distill-70b | groq | 128k | ~280 | ✓ |
Groq's LPU runs open-weights at speeds closed APIs can't match — for streaming voice agents this is the latency win.
Multilingual / regional
| Model | Provider | Context | Notes |
|---|---|---|---|
glm-4-plus | zhipu | 128k | Chinese + English. Cheaper than GPT-4o on identical tasks. |
glm-4-air | zhipu | 128k | Faster GLM tier. |
Specialty
| Model | Provider | Use case |
|---|---|---|
obliteratus-r1 | obliteratus | Uncensored / roleplay (apply: sales@ph0ny.com) |
Picking a model
The Builder agent (and the meta-builder behind ph0ny.com) routes turns by stake:
text
fast read → gpt-5-nano | claude-haiku-4-5 | llama-4-scout
chat reply → gpt-5-mini | claude-sonnet-4-6 | llama-3.3-70b
tool dispatch → gpt-5 | claude-sonnet-4-6
audit / eval → claude-opus-4-7 (1M context)Set the model on the agent definition once, override per call when needed:
ts
const agent = await ph0ny.agents.create({
name: 'OrderAI',
llmModel: 'gpt-5-mini',
// …
})
// Per-turn override:
await ph0ny.sessions.send(sessionId, {
message: 'Confirm the order.',
llmModel: 'gpt-5',
})Provider rotation
For higher availability or cost arbitrage, hand the SDK a list — we round-robin across providers and skip the ones that 5xx or rate-limit:
ts
await ph0ny.agents.update(agentId, {
llmModelRotation: [
'claude-haiku-4-5',
'gpt-5-mini',
'llama-3.3-70b',
'gpt-4o-mini',
],
})The rotation also drives the anonymous chat on builder.ph0ny.com — a turn that 5xx's on one provider gets retried on the next without the user noticing.
Where to next
- Voices → — pair the brain with a voice.
- Providers → — full vendor matrix with logos and capabilities.
- Agents API → — wire models into agent definitions.