Voices
A voice in ph0ny is a voiceId plus a provider. The same caller can sound like a different person in every call by swapping voice IDs at request time. We don't lock voices to providers — clone a voice once and use it across Cartesia, ElevenLabs, Fish Audio, or any compatible engine.
Library voices
Curated, ready-to-use voices that ship with the platform. Pass voiceId directly to any TTS or video-avatar request.
| Voice ID | Provider | Style | Language |
|---|---|---|---|
sonic-english-male | cartesia | Conversational, warm | en-US |
sonic-english-female | cartesia | Conversational, neutral | en-US |
sonic-broadcast-female | cartesia | News anchor | en-US |
rachel | elevenlabs | Friendly, professional | en-US |
bella | elevenlabs | Warm, approachable | en-US |
josh | elevenlabs | Authoritative, deep | en-US |
aura-asteria-en | deepgram | Confident, modern | en-US |
aura-luna-en | deepgram | Polite, soft | en-US |
fish-genesis | fish-audio | Multilingual neutral | multi |
qwen-default-female | qwen-tts | Bilingual zh + en | zh, en |
Library is non-exhaustive — every provider exposes its own catalog. Hit
GET /v1/voices?provider=<id>to list every voice the provider exposes for your account, including BYOK-only ones.
Voice cloning
Clone any voice from a 30 second–10 minute audio sample. Cloned voices are scoped to your account and usable across compatible providers.
const cloned = await ph0ny.voices.clone({
name: 'Drew (Founder)',
audioUrl: 'https://uploads.example.com/drew-30s.mp3',
consent: { speakerName: 'Drew Stone', confirmed: true },
})
// Use the same id on TTS, avatar, and dub.
await ph0ny.tts.synthesize({
text: 'Welcome to the launch.',
voiceId: cloned.voiceId,
})
await ph0ny.video.avatar({
image_url: 'https://uploads.example.com/drew.jpg',
text: 'Welcome to the launch.',
voice_id: cloned.voiceId,
})Consent matters. ph0ny refuses cloning requests that don't carry a confirmed speaker identity. We log the consent block on every clone and on every request that uses a cloned voice. See Pricing for retention details.
Provider compatibility
| Engine | Cloned voice usable? | Notes |
|---|---|---|
| Cartesia | ✓ | Re-encodes for Cartesia's voice space; ~5s prep on first use. |
| ElevenLabs | ✓ | Native instant clone if BYOK; falls back to phonetic adaptation otherwise. |
| Fish Audio | ✓ | Best zero-shot quality on non-English clones. |
| Resemble AI | ✓ | Enterprise-tier; needs explicit consent workflow. |
| F5 TTS | ✓ | Self-hosted; you keep the model. |
| CosyVoice | ✓ | Open-source; multilingual zero-shot. |
| Kokoro / Pocket / Inworld | ✗ | Use library voices on these. |
Voice search & filter
# All English-language conversational voices, BYOK-eligible only
curl "https://api.ph0ny.com/v1/voices?lang=en&style=conversational&byok=true"
# Voices clonable on Cartesia + ElevenLabs both
curl "https://api.ph0ny.com/v1/voices?cloneable=cartesia,elevenlabs"Each voice carries a stable voiceId, the provider it came from, language tags, supported emotions, and a 5-second sample URL.
Multilingual
Most ElevenLabs and Fish Audio voices speak 32+ languages from a single voice ID. Pass the target language and ph0ny re-routes to the right model variant:
await ph0ny.tts.synthesize({
text: '¡Hola! Bienvenido a tu reserva.',
voiceId: 'rachel',
provider: 'elevenlabs',
language: 'es-ES',
})For dubbing existing audio while preserving the original speaker, use the Video API dub endpoint — it lipsync-aligns to the new audio in one shot.
Where to next
- Voice cloning guide → — full walkthrough with consent flow.
- TTS / STT API → — request/response shapes.
- Models → — pair voice with the right LLM brain.