Voices

A voice in ph0ny is a voiceId plus a provider. The same caller can sound like a different person in every call by swapping voice IDs at request time. We don't lock voices to providers — clone a voice once and use it across Cartesia, ElevenLabs, Fish Audio, or any compatible engine.

Library voices

Curated, ready-to-use voices that ship with the platform. Pass voiceId directly to any TTS or video-avatar request.

Voice ID	Provider	Style	Language
`sonic-english-male`	cartesia	Conversational, warm	en-US
`sonic-english-female`	cartesia	Conversational, neutral	en-US
`sonic-broadcast-female`	cartesia	News anchor	en-US
`rachel`	elevenlabs	Friendly, professional	en-US
`bella`	elevenlabs	Warm, approachable	en-US
`josh`	elevenlabs	Authoritative, deep	en-US
`aura-asteria-en`	deepgram	Confident, modern	en-US
`aura-luna-en`	deepgram	Polite, soft	en-US
`fish-genesis`	fish-audio	Multilingual neutral	multi
`qwen-default-female`	qwen-tts	Bilingual zh + en	zh, en

Library is non-exhaustive — every provider exposes its own catalog. Hit GET /v1/voices?provider=<id> to list every voice the provider exposes for your account, including BYOK-only ones.

Voice cloning

Clone any voice from a 30 second–10 minute audio sample. Cloned voices are scoped to your account and usable across compatible providers.

const cloned = await ph0ny.voices.clone({
  name: 'Drew',
  audioUrl: 'https://uploads.example.com/drew-30s.mp3',
  consent: { speakerName: 'Drew Stone', confirmed: true },
})

// Use the same id on TTS, avatar, and dub.
await ph0ny.tts.synthesize({
  text: 'Welcome to the launch.',
  voiceId: cloned.voiceId,
})

await ph0ny.video.avatar({
  image_url: 'https://uploads.example.com/drew.jpg',
  text: 'Welcome to the launch.',
  voice_id: cloned.voiceId,
})

Consent matters. ph0ny refuses cloning requests that don't carry a confirmed speaker identity. We log the consent block on every clone and on every request that uses a cloned voice. See Pricing for retention details.

Provider compatibility

Engine	Cloned voice usable?	Notes
Cartesia	✓	Re-encodes for Cartesia's voice space; ~5s prep on first use.
ElevenLabs	✓	Native instant clone if BYOK; falls back to phonetic adaptation otherwise.
Fish Audio	✓	Best zero-shot quality on non-English clones.
Resemble AI	✓	Enterprise-tier; needs explicit consent workflow.
F5 TTS	✓	Self-hosted; you keep the model.
CosyVoice	✓	Open-source; multilingual zero-shot.
Kokoro / Pocket / Inworld	✗	Use library voices on these.

bash

# All English-language conversational voices, BYOK-eligible only
curl "https://api.ph0ny.com/v1/voices?lang=en&style=conversational&byok=true"

# Voices clonable on Cartesia + ElevenLabs both
curl "https://api.ph0ny.com/v1/voices?cloneable=cartesia,elevenlabs"

Each voice carries a stable voiceId, the provider it came from, language tags, supported emotions, and a 5-second sample URL.

Multilingual

Most ElevenLabs and Fish Audio voices speak 32+ languages from a single voice ID. Pass the target language and ph0ny re-routes to the right model variant:

await ph0ny.tts.synthesize({
  text: '¡Hola! Bienvenido a tu reserva.',
  voiceId: 'rachel',
  provider: 'elevenlabs',
  language: 'es-ES',
})

For dubbing existing audio while preserving the original speaker, use the Video API dub endpoint — it lipsync-aligns to the new audio in one shot.

Where to next

Voice cloning guide → — full walkthrough with consent flow.
TTS / STT API → — request/response shapes.
Models → — pair voice with the right LLM brain.

Voices ​

Library voices ​

Voice cloning ​

Provider compatibility ​

Voice search & filter ​

Multilingual ​