Skip to content

Voices

A voice in ph0ny is a voiceId plus a provider. The same caller can sound like a different person in every call by swapping voice IDs at request time. We don't lock voices to providers — clone a voice once and use it across Cartesia, ElevenLabs, Fish Audio, or any compatible engine.

Library voices

Curated, ready-to-use voices that ship with the platform. Pass voiceId directly to any TTS or video-avatar request.

Voice IDProviderStyleLanguage
sonic-english-malecartesiaConversational, warmen-US
sonic-english-femalecartesiaConversational, neutralen-US
sonic-broadcast-femalecartesiaNews anchoren-US
rachelelevenlabsFriendly, professionalen-US
bellaelevenlabsWarm, approachableen-US
joshelevenlabsAuthoritative, deepen-US
aura-asteria-endeepgramConfident, modernen-US
aura-luna-endeepgramPolite, soften-US
fish-genesisfish-audioMultilingual neutralmulti
qwen-default-femaleqwen-ttsBilingual zh + enzh, en

Library is non-exhaustive — every provider exposes its own catalog. Hit GET /v1/voices?provider=<id> to list every voice the provider exposes for your account, including BYOK-only ones.

Voice cloning

Clone any voice from a 30 second–10 minute audio sample. Cloned voices are scoped to your account and usable across compatible providers.

ts
const cloned = await ph0ny.voices.clone({
  name: 'Drew (Founder)',
  audioUrl: 'https://uploads.example.com/drew-30s.mp3',
  consent: { speakerName: 'Drew Stone', confirmed: true },
})

// Use the same id on TTS, avatar, and dub.
await ph0ny.tts.synthesize({
  text: 'Welcome to the launch.',
  voiceId: cloned.voiceId,
})

await ph0ny.video.avatar({
  image_url: 'https://uploads.example.com/drew.jpg',
  text: 'Welcome to the launch.',
  voice_id: cloned.voiceId,
})

Consent matters. ph0ny refuses cloning requests that don't carry a confirmed speaker identity. We log the consent block on every clone and on every request that uses a cloned voice. See Pricing for retention details.

Provider compatibility

EngineCloned voice usable?Notes
CartesiaRe-encodes for Cartesia's voice space; ~5s prep on first use.
ElevenLabsNative instant clone if BYOK; falls back to phonetic adaptation otherwise.
Fish AudioBest zero-shot quality on non-English clones.
Resemble AIEnterprise-tier; needs explicit consent workflow.
F5 TTSSelf-hosted; you keep the model.
CosyVoiceOpen-source; multilingual zero-shot.
Kokoro / Pocket / InworldUse library voices on these.

Voice search & filter

bash
# All English-language conversational voices, BYOK-eligible only
curl "https://api.ph0ny.com/v1/voices?lang=en&style=conversational&byok=true"

# Voices clonable on Cartesia + ElevenLabs both
curl "https://api.ph0ny.com/v1/voices?cloneable=cartesia,elevenlabs"

Each voice carries a stable voiceId, the provider it came from, language tags, supported emotions, and a 5-second sample URL.

Multilingual

Most ElevenLabs and Fish Audio voices speak 32+ languages from a single voice ID. Pass the target language and ph0ny re-routes to the right model variant:

ts
await ph0ny.tts.synthesize({
  text: '¡Hola! Bienvenido a tu reserva.',
  voiceId: 'rachel',
  provider: 'elevenlabs',
  language: 'es-ES',
})

For dubbing existing audio while preserving the original speaker, use the Video API dub endpoint — it lipsync-aligns to the new audio in one shot.

Where to next

Built by ph0ny.