Video API

Five operations cover the entire video surface. Every endpoint is async (returns a VideoJob), every endpoint accepts a provider field to pin an engine, and every endpoint emits a webhook on completion when webhook_url is set.

text

POST /v1/video/generate     text → video
POST /v1/video/avatar       image + audio (or text + voice) → talking-head video
POST /v1/video/lipsync      video + audio → lipsync'd video
POST /v1/video/dub          video → dubbed in another language, lipsync preserved
POST /v1/video/analyze      video → embeddings, transcription, semantic Q&A
GET  /v1/video/:jobId       poll job status

See Providers → Video for the full engine matrix.

Generate

Text-to-video. Render up to 30 seconds of generated video from a prompt.

POST /v1/video/generate

Request

json

{
  "prompt": "A barista pulling a perfect espresso shot, warm morning light, 35mm film",
  "negative_prompt": "blurry, distorted faces",
  "duration_seconds": 6,
  "width": 1280,
  "height": 720,
  "fps": 24,
  "provider": "runway",
  "webhook_url": "https://your-app.com/hooks/video"
}

Field	Type	Default	Description
`prompt`	string	required	1–2,000 chars.
`negative_prompt`	string	—	What to avoid.
`duration_seconds`	number	`5`	1–30.
`width`	int	`768`	256–1920.
`height`	int	`512`	256–1080.
`fps`	int	`24`	8–60.
`provider`	enum	auto	`sora`, `runway`, `ltx-video`, …
`webhook_url`	url	—	Notified on completion.

Response `202`

json

{
  "id": "vj_01HM…",
  "type": "generate",
  "status": "pending",
  "provider": "runway",
  "created_at": "2026-05-05T18:00:00Z",
  "owner_id": "user_…"
}

Poll GET /v1/video/:jobId or wait on the webhook.

Avatar

Talking-head video from a single still image plus audio. Pre-render the audio yourself, or chain through TTS in one call.

POST /v1/video/avatar

Request — pre-rendered audio

json

{
  "image_url": "https://uploads.example.com/headshot.jpg",
  "audio_url": "https://uploads.example.com/script.mp3",
  "provider": "heygen"
}

Request — TTS chaining

json

{
  "image_url": "https://uploads.example.com/headshot.jpg",
  "text": "Welcome to our launch. We're shipping in three weeks.",
  "voice_id": "rachel",
  "tts_provider": "elevenlabs",
  "provider": "hallo3"
}

Field	Type	Notes
`image_url`	url	Required. JPG/PNG. Face must be visible.
`audio_url`	url	Provide this or `text` + `voice_id`.
`text`	string	1–10,000 chars. Synthesizes audio first.
`voice_id`	string	Library or cloned voice. See /voices.
`tts_provider`	string	TTS engine. Default `default`.
`target_language`	string	If set, translates `text` before synthesis.
`provider`	enum	Avatar engine: `hallo3`, `heygen`, `liveportrait`, `echomimicv2`, `v-express`, `skyreels-a1`.

Self-hosted engines (hallo3, liveportrait, echomimicv2, v-express, skyreels-a1) run on ph0ny GPUs and are billed per second of generated video. heygen requires BYOK.

Lipsync

Re-align lip motion in an existing video to a new audio track. Useful for dubbing, ADR, fixing flubbed takes.

POST /v1/video/lipsync

json

{
  "video_url": "https://uploads.example.com/clip.mp4",
  "audio_url": "https://uploads.example.com/replacement.mp3",
  "provider": "sync-labs"
}

Provider	Notes
`sync-labs`	Best commercial fidelity. BYOK.
`heygen-lipsync`	If you already use HeyGen.
`latentsync`	Self-hosted, open-source.
`video-retalking`	Self-hosted, robust on noisy footage.
`musetalk`	Realtime; lower-latency at lower fidelity.

Dub

End-to-end localization: translate audio, synthesize in the target language, lipsync the original video to the new audio. One call.

POST /v1/video/dub

json

{
  "video_url": "https://uploads.example.com/launch.mp4",
  "target_language": "es-ES",
  "voice_id": "rachel",
  "tts_provider": "elevenlabs",
  "lipsync_provider": "sync-labs"
}

ph0ny will:

Transcribe the source audio (Whisper).
Translate to target_language preserving timing.
Synthesize the new audio using voice_id on tts_provider.
Re-lipsync the video using lipsync_provider.

If voice_id is omitted we use a multilingual voice on the same provider. If source_language is omitted we auto-detect.

Analyze

Multimodal video understanding — semantic search, transcription, scene detection, Q&A.

POST /v1/video/analyze

json

{
  "video_url": "https://uploads.example.com/meeting.mp4",
  "provider": "twelve-labs",
  "tasks": ["transcribe", "scenes", "embed"]
}

Use cases:

Search across hours of footage — embed once, query in natural language. Pairs with Collections.
Auto-chapter long videos — scenes returns timestamps + descriptions.
Q&A over recorded calls — pass the analyze output as agent context.

Provider	Strength
`twelve-labs`	Best commercial — Marengo + Pegasus. BYOK.
`internvideo2.5`	Self-hosted; long-form.
`qwen2.5-vl` / `qwen3-vl`	Self-hosted; charts and screen capture.
`videochat-flash`	Realtime conversational video Q&A.

Job lifecycle

Every endpoint returns:

json

{
  "id": "vj_…",
  "type": "generate" | "avatar" | "lipsync" | "dub" | "analyze",
  "status": "pending" | "processing" | "completed" | "failed",
  "provider": "runway",
  "progress": 0.42,
  "result": {
    "video_url": "https://r2.ph0ny.com/jobs/vj_…/output.mp4",
    "duration_seconds": 6.0,
    "width": 1280,
    "height": 720,
    "provider": "runway"
  },
  "error": { "code": "provider_timeout", "message": "…" },
  "created_at": "2026-05-05T18:00:00Z",
  "completed_at": "2026-05-05T18:00:42Z",
  "owner_id": "user_…"
}

Poll: GET /v1/video/:jobId (1–5s interval recommended). Webhook: signed with HMAC-SHA256 over your webhook_url secret. See Sessions → Webhooks for verifier code.

SDK

import { ph0ny } from '@ph0ny/sdk'

const job = await ph0ny.video.avatar({
  image_url: 'https://uploads.example.com/headshot.jpg',
  text: 'Welcome to ph0ny.',
  voice_id: 'rachel',
  provider: 'hallo3',
})

const result = await ph0ny.video.waitForJob(job.id)
console.log(result.video_url)

Video API ​

Generate ​

Request ​

Response 202 ​

Avatar ​

Request — pre-rendered audio ​

Request — TTS chaining ​

Lipsync ​

Dub ​

Analyze ​

Job lifecycle ​

SDK ​

Video API

Generate

Request

Response `202`

Avatar

Request — pre-rendered audio

Request — TTS chaining

Lipsync

Dub

Analyze

Job lifecycle

SDK