$5 free credits when you sign up Claim now
Wan 2.2 Animate now available Test it!
Video Upscaling models now available Test it!
Z-Anime image model Test it!

AI Video Avatar $5 Free Credits

Build talking AI avatars by chaining text-to-image (FLUX-2 Klein), text-to-speech (Kokoro, Chatterbox), and audio-to-video (LTX-2.3) into one pipeline. Full avatar from ~$0.04, powered by deAPI's decentralized GPUs at low cost.

Why deAPI for AI video avatars?

deAPI's avatar pipeline chains three open-source models — FLUX-2 Klein, Kokoro / Chatterbox TTS, and LTX-2.3 audio-to-video — behind one unified API. With decentralized GPU infrastructure, deAPI delivers full talking-head avatars from ~$0.04 per generation, up to 20× lower than HeyGen / Synthesia-class SaaS. Whether you're building marketing automation, e-learning platforms, customer support flows, or faceless content channels, deAPI makes it simple to ship avatar video at scale. Check the full list.

  • 3-Step Pipeline

    Chain in one workflow

    Chain text-to-image + text-to-speech + audio-to-video through three API calls. A complete talking avatar — portrait, voice, and synced animation — generated from a single text description.

  • LTX-2.3 Animation

    State-of-the-art video model

    State-of-the-art image-to-video model by Lightricks. Natural head movements, blinking, and facial expressions from a single portrait — driven by the generated speech in audio-to-video mode.

  • Low Cost

    ~$0.04 per avatar

    Full avatar from ~$0.04 per generation. Decentralized GPUs make talking-head video affordable at any scale — over 120 avatars on the $5 starter credit.

  • Open-Source Models

    No vendor lock-in

    No vendor lock-in. FLUX, LTX-2.3, Kokoro, Chatterbox — swap models anytime as better ones emerge. One key, one billing account, every modality of the pipeline.

Three Steps to a Talking Avatar

Chain three API calls to ship a full avatar — portrait, voice, animation

What it does

Create a photorealistic or stylized portrait from a text description. Define gender, age, ethnicity, clothing, background — everything through a prompt. FLUX-2 Klein delivers high-quality faces in seconds.

API workflow

Single POST to /txt2img with your prompt. Receive a download URL with the generated portrait. Use prompt enhancement for optimized results automatically.

Available Models

  • FLUX-2 Klein Text → Image

    Fast, high-quality photorealistic portraits

    from $0.00141/img

  • Z-Image Text → Image

    Alternative model for stylized portraits

    from $0.00248/img

  • Prompt Enhancement AI Boost

    Optimize prompts for better face generation

What it does

Generate natural-sounding speech from any text. Choose from multiple voices or clone a custom voice with Chatterbox. The generated audio file will be used in the next step to drive the avatar's animation.

API workflow

POST to /txt2audio with text content and voice parameters. Receive an audio file URL. This audio will feed directly into LTX-2.3's audio-to-video endpoint.

Available Models

  • Kokoro TTS Text → Speech

    Fast, natural English voice generation

  • Chatterbox Voice Clone

    Clone any voice from a short audio sample

  • Qwen TTS Multilingual

    Multilingual speech for global content

What it does

Combine the portrait and the generated audio in one step. LTX-2.3's audio-to-video mode takes an image and an audio file, then produces a video with lip-synced animation, natural head movements, and facial expressions.

API workflow

POST to /aud2video with the portrait URL, the generated audio URL, and a motion prompt. Receive a complete talking avatar video — audio and animation combined.

Available Models

  • LTX-2.3 Audio → Video

    Lip-synced animation driven by speech

    from $0.0396/video

  • LTX-2.3 Image → Video

    Generic image-to-video for background motion

  • Prompt Enhancement AI Boost

    Optimize motion prompts for smoother results

Marketing & Sales

Generate personalized video messages at scale. Create product demos, explainer videos, and social media content with AI presenters — without hiring actors or booking studios.

E-Learning & Training

Build course videos with AI instructors. Translate training materials into any language with localized avatars. Update content instantly without re-recording.

Industries

  • SaaS & Product B2B

    Product walkthroughs and onboarding videos at scale

  • Customer Support Automation

    Multilingual FAQ avatars and support flows

  • Media & Content Creator

    Faceless YouTube channels, news avatars, faceless creators

See AI Video Avatar in Action

Watch how deAPI chains text-to-image, text-to-speech, and audio-to-video into a single talking-avatar pipeline. From API call to lip-synced video in under a minute.

  • Full pipeline from ~$0.04 per avatar
  • Three API calls, fully automatable
  • Webhook delivery — no polling needed
  • Free tier available
  • No credit card required

Create your first AI avatar
in under a minute

Chain three API calls and ship a talking-head video — no actors, no studio

Frequently Asked Questions

Everything you need to know

The pipeline uses three models: FLUX-2 Klein for portrait generation (text-to-image), Kokoro or Chatterbox for voice synthesis (text-to-speech), and LTX-2.3 for audio-driven animation (audio-to-video). You can also use Z-Image for image generation or Qwen TTS for multilingual voice.
The full pipeline costs approximately $0.04 per avatar: ~$0.0014 for image generation (FLUX-2 Klein), TTS voice generation, and ~$0.0396 for audio-to-video animation (LTX-2.3). With $5 free credits you can generate around 120 avatars to get started.
Yes. Skip Step 1 and pass any portrait image URL directly to the image-to-video endpoint (LTX-2.3). This works with photos, AI-generated images, or illustrations.
LTX-2.3's audio-to-video mode handles this automatically. You pass the portrait image and the generated audio file to the /aud2video endpoint — the model produces a video with lip-synced animation driven by the speech. No manual merging or FFmpeg needed.
Yes. Chatterbox supports voice cloning from a short audio sample. Upload a reference clip, and the model generates speech matching that voice — perfect for maintaining a consistent brand voice across all avatar content.
Unlike SaaS platforms like HeyGen or Synthesia that require monthly subscriptions, deAPI is pay-per-use with no subscription. You get full API access to open-source models, complete customization of the pipeline, and pay only for what you use — starting from ~$0.04 per avatar.