$5 free credits when you sign up

Simple, Transparent Pricing

Q: Can I really start for free, without a credit card?

Yes. Every new account receives $5 in free credits the moment you sign up — no credit card, no upfront commitment. That balance is enough to test most models extensively, generate hundreds of images, transcribe several hours of video, or prototype a complete AI pipeline. Your account starts on the Basic tier with conservative rate limits designed for testing; making any payment upgrades you to Premium with no waiting period.

Q: How do payments, top-ups, and B2B invoicing work?

deAPI uses Stripe for secure card payments and supported local methods. You can either make one-off top-ups (with preset amounts of $10, $25, or $50) or enable automatic top-ups, which recharge your balance whenever it drops below $2 — perfect for production workloads that can't afford to fail mid-job. For B2B customers needing custom invoices, larger commitments, or tailored billing terms, our team arranges individual agreements; just reach out to support.

Q: What happens when I outgrow the standard limits?

The Basic tier offers conservative rate limits (typically 1–10 RPM depending on the endpoint) ideal for testing. The moment you make any payment via Stripe, your account upgrades to Premium with 300 RPM across all endpoints and unlimited daily requests — instantly, with no application process. For high-volume production needs beyond Premium, dedicated capacity, or enterprise terms, our team can set up bespoke arrangements with volume discounts.

Q: How is text-to-speech billed on deAPI?

TTS is billed per character of input text, with rates that vary by model. Kokoro starts at $0.77 per 1M characters, Chatterbox at $7.71 per 1M characters, and Qwen3 TTS at $12.86 per 1M characters for premium voice features. There's no per-request minimum, and the per-character rate stays consistent across all three TTS modes within a single model — custom_voice (preset speakers), voice_clone (clone from reference audio), and voice_design (create voice from a text description).

Q: Which TTS model fits which use case?

Kokoro is the most cost-efficient option and well-suited to high-volume audiobook and e-learning content. Chatterbox offers expanded language support with built-in AI watermarking, fitting compliance-sensitive use cases. Qwen3 TTS CustomVoice delivers premium preset voices and is suitable for real-time voice agents. Qwen3 TTS VoiceClone reproduces a target voice from a short reference sample — great for branded narration. Qwen3 TTS VoiceDesign generates a unique voice from a text description alone, no audio reference needed.

Q: How are reference audio samples for voice cloning used?

For voice_clone mode, you upload a short reference audio file — typically 3–10 seconds, max 15 MB, in MP3, WAV, FLAC, OGG, or M4A format — along with your text. The model captures vocal characteristics like tone, accent, and cadence and synthesizes new speech in that voice. Optionally, providing a ref_text transcript of the reference audio improves cloning accuracy. You must own the rights to the reference audio or have explicit consent from the speaker — this is a legal requirement for ethical voice synthesis.

Pay only for what you use. No subscriptions, no hidden fees.

Get $5 credits Docs

Loading pricing data…

Price Calculator

Text-to-Speech Chatterbox

Estimated cost

$7.71 per 1M characters

TTS pricing scales linearly with character count.

Try in Playground

Use case

Characters

Price

Short paragraph

e.g., Product description, social media post

1,000

$0.00771

Blog article

e.g., 5-minute podcast episode

10,000

$0.07714

Short book chapter

e.g., 15-20 minute audiobook chapter

100,000

$0.77143

Full audiobook

e.g., 3-4 hour audiobook

1,000,000

$7.71429

Large project

e.g., Full educational course narration

5,000,000

$38.57143

Free tier available
No credit card required

See Chatterbox in action

Real samples, API docs & free $5 credits to start

Explore

How it works

Three Steps to Your First API Call

Sign Up & Get $5 Free

Create your account in 30 seconds. No credit card required. We'll add $5 in free credits to your balance.

Pick a Model & Call the API

Choose from available open-source models. One unified endpoint, same auth, same format. Test in Playground or hit the REST API.

Pay Only for What You Use

No monthly minimums, no tiers, no lock-in. Charge per request at the rates above. Top up anytime.

Frequently Asked Questions

Everything you need to know about deAPI pricing

Every request is billed dynamically, with the metric chosen to match each task: resolution × steps for images, characters for speech, tokens for embeddings, duration with optional resolution for video, hours for transcription, output characters for OCR, and per-image rates for background removal and upscaling. There are no subscriptions, no monthly minimums, and no hidden fees — you fund a prepaid balance and each successful inference deducts its exact cost. Before any job, you can call the matching /price endpoint to preview the precise cost for the model and parameters you plan to use.

Yes. Every new account receives $5 in free credits the moment you sign up — no credit card, no upfront commitment. That balance is enough to test most models extensively, generate hundreds of images, transcribe several hours of video, or prototype a complete AI pipeline. Your account starts on the Basic tier with conservative rate limits designed for testing; making any payment upgrades you to Premium with no waiting period.

Inference is routed through a globally distributed GPU network rather than concentrated in a few hyperscale data centers, which removes most of the infrastructure markup baked into traditional cloud pricing. We also serve highly optimized open-source models — many of them quantized (INT8, FP8, NF4) and distilled — so each request uses fewer GPU seconds without sacrificing output quality. The combined effect can deliver up to 20× lower inference cost for comparable workloads.

deAPI uses Stripe for secure card payments and supported local methods. You can either make one-off top-ups (with preset amounts of $10, $25, or $50) or enable automatic top-ups, which recharge your balance whenever it drops below $2 — perfect for production workloads that can't afford to fail mid-job. For B2B customers needing custom invoices, larger commitments, or tailored billing terms, our team arranges individual agreements; just reach out to support.

The Basic tier offers conservative rate limits (typically 1–10 RPM depending on the endpoint) ideal for testing. The moment you make any payment via Stripe, your account upgrades to Premium with 300 RPM across all endpoints and unlimited daily requests — instantly, with no application process. For high-volume production needs beyond Premium, dedicated capacity, or enterprise terms, our team can set up bespoke arrangements with volume discounts.

TTS is billed per character of input text, with rates that vary by model. Kokoro starts at $0.77 per 1M characters, Chatterbox at $7.71 per 1M characters, and Qwen3 TTS at $12.86 per 1M characters for premium voice features. There's no per-request minimum, and the per-character rate stays consistent across all three TTS modes within a single model — custom_voice (preset speakers), voice_clone (clone from reference audio), and voice_design (create voice from a text description).

Yes — and it's one of the most underused optimizations in voice production. Generating audio at 2× playback speed costs 0.5× the standard rate, while slow speed (0.5× playback) costs 2× more. Most teams generate drafts at 2× to validate prompts and review timing, then re-render only approved scripts at standard speed — meaningfully cutting the cost of an iterative production cycle.

Kokoro is the most cost-efficient option and well-suited to high-volume audiobook and e-learning content. Chatterbox offers expanded language support with built-in AI watermarking, fitting compliance-sensitive use cases. Qwen3 TTS CustomVoice delivers premium preset voices and is suitable for real-time voice agents. Qwen3 TTS VoiceClone reproduces a target voice from a short reference sample — great for branded narration. Qwen3 TTS VoiceDesign generates a unique voice from a text description alone, no audio reference needed.

For voice_clone mode, you upload a short reference audio file — typically 3–10 seconds, max 15 MB, in MP3, WAV, FLAC, OGG, or M4A format — along with your text. The model captures vocal characteristics like tone, accent, and cadence and synthesizes new speech in that voice. Optionally, providing a ref_text transcript of the reference audio improves cloning accuracy. You must own the rights to the reference audio or have explicit consent from the speaker — this is a legal requirement for ethical voice synthesis.

Yes. The models are released under commercial-friendly open-source licenses, so generated audio can be used in marketing, products, podcasts, and client work. You can choose your output format (FLAC, MP3, WAV) and sample rate when calling the endpoint. Chatterbox also includes built-in AI watermarking, which helps with downstream traceability and compliance in regulated markets.

Simple, Transparent Pricing

See Chatterbox in action

Three Steps to Your First API Call

Sign Up & Get $5 Free

Pick a Model & Call the API

Pay Only for What You Use

Frequently Asked Questions

How does deAPI's pay-as-you-go pricing actually work?

Can I really start for free, without a credit card?

Why is deAPI typically more affordable than running models on traditional cloud GPUs?

How do payments, top-ups, and B2B invoicing work?

What happens when I outgrow the standard limits?

How is text-to-speech billed on deAPI?

Can I really cut my costs in half with playback speed?

Which TTS model fits which use case?

How are reference audio samples for voice cloning used?

Are voice cloning and synthetic voices safe for commercial use?