Hiring a professional voiceover artist for a single 10-minute YouTube video runs anywhere from $100 to $500. Need that same video dubbed into five languages for a global audience? Multiply that cost by five—and add weeks of coordination with voice talent across time zones. For indie creators, educators, and small development teams, this bottleneck turns multilingual content into a luxury rather than a strategy.
ElevenLabs has become the go-to platform for solving exactly this problem. In this tutorial I will walk you through the entire platform—from creating your account and choosing the right plan, to cloning your voice, dubbing a video into 29+ languages, and making your first API call. Every step uses real pricing, real settings, and real workflows current as of April 2026.
If you want a broader comparison of ElevenLabs against competitors before diving in, read our companion piece: ElevenLabs 2026: Best AI Voice Generator? (I Tested It).
Step 1: Create Your ElevenLabs Account
Getting started takes less than two minutes.
- Go to ElevenLabs and click Sign Up.
- Register with your email address or sign in with Google. The Google option skips email verification entirely, so it is faster.
- Once logged in, you land on the ElevenLabs dashboard. Click Speech Synthesis in the left-hand menu to access the main workspace.
That is it—you are inside the platform with a free plan already active.
Step 2: Choose the Right Plan
ElevenLabs uses a credit-based system. Every character of text you generate consumes credits, and the exact rate depends on the model you select. Here is the current plan lineup as of early 2026:
| Plan | Monthly Price | Credits / Month | Key Unlocks |
|---|---|---|---|
| Free | $0 | 10,000 | ~10 min TTS, no commercial license |
| Starter | $5 | 30,000 | Commercial license, instant voice cloning, Studio & Dubbing API |
| Creator | $22 | 100,000 | Professional voice cloning, 192 kbps audio |
| Pro | $99 | 500,000 | Expanded API capabilities, higher concurrency |
| Scale | $330 | 2,000,000 | 3 workspace seats, team collaboration |
The free plan provides 10,000 credits per month—roughly 10 minutes of speech using the Multilingual v2 model, or about 20 minutes with the lighter Flash model. It is enough to experiment, but it carries no commercial license. Starter at $5/month is the minimum tier for monetized content. Creator at $22/month is the sweet spot for most serious creators because it unlocks Professional Voice Cloning and higher-quality 192 kbps audio output. Annual billing saves approximately 17% across all paid plans.
Pro tip: The Flash and Turbo models cost roughly 0.5 credits per character, effectively doubling your output for the same credit allocation. Use Flash for drafts and internal reviews, then switch to Multilingual v2 or Eleven v3 for final renders.
Step 3: Voice Cloning Walkthrough
Voice cloning is the feature that sets ElevenLabs apart. The platform offers two distinct methods, and understanding the difference before you upgrade will save you from picking the wrong tier.
Instant Voice Cloning (IVC)
Instant Voice Cloning creates a voice replica from a short audio sample in seconds. It does not train a custom AI model—instead, it uses ElevenLabs’ existing training data to approximate your voice. For standard accents, this works remarkably well.
Requirements:
- Audio length: 1–2 minutes of clean audio. Avoid recording more than 3 minutes—longer samples yield little improvement and can sometimes be detrimental to the clone.
- Plan: Available from the Starter plan ($5/month) and above.
- Format: MP3 at 128 kbps or above works fine; higher bitrates do not significantly improve clone quality.
How to create your instant clone:
- In the ElevenLabs dashboard, select Voices in the left sidebar, then click Add a new voice.
- From the modal, select Instant Voice Clone.
- Follow the on-screen instructions to upload or record your audio.
- Under the Voices section, select the Personal tab, then click on your voice clone to begin using it.
Recording tips that actually matter:
- No background noise. Record in the quietest room you have. AC units, fans, and keyboard sounds all degrade clone quality.
- No reverb. Avoid tiled bathrooms or large empty rooms.
- Consistent performance. Keep your tone either animated throughout or subdued throughout—do not mix and match, or the AI will become unstable.
- Microphone distance. Stay roughly 20 cm (7–8 inches) from the microphone with a pop filter between you and the mic.
Audio quality matters far more than audio length. A clean 90-second recording will produce a better clone than a noisy five-minute clip every time.
Professional Voice Cloning (PVC)
Professional Voice Cloning trains a dedicated model on your voice, producing results that are virtually indistinguishable from the original speaker.
Requirements:
- Audio length: 30 minutes minimum; 2–3 hours recommended for the most accurate clone.
- Training time: Approximately 3–4 weeks as of early 2026.
- Plan: Requires the Creator plan ($22/month) or above.
- Content: Only your own voice may be cloned. ElevenLabs enforces voice verification.
How to create your professional clone:
- In the dashboard, go to Voices → Add a new voice → Professional Voice Clone.
- Upload your audio samples by clicking Upload samples, or record directly into the interface by selecting Record yourself.
- Choose the number of speakers (one), and submit.
- Wait for training to complete (you will receive an email notification).
- Once ready, find your clone under the Personal tab and click Use.
Critical detail: The AI will clone everything it hears—including artifacts, background noise, and room reverb. If you upload low-quality samples, those flaws will appear in every generation. Clean audio is non-negotiable.
If you are new to ElevenLabs, start with Instant Cloning to test the workflow. Once you are producing content regularly and need a voice that genuinely represents you at broadcast quality, invest in Professional Cloning. It becomes a reusable asset for your business.
Step 4: Multilingual Dubbing Tutorial
ElevenLabs’ Dubbing Studio translates audio and video across 29 languages while preserving the emotion, timing, and tone of each speaker. It is one of the fastest ways to expand your audience without re-recording anything.
Supported Languages
The dubbing tool covers English, Spanish, French, German, Japanese, Chinese, Korean, Dutch, Turkish, Swedish, Indonesian, Filipino, Ukrainian, Greek, Czech, Finnish, Romanian, Russian, Danish, Bulgarian, Malay, Slovak, Croatian, Classic Arabic, Tamil, Polish, Portuguese, Italian, and Hindi—any pair of these languages.
How to Dub a Video Step-by-Step
- Open Dubbing Studio from the left navigation menu in your ElevenLabs dashboard.
- Enter a project name and select the source language and target language.
- Upload your file (MP3, MP4, WAV, or MOV—up to 500 MB and 45 minutes via the UI), or paste a URL from YouTube, TikTok, Vimeo, or X.
- Choose the number of speakers in your video.
- Select video resolution and decide whether to add a watermark. Watermarked video reduces credit usage.
- Click Create and wait a few minutes for processing.
Using Dubbing Studio for Fine Control
If you need to edit transcripts or fine-tune translations:
- Check the Create a Dubbing Studio project box before clicking Create.
- Once generated, open the Studio project.
- Speaker cards show the original transcription and translation. Click inside any card to edit text freely.
- Regenerate individual speech segments until the output sounds right.
- Choose between Fixed Generations (keeps clip duration constant) and Dynamic Generations (adjusts clip length to match text naturally).
You can also swap voices at the clip or track level. The Studio offers clip clones (voice matched per segment), track clones (consistent voice across all clips for a speaker), and access to the full ElevenLabs Voice Library with thousands of pre-built options.
Credit cost: Dubbing consumes approximately 2,000 characters per minute for watermarked output and 3,000 characters per minute without watermark. A Creator plan or higher is required to dub audio files.
Step 5: API Usage Basics
The ElevenLabs API opens the door to automation—batch voiceover generation, app integrations, real-time conversational agents, and programmatic dubbing at scale. It is accessible through HTTP requests from any language, with official Python and TypeScript/Node.js SDKs available.
Getting Your API Key
- Log in to ElevenLabs.
- Click Developers in the left sidebar, then select the API Keys tab.
- Click Create API Key. Store it securely—treat it like a password.
The API is included in all plans, even the free plan, with no extra cost beyond your normal credit consumption. The website and the API draw from the same monthly quota.
Your First Text-to-Speech Call
Here is a minimal curl example to generate speech and save it as an MP3:
curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/21m00Tcm4TlvDq8ikWAM" \
-H "xi-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Welcome to ElevenLabs. This is your first generated voice.",
"model_id": "eleven_flash_v2_5",
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.75
}
}' \
--output speech.mp3
Key parameters:
- Voice ID:
21m00Tcm4TlvDq8ikWAMis “Rachel,” the default female voice. Find other voice IDs in the Voice Lab or via the/v1/voicesendpoint. - Model ID: Use
eleven_flash_v2_5for low-latency generation (~75 ms),eleven_multilingual_v2for polished multilingual narration, oreleven_v3for the most expressive output across 70+ languages. - Stability: Lower values (0.3–0.5) produce more expressive, varied speech. Higher values (0.7–1.0) keep output more consistent.
- Similarity Boost: 0.75–0.85 is the recommended range. Pushing to 1.0 can introduce background artifacts from the original training audio.
Using the Python SDK
from elevenlabs import ElevenLabs
client = ElevenLabs(api_key="YOUR_API_KEY")
audio = client.text_to_speech.convert(
text="Hello from the ElevenLabs Python SDK.",
voice_id="21m00Tcm4TlvDq8ikWAM",
model_id="eleven_flash_v2_5"
)
Install the SDK with pip install elevenlabs and store your key in an .env file or as an environment variable.
API Use Cases Worth Exploring
- Batch voiceover generation: Script a pipeline that reads blog posts from a CMS and generates audio versions automatically.
- Real-time conversational agents: Use
eleven_flash_v2_5with WebSocket streaming for chatbots and voice assistants at ~75 ms latency. - Programmatic dubbing: The
/v1/dubbingendpoint accepts files up to 1 GB and 2.5 hours—far beyond the UI limits. - Dynamic product demos: Generate personalized audio greetings or onboarding narrations using customer names and details.
Choosing the Right Model
| Model | Best For | Latency | Languages |
|---|---|---|---|
eleven_flash_v2_5 |
Real-time agents, streaming | ~75 ms | 32 |
eleven_multilingual_v2 |
Polished narration | ~250–300 ms | 29+ |
eleven_v3 |
Expressive dialogue, character work | Moderate | 70+ |
The Eleven v3 model, updated in February 2026, is the current best choice for expressive, natural-sounding output. It handles pauses, breathing, and emotional intonation significantly better than older models. It also supports the new bracketed “stage directions” syntax—write [whispering] or [excited] directly in your text to control emotional delivery without any coding markup.
Tips for Getting the Best Results
- Start with the Voice Library for rapid testing before investing time in a custom clone.
- Use the v3 audio tags for emotion control: type directions like
[softly],[laughing], or[with urgency]directly in your script. - Monitor your credit usage weekly. If your overages regularly hit 30–50% of the next plan’s price, upgrading is almost always cheaper than staying on a lower tier.
- Roll over unused credits. Paid plans allow rollover for up to two months—unused credits do not accumulate indefinitely.
See also: ElevenLabs vs Murf AI vs Play.ht: Best AI Voice Generator in 2026
Frequently Asked Questions
Is ElevenLabs free to use?
Yes. The free plan provides 10,000 credits per month, which translates to roughly 10 minutes of text-to-speech using the Multilingual v2 model. You get access to text-to-speech, speech-to-text, sound effects, voice design, and 3 Studio projects. However, the free plan does not include a commercial license—you must attribute ElevenLabs in any public content and cannot monetize output. For commercial use, you need at minimum the Starter plan at $5/month.
How much audio do I need for voice cloning?
For Instant Voice Cloning, 1–2 minutes of clean audio is recommended. Quality matters more than length—a focused 90-second recording in a quiet room outperforms a noisy five-minute clip. For Professional Voice Cloning, the bare minimum is 30 minutes of audio, but 2–3 hours produces the most accurate results. PVC training takes approximately 3–4 weeks as of early 2026.
Can my cloned voice speak other languages?
Yes. Generated voice clones can automatically speak 32+ languages, even if your original recording was in just one language. The multilingual models detect the language in your text and switch seamlessly while maintaining the acoustic properties of the chosen voice. Be aware that a voice cloned from English samples may carry a slight English accent when speaking other languages.
Is it legal to clone someone’s voice?
You can freely clone your own voice for any purpose. Cloning someone else’s voice requires their explicit consent. Using cloned voices for fraud, impersonation, or misleading content is illegal in most jurisdictions. ElevenLabs enforces voice verification for Professional Voice Cloning and currently only allows you to clone your own voice through PVC.
Does the API cost extra beyond my subscription?
No. The API is included in all plans, including the free plan. There is no extra charge—API usage draws from the same monthly credit pool as the web interface. However, the API supports much larger inputs (up to 40,000 characters per request for newer models) and has different concurrency limits depending on your tier.
Start Building With Your Voice
ElevenLabs has evolved from a text-to-speech tool into a complete audio infrastructure platform covering TTS, speech-to-text, voice cloning, dubbing, sound effects, music generation, and conversational AI agents. Whether you are a solo creator who needs a digital twin to narrate videos while you sleep, or a developer building a real-time voice assistant, the platform covers the full stack.
The fastest path forward: sign up for free, clone your voice in under five minutes with Instant Voice Cloning, and generate your first audio. You will hear the difference immediately.