Pronunciation API

The Dictionary Pronunciation API provides developers with the ability to generate realistic speech audio from text. Whether you need single word pronunciations, full sentence readings, or custom voice configurations, our API makes it simple and fast.

💡
Tip: New here? Start with the Quick Start guide to make your first API call in under 2 minutes.

Key Features

  • 100+ Languages — From English and Spanish to Japanese and Arabic
  • 500+ Natural Voices — Male, female, and neutral voice options
  • Multiple Formats — MP3, WAV, OGG, and FLAC output
  • SSML Support — Fine-grained control with Speech Synthesis Markup Language
  • Sub-200ms Latency — Optimized for real-time applications
  • Phonetic Precision — IPA transcription included in responses

Quick Start

Follow these steps to make your first Pronunciation API call.

Step 1: Get Your API Key

Sign up at dashboard.dictionary.com and generate an API key from your dashboard under Settings → API Keys.

Step 2: Make Your First Request

Here's a simple curl example to pronounce the word "serendipity":

Bash / curl
curl https://api.dictionary.com/v2/pronunciation \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "serendipity",
    "voice": "en-US-AriaNeural",
    "format": "mp3"
  }'

Step 3: Handle the Response

The API returns an audio URL along with phonetic transcription:

200 OK
JSON Response
{
  "word": "serendipity",
  "phonetic": "ˌser.ənˈdɪp.ə.t̬i",
  "audio_url": "https://cdn.dictionary.com/audio/pron_8f3a2b1c.mp3",
  "format": "mp3",
  "voice": "en-US-AriaNeural",
  "duration_ms": 1840,
  "cache_key": "pron_8f3a2b1c"
}
That's it! You can now stream or download the audio from the audio_url. Cached audio URLs remain valid for 24 hours.

Authentication

All API requests require authentication using a Bearer token. Include your API key in the Authorization header.

HTTP Header
Authorization: Bearer sk_live_your_api_key_here

Key Types

Prefix Type Usage
sk_live_ Production Use in your deployed applications
sk_test_ Sandbox Use during development and testing
sk_pub_ Public Client-side only (rate limited to 10 req/min)
⚠️
Security: Never expose sk_live_ or sk_test_ keys in client-side code. Use sk_pub_ keys or a backend proxy for browser-based applications.

Base URL

All API requests are made to the following base URL:

Base URL https://api.dictionary.com/v2
ℹ️
All endpoints are relative to the base URL. We use HTTPS exclusively — HTTP requests will be rejected.

Rate Limits

Rate limits depend on your subscription tier. All limits are calculated per API key.

Plan Requests / Minute Requests / Day Max Audio Length
Free 30 1,000 10 seconds
Pro 300 50,000 60 seconds
Business 2,000 500,000 300 seconds
Enterprise Custom Unlimited Custom

Rate Limit Headers

Every response includes rate limit information in headers:

Header Description
X-RateLimit-Limit Maximum requests per window
X-RateLimit-Remaining Remaining requests in current window
X-RateLimit-Reset Unix timestamp when the window resets
Retry-After Seconds to wait when rate limited (429)

Error Handling

The API uses standard HTTP status codes and returns structured error responses.

Code Status Description
400 Bad Request Invalid parameters or malformed request body
401 Unauthorized Missing or invalid API key
403 Forbidden API key revoked or insufficient permissions
404 Not Found Requested voice or language not available
429 Too Many Requests Rate limit exceeded
500 Internal Error Server-side error — retry with exponential backoff
503 Service Unavailable Temporary maintenance or overload

Error Response Format

JSON Error Response
{
  "error": {
    "code": "invalid_voice",
    "message": "Voice 'de-DE-MaxxNeural' is not available for the requested language",
    "status": 400,
    "details": [
      {
        "field": "voice",
        "value": "de-DE-MaxxNeural",
        "reason": "Voice does not support language 'fr-FR'"
      }
    ]
  }
}

Pronounce a Word

Generate pronunciation audio for a single word with phonetic transcription.

GET /v2/pronunciation/word/{word}

Path Parameters

Parameter Type Required Description
word string Required The word to pronounce (1-50 characters)

Query Parameters

Parameter Type Required Default Description
lang string Optional auto Language code (e.g., en-US, fr-FR, ja-JP)
voice string Optional default Specific voice ID. See List Voices
format string Optional mp3 Audio format: mp3, wav, ogg, flac
speed number Optional 1.0 Speech rate from 0.5 (slow) to 2.0 (fast)
pitch number Optional 0 Pitch adjustment from -50 (low) to +50 (high)
include_phonetic boolean Optional true Include IPA phonetic transcription in response

Try It

▶ Interactive Tester GET
200 OK
Response
{
  "word": "serendipity",
  "phonetic": "ˌser.ənˈdɪp.ə.t̬i",
  "audio_url": "https://cdn.dictionary.com/audio/pron_8f3a2b1c.mp3",
  "format": "mp3",
  "voice": "en-US-AriaNeural",
  "duration_ms": 1840,
  "cache_key": "pron_8f3a2b1c",
  "language": "en-US"
}

Response Example

200 OK
JSON
{
  "word": "serendipity",
  "phonetic": "ˌser.ənˈdɪp.ə.t̬i",
  "audio_url": "https://cdn.dictionary.com/audio/pron_8f3a2b1c.mp3",
  "format": "mp3",
  "voice": "en-US-AriaNeural",
  "duration_ms": 1840,
  "cache_key": "pron_8f3a2b1c",
  "language": "en-US",
  "syllables": 5,
  "stress_pattern": "01000"
}

Pronounce Text

Generate speech audio for arbitrary text, sentences, or paragraphs.

POST /v2/pronunciation/text

Request Body

Field Type Required Description
text string Required Text to pronounce (1-5000 characters). Plain text or SSML.
language string Required BCP-47 language tag (e.g., en-US)
voice string Optional Specific voice ID
output_format string Optional mp3, wav, ogg, flac (default: mp3)
speed number Optional Speech rate (0.5–2.0, default: 1.0)
pitch number Optional Pitch adjustment (-50 to +50, default: 0)
volume number Optional Volume level (0.0–1.0, default: 1.0)

Code Examples

cURL
curl -X POST https://api.dictionary.com/v2/pronunciation/text \
  -H "Authorization: Bearer sk_live_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "The quick brown fox jumps over the lazy dog.",
    "language": "en-US",
    "voice": "en-US-GuyNeural",
    "output_format": "mp3",
    "speed": 0.9
  }'
Python
import requests

response = requests.post(
    "https://api.dictionary.com/v2/pronunciation/text",
    headers={
        "Authorization": "Bearer sk_live_your_api_key",
        "Content-Type": "application/json"
    },
    json={
        "text": "The quick brown fox jumps over the lazy dog.",
        "language": "en-US",
        "voice": "en-US-GuyNeural",
        "speed": 0.9
    }
)

print(response.json["audio_url"])
Node.js
const axios = require('axios');

const response = await axios.post(
  'https://api.dictionary.com/v2/pronunciation/text',
  {
    text: 'The quick brown fox jumps over the lazy dog.',
    language: 'en-US',
    voice: 'en-US-GuyNeural',
    speed: 0.9
  },
  {
    headers: {
      Authorization: `Bearer sk_live_your_api_key`
    }
  }
);

console.log(response.data.audio_url);
JavaScript (Fetch)
const response = await fetch(
  'https://api.dictionary.com/v2/pronunciation/text',
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer sk_live_your_api_key`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      text: 'The quick brown fox jumps over the lazy dog.',
      language: 'en-US',
      voice: 'en-US-GuyNeural',
      speed: 0.9
    })
  }
);

const data = await response.json();
console.log(data.audio_url);

Response

200 OK
JSON
{
  "text": "The quick brown fox jumps over the lazy dog.",
  "audio_url": "https://cdn.dictionary.com/audio/pron_a1b2c3d4.mp3",
  "format": "mp3",
  "voice": "en-US-GuyNeural",
  "language": "en-US",
  "duration_ms": 3200,
  "character_count": 43,
  "cache_key": "pron_a1b2c3d4"
}

Phonetic Lookup

Retrieve IPA phonetic transcription without generating audio.

GET /v2/phonetics/{word}
Request
curl "https://api.dictionary.com/v2/phonetics/serendipity?lang=en-US" \
  -H "Authorization: Bearer sk_live_your_api_key"
200 OK
Response
{
  "word": "serendipity",
  "language": "en-US",
  "ipa": "ˌser.ənˈdɪp.ə.t̬i",
  "arpabet": "S EH0 R AH0 N D IH1 P AH0 T IY0",
  "syllables": [
    { "text": "ser", "stress": 0 },
    { "text": "en", "stress": 0 },
    { "text": "dip", "stress": 1 },
    { "text": "i", "stress": 0 },
    { "text": "ty", "stress": 0 }
  ],
  "stress_pattern": "00100",
  "primary_stress": 2
}

Compare Pronunciation

Upload an audio recording and compare it against the correct pronunciation. Returns a similarity score and feedback.

POST /v2/pronunciation/compare
🎙️
Audio Requirements: Upload WAV or MP3 files. Maximum 30 seconds. Recommended sample rate: 16kHz, mono, 16-bit PCM for WAV.

Request Body (multipart/form-data)

Field Type Required Description
audio file Required User's audio recording (WAV/MP3, max 30s)
word string Required The word that was being pronounced
language string Required BCP-47 language tag
cURL
curl -X POST https://api.dictionary.com/v2/pronunciation/compare \
  -H "Authorization: Bearer sk_live_your_api_key" \
  -F "audio=@recording.wav" \
  -F "word=beautiful" \
  -F "language=en-US"
200 OK
Response
{
  "word": "beautiful",
  "overall_score": 0.87,
  "ratings": {
    "accuracy": 0.89,
    "fluency": 0.92,
    "completeness": 1.0,
    "pronunciation": 0.83
  },
  "feedback": [
    {
      "phoneme": "juː",
      "score": 0.71,
      "suggestion": "The 'yu' sound needs more lip rounding"
    }
  ],
  "reference_audio": "https://cdn.dictionary.com/audio/ref_b3e4f5a2.mp3"
}

List Available Voices

Retrieve all available voices for a specific language or across all languages.

GET /v2/voices

Query Parameters

Parameter Type Description
language string Filter by language code (e.g., en-US)
gender string Filter by gender: male, female, neutral
style string Filter by style: natural, formal, casual, newscast
limit integer Max results per page (default: 50, max: 200)
Request
curl "https://api.dictionary.com/v2/voices?language=en-US&gender=female" \
  -H "Authorization: Bearer sk_live_your_api_key"
200 OK
Response
{
  "voices": [
    {
      "voice_id": "en-US-AriaNeural",
      "name": "Aria",
      "language": "en-US",
      "gender": "female",
      "style": "natural",
      "sample_url": "https://cdn.dictionary.com/voice-samples/en-US-AriaNeural.mp3",
      "locales": ["en-US"]
    },
    {
      "voice_id": "en-US-JennyNeural",
      "name": "Jenny",
      "language": "en-US",
      "gender": "female",
      "style": "newscast",
      "sample_url": "https://cdn.dictionary.com/voice-samples/en-US-JennyNeural.mp3",
      "locales": ["en-US"]
    }
  ],
  "total": 24,
  "page": 1,
  "per_page": 50
}

Custom Pronunciation

Create and manage custom pronunciation rules for proper nouns, brand names, and domain-specific terminology.

POST /v2/custom-pronunciations
Request Body
{
  "name": "Brand Names Dictionary",
  "entries": [
    {
      "text": "Kodak",
      "phoneme": "kˈoʊ.dæɡ",
      "language": "en-US"
    },
    {
      "text": "Nissan",
      "phoneme": "ˈniː.sən",
      "language": "en-US"
    }
  ]
}

Using Custom Dictionaries

Reference your custom dictionary in pronunciation requests using the custom_dict_id parameter:

Request
curl -X POST https://api.dictionary.com/v2/pronunciation/text \
  -H "Authorization: Bearer sk_live_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Visit Kodak for your camera needs.",
    "language": "en-US",
    "custom_dict_id": "cd_abc123"
  }'

Batch Pronounce

Generate pronunciations for up to 100 words in a single request. Reduces API call overhead and cost.

POST /v2/pronunciation/batch
💡
Batch Pricing: Each word in a batch counts as 0.5 API calls, making it 50% more cost-effective than individual requests.
Request Body
{
  "words": [
    { "text": "serendipity", "language": "en-US" },
    { "text": "ephemeral", "language": "en-US" },
    { "text": "ubiquitous", "language": "en-US" }
  ],
  "format": "mp3",
  "voice": "en-US-AriaNeural"
}
200 OK
Response
{
  "results": [
    {
      "word": "serendipity",
      "audio_url": "https://cdn.dictionary.com/audio/pron_8f3a2b1c.mp3",
      "phonetic": "ˌser.ənˈdɪp.ə.t̬i",
      "status": "success"
    },
    {
      "word": "ephemeral",
      "audio_url": "https://cdn.dictionary.com/audio/pron_2d4e6f8a.mp3",
      "phonetic": "əˈfem.ər.əl",
      "status": "success"
    },
    {
      "word": "ubiquitous",
      "audio_url": "https://cdn.dictionary.com/audio/pron_9c1b3d5e.mp3",
      "phonetic": "juːˈbɪk.wə.t̬əs",
      "status": "success"
    }
  ],
  "batch_id": "batch_x7y8z9",
  "credits_used": 1.5
}

SDKs & Libraries

We provide official SDKs for popular programming languages to make integration seamless.

Language Package Install Command Repository
Python dictionary-pronunciation pip install dictionary-pronunciation GitHub →
JavaScript @dictionary/pronunciation npm install @dictionary/pronunciation GitHub →
Java com.dictionary:pronunciation-sdk Maven / Gradle GitHub →
Swift DictionaryPronunciation spm add DictionaryPronunciation GitHub →
Kotlin com.dictionary:pronunciation-kotlin Gradle GitHub →
Go github.com/dictionary/pronunciation-go go get github.com/dictionary/pronunciation-go GitHub →
Ruby dictionary-pronunciation gem install dictionary-pronunciation GitHub →
PHP dictionary/pronunciation-php composer require dictionary/pronunciation-php GitHub →

Postman Collection

Import our ready-to-use Postman collection to test all Pronunciation API endpoints interactively.

Import
# Direct URL for Postman import
https://api.dictionary.com/openapi/postman-collection.json

# Or use the OpenAPI 3.0 spec
https://api.dictionary.com/openapi/spec.json
📦
The Postman collection includes pre-configured environment variables. Set your API_KEY variable after importing.

Changelog

v2.4  Latest — December 2025
  • Added pitch and volume parameters to /pronunciation/text
  • New batch endpoint: POST /pronunciation/batch (up to 100 words)
  • 15 new voices for Japanese (ja-JP) and Korean (ko-KR)
  • SSML support in text pronunciation endpoint
v2.3 — October 2025
  • Custom pronunciation dictionary support
  • Phonetic lookup now returns syllable breakdowns
  • Improved audio quality with neural voice models
  • FLAC output format added
v2.2 — August 2025
  • Pronunciation comparison endpoint with similarity scoring
  • Added stress_pattern to phonetic responses
  • Rate limit headers now included in all responses
v2.1 — June 2025
  • Initial v2 release with breaking changes from v1
  • New authentication model (Bearer tokens)
  • 500+ voices across 100+ languages
  • Sub-200ms latency improvements

Support

Need help with the Pronunciation API? Here are the best ways to get support:

Channel Response Time Best For
Developer Forum Community-driven General questions, code examples
Email Support < 4 hours Technical issues, billing
Slack Community Real-time Quick questions, networking
Enterprise SLA < 1 hour Critical issues (Enterprise plan)
🚀
Ready to start building? Get your free API key and make your first call in under 2 minutes.