POST /api/speak

Synthesize voice.

Caution

If you set a callback-url, we’ll send data to it using a POST request in JSON format. Use the embed URL (share.embed_url) from the callback response. Make sure result.speak_url in the /api/speak response matches the speak_url in the callback data.

Request

Headers

Parameters

  • None

Body as JSON Object

Key

Type

Required

Description

text

string

Yes

The sentence to synthesize voice.

tts_mode

string

Yes

Identify the mode to generate a speech. One of ‘actor’ and ‘audio_file’ mode mode. Default is ‘actor’.

actor_id

string

Yes

The character ID, which is required only when tts_mode is ‘actor’. Retrieve your character from the Actor API.

speak_resource_id

string

Yes

The speak resource ID, which is required only when tts_mode is ‘audio_file’. Retrieve speak_resource_id from the Resource API.

lang

string

Yes

Language code of the text. Available language codes are [‘en-us’, ‘ko-kr’, ‘ja-jp’, ‘es-es’, ‘zh-cn’, ‘auto’] for ‘actor’ mode. Use auto for automatic language detection from the text.
When using tts_mode: audio_file, Available language codes are [ ‘en-us’, ‘ko-kr’, ‘ja-jp’, ‘zh-cn’, ‘fr-fr’, ‘de-de’, ‘es-es’, ‘pt-pt’, ‘it-it’, ‘ru-ru’, ‘ta-in’, ‘ar-sa’, ‘bg-bg’, ‘uk-ua’, ‘hr-hr’, ‘cs-cz’, ‘pl-pl’, ‘sk-sk’, ‘fi-fi’, ‘ro-ro’, ‘el-gr’, ‘nl-nl’, ‘sv-se’, ‘tl-ph’, ‘id-id’, ‘auto’].

xapi_hd

bool

No

Specify sample rate. If set to true, you’ll get high-quality audio (44.1 KHz). Default is false (16 KHz).

xapi_audio_format

string

No

Specify audio format. If set to “mp3”, you’ll get an mp3 format file. Default is “wav” format.

model_version

string

No

Specify a model version name or alias. Refer to character version API. Use “latest” for the latest model.

emotion_tone_preset

string

No

Specify an emotion. Retrieve available emotions for your character from Actor API with actor id.

emotion_prompt

string

No

Specify a custom emotion in natural language (Korean or English). Use when emotion_tone_preset is emotion-prompt. Ensure emotion_prompt is activated for your selected character.

volume

int

No

Specify audio volume. Set to 50 for 0.5x down, 200 for 2x up. Default is 100.

speed_x

float

No

Control speaking speed. Values: 1.5 (slow), 1 (normal), or 0.5 (fast). Default is 1.

tempo

float

No

Control voice playing speed. Range: 0.5 (0.5x slow) to 2.0 (2x fast). Default is 1.0.

pitch

int

No

Control voice pitch. Range: -12 to 12. A value of 1 corresponds to one semitone. For example, 4 means 4 semitones higher than the original voice. Default is 0.

max_seconds

float

No

Limit maximum length of synthesized speech (1 to 60 seconds). Default is 30. Refer to limit length and fixed duration documents.

duration

float

No

Define the length of synthesized speech (1 to 60 seconds). Use default value for tempo and speed_x. Refer to fixed duration.

last_pitch

int

No

Control pitch of end of sentence. Values: -2 (lowest), -1 (low), 0 (normal), 1 (high), or 2 (highest).

Example with cURL
  curl --request POST \
      --url https://typecast.ai/api/speak \
      --header "Content-Type: application/json" \
      --header "Authorization: Bearer $API_TOKEN" \
      --data '{
          "text": "My name is Juncheol.",
          "lang": "auto",
          "model_version": "latest",
          "emotion_tone_preset": "${emotion}",
          "actor_id": "${24-letters-your_actor_id}",
          "xapi_hd": true,
          "xapi_audio_format": "mp3",
          "max_seconds": 20,
          "volume": 100,
          "speed_x": 1,
          "tempo": 1,
          "pitch": 0,
      }'

Check out an example of using a custom emotion with emotion_prompt in the following guide: Advanced speech synthesis: Apply custom emotion in your script

Response

Status Code

Status Code

Description

401

Authorization Error

400

JSON object representing an error. See how the error looks.

429

JSON object representing an error indicating that the request limit has been exceeded. See how the error looks. error_code is app/too-many-requests.

200

JSON object containing the result.

result consists of the following

Key

Description

speak_v2_url

URL to view detailed information about the created speak.

speak_url

(Deprecated) URL to view details about the speak.

Example
{
  "result": {
    "speak_v2_url": "https://typecast.ai/api/speak/v2/{your-speak-id}",
    "speak_url": "https://typecast.ai/api/speak/{your-speak-id}"
  }
}

error_codes in 400 response

Error Code

Description

app/param/not-enough

Some required fields are not included in the request body.

app/invalid/text

The length of text is over 350, or text consists entirely of unpronounceable letters.

app/invalid/actor_id

actor_id is incorrect or disallowed.

Example
{
  "message": {
    "msg": "need params as actor_id",
    "error_code": "app/param/not-enough"
  }
}