POST /api/text-to-speech

Caution

This feature is not available to all customers. Please contact our sales team for eligibility details.

Synthesize voice.

Request

Headers

Parameters

  • None

Body as JSON Object

Key

Type

Required

Description

text

string

Yes

The sentence to synthesize voice.

tts_mode

string

Yes

Identify the mode to generate a speech. One of ‘actor’ and ‘audio_file’ mode mode. Default is ‘actor’.

actor_id

string

Yes

The character ID, which is required only when tts_mode is ‘actor’. Retrieve your character from the Actor API.

speak_resource_id

string

Yes

The speak resource ID, which is required only when tts_mode is ‘audio_file’. Retrieve speak_resource_id from the Resource API.

lang

string

Yes

Language code of the text. Available language codes are [‘en-us’, ‘ko-kr’, ‘ja-jp’, ‘es-es’, ‘zh-cn’, ‘auto’] for ‘actor’ mode. Use auto for automatic language detection from the text.
When using tts_mode: audio_file, Available language codes are [ ‘en-us’, ‘ko-kr’, ‘ja-jp’, ‘zh-cn’, ‘fr-fr’, ‘de-de’, ‘es-es’, ‘pt-pt’, ‘it-it’, ‘ru-ru’, ‘ta-in’, ‘ar-sa’, ‘bg-bg’, ‘uk-ua’, ‘hr-hr’, ‘cs-cz’, ‘pl-pl’, ‘sk-sk’, ‘fi-fi’, ‘ro-ro’, ‘el-gr’, ‘nl-nl’, ‘sv-se’, ‘tl-ph’, ‘id-id’, ‘auto’].

xapi_hd

bool

No

Specify sample rate. If set to true, you’ll get high-quality audio (44.1 KHz). Default is false (16 KHz).

xapi_audio_format

string

No

Specify audio format. If set to “mp3”, you’ll get an mp3 format file. Default is “wav” format.

model_version

string

No

Specify a model version name or alias. Refer to character version API. Use “latest” for the latest model.

emotion_tone_preset

string

No

Specify an emotion. Retrieve available emotions for your character from Actor API with actor id.

emotion_prompt

string

No

Specify a custom emotion in natural language (Korean or English). Use when emotion_tone_preset is emotion-prompt. Ensure emotion_prompt is activated for your selected character.

volume

int

No

Specify audio volume. Set to 50 for 0.5x down, 200 for 2x up. Default is 100.

speed_x

float

No

Control speaking speed. Values: 1.5 (slow), 1 (normal), or 0.5 (fast). Default is 1.

tempo

float

No

Control voice playing speed. Range: 0.5 (0.5x slow) to 2.0 (2x fast). Default is 1.0.

pitch

int

No

Control voice pitch. Range: -12 to 12. A value of 1 corresponds to one semitone. For example, 4 means 4 semitones higher than the original voice. Default is 0.

max_seconds

float

No

Limit maximum length of synthesized speech (1 to 60 seconds). Default is 30. Refer to limit length and fixed duration documents.

duration

float

No

Define the length of synthesized speech (1 to 60 seconds). Use default value for tempo and speed_x. Refer to fixed duration.

last_pitch

int

No

Control pitch of end of sentence. Values: -2 (lowest), -1 (low), 0 (normal), 1 (high), or 2 (highest).

Example with cURL
  curl --request POST \
      --url https://typecast.ai/api/text-to-speech \
      --header "Content-Type: application/json" \
      --header "Authorization: Bearer $API_TOKEN" \
      --data '{
          "text": "My name is Juncheol.",
          "lang": "auto",
          "model_version": "latest",
          "emotion_tone_preset": "${emotion}",
          "actor_id": "${24-letters-your_actor_id}",
          "xapi_hd": true,
          "xapi_audio_format": "mp3",
          "max_seconds": 20,
          "volume": 100,
          "speed_x": 1,
          "tempo": 1,
          "pitch": 0,
      }'

Check out an example of using a custom emotion with emotion_prompt in the following guide: Advanced speech synthesis: Apply custom emotion in your script

Response

Status Code

Status Code

Description

401

Authorization Error

400

JSON object representing an error. See how the error looks.

429

JSON object representing an error indicating that the request limit has been exceeded. See how the error looks. error_code is app/too-many-requests.

200

JSON object containing the result.

result consists of the following

Upon success, you will receive a binary file containing the synthesized audio.

error_codes in 400 response

Error Code

Description

app/param/not-enough

Some required fields are not included in the request body.

app/invalid/text

The length of text is over 350, or text consists entirely of unpronounceable letters.

app/invalid/actor_id

actor_id is incorrect or disallowed.

Example
{
  "message": {
    "msg": "need params as actor_id",
    "error_code": "app/param/not-enough"
  }
}