Skip to content

POST /api/text-to-speech¶

Synthesize voice.

Request¶

Headers¶

Required headers

Parameters¶

None

Body as JSON Object¶

Key	Type	Required	Description
`text`	string	Yes	The sentence to synthesize voice.
`tts_mode`	string	Yes	Identify the mode to generate a speech. One of ‘actor’ and ‘audio_file’ mode mode. Default is ‘actor’.
`actor_id`	string	Yes	The character ID, which is required only when tts_mode is ‘actor’. Retrieve your character from the Actor API.
`speak_resource_id`	string	Yes	The speak resource ID, which is required only when tts_mode is ‘audio_file’. Retrieve speak_resource_id from the Resource API.
`lang`	string	Yes	Language code of the `text`. Available language codes are [‘en-us’, ‘eng’, ‘ko-kr’, ‘kor’, ‘ja-jp’, ‘jpn’, ‘es-es’, ‘spa’, ‘zh-cn’, ‘zho’, ‘auto’] for ‘actor’ mode. Use `auto` for automatic language detection from the text. When using `tts_mode: audio_file`, Available language codes are [‘en-us’, ‘eng’, ‘ko-kr’, ‘kor’, ‘ja-jp’, ‘jpn’, ‘zh-cn’, ‘zho’, ‘fr-fr’, ‘fra’, ‘de-de’, ‘deu’, ‘es-es’, ‘spa’, ‘pt-pt’, ‘por’, ‘it-it’, ‘ita’, ‘ru-ru’, ‘rus’, ‘ta-in’, ‘tam’, ‘ar-sa’, ‘ara’, ‘bg-bg’, ‘bul’, ‘uk-ua’, ‘ukr’, ‘hr-hr’, ‘hrv’, ‘cs-cz’, ‘ces’, ‘pl-pl’, ‘pol’, ‘sk-sk’, ‘slk’, ‘fi-fi’, ‘fin’, ‘ro-ro’, ‘ron’, ‘el-gr’, ‘ell’, ‘nl-nl’, ‘nld’, ‘sv-se’, ‘swe’, ‘tl-ph’, ‘tgl’, ‘id-id’, ‘ind’, ‘auto’].
`xapi_hd`	bool	No	Specify sample rate. If set to true, you’ll get high-quality audio (44.1 KHz). Default is false (16 KHz).
`xapi_audio_format`	string	No	Specify audio format. If set to “mp3”, you’ll get an mp3 format file. Default is “wav” format.
`model_version`	string	No	Specify a model version name or alias. Refer to character version API. Use “latest” for the latest model.
`emotion_tone_preset`	string	No	Specify an emotion. Retrieve available emotions for your character from Actor API with actor id.
`emotion_prompt`	string	No	Specify a custom emotion in natural language (Korean or English). Use when `emotion_tone_preset` is `emotion-prompt`. Ensure `emotion_prompt` is activated for your selected character.
`volume`	int	No	Specify audio volume. Set to 50 for 0.5x down, 200 for 2x up. Default is 100.
`speed_x`	float	No	Control speaking speed. Values: 1.5 (slow), 1 (normal), or 0.5 (fast). Default is 1.
`tempo`	float	No	Control voice playing speed. Range: 0.5 (0.5x slow) to 2.0 (2x fast). Default is 1.0.
`pitch`	int	No	Control voice pitch. Range: -12 to 12. A value of 1 corresponds to one semitone. For example, 4 means 4 semitones higher than the original voice. Default is 0.
`max_seconds`	float	No	Limit maximum length of synthesized speech (1 to 60 seconds). Default is 30. Refer to limit length and fixed duration documents.
`duration`	float	No	Define the length of synthesized speech (1 to 60 seconds). Use default value for `tempo` and `speed_x`. Refer to fixed duration.
`last_pitch`	int	No	Control pitch of end of sentence. Values: -2 (lowest), -1 (low), 0 (normal), 1 (high), or 2 (highest).

Example with cURL

  curl --request POST \
      --url https://typecast.ai/api/text-to-speech \
      --header "Content-Type: application/json" \
      --header "Authorization: Bearer $API_TOKEN" \
      --data '{
          "text": "My name is Juncheol.",
          "lang": "auto",
          "model_version": "latest",
          "emotion_tone_preset": "${emotion}",
          "actor_id": "${24-letters-your_actor_id}",
          "xapi_hd": true,
          "xapi_audio_format": "mp3",
          "max_seconds": 20,
          "volume": 100,
          "speed_x": 1,
          "tempo": 1,
          "pitch": 0,
      }'

Check out an example of using a custom emotion with emotion_prompt in the following guide: Advanced speech synthesis: Apply custom emotion in your script

Response¶

Status Code¶

Status Code	Description
401	Authorization Error
400	JSON object representing an error. See how the error looks.
429	JSON object representing an error indicating that the request limit has been exceeded. See how the error looks. `error_code` is `app/too-many-requests`.
200	JSON object containing the `result`.

`result` consists of the following¶

Upon success, you will receive a binary file containing the synthesized audio.

`error_code`s in `400` response¶

Error Code	Description
`app/param/not-enough`	Some required fields are not included in the request body.
`app/invalid/text`	The length of `text` is over 350, or `text` consists entirely of unpronounceable letters.
`app/invalid/actor_id`	`actor_id` is incorrect or disallowed.

Example

{
  "message": {
    "msg": "need params as actor_id",
    "error_code": "app/param/not-enough"
  }
}