Text-to-Speech: Replicating an existing voice file.

Caution

This feature is not available to all customers. Please contact our sales team for eligibility details.

Important

Related paramters:

  • speak_resource_id

Description

Typecast offers a new API that generates speech with a voice and tone closely resembling the provided audio file. This document describes a detailed, step-by-step guide on how to utilize this feature effectively.

This is the reference audio file:

This is a generated voice that closely resembles the reference audio file:

There are four steps:

  1. Request presigned URL to upload an audio file

  2. Upload an audio file to the given presigned URL

  3. Request speech generation

  4. Get the status of the speech generation

Details of each step are as follows:

1. Request presigned URL to upload an audio file: POST /api/speak/resource

Firstly, you have to upload the reference audio file which will be used as a source to generate speech later. So that, you request presigned URL to upload then get presigned_url and speak_resource_id in the response of this API.

An input parameter, “audio_format” is required to generate presigned_url. Please check the allowable format.

curl --request POST \
    --url 'https://typecast.ai/api/speak/resource' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer $API_TOKEN' \
    --data '{
        "audio_format": "wav"
    }'

2. Upload an audio file to the given presigned URL: PUT presigned_url

Upload reference audio file actually to the presigned_url, which comes from step 1.

curl --location \
    --request PUT \
    --url 'https://cdn.typecast.ai/...' \
    --upload-file '/Users/parkwoongki/Documents/Workspace/multilingual-dubbing/tmp/2s.wav'

3. Request speech generation: POST /api/speak

Then you’re ready to request speech generation from a reference audio file. Provide the speak_resource_id, which comes from step 1. This API’s response contains speak_v2_url, which will be used to check the status of the speech generation. Please note that the available language codes are [ ‘en-us’, ‘ko-kr’, ‘ja-jp’, ‘zh-cn’, ‘fr-fr’, ‘de-de’, ‘es-es’, ‘pt-pt’, ‘it-it’, ‘ru-ru’, ‘ta-in’, ‘ar-sa’, ‘bg-bg’, ‘uk-ua’, ‘hr-hr’, ‘cs-cz’, ‘pl-pl’, ‘sk-sk’, ‘fi-fi’, ‘ro-ro’, ‘el-gr’, ‘nl-nl’, ‘sv-se’, ‘tl-ph’, ‘id-id’, ‘auto’] when ‘audio_file’ tts_mode is used.

curl --request POST \
    --url 'https://typecast.ai/api/speak' \
    --header 'Content-Type: application/json' \
    --header "Authorization: Bearer $API_TOKEN" \
    --data '{
        "tts_mode": "audio_file",
        "speak_resource_id": "63ee55f00000000000000000",  # here
        "text": "Hello world!",
        "lang": "en-us",
        "xapi_hd": true
    }'

4. Get the status of the speech generation: GET /api/speak/v2/{speak_id}

Check the status of your speech generation. Once result.status turns to done, it signifies that the audio file is ready for download, and you can retrieve it from result.audio_download_url.

Follow this link to see how you can finally download the audio.

Python Sample Code

For ease of use and clarity, please refer to the sample Python code linked below, which demonstrates the steps for performing speech systhesis with an existing voice file as described above.

If you have any additional inquiries concerning the sample code, feel free to leave a comment on the GitHub.