POST /api/speak/prosody-meta/{ :id }

An API for extracting start/stop timestamps of each word from a synthesized speech which is generated by POST /api/speak , as well as other data of it.

Caution

This API is not available to all customers. Please contact our sales team for eligibility details.

Request

Headers

Parameters

Path parameters

  • {:id}: ID of a speak that the word timestamps are extracted from. Refer to response of POST /api/speak.

Body as JSON Object

Key

Type

Required

Description

language

string

Yes

The language of the give speak as the following format: (English: en-us , Korean: ko-kr).

version

enum(string)

No

The default value is v1. The value should be either v2 or v1. v1 returns results based on normalized text, while v2 returns results based on the user’s original input.

Example with cURL
curl --request POST \
  --url https://typecast.ai/api/speak/prosody-meta/{your_speak_id} \
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer $API_TOKEN" \
  --data '{"language": "ko-kr", "version": "v2"}'

Response

Status Code

Status Code

Description

401

Authorization Error

403

Forbidden Error

200

JSON object containing the result.

result consists of the following

Key

Description

_id

ID of the given speak.

prosody_meta

Prosody Metadata. Refer to the below for the detail with an example.

actor_id

ID of the requested actor.

uid

ID of user.

query

The body of the request to create the given speak.

status

The result status of the given speak.

task_id

The internal task ID.

speak_url

Deprecated.

audio

The metadata of the created speech audio file.

audio_path

The path of the audio file.

quality

Used internally.

sentence_task_ids

Used internally.

callback

Used internally.

download

Used internally.

text_count

length of the text that you requeste.

duration

how much time the speak took (sec).

is_generated_by_api

True for the api user.

seed

Used internally.

prosody_meta consists of the following

Key

Description

phoneme_seq

List of word in the text, which is used to synthesize the speak.

phoneme_location

List of start/stop time location of the words in phoneme_seq.

phoneme_time

List of start/stop time (second) of the words in phoneme_seq.

features

The intonation data.

request_version

The version value used in the version parameter of the request. Returns null if the value is v1

Prosody-Meta Example
{
    "result": {
        ...
        "prosody_meta": {
            "phoneme_seq": [
                "thank",
                "you"
            ],
            "phoneme_time": [
                [
                    0.2230625,
                    0.48675
                ],
                [
                    0.567875,
                    0.709875
                ]
            ],
        }
    }
}

Example usage to extract timestamp of each word

Here is a sample script to get the timestamp with the above Prosody-Meta Example response

Prosody-Meta Example
response = requests.post(
    f"{prosody_meta_base_url}/{speak_id}",
    headers=my_authorized_header,
    json={"language": "ko-KR", "version": "v2"}
)
prosody_meta = response.json()['result']['prosody_meta']
prosody_seq = prosody_meta['phoneme_seq']
prosody_time = prosody_meta['phoneme_time']
for word, (start, end) in zip(prosody_seq, prosody_time):
    print(word, start, end)
# > thank 0.2230625 0.48675
# > you 0.567875 0.709875