Skip to content

POST /api/speak/prosody-meta/{ :id }¶

An API for extracting start/stop timestamps of each word from a synthesized speech which is generated by POST /api/speak , as well as other data of it.

Request¶

Headers¶

Required headers

Parameters¶

Path parameters¶

{:id}: ID of a speak that the word timestamps are extracted from. Refer to response of POST /api/speak.

Body as JSON Object¶

Key	Type	Required	Description
`language`	string	Yes	The language of the give speak as the following format: (English: en-us , Korean: ko-kr).
`version`	enum(string)	No	The default value is `v1`. The value should be either `v2` or `v1`. `v1` returns results based on normalized text, while `v2` returns results based on the user’s original input.

Example with cURL

curl --request POST \
  --url https://typecast.ai/api/speak/prosody-meta/{your_speak_id} \
  --header "Content-Type: application/json" \
  --header "Authorization: Bearer $API_TOKEN" \
  --data '{"language": "ko-kr", "version": "v2"}'

Response¶

Status Code¶

Status Code	Description
401	Authorization Error
403	Forbidden Error
200	JSON object containing the `result`.

`result` consists of the following¶

Key	Description
`_id`	ID of the given speak.
`prosody_meta`	Prosody Metadata. Refer to the below for the detail with an example.
`actor_id`	ID of the requested actor.
`uid`	ID of user.
`query`	The body of the request to create the given speak.
`status`	The result status of the given speak.
`task_id`	The internal task ID.
`speak_url`	Deprecated.
`audio`	The metadata of the created speech audio file.
`audio_path`	The path of the audio file.
`quality`	Used internally.
`sentence_task_ids`	Used internally.
`callback`	Used internally.
`download`	Used internally.
`text_count`	length of the `text` that you requeste.
`duration`	how much time the speak took (sec).
`is_generated_by_api`	True for the api user.
`seed`	Used internally.

`prosody_meta` consists of the following¶

Key	Description
`phoneme_seq`	List of word in the text, which is used to synthesize the speak.
`phoneme_location`	List of start/stop time location of the words in `phoneme_seq`.
`phoneme_time`	List of start/stop time (second) of the words in `phoneme_seq`.
`features`	The intonation data.
`request_version`	The version value used in the version parameter of the request. Returns null if the value is `v1`

Prosody-Meta Example

{
    "result": {
        ...
        "prosody_meta": {
            "phoneme_seq": [
                "thank",
                "you"
            ],
            "phoneme_time": [
                [
                    0.2230625,
                    0.48675
                ],
                [
                    0.567875,
                    0.709875
                ]
            ],
        }
    }
}

Example usage to extract timestamp of each word¶

Here is a sample script to get the timestamp with the above Prosody-Meta Example response

Prosody-Meta Example

response = requests.post(
    f"{prosody_meta_base_url}/{speak_id}",
    headers=my_authorized_header,
    json={"language": "ko-KR", "version": "v2"}
)
prosody_meta = response.json()['result']['prosody_meta']
prosody_seq = prosody_meta['phoneme_seq']
prosody_time = prosody_meta['phoneme_time']
for word, (start, end) in zip(prosody_seq, prosody_time):
    print(word, start, end)
# > thank 0.2230625 0.48675
# > you 0.567875 0.709875