POST /api/speak/prosody-meta/{ :id }¶
An API for extracting start/stop timestamps of each word from a synthesized speech which is generated by POST /api/speak , as well as other data of it.
Caution
This API is not available to all customers. Please contact our sales team for eligibility details.
Request¶
Headers¶
Parameters¶
Path parameters¶
{:id}: ID of a speak that the word timestamps are extracted from. Refer to response of
POST /api/speak
.
Body as JSON Object¶
Key |
Type |
Required |
Description |
---|---|---|---|
|
string |
Yes |
The language of the give speak as the following format: (English: en-us , Korean: ko-kr). |
|
enum(string) |
No |
The default value is |
Example with cURL
curl --request POST \
--url https://typecast.ai/api/speak/prosody-meta/{your_speak_id} \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $API_TOKEN" \
--data '{"language": "ko-kr", "version": "v2"}'
Response¶
Status Code¶
Status Code |
Description |
---|---|
401 |
|
403 |
|
200 |
JSON object containing the |
result
consists of the following¶
Key |
Description |
---|---|
|
ID of the given speak. |
|
Prosody Metadata. Refer to the below for the detail with an example. |
|
ID of the requested actor. |
|
ID of user. |
|
The body of the request to create the given speak. |
|
The result status of the given speak. |
|
The internal task ID. |
|
Deprecated. |
|
The metadata of the created speech audio file. |
|
The path of the audio file. |
|
Used internally. |
|
Used internally. |
|
Used internally. |
|
Used internally. |
|
length of the |
|
how much time the speak took (sec). |
|
True for the api user. |
|
Used internally. |
prosody_meta
consists of the following¶
Key |
Description |
---|---|
|
List of word in the text, which is used to synthesize the speak. |
|
List of start/stop time location of the words in |
|
List of start/stop time (second) of the words in |
|
The intonation data. |
|
The version value used in the version parameter of the request. Returns null if the value is |
Prosody-Meta Example
{
"result": {
...
"prosody_meta": {
"phoneme_seq": [
"thank",
"you"
],
"phoneme_time": [
[
0.2230625,
0.48675
],
[
0.567875,
0.709875
]
],
}
}
}
Example usage to extract timestamp of each word¶
Here is a sample script to get the timestamp with the above Prosody-Meta Example response
Prosody-Meta Example
response = requests.post(
f"{prosody_meta_base_url}/{speak_id}",
headers=my_authorized_header,
json={"language": "ko-KR", "version": "v2"}
)
prosody_meta = response.json()['result']['prosody_meta']
prosody_seq = prosody_meta['phoneme_seq']
prosody_time = prosody_meta['phoneme_time']
for word, (start, end) in zip(prosody_seq, prosody_time):
print(word, start, end)
# > thank 0.2230625 0.48675
# > you 0.567875 0.709875