We highly encourage you to use the API through our SDK. If you still want to make requests on your own, there's a caveat to remember. Most of the JavasScript tools to make HTTP requests, interpret the server's response as
plaintext
orjson
by default, but our TTS/VC endpoints return the binary data. You have take that into account to be able to use it. In case of Axios, you have to configure theresponseType
asarraybuffer
. In (Fetch)[https://developer.mozilla.org/en-US/docs/Web/API/Response/arrayBuffer], you have to use theresponse.arrayBuffer()
method. For other tools, please refer to the corresponding documentation. If you use our SDK, you don't have to worry about it.
The
ClientKey
andApiKey
required to authenticate the SDK are meant to be private. Packaging them with the frontend application effectively leaks them publicly and allows anyone to use your gemelo.ai Account directly. We strongly advise to proxy those requests via your own backend or use the browser SDK only for internal usage.
The JS/TS SDK works both in all of the modern browsers and Node.js >=18.0.0. We support CommonJS and ESModules.
yarn add @charactr/api-sdk
Before you use the SDK, you have to initialize it first with your account's credentials. To get them, please log in to the (Gemelo Platform)[https://app.gemelo.ai/).
Getting available voices
const voices = await sdk.tts.getVoices();
Refer here for the response type.
You can also browse the voices more conveniently on the (Gemelo Platform)[https://app.gemelo.ai/).
Making TTS requests
const result = await sdk.tts.convert(voiceId, "Hello world");
Refer here for the response type.
TTS Simplex Streaming
Simplex streaming allows you to send the text to the server and receive the converted audio response as a stream through WebSockets. It greatly reduces the latency compared to the traditional REST endpoint and it's not limited by the text's length. To use it, call sdk.tts.convertStreamSimplex
method:
try {
await sdk.tts.convertStreamSimplex(voiceId, text, {
onData: (data: ArrayBuffer) => {
// do anything with the incoming binary data
},
});
} catch (error) {
console.error(error);
}
TTS Duplex Streaming This more advanced technique allows you to start a two-way, WebSockets-based stream, where you can asynchronously send text and receive the converted audio. You control when the stream ends.
To use, it, call the sdk.tts.convertStreamDuplex method:
async convertStreamDuplex(
voice: number | Voice,
cb: TTSStreamDuplexCallbacks
): Promise<TTSStreamDuplex>
You may pass callbacks to know when the stream ends/errors and to receive data:
export interface TTSStreamDuplexCallbacks {
onData?: (data: ArrayBuffer) => void;
onClose?: (event: CloseEvent) => void;
}
The method also returns a set of functions to control the stream:
export interface TTSStreamDuplex {
/**
* sends to the server the text to be converted into audio
*/
convert: (text: string) => void;
/**
* resolves when there were 5 seconds of stream inactivity
*/
wait: () => Promise<void>;
/**
* terminates the websocket connection immediately
* in most use cases we advise to use the `close()` method instead
*/
terminate: () => void;
/**
* requests the server to close the connection gracefully
*/
close: () => void;
}
Example usage:
const stream = await sdk.tts.convertStreamDuplex(voiceId, {
onData: (data: ArrayBuffer) => {
// do anything with the incoming binary data
},
onClose: (event: CloseEvent) => {
if (event.code === 1000) {
// the stream ended successfully
} else {
// the stream was closed because of an error
}
},
});
stream.convert(text);
stream.close();
Getting available voices
const voices = await sdk.vc.getVoices();
Refer here for the response type.
You can also browse the voices more conveniently on the (Gemelo Platform)[https://app.gemelo.ai/).
Making VC requests
const inputAudioBlob = new Blob();
const result = await sdk.vc.convert(inputAudioBlob, "Hello world");
Refer here for the response type.
Voice cloning API allows you to create voice clones from audio, list all created clones, change their name and delete them, as well as using them with tts & vc endpoints.
The only notable difference is, that voiceType
option needs to be set to cloned
when doing the convert request:
{
voiceType: "cloned";
}
For more, see the relevant code example: Full Voice Cloning Example.
Our SDK comes with a bunch of examples to help you start working with it. Please, refer to the SDK's README page if you wish to run them.