Speaker enrollment
Speaker enrollment is the process by which an identity becomes associated with a voice or accoustic signature. This allows RingCentral to include those identities in any reports it generates about the speakers within a media file.
Our Speaker Enrollment API can be used to register speakers and their voices before calling our other speaker-related APIs such as speaker identification, and speaker diarization. The speaker enrollment process is content agnostic, meaning there are no specific requirements or restrictions on what the speaker says in order generate their audio signature. However, for the best results, obey the following guidelines:
- Use audio samples that are 12-24 seconds in length. Samples with less than six or more than 30 seconds will be rejected.
- Ensure the sample include continuous speech, no silence, and no background noise if possible.
- Enroll a speaker using three to five audio samples that exhibit diversity in the person's speech, and do not use multiple samples from same audio recording.
- If the total speech duration of an enrollment is less than 12 seconds, the enrollment will be treated as incomplete and
enrollmentComplete
will be set tofalse
. - Enrollments with status
enrollmentComplete=True
will be considered for identification, otherwise an error will be returned.
Each enrolled speaker is given a unique speakerId
. The system does not record any other personal data relating to the speaker. It is therefore the developer's responsibility to store a speaker's speakerId
and associate it with other data that will allow the client application to display the speaker's name or other speaker meta data in your final output.
If you find that speaker identification is unreliable for a given individual, you may want to consider augmenting a speaker's enrollment with additional audio files. The process of reenrolling a speaker is done by updating an existing enrollment/speaker.
Register a voice-print by enrolling speakers
After you have setup a simple web server to process the response, copy and paste the code from below in index.js
and make sure to edit the variables in ALL CAPS to ensure your code runs properly.
const RC = require('@ringcentral/sdk').SDK;
require('dotenv').config();
MEDIA_URL = process.env.RC_MEDIA_URL;
WEBHOOK_URL = '<INSERT YOUR WEBHOOK URL>';
// Initialize the RingCentral SDK and Platform
const rcsdk = new RC({
'server': process.env.RC_SERVER_URL,
'clientId': process.env.RC_CLIENT_ID,
'clientSecret': process.env.RC_CLIENT_SECRET
});
const platform = rcsdk.platform();
// Authenticate with RingCentral Developer Platdorm using Developer's JWT Credential
platform.login({
'jwt': process.env.RC_JWT
});
// Call the Speaker Enrollment API right after login asynchronously
platform.on(platform.events.loginSuccess, () => {
enrollSpeaker();
})
async function enrollSpeaker() {
try {
console.log("Enrolling speaker using RingCentral Enrollment API");
let resp = await platform.post("/ai/audio/v1/enrollments?webhook=" + WEBHOOK_URL, {
"contentUri": MEDIA_URL,
"encoding": "Mpeg",
"languageCode": "en-US",
"enrollmentId": "manish3"
});
console.log("Job is " + resp.statusText + " with HTTP status code " + resp.status);
}
catch (e) {
console.log("An error occurred : " + e.message);
}
}
Run your sample code.
$ node index.js
Sample response
When RingCentral is done processing your request, it will post the response back to the webhookUrl
you specified. The payload of that callback will resemble the following:
{
"enrollmentId": "manish3",
"enrollmentComplete": false,
"totalSpeechDuration": 8.180000000000001,
"totalEnrollDuration": 18.0,
"enrollmentQuality": "Poor"
}
Parameter | Type | Description |
---|---|---|
speakerId | String | Registered speaker id. |
enrollmentQuality | String | Quality of the enrollment. Values will be one of: Poor , Average , Good , High . |
enrollmentComplete | Bool | Status of the enrollment. Will be set to True if total speech exceeds 12sec. |
totalSpeechDuration | Number | Total Speech Duration of the enrollment. |
totalEnrollDuration | Number | Total Duration of the enrollment. |