Speaker enrollment
Speaker enrollment is a process by which an identity becomes associated with a voice or acoustic signature. This allows RingCentral to include those identities in any reports it generates about the speakers within a media file.
Our Speaker Enrollment API can be used to register speakers and their voices before calling our other speaker-related APIs such as speaker identification, and speaker diarization. The speaker enrollment process is content agnostic, meaning there are no specific requirements or restrictions on what the speaker says in order generate their audio signature. However, for the best results, obey the following guidelines:
- Use audio samples that are 12-30 seconds in length. Samples with more than 30 seconds will potentially be rejected due to too large size of based64 encoded content.
- Ensure the sample include continuous and monologue speech, no silence, and no background noise if possible.
- Enroll a speaker multiple times using different audio samples that exhibit diversity in the person's speech. After each enrollment, check the enrollment quality in the response. You should expect the quality to be "High".
- If the total speech duration of an enrollment is less than 12 seconds, the enrollment will be treated as incomplete and
enrollmentComplete
will be set tofalse
. - Enrollments with status
enrollmentComplete=True
will be considered for identification, otherwise an error will be returned.
When enrolling a new speaker identification, a unique speakerId
(a string of alphabetic, numeric and underscore characters) must be specified as an input. The system will register the speakerId
value and associate it with the voice signature in the provided audio file. If the system detects that the specified speakerId
exists, it will return an error '409 Conflict'. In that case, you have to update the enrollment if the audio content is recorded from the same person's speech that previously enrolled with that speaker Id. Or you have to specify a new speakerId
value if the audio content is recorded from a different person's speech.
A speaker Id will be returned in a response of other AI APIs; E.g. the speech to text API, which supports the speaker identification if the voice speech in the audio matches the voice signature associated with the specified speakerId
(provided that you set the speakerId
in the API request!). The system does not record any other personal data relating to the speaker. Therefore, it is the developer's responsibility to store a speaker id and associate it with other data that will allow the client application to display the speaker's name or other speaker metadata in the final output.
It's worth mentioning that a speakerId
is a unique identifier within the same RingCentral account. Therefore, using a person's name to specify the speaker id of users in a large RingCentral account might cause name collision problem. The best practice to enroll speaker identification for users under the same RingCentral account is to use the user's extension id, which is a unique id not only under the same account, but also across all other RingCentral accounts. The benefit of using a user's extension id as a speaker id is that you don't need to store the speaker id with other associated user metadata in your database. You can always use an extension id to read the user metadata e.g. the user first and last name from your RingCentral account and replace the speaker id with the name of the speaker. This is also useful for transcribing a RingCentral call recording when you get the recording URL from a call record from your call log data, which contains the extension id of the user as one of the call party. In that case, simply specify the extension id in the speakerIds
body params of the AI API.
{
"uri": 'https://platform.ringcentral.com/restapi/v1.0/account/...',
"id": 'WKt-N7_...',
"duration": 228,
"durationMs": 227606,
"type": 'Voice',
"direction": 'Inbound',
"action": 'Phone Call',
"result": 'Accepted',
"to": {
"name": 'Agent 1200',
"extensionId": '59586xxxx',
"extensionNumber": '11120'
},
"from": { "name": '...', "phoneNumber": '...' },
"extension": {
"uri": 'https://platform.ringcentral.com/restapi/v1.0/account/...',
"id": 59586xxxx
},
"recording": {
"uri": "https://platform.ringcentral.com/restapi/v1.0/account/40119014xxxx/recording/401547458000",
"id": "401547458000",
"type": "OnDemand",
"contentUri": "https://media.ringcentral.com/restapi/v1.0/account/40119014xxxx/recording/401547458000/content"
}
...
}
If you find that speaker identification is unreliable for a given individual, you may want to consider augmenting a speaker's enrollment with additional audio files. The process of reenrolling a speaker is done by updating an existing speaker id.
Sample code
The following sample code shows how to enroll a speaker identification for a user extension using its extension id as a speakerId
. It checks if a speaker id exists, then update the enrollment, otherwise it creates a new speaker id enrollment.
Running the code
- Edit the variables in ALL CAPS with your app and user credentials before running the code.
- You can only run on your production account, this means that you have to use app credentials for production.
- Also make sure that you have recorded several voice recordings of your own voice.
const fs = require ('fs')
const RC = require('@ringcentral/sdk').SDK
// Instantiate the SDK and get the platform instance
var rcsdk = new RC({
server: 'https://platform.ringcentral.com',
clientId: 'RC_APP_CLIENT_ID',
clientSecret: 'RC_APP_CLIENT_SECRET'
});
var platform = rcsdk.platform();
/* Authenticate a user using a personal JWT token */
platform.on(platform.events.loginSuccess, () => {
// set your valid audio content file name and path
let contentFile = "VALID_AUDIO_CONTENT_FILE"
create_speaker_enrollment(contentFile)
})
platform.on(platform.events.loginError, function(e){
console.log("Unable to authenticate to platform. Check credentials.", e.message)
process.exit(1)
});
/*
* Enroll a speaker id
*/
async function create_speaker_enrollment(contentFile) {
try{
// use own extension id as a unique enrollment id
let tokens = await platform.auth().data()
let enrollmentId = tokens.owner_id
const fs = require('fs')
const base64data = fs.readFileSync(contentFile, {encoding: 'base64'})
console.log(base64data.length)
let endpoint = "/ai/audio/v1/enrollments"
// check if this speaker id exists
let enrollment = await read_enrollment(enrollmentId)
if (enrollment){
// speaker id exists => update it
console.log("Existing enrollment", enrollment)
let bodyParams = {
encoding: "Mpeg", // Change the encoding if not an MP3 or MP4 file!
languageCode: "en-US", // Change language code if not English US
content: base64data
}
var resp = await platform.patch(`${endpoint}/${enrollmentId}`, bodyParams)
}else{
// speaker id does not exist => enroll a new one
let bodyParams = {
encoding: "Mpeg", // Change the encoding if not an MP3 or MP4 file!
languageCode: "en-US",
content: base64data,
enrollmentId: enrollmentId
}
let endpoint = "/ai/audio/v1/enrollments"
var resp = await platform.post(endpoint, bodyParams)
}
var jsonObj = await resp.json()
console.log("New enrollment", jsonObj)
}catch (e){
console.log("Unable to enroll speaker identification.", e.message)
}
}
/*
* Read a speaker id
*/
async function read_enrollment(enrollmentId) {
try{
let endpoint = `/ai/audio/v1/enrollments/${enrollmentId}`
var resp = await platform.get(endpoint)
var jsonObj = await resp.json()
return jsonObj
}catch (e){
console.log("Unable to find this speaker identification.", e.message)
return null
}
}
from ringcentral import SDK
import json
import base64
#
# Read a speaker id
#
def read_enrollment(enrollmentId):
try:
endpoint = f"/ai/audio/v1/enrollments/{enrollmentId}"
resp = platform.get(endpoint)
jsonObj = resp.json_dict()
return jsonObj
except Exception as e:
print ("Unable to find this speaker identification. " + str(e))
return None
#
# Enroll speaker identification
#
def create_speaker_enrollment(contentFile):
try:
# use own extension id as a unique enrollment id
tokens = platform.auth().data()
enrollmentId = str(tokens['owner_id'])
with open(contentFile, "rb") as f:
base64_bytes = base64.b64encode(f.read())
base64_string = base64_bytes.decode('utf-8')
endpoint = '/ai/audio/v1/enrollments'
# check if this speaker id exists
enrollmentObj = read_enrollment(enrollmentId)
if enrollmentObj != None:
# speaker id exists => update it
print ("Existing enrollment")
print(json.dumps(enrollmentObj, indent=2, sort_keys=True))
bodyParams = {
'encoding': "Mpeg",
'languageCode': "en-US",
'content': base64_string
}
resp = platform.patch(f"{endpoint}/{enrollmentId}", bodyParams)
else:
# speaker id does not exist => enroll a new one
bodyParams = {
'encoding': "Mpeg",
'languageCode': "en-US",
'content': base64_string,
'enrollmentId': enrollmentId
}
resp = platform.post(endpoint, bodyParams)
jsonObj = resp.json_dict()
print ("New enrollment")
print(json.dumps(jsonObj, indent=2, sort_keys=True))
except Exception as e:
print ("Unable to enroll speaker identification. " + str(e))
# Authenticate a user using a personal JWT token
def login():
try:
platform.login( jwt= "RC_USER_JWT" )
# et your valid audio content file name and path
contentFile = "VALID_AUDIO_CONTENT_FILE"
create_speaker_enrollment(contentFile)
except Exception as e:
print ("Unable to authenticate to platform. Check credentials. " + str(e))
# Instantiate the SDK and get the platform instance
rcsdk = SDK("RC_APP_CLIENT_ID", "RC_APP_CLIENT_SECRET", "https://platform.ringcentral.com")
platform = rcsdk.platform()
login()
<?php
require('vendor/autoload.php');
// Instantiate the SDK and get the platform instance
$rcsdk = new RingCentral\SDK\SDK( 'RC_APP_CLIENT_ID', 'RC_APP_CLIENT_SECRET', 'https://platform.ringcentral.com' );
$platform = $rcsdk->platform();
/* Authenticate a user using a personal JWT token */
$platform->login(["jwt" => 'RC_USER_JWT']);
// For code sample testing purpuse, we set the file name in the environment.
// Replace the $contentFile value with your valid audio file!
$contentFile = $_ENV['ENROLLMENT_CONTENT_3'];
create_speaker_enrollment($contentFile);
/*
* Enroll speaker identification
*/
function create_speaker_enrollment($contentFile)
{
global $platform;
try{
// use own extension id as a unique enrollment id
$tokens = $platform->auth()->data();
$enrollmentId = $tokens['owner_id'];
$content = file_get_contents($contentFile);
$base64data = base64_encode($content);
$endpoint = "/ai/audio/v1/enrollments";
// check if this speaker id exists
$enrollmentObj = read_enrollment($enrollmentId);
if ($enrollmentObj){
// speaker id exists => update it
print_r ("Existing enrollment");
print_r (json_encode($enrollmentObj, JSON_PRETTY_PRINT));
$bodyParams = array (
'encoding' => "Mpeg",
'languageCode' => "en-US",
'content' => $base64data
);
$resp = $platform->patch($endpoint . "/" . $enrollmentId, $bodyParams);
}else{
// speaker id does not exist => enroll a new one
$bodyParams = array (
'encoding' => "Mpeg",
'languageCode' => "en-US",
'content' => $base64data,
'enrollmentId' => $enrollmentId
);
$resp = $platform->post($endpoint, $bodyParams);
}
print_r ("New enrollment");
print_r (json_encode($resp->json(), JSON_PRETTY_PRINT));
}catch (\RingCentral\SDK\Http\ApiException $e) {
print_r ('Unable to enroll speaker identification. ' . $e->getMessage() . PHP_EOL);
}
}
/*
* Read a speaker id
*/
function read_enrollment($enrollmentId) {
global $platform;
try{
$endpoint = "/ai/audio/v1/enrollments/" .$enrollmentId;
$resp = $platform->get($endpoint);
$jsonObj = $resp->json();
return $jsonObj;
}catch (\RingCentral\SDK\Http\ApiException $e) {
print_r ("Unable to find this speaker identification." . $e->getMessage() . PHP_EOL);
return null;
}
}
?>
require 'ringcentral'
require 'base64'
#
# Read a speaker id
#
def read_enrollment(enrollmentId)
begin
endpoint = "/ai/audio/v1/enrollments/" + enrollmentId
resp = $platform.get(endpoint)
jsonObj = resp.body
return jsonObj
rescue StandardError => e
puts ("Unable to find this speaker identification. " + e.to_s)
return nil
end
end
#
# Enroll speaker identification
#
def create_speaker_enrollment(contentFile)
begin
# use own extension id as a unique enrollment id
tokens = $platform.token
enrollmentId = tokens['owner_id'].to_s
file = File.open(contentFile, "rb")
contents = file.read
base64_string = Base64.encode64(contents)
endpoint = "/ai/audio/v1/enrollments"
# check if this speaker id exists
enrollmentObj = read_enrollment(enrollmentId)
if enrollmentObj != nil
# speaker id exists => update it
puts ("Existing enrollment")
puts (enrollmentObj)
bodyParams = {
'encoding': "Mpeg",
'languageCode': "en-US",
'content': base64_string
}
resp = $platform.patch(endpoint + "/" + enrollmentId, payload: bodyParams)
else
# speaker id does not exist => enroll a new one
bodyParams = {
'encoding': "Mpeg",
'languageCode': "en-US",
'content': base64_string,
'enrollmentId': enrollmentId
}
resp = $platform.post(endpoint, payload: bodyParams)
end
jsonObj = resp.body
puts ("New enrollment")
puts (jsonObj)
rescue StandardError => e
puts ("Unable to enroll speaker identification. " + e.to_s)
end
end
# Authenticate a user using a personal JWT token
def login()
begin
$platform.authorize( jwt: "RC_USER_JWT" )
# set your valid audio content file name and path
contentFile = "VALID_AUDIO_CONTENT_FILE"
create_speaker_enrollment(contentFile)
rescue StandardError => e
puts ("Unable to authenticate to platform. Check credentials. " + e.to_s)
end
end
# Instantiate the SDK and get the platform instance
$platform = RingCentral.new( "RC_APP_CLIENT_ID", "RC_APP_CLIENT_SECRET", "https://platform.ringcentral.com" )
login()
using System;
using System.IO;
using System.Threading.Tasks;
using System.Collections.Generic;
using RingCentral;
using Newtonsoft.Json;
namespace SpeakserIdentificationEnrollment {
class Program {
static RestClient restClient;
static async Task Main(string[] args){
try
{
// Instantiate the SDK
restClient = new RestClient("RC_APP_CLIENT_ID", "RC_APP_CLIENT_SECRET", "https://platform.ringcentral.com");
// Authenticate a user using a personal JWT token
await restClient.Authorize("RC_USER_JWT");
// set your valid audio content file name and path
var contentFile = "VALID_AUDIO_CONTENT_FILE";
await create_speaker_enrollment(contentFile);
}
catch (Exception ex)
{
Console.WriteLine("Unable to authenticate to platform. Check credentials. " + ex.Message);
}
}
/*
* Enroll speaker identification
*/
static private async Task create_speaker_enrollment(String contentFile)
{
try
{
// use own extension id as a unique enrollment id
var enrollmentId = restClient.token.owner_id.ToString();
var content_bytes = System.IO.File.ReadAllBytes(contentFile);
var based64_data = System.Convert.ToBase64String(content_bytes);
// check if this speaker id exists
var enrollmentObj = await read_enrollment(enrollmentId);
EnrollmentStatus resp = null;
if (enrollmentObj != null)
{
// speaker id exists => update it
Console.WriteLine("Existing enrollment");
Console.WriteLine(JsonConvert.SerializeObject(enrollmentObj));
var bodyParams = new EnrollmentPatchInput()
{
content = based64_data,
encoding = "Mpeg",
languageCode = "en-US"
};
resp = await restClient.Ai().Audio().V1().Enrollments(enrollmentId).Patch(bodyParams);
}
else
{
// speaker id does not exist => enroll a new one
var bodyParams = new EnrollmentInput()
{
content = based64_data,
encoding = "Mpeg",
languageCode = "en-US",
enrollmentId = enrollmentId
};
resp = await restClient.Ai().Audio().V1().Enrollments().Post(bodyParams);
}
Console.WriteLine("New enrollment");
var jsonStr = JsonConvert.SerializeObject(resp);
Console.WriteLine(jsonStr);
}
catch (Exception ex)
{
Console.WriteLine("Unable to enroll a speaker identification. " + ex.Message);
}
}
// Read a speaker identification
static private async Task<EnrollmentStatus> read_enrollment(String enrollmentId)
{
try
{
var resp = await restClient.Ai().Audio().V1().Enrollments(enrollmentId).Get();
return resp;
}
catch (Exception ex)
{
Console.WriteLine("Unable to read a speaker identification. " + ex.Message);
return null;
}
}
}
package SpeakserIdentificationEnrollment;
import java.io.IOException;
import com.google.common.reflect.TypeToken;
import com.google.gson.Gson;
import com.ringcentral.*;
import com.ringcentral.definitions.*;
public class SpeakserIdentificationEnrollment {
static RestClient restClient;
public static void main(String[] args) {
var obj = new SpeakserIdentificationEnrollment();
try {
// Instantiate the SDK
restClient = new RestClient("RC_APP_CLIENT_ID", "RC_APP_CLIENT_SECRET", "https://platform.ringcentral.com");
// Authenticate a user using a personal JWT token
restClient.authorize("RC_USER_JWT");
// set your valid audio content file name and path
var contentFile = "VALID_AUDIO_CONTENT_FILE";
obj.create_speaker_enrollment(contentFile);
} catch (RestException e) {
System.out.println(e.getMessage());
} catch (IOException e) {
e.printStackTrace();
}
}
/*
* Enroll speaker identification
*/
private void create_speaker_enrollment(String contenFile)
{
try {
// use own extension id as a unique enrollment id
var enrollmentId = restClient.token.owner_id.toString();
var content_bytes = Files.readAllBytes(Paths.get(contenFile));
var based64_data = Base64.getEncoder().encodeToString(content_bytes);
// check if this speaker id exists
var enrollmentObj = read_enrollment(enrollmentId);
EnrollmentStatus resp = null;
if (enrollmentObj != null) {
// speaker id exists => update it
System.out.println("Existing enrollment");
String jsonStr = new Gson().toJson(enrollmentObj, new TypeToken<Object>(){}.getType());
System.out.println(jsonStr);
var bodyParams = new EnrollmentPatchInput()
.content(based64_data)
.encoding( "Mpeg")
.languageCode( "en-US");
resp = restClient.ai().audio().v1().enrollments(enrollmentId).patch(bodyParams);
} else {
// speaker id does not exist => enroll a new one
var bodyParams = new EnrollmentInput()
.content(based64_data)
.encoding("Mpeg")
.languageCode("en-US")
.enrollmentId(enrollmentId);
resp = restClient.ai().audio().v1().enrollments().post(bodyParams);
}
System.out.println("New enrollment");
@SuppressWarnings("serial")
var jsonStr = new Gson().toJson(resp, new TypeToken<Object>(){}.getType());
System.out.println (jsonStr );
} catch (RestException e) {
System.out.println("Unable to enroll a speaker identification. " + e.getMessage());
}
}
// Read a speaker identification
private EnrollmentStatus read_enrollment(String enrollmentId) throws RestException, IOException {
try {
var resp = restClient.ai().audio().v1().enrollments(enrollmentId).get();
return resp;
} catch (RestException e){
System.out.println("Unable to read a speaker identification. " + e.getMessage());
return null;
}
}
}
Sample response
If your speaker identification request is processed successfully, the response payload will resemble the following:
{
"enrollmentId": "59586xxxx",
"enrollmentComplete": true,
"totalSpeechDuration": 28.180000000000001,
"totalEnrollDuration": 28.0,
"enrollmentQuality": "Average"
}
Attribute | Type | Description |
---|---|---|
speakerId |
String | Registered speaker id. |
enrollmentQuality |
String | Quality of the enrollment. Values will be one of: Poor , Average , Good , High . |
enrollmentComplete |
Bool | Status of the enrollment. Will be set to True if total speech exceeds 12 secs. |
totalSpeechDuration |
Number | Total Speech Duration of the enrollment. |
totalEnrollDuration |
Number | Total Duration of the enrollment. |