Speaker enrollment

Last updated: 2024-02-08

Contributors

Speaker enrollment is a process by which an identity becomes associated with a voice or acoustic signature. This allows RingCentral to include those identities in any reports it generates about the speakers within a media file.

Our Speaker Enrollment API can be used to register speakers and their voices before calling our other speaker-related APIs such as speaker identification, and speaker diarization. The speaker enrollment process is content agnostic, meaning there are no specific requirements or restrictions on what the speaker says in order generate their audio signature. However, for the best results, obey the following guidelines:

Use audio samples that are 12-30 seconds in length. Samples with more than 30 seconds will potentially be rejected due to too large size of based64 encoded content.
Ensure the sample include continuous and monologue speech, no silence, and no background noise if possible.
Enroll a speaker multiple times using different audio samples that exhibit diversity in the person's speech. After each enrollment, check the enrollment quality in the response. You should expect the quality to be "High".
If the total speech duration of an enrollment is less than 12 seconds, the enrollment will be treated as incomplete and enrollmentComplete will be set to false.
Enrollments with status enrollmentComplete=True will be considered for identification, otherwise an error will be returned.

When enrolling a new speaker identification, a unique speakerId (a string of alphabetic, numeric and underscore characters) must be specified as an input. The system will register the speakerId value and associate it with the voice signature in the provided audio file. If the system detects that the specified speakerId exists, it will return an error '409 Conflict'. In that case, you have to update the enrollment if the audio content is recorded from the same person's speech that previously enrolled with that speaker Id. Or you have to specify a new speakerId value if the audio content is recorded from a different person's speech.

A speaker Id will be returned in a response of other AI APIs; E.g. the speech to text API, which supports the speaker identification if the voice speech in the audio matches the voice signature associated with the specified speakerId (provided that you set the speakerId in the API request!). The system does not record any other personal data relating to the speaker. Therefore, it is the developer's responsibility to store a speaker id and associate it with other data that will allow the client application to display the speaker's name or other speaker metadata in the final output.

It's worth mentioning that a speakerId is a unique identifier within the same RingCentral account. Therefore, using a person's name to specify the speaker id of users in a large RingCentral account might cause name collision problem. The best practice to enroll speaker identification for users under the same RingCentral account is to use the user's extension id, which is a unique id not only under the same account, but also across all other RingCentral accounts. The benefit of using a user's extension id as a speaker id is that you don't need to store the speaker id with other associated user metadata in your database. You can always use an extension id to read the user metadata e.g. the user first and last name from your RingCentral account and replace the speaker id with the name of the speaker. This is also useful for transcribing a RingCentral call recording when you get the recording URL from a call record from your call log data, which contains the extension id of the user as one of the call party. In that case, simply specify the extension id in the speakerIds body params of the AI API.

{
  "uri": 'https://platform.ringcentral.com/restapi/v1.0/account/...',
  "id": 'WKt-N7_...',
  "duration": 228,
  "durationMs": 227606,
  "type": 'Voice',
  "direction": 'Inbound',
  "action": 'Phone Call',
  "result": 'Accepted',
  "to": {
        "name": 'Agent 1200',
        "extensionId": '59586xxxx',
        "extensionNumber": '11120'
  },
  "from": { "name": '...', "phoneNumber": '...' },
  "extension": {
        "uri": 'https://platform.ringcentral.com/restapi/v1.0/account/...',
        "id": 59586xxxx
  },
  "recording": {
        "uri": "https://platform.ringcentral.com/restapi/v1.0/account/40119014xxxx/recording/401547458000",
        "id": "401547458000",
        "type": "OnDemand",
        "contentUri": "https://media.ringcentral.com/restapi/v1.0/account/40119014xxxx/recording/401547458000/content"
  }
  ...
}

If you find that speaker identification is unreliable for a given individual, you may want to consider augmenting a speaker's enrollment with additional audio files. The process of reenrolling a speaker is done by updating an existing speaker id.

Sample code

The following sample code shows how to enroll a speaker identification for a user extension using its extension id as a speakerId. It checks if a speaker id exists, then update the enrollment, otherwise it creates a new speaker id enrollment.

Running the code

Edit the variables in ALL CAPS with your app and user credentials before running the code.
You can only run on your production account, this means that you have to use app credentials for production.
Also make sure that you have recorded several voice recordings of your own voice.

JavaScript

const fs = require ('fs')
const RC = require('@ringcentral/sdk').SDK

// Instantiate the SDK and get the platform instance
var rcsdk = new RC({
    server: 'https://platform.ringcentral.com',
    clientId: 'RC_APP_CLIENT_ID',
    clientSecret: 'RC_APP_CLIENT_SECRET'
});
var platform = rcsdk.platform();

/* Authenticate a user using a personal JWT token */
platform.on(platform.events.loginSuccess, () => {
    // set your valid audio content file name and path
    let contentFile = "VALID_AUDIO_CONTENT_FILE"
    create_speaker_enrollment(contentFile)
})

platform.on(platform.events.loginError, function(e){
    console.log("Unable to authenticate to platform. Check credentials.", e.message)
    process.exit(1)
});

/*
* Enroll a speaker id
*/
async function create_speaker_enrollment(contentFile) {
  try{
    // use own extension id as a unique enrollment id
    let tokens = await platform.auth().data()
    let enrollmentId = tokens.owner_id

    const fs = require('fs')
    const base64data = fs.readFileSync(contentFile, {encoding: 'base64'})
    console.log(base64data.length)
    let endpoint = "/ai/audio/v1/enrollments"

    // check if this speaker id exists
    let enrollment = await read_enrollment(enrollmentId)
    if (enrollment){
      // speaker id exists => update it
      console.log("Existing enrollment", enrollment)
      let bodyParams = {
              encoding: "Mpeg", // Change the encoding if not an MP3 or MP4 file!
              languageCode: "en-US", // Change language code if not English US
              content: base64data
            }
      var resp = await platform.patch(`${endpoint}/${enrollmentId}`, bodyParams)
    }else{
      // speaker id does not exist => enroll a new one
      let bodyParams = {
              encoding: "Mpeg", // Change the encoding if not an MP3 or MP4 file!
              languageCode: "en-US",
              content: base64data,
              enrollmentId: enrollmentId
            }
      let endpoint = "/ai/audio/v1/enrollments"
      var resp = await platform.post(endpoint, bodyParams)
    }
    var jsonObj = await resp.json()
    console.log("New enrollment", jsonObj)
  }catch (e){
    console.log("Unable to enroll speaker identification.", e.message)
  }
}

/*
* Read a speaker id
*/
async function read_enrollment(enrollmentId) {
  try{
    let endpoint = `/ai/audio/v1/enrollments/${enrollmentId}`
    var resp = await platform.get(endpoint)
    var jsonObj = await resp.json()
    return jsonObj
  }catch (e){
    console.log("Unable to find this speaker identification.", e.message)
    return null
  }
}

Python

from ringcentral import SDK
import json
import base64

#
# Read a speaker id
#
def read_enrollment(enrollmentId):
    try:
        endpoint = f"/ai/audio/v1/enrollments/{enrollmentId}"
        resp = platform.get(endpoint)
        jsonObj = resp.json_dict()
        return jsonObj
    except Exception as e:
        print ("Unable to find this speaker identification. " + str(e))
        return None


#
# Enroll speaker identification
#
def create_speaker_enrollment(contentFile):
    try:
        # use own extension id as a unique enrollment id
        tokens = platform.auth().data()
        enrollmentId = str(tokens['owner_id'])

        with open(contentFile, "rb") as f:
            base64_bytes = base64.b64encode(f.read())
        base64_string = base64_bytes.decode('utf-8')

        endpoint = '/ai/audio/v1/enrollments'

        # check if this speaker id exists
        enrollmentObj = read_enrollment(enrollmentId)
        if enrollmentObj != None:
            # speaker id exists => update it
            print ("Existing enrollment")
            print(json.dumps(enrollmentObj, indent=2, sort_keys=True))
            bodyParams = {
                'encoding': "Mpeg",
                'languageCode': "en-US",
                'content': base64_string
            }
            resp = platform.patch(f"{endpoint}/{enrollmentId}", bodyParams)
        else:
            # speaker id does not exist => enroll a new one
            bodyParams = {
                'encoding': "Mpeg",
                'languageCode': "en-US",
                'content': base64_string,
                'enrollmentId': enrollmentId
            }
            resp = platform.post(endpoint, bodyParams)

        jsonObj = resp.json_dict()
        print ("New enrollment")
        print(json.dumps(jsonObj, indent=2, sort_keys=True))
    except Exception as e:
      print ("Unable to enroll speaker identification. " + str(e))


# Authenticate a user using a personal JWT token
def login():
  try:
      platform.login( jwt= "RC_USER_JWT" )
      # et your valid audio content file name and path
      contentFile = "VALID_AUDIO_CONTENT_FILE"
      create_speaker_enrollment(contentFile)
  except Exception as e:
      print ("Unable to authenticate to platform. Check credentials. " + str(e))

# Instantiate the SDK and get the platform instance
rcsdk = SDK("RC_APP_CLIENT_ID", "RC_APP_CLIENT_SECRET", "https://platform.ringcentral.com")
platform = rcsdk.platform()

login()

PHP

<?php
require('vendor/autoload.php');

// Instantiate the SDK and get the platform instance
$rcsdk = new RingCentral\SDK\SDK( 'RC_APP_CLIENT_ID', 'RC_APP_CLIENT_SECRET', 'https://platform.ringcentral.com' );
$platform = $rcsdk->platform();

/* Authenticate a user using a personal JWT token */
$platform->login(["jwt" => 'RC_USER_JWT']);
// For code sample testing purpuse, we set the file name in the environment.
// Replace the $contentFile value with your valid audio file!
$contentFile = $_ENV['ENROLLMENT_CONTENT_3'];
create_speaker_enrollment($contentFile);

/*
* Enroll speaker identification
*/
function create_speaker_enrollment($contentFile)
{
  global $platform;
  try{
    // use own extension id as a unique enrollment id
    $tokens = $platform->auth()->data();
    $enrollmentId = $tokens['owner_id'];

    $content =  file_get_contents($contentFile);
    $base64data = base64_encode($content);

    $endpoint = "/ai/audio/v1/enrollments";

    // check if this speaker id exists
    $enrollmentObj = read_enrollment($enrollmentId);
    if ($enrollmentObj){
      // speaker id exists => update it
      print_r ("Existing enrollment");
      print_r (json_encode($enrollmentObj, JSON_PRETTY_PRINT));
      $bodyParams = array (
        'encoding' => "Mpeg",
        'languageCode' => "en-US",
        'content' => $base64data
      );
      $resp = $platform->patch($endpoint . "/" . $enrollmentId, $bodyParams);
    }else{
      // speaker id does not exist => enroll a new one
      $bodyParams = array (
        'encoding' => "Mpeg",
        'languageCode' => "en-US",
        'content' => $base64data,
        'enrollmentId' => $enrollmentId
      );
      $resp = $platform->post($endpoint, $bodyParams);
    }
    print_r ("New enrollment");
    print_r (json_encode($resp->json(), JSON_PRETTY_PRINT));
  }catch (\RingCentral\SDK\Http\ApiException $e) {
    print_r ('Unable to enroll speaker identification. ' . $e->getMessage() . PHP_EOL);
  }
}

/*
* Read a speaker id
*/
function read_enrollment($enrollmentId) {
  global $platform;
  try{
    $endpoint = "/ai/audio/v1/enrollments/" .$enrollmentId;
    $resp = $platform->get($endpoint);
    $jsonObj = $resp->json();
    return $jsonObj;
  }catch (\RingCentral\SDK\Http\ApiException $e) {
    print_r ("Unable to find this speaker identification." . $e->getMessage() . PHP_EOL);
    return null;
  }
}
?>

Ruby

require 'ringcentral'
require 'base64'

#
# Read a speaker id
#
def read_enrollment(enrollmentId)
    begin
        endpoint = "/ai/audio/v1/enrollments/" + enrollmentId
        resp = $platform.get(endpoint)
        jsonObj = resp.body
        return jsonObj
    rescue StandardError => e
        puts ("Unable to find this speaker identification. " + e.to_s)
        return nil
    end
end

#
# Enroll speaker identification
#
def create_speaker_enrollment(contentFile)
    begin
        # use own extension id as a unique enrollment id
        tokens = $platform.token
        enrollmentId = tokens['owner_id'].to_s

        file = File.open(contentFile, "rb")
        contents = file.read
        base64_string = Base64.encode64(contents)

        endpoint = "/ai/audio/v1/enrollments"

        # check if this speaker id exists
        enrollmentObj = read_enrollment(enrollmentId)
        if enrollmentObj != nil
            # speaker id exists => update it
            puts ("Existing enrollment")
            puts (enrollmentObj)
            bodyParams = {
                  'encoding': "Mpeg",
                  'languageCode': "en-US",
                  'content': base64_string
                }
            resp = $platform.patch(endpoint + "/" + enrollmentId, payload: bodyParams)
        else
            # speaker id does not exist => enroll a new one
            bodyParams = {
                  'encoding': "Mpeg",
                  'languageCode': "en-US",
                  'content': base64_string,
                  'enrollmentId': enrollmentId
                }
            resp = $platform.post(endpoint, payload: bodyParams)
        end

        jsonObj = resp.body
        puts ("New enrollment")
        puts (jsonObj)
    rescue StandardError => e
      puts ("Unable to enroll speaker identification. " + e.to_s)
    end
end

# Authenticate a user using a personal JWT token
def login()
  begin
    $platform.authorize( jwt: "RC_USER_JWT" )
    # set your valid audio content file name and path
    contentFile = "VALID_AUDIO_CONTENT_FILE"
    create_speaker_enrollment(contentFile)
  rescue StandardError => e
    puts ("Unable to authenticate to platform. Check credentials. " + e.to_s)
  end
end

# Instantiate the SDK and get the platform instance
$platform = RingCentral.new( "RC_APP_CLIENT_ID", "RC_APP_CLIENT_SECRET", "https://platform.ringcentral.com" )

login()

using System;
using System.IO;
using System.Threading.Tasks;
using System.Collections.Generic;
using RingCentral;
using Newtonsoft.Json;

namespace SpeakserIdentificationEnrollment {
  class Program {
    static RestClient restClient;
    static async Task Main(string[] args){
      try
      {
        // Instantiate the SDK
        restClient = new RestClient("RC_APP_CLIENT_ID", "RC_APP_CLIENT_SECRET", "https://platform.ringcentral.com");

        // Authenticate a user using a personal JWT token
        await restClient.Authorize("RC_USER_JWT");

        // set your valid audio content file name and path
        var contentFile = "VALID_AUDIO_CONTENT_FILE";
        await create_speaker_enrollment(contentFile);
      }
      catch (Exception ex)
      {
        Console.WriteLine("Unable to authenticate to platform. Check credentials. " + ex.Message);
      }
    }
    /*
    * Enroll speaker identification
    */
    static private async Task create_speaker_enrollment(String contentFile)
    {
      try
      {
        // use own extension id as a unique enrollment id
        var enrollmentId = restClient.token.owner_id.ToString();

        var content_bytes = System.IO.File.ReadAllBytes(contentFile);
        var based64_data =  System.Convert.ToBase64String(content_bytes);

        // check if this speaker id exists
        var enrollmentObj = await read_enrollment(enrollmentId);
        EnrollmentStatus resp = null;

        if (enrollmentObj != null)
        {
          // speaker id exists => update it
          Console.WriteLine("Existing enrollment");
          Console.WriteLine(JsonConvert.SerializeObject(enrollmentObj));
          var bodyParams = new EnrollmentPatchInput()
                          {
                            content = based64_data,
                            encoding = "Mpeg",
                            languageCode = "en-US"
                          };
          resp = await restClient.Ai().Audio().V1().Enrollments(enrollmentId).Patch(bodyParams);
        }
        else
        {
          // speaker id does not exist => enroll a new one
          var bodyParams = new EnrollmentInput()
                          {
                            content = based64_data,
                            encoding = "Mpeg",
                            languageCode = "en-US",
                            enrollmentId = enrollmentId
                          };
          resp = await restClient.Ai().Audio().V1().Enrollments().Post(bodyParams);
        }

        Console.WriteLine("New enrollment");
        var jsonStr = JsonConvert.SerializeObject(resp);
        Console.WriteLine(jsonStr);
      }
      catch (Exception ex)
      {
        Console.WriteLine("Unable to enroll a speaker identification. " + ex.Message);
      }
    }
    // Read a speaker identification
    static private async Task<EnrollmentStatus> read_enrollment(String enrollmentId)
    {
      try
      {
        var resp = await restClient.Ai().Audio().V1().Enrollments(enrollmentId).Get();
        return resp;
      }
      catch (Exception ex)
      {
        Console.WriteLine("Unable to read a speaker identification. " + ex.Message);
        return null;
      }
    }
}

Java

package SpeakserIdentificationEnrollment;

import java.io.IOException;
import com.google.common.reflect.TypeToken;
import com.google.gson.Gson;

import com.ringcentral.*;
import com.ringcentral.definitions.*;

public class SpeakserIdentificationEnrollment {
    static RestClient restClient;

    public static void main(String[] args) {
      var obj = new SpeakserIdentificationEnrollment();
      try {
        // Instantiate the SDK
        restClient = new RestClient("RC_APP_CLIENT_ID", "RC_APP_CLIENT_SECRET", "https://platform.ringcentral.com");

        // Authenticate a user using a personal JWT token
        restClient.authorize("RC_USER_JWT");

        // set your valid audio content file name and path
        var contentFile = "VALID_AUDIO_CONTENT_FILE";
        obj.create_speaker_enrollment(contentFile);

      } catch (RestException e) {
        System.out.println(e.getMessage());
      } catch (IOException e) {
        e.printStackTrace();
      }
    }
    /*
    * Enroll speaker identification
    */
    private void create_speaker_enrollment(String contenFile)
    {
      try {
        // use own extension id as a unique enrollment id
        var enrollmentId = restClient.token.owner_id.toString();

        var content_bytes = Files.readAllBytes(Paths.get(contenFile));
        var based64_data =  Base64.getEncoder().encodeToString(content_bytes);

        // check if this speaker id exists
        var enrollmentObj = read_enrollment(enrollmentId);
        EnrollmentStatus resp = null;

        if (enrollmentObj != null) {
          // speaker id exists => update it
          System.out.println("Existing enrollment");
          String jsonStr = new Gson().toJson(enrollmentObj, new TypeToken<Object>(){}.getType());
          System.out.println(jsonStr);
          var bodyParams = new EnrollmentPatchInput()
                .content(based64_data)
                .encoding( "Mpeg")
                .languageCode( "en-US");
          resp =  restClient.ai().audio().v1().enrollments(enrollmentId).patch(bodyParams);
        } else {
          // speaker id does not exist => enroll a new one
          var bodyParams = new EnrollmentInput()
                .content(based64_data)
                .encoding("Mpeg")
                .languageCode("en-US")
                .enrollmentId(enrollmentId);
          resp = restClient.ai().audio().v1().enrollments().post(bodyParams);
        }

        System.out.println("New enrollment");
        @SuppressWarnings("serial")
        var jsonStr = new Gson().toJson(resp, new TypeToken<Object>(){}.getType());
        System.out.println (jsonStr );
      } catch (RestException e) {
        System.out.println("Unable to enroll a speaker identification. " + e.getMessage());
      }
    }
    // Read a speaker identification
    private EnrollmentStatus read_enrollment(String enrollmentId) throws RestException, IOException {
      try {
        var resp = restClient.ai().audio().v1().enrollments(enrollmentId).get();
        return resp;
      } catch (RestException e){
        System.out.println("Unable to read a speaker identification. " + e.getMessage());
        return null;
      }
    }
}

Sample response

If your speaker identification request is processed successfully, the response payload will resemble the following:

{
    "enrollmentId": "59586xxxx",
    "enrollmentComplete": true,
    "totalSpeechDuration": 28.180000000000001,
    "totalEnrollDuration": 28.0,
    "enrollmentQuality": "Average"
}

Attribute	Type	Description
`speakerId`	String	Registered speaker id.
`enrollmentQuality`	String	Quality of the enrollment. Values will be one of: `Poor`, `Average`, `Good`, `High`.
`enrollmentComplete`	Bool	Status of the enrollment. Will be set to `True` if total speech exceeds 12 secs.
`totalSpeechDuration`	Number	Total Speech Duration of the enrollment.
`totalEnrollDuration`	Number	Total Duration of the enrollment.