Extract interaction analytics from a media file
Interaction analytics is used to understand a conversation happening in a meeting between two or more people and extract from them more meaningful insights at scale. This API is a comprehensive in that in addition to its unique capabilities, it also bundles functionality found in our other APIs. In processing a media file, this API will provide multiple levels of insights, including:
- Conversation insights
- transcription with smart punctuation
- content summaries
- keywords and conversation metrics
- Speaker-level insights
- Utterance-level insights
- emotion recognition
Let's say we want to analyze a meeting between a sales representative and a customer, and that meeting lasted for twenty minutes. Here are some of the insights we can extract using this API:
- Speaker talking time, e.g. a sales representative spoke for ten minutes, and the customer spoke for eight minutes.
- Speaker pace, which is measured by an average number of words spoken per minute.
- Speaker emotions, which was the tone or emotional context of every utterance.
- Auto-generated meeting summary
Extracting interaction analytics
For the best results we recommend following the guidelines below.
-
The
audioType
parameter provides the system with a hint about the nature of the meeting which helps improve accuracy. We recommend setting this parameter toCallCenter
when there are 2-3 speakers expected to be identified andMeeting
when 4 or more speakers are expected. -
Set the
enableVoiceActivityDetection
parameter toTrue
if you want silence and noise segments removed from the diarization output. We suggest you to set it toTrue
in most circumstances. -
Setting the
source
parameter helps to optimize the diarization process by allowing a specialized acoustic model built specifically for the corresponding audio sources. -
If you specify the
speakerIds
parameter, make sure that all the speaker ids in the array exist. Otherwise, the API call will fail. As a good practice, you can always read the speaker ids from your account and use the correct ids of the speakers, who you think that might speak in the audio file.
Request body parameters
Parameter | Type | Description |
---|---|---|
encoding |
String | Encoding of audio file like MP3, WAV etc. |
sampleRate |
Number | Sample rate of the audio file. Optional. |
languageCode |
String | Language spoken in the audio file. Default of "en-US". |
separateSpeakerPerChannel |
Boolean | Set to True if the input audio is multi-channel and each channel has a separate speaker. Optional. Default of False . |
speakerCount |
Number | Number of speakers in the file. Optional. |
audioType |
String | Type of the audio based on number of speakers. Optional. Permitted values: CallCenter , Meeting , EarningsCalls , Interview , PressConference , Voicemail |
speakerIds |
List[String] | Optional set of speakers to be identified from the audio. Optional. |
enableVoiceActivityDetection |
Boolean | Apply voice activity detection. Optional. Default of False . |
contentUri |
String | Publicly facing url. |
source |
String | Source of the audio file eg: Phone , RingCentral , GoogleMeet , Zoom etc. Optional. |
insights |
List[String] | List of insights to be returned. Specify ['All'] to extract all insight analytics. Permitted Values: All , KeyPhrases , Emotion , AbstractiveSummaryLong , AbstractiveSummaryShort , ExtractiveSummary , TalkToListenRatio , Energy , Pace , QuestionsAsked , Topics . |
speechContexts |
List[Phrase Object] | Indicates the words/phrases that will be used for boosting the transcript. This can help to boost accuracy for cases like Person Names, Company names etc. |
Sample code to extract insights of a conversation
The following code sample shows how to extract insights of a conversations from a call recording.
Follow the instructions on the quick start section to setup and run your server code before running the sample code below.
Running the code
- Edit the variables in ALL CAPS with your app and user credentials before running the code.
- You can only run on your production account, this means that you have to use app credentials for production.
- Also make sure that you have recorded several voice recordings of your own voice.
const fs = require ('fs')
const RC = require('@ringcentral/sdk').SDK
// Instantiate the SDK and get the platform instance
var rcsdk = new RC({
server: 'https://platform.ringcentral.com',
clientId: 'RC_APP_CLIENT_ID',
clientSecret: 'RC_APP_CLIENT_SECRET'
});
var platform = rcsdk.platform();
/* Authenticate a user using a personal JWT token */
platform.login({ jwt: 'RC_USER_JWT' })
platform.on(platform.events.loginSuccess, () => {
NGROK = "NGROK-TUNNEL-ADDRESS"
WEBHOOK_URL = NGROK + "/webhook";
CONTENT_URI = "PUBLICLY-ACCESSIBLE-CONTENT-URI"
analyze_interaction()
})
platform.on(platform.events.loginError, function(e){
console.log("Unable to authenticate to platform. Check credentials.", e.message)
process.exit(1)
});
/*
* Transcribe a call recording and analyze interaction
*/
async function analyze_interaction() {
try {
let bodyParams = {
contentUri: CONTENT_URI,
encoding: "Mpeg",
languageCode: "en-US",
source: "RingCentral",
audioType: "Meeting",
insights: [ "All" ],
enableVoiceActivityDetection: true,
separateSpeakerPerChannel: true
}
let endpoint = `/ai/insights/v1/async/analyze-interaction?webhook=${WEBHOOK_URL}`
let resp = await platform.post(endpoint, bodyParams);
let jsonObj = await resp.json();
if (resp.status == 202) {
console.log("Job ID: " + jsonObj.jobId);
console.log("Ready to receive response at: " + WEBHOOK_URL);
}
} catch (e) {
console.log(`Unable to call this API. ${e.message}`);
}
}
from ringcentral import SDK
import os,sys,urllib.parse,json
NGROK_ADDRESS = "NGROK-TUNNEL-ADDRESS"
WEBHOOK_URL = NGROK_ADDRESS + "/webhook";
CONTENT_URI = 'PUBLICLY-ACCESSIBLE-CONTENT-URI'
#
# Transcribe a call recording and analyze interaction
#
def analyze_interaction():
try:
bodyParams = {
'contentUri': CONTENT_URI,
'encoding': "Mpeg",
'languageCode': "en-US",
'source': "RingCentral",
'audioType': "CallCenter",
'insights': [ "All" ],
'enableVoiceActivityDetection': True,
'separateSpeakerPerChannel': True
}
endpoint = f'/ai/insights/v1/async/analyze-interaction?webhook={urllib.parse.quote(WEBHOOK_URL)}'
resp = platform.post(endpoint, bodyParams)
jsonObj = resp.json()
if resp.response().status_code == 202:
print(f'Job ID: {resp.json().jobId}');
print(f'Ready to receive response at: {WEBHOOK_URL}');
except Exception as e:
print ("Unable to analyze interaction. " + str(e))
# Authenticate a user using a personal JWT token
def login():
try:
platform.login( jwt= "RC_USER_JWT" )
analyze_interaction()
except Exception as e:
print ("Unable to authenticate to platform. Check credentials. " + str(e))
# Instantiate the SDK and get the platform instance
rcsdk = SDK("RC_APP_CLIENT_ID", "RC_APP_CLIENT_SECRET", "https://platform.ringcentral.com")
platform = rcsdk.platform()
login()
<?php
require('vendor/autoload.php');
// Instantiate the SDK and get the platform instance
$rcsdk = new RingCentral\SDK\SDK( 'RC_APP_CLIENT_ID', 'RC_APP_CLIENT_SECRET', 'https://platform.ringcentral.com' );
$platform = $rcsdk->platform();
/* Authenticate a user using a personal JWT token */
$platform->login(["jwt" => 'RC_USER_JWT']);
$NGROK_ADDRESS = "NGROK-TUNNEL-ADDRESS";
$WEBHOOK_URL = $NGROK_ADDRESS . "/webhook";
$CONTENT_URI = "PUBLICLY-ACCESSIBLE-CONTENT-URI";
analyze_interaction();
/*
* Transcribe a call recording and analyze interaction
*/
function analyze_interaction()
{
global $platform, $WEBHOOK_URL, $CONTENT_URI;
try {
$bodyParams = array (
'contentUri' => $CONTENT_URI,
'encoding' => "Mpeg",
'languageCode' => "en-US",
'source' => "RingCentral",
'audioType' => "CallCenter",
'insights' => array ( "All" ),
'enableVoiceActivityDetection' => True,
'separateSpeakerPerChannel' => True
);
$endpoint = "/ai/insights/v1/async/analyze-interaction?webhook=" . urlencode($WEBHOOK_URL);
$resp = $platform->post($endpoint, $bodyParams);
$jsonObj = $resp->json();
if ($resp->response()->getStatusCode() == 202) {
print_r ("Job ID: " . $jsonObj->jobId . PHP_EOL);
print_r("Ready to receive response at: " . $WEBHOOK_URL . PHP_EOL);
}
}catch (\RingCentral\SDK\Http\ApiException $e) {
// Getting error messages using PHP native interface
print_r ('HTTP Error: ' . $e->getMessage() . PHP_EOL);
// Another way to get message, but keep in mind, that there could be no response if request has failed completely
print_r ('Unable to analyze interaction. ' . $e->apiResponse->response()->error() . PHP_EOL);
}
}
?>
require 'ringcentral'
NGROK_ADDRESS = "NGROK-TUNNEL-ADDRESS"
WEBHOOK_URL = NGROK_ADDRESS + "/webhook";
CONTENT_URI = 'PUBLICLY-ACCESSIBLE-CONTENT-URI'
#
# Transcribe a call recording and analyze interaction
#
def analyze_interaction()
bodyParams = {
'contentUri': CONTENT_URI,
'encoding': "Mpeg",
'languageCode': "en-US",
'source': "RingCentral",
'audioType': "CallCenter",
'insights': [ "All" ],
'enableVoiceActivityDetection': true,
'separateSpeakerPerChannel': true
}
queryParams = {
'webhook': WEBHOOK_URL
}
endpoint = "/ai/insights/v1/async/analyze-interaction"
begin
resp = $platform.post(endpoint, payload: bodyParams, params: queryParams)
body = resp.body
if resp.status == 202
puts('Job ID: ' + body['jobId']);
puts ('Ready to receive response at: ' + WEBHOOK_URL);
end
rescue StandardError => e
puts ("Unable to analyze interaction. " + e.to_s)
end
end
# Authenticate a user using a personal JWT token
def login()
begin
$platform.authorize( jwt: "RC_USER_JWT" )
analyze_interaction()
rescue StandardError => e
puts ("Unable to authenticate to platform. Check credentials. " + e.to_s)
end
end
# Instantiate the SDK and get the platform instance
$platform = RingCentral.new( "RC_APP_CLIENT_ID", "RC_APP_CLIENT_SECRET", "https://platform.ringcentral.com" )
login()
using System;
using System.IO;
using System.Threading.Tasks;
using System.Collections.Generic;
using RingCentral;
using Newtonsoft.Json;
namespace AnalyzeInteraction {
class Program {
static RestClient restClient;
static string NGROK_ADDRESS = "NGROK-TUNNEL-ADDRESS";
static string WEBHOOK_URL = NGROK_ADDRESS + "/webhook";
static string CONTENT_URI = "PUBLICLY-ACCESSIBLE-CONTENT-URI";
static async Task Main(string[] args){
try
{
// Instantiate the SDK
restClient = new RestClient("RC_APP_CLIENT_ID", "RC_APP_CLIENT_SECRET", "https://platform.ringcentral.com");
// Authenticate a user using a personal JWT token
await restClient.Authorize("RC_USER_JWT");
await analyze_interaction();
}
catch (Exception ex)
{
Console.WriteLine("Unable to authenticate to platform. Check credentials. " + ex.Message);
}
}
/*
* Transcribe a call recording and analyze interaction
*/
static private async Task analyze_interaction()
{
try
{
var bodyParams = new InteractionInput()
{
contentUri = CONTENT_URI,
encoding = "Mpeg",
languageCode = "en-US",
source = "RingCentral",
audioType = "CallCenter",
insights = new String[] { "All" },
enableVoiceActivityDetection = true,
separateSpeakerPerChannel = true
};
var queryParams = new CaiAnalyzeInteractionParameters() { webhook = WEBHOOK_URL };
var resp = await restClient.Ai().Insights().V1().Async().AnalyzeInteraction().Post(bodyParams, queryParams);
Console.WriteLine("Job ID: " + resp.jobId);
Console.WriteLine("Ready to receive response at: " + WEBHOOK_URL);
}
catch (Exception ex)
{
Console.WriteLine("Unable to analyze interaction. " + ex.Message);
}
}
}
}
package AnalyzeInteraction;
import java.io.IOException;
import com.google.common.reflect.TypeToken;
import com.google.gson.Gson;
import com.ringcentral.*;
import com.ringcentral.definitions.*;
public class AnalyzeInteraction {
static String NGROK_ADDRESS = "NGROK-TUNNEL-ADDRESS";
static String WEBHOOK_URL = NGROK_ADDRESS + "/webhook";
static String CONTENT_URI = "PUBLICLY-ACCESSIBLE-CONTENT-URI";
static RestClient restClient;
public static void main(String[] args) {
var obj = new AnalyzeInteraction();
try {
// Instantiate the SDK
restClient = new RestClient("RC_APP_CLIENT_ID", "RC_APP_CLIENT_SECRET", "https://platform.ringcentral.com");
// Authenticate a user using a personal JWT token
restClient.authorize("RC_USER_JWT");
obj.analyze_interaction();
} catch (RestException e) {
System.out.println(e.getMessage());
} catch (IOException e) {
e.printStackTrace();
}
}
/*
* Transcribe a call recording and analyze interaction
*/
private void analyze_interaction()
{
try {
var bodyParams = new InteractionInput()
.contentUri(CONTENT_URI)
.encoding("Mpeg")
.languageCode("en-US")
.source("RingCentral")
.audioType("CallCenter")
.insights(new String[] {"All"})
.enableVoiceActivityDetection(true)
.separateSpeakerPerChannel(true);
var queryParams = new CaiAnalyzeInteractionParameters().webhook(WEBHOOK_URL);
var resp = restClient.ai().insights().v1().async().analyzeInteraction().post(bodyParams, queryParams);
System.out.println("Job ID: " + resp.jobId);
System.out.println("Ready to receive response at: " + WEBHOOK_URL);
} catch (Exception ex) {
System.out.println("Unable to analyze interaction. " + ex.getMessage());
}
}
}
Example response
{
"jobId": "80800e1a-a663-11ee-b548-0050568ccd07",
"api": "/ai/insights/v1/async/analyze-interaction",
"creationTime": "2023-12-29T16:01:18.558Z",
"completionTime": "2023-12-29T16:01:29.217Z",
"expirationTime": "2024-01-05T16:01:18.558Z",
"status": "Success",
"response": {
"utteranceInsights": [
{
"start": 3.72,
"end": 7.56,
"text": "Good evening, thank you for calling electronics or this is Rachel.",
"confidence": 0.85,
"speakerId": "0",
"insights": [
{
"name": "Emotion",
"value": "Neutral",
"confidence": 0.54
}
]
},
{
"start": 7.56,
"end": 8.96,
"text": "How may I assist you?",
"confidence": 0.85,
"speakerId": "0",
"insights": [
{
"name": "Emotion",
"value": "Fear",
"confidence": 0.71
}
]
},
{
"start": 8.96,
"end": 9.8,
"text": "Hi, Rachel.",
"confidence": 0.85,
"speakerId": "1",
"insights": [
{
"name": "Emotion",
"value": "Neutral",
"confidence": 0.79
}
]
},
{
"start": 9.8,
"end": 11.16,
"text": "I would like to know how to use this car.",
"confidence": 0.85,
"speakerId": "1",
"insights": [
{
"name": "Emotion",
"value": "Neutral",
"confidence": 0.4
}
]
},
{
"start": 11.16,
"end": 14.28,
"text": "Bluetooth headset I recently purchased from your store.",
"confidence": 0.85,
"speakerId": "1",
"insights": [
{
"name": "Emotion",
"value": "Neutral",
"confidence": 0.46
}
]
},
{
"start": 14.28,
"end": 21.36,
"text": "Sure, ma'am, I can help you out with that, but before anything else, I have your name so that I can address you properly.",
"confidence": 0.87,
"speakerId": "0",
"insights": [
{
"name": "Emotion",
"value": "Neutral",
"confidence": 0.91
}
]
},
{
"start": 21.36,
"end": 23.58,
"text": "Yes, this is Meredith Blake.",
"confidence": 0.87,
"speakerId": "1",
"insights": [
{
"name": "Emotion",
"value": "Neutral",
"confidence": 0.91
}
]
},
...
],
"speakerInsights": {
"speakerCount": 2,
"insights": [
{
"name": "Energy",
"values": [
{
"speakerId": "0",
"value": 93.11
},
{
"speakerId": "1",
"value": 93.65
}
]
},
{
"name": "Pace",
"values": [
{
"speakerId": "0",
"value": "medium",
"wpm": 152.9
},
{
"speakerId": "1",
"value": "fast",
"wpm": 196.9
}
]
},
{
"name": "TalkToListenRatio",
"values": [
{
"speakerId": "0",
"value": "58:42"
},
{
"speakerId": "1",
"value": "42:58"
}
]
},
{
"name": "QuestionsAsked",
"values": [
{
"speakerId": "0",
"value": 5,
"questions": [
{
"text": "Good evening, thank you for calling electronics or this is Rachel. How may I assist you?",
"start": 3.72,
"end": 8.96
},
{
"text": "Okay, thank you for that, Mrs. Plague, what exactly do you want done with your headset?",
"start": 23.9,
"end": 29.72
},
...
]
},
{
"speakerId": "1",
"value": 3,
"questions": [
{
"text": "Well, we have already done that. I only ask a simple question. Why can't you seem to get that?",
"start": 102.22,
"end": 107.7
},
...
]
}
]
}
]
},
"conversationalInsights": [
{
"name": "KeyPhrases",
"values": [
{
"start": 11.55,
"end": 11.94,
"value": "headset",
"confidence": 0.92
},
{
"start": 13.89,
"end": 14.28,
"value": "store",
"confidence": 0.94
},
{
"start": 29.36,
"end": 29.72,
"value": "headset",
"confidence": 0.86
},
{
"start": 34.32,
"end": 34.72,
"value": "headset",
"confidence": 0.91
},
{
"start": 38.68,
"end": 39.08,
"value": "phone",
"confidence": 0.86
},
{
"start": 43.77,
"end": 44.24,
"value": "iphone",
"confidence": 0.89
},
...
]
},
{
"name": "ExtractiveSummary",
"values": []
},
{
"name": "Topics",
"values": [
{
"value": "car bluetooth headset",
"start": 9.8,
"end": 114.2,
"confidence": 0.92
}
]
},
{
"name": "AbstractiveSummaryLong",
"values": [
{
"value": "First speaker helps second speaker use a car bluetooth headset from the store and asks speaker 1 to switch off the device with speaker's phone.",
"start": 3.72,
"end": 114.2,
"confidence": 0.4,
"groupId": "0"
}
]
},
{
"name": "AbstractiveSummaryShort",
"values": [
{
"value": "First speaker helps second speaker use a car bluetooth headset from the store and asks speaker 1 to switch off the device with speaker's phone.",
"start": 3.72,
"end": 114.2,
"confidence": 0.4
}
]
}
]
}
}
Interaction-Analytics-Object
Interaction analytics are presented by insights grouped and categorized under the following category objects:
Parameter | Type | Description |
---|---|---|
utteranceInsights |
List[Utterance-Insights] | List of utterances and the insights computed for each utterance. |
speakerInsights |
Object | The set of insights computed for each speaker separately. |
conversationalInsights |
List[Conversational-Insights-Object] | List of insights computed by analyzing the conversation as a whole. |
Utterance-Insights
The utteranceInsights
is a list of objects, with each object contains the following key/value pairs:
Parameter | Type | Description |
---|---|---|
start |
Number | Start time of the audio segment in seconds. |
end |
Number | End time of the audio segment in seconds. |
text |
String | The transcription output corresponding to the segment (a.k.a an utterance). |
confidence |
Number | The confidence score for the transcribed segment. |
speakerId |
String | The speaker id for the corresponding audio segment. |
insights |
List[Utterance-Insights-Unit] | List of insights from the utterance text. |
Utterance-Insights-Unit
Currently, only the Emotion
insight is supported
Parameter | Type | Description |
---|---|---|
name |
String Enum | Currently supported insight: [ Emotion ]. |
value |
String | Possible values: Anger, Anticipation, Disgust, Fear, Joy, Sadness, Surprise, Trust, Neutral. |
confidence |
Number | Confidence Score. Optional. |
Speaker-Insights-Object
The speakerInsights
object contain the number of speakers which was detected
Parameter | Type | Description |
---|---|---|
speakerCount |
Number | Number of speakers detected. In case speakerCount is not specified, the number of speakers are estimated algorithmically. |
insights |
List[Speaker-Insights-Unit] | List of overall level insights. Each insight is computed separately for each speaker. |
Speaker-Insights-Unit
Parameter | Type | Description |
---|---|---|
name |
String Enum | Name of the insight. Possible values: Energy , Pace , TalkToListenRatio , QuestionsAsked |
values |
List[Speaker-Insights-Value-Unit] | Value corresponding to the insight |
Speaker-Insights-Value-Unit
-
Energy
Parameter Type Description speakerId
String The speaker id for whom insights are computed. value
Number The computed value of the insight for this speaker. -
Pace
Parameter Type Description speakerId
String The speaker id for whom insights are computed. value
String The label of speech speed. slow
,medium
orfast
.wpm
Number The average number of words per minute spoken by this speaker. -
TalkToListenRatio
Parameter Type Description speakerId
String The speaker id for whom insights are computed. value
String The computed time ratio a speaker talks and listens. -
QuestionsAsked
Parameter Type Description speakerId
String The speaker id for whom insights are computed. value
Number The computed value of the insight for this speaker. questions
List[Question-Insights-Value-Unit] List of questions asked by each speaker.
Question-Insights-Value-Unit
Parameter | Type | Description |
---|---|---|
text |
String | The question a speaker asked. |
start |
Number | The start time of the audio segment in seconds. |
end |
Number | The end time of the audio segment in seconds. |
Timed-Segment
Parameter | Type | Description |
---|---|---|
start |
Number | Start time of the audio segment in seconds. |
end |
Number | End time of the audio segment in seconds. |
Conversational-Insights-Object
Parameter | Type | Description |
---|---|---|
name |
String Enum | Name of the insight. Possible values: AbstractiveSummaryLong , AbstractiveSummaryShort , ExtractiveSummary , KeyPhrases , Topics |
values |
List[Conversational-Insights-Value-Unit] | Value corresponding to the insight |
Conversational-Insights-Value-Unit
-
KeyPhrases
Parameter Type Description start
Number Start time of the audio segment in seconds. end
Number End time of the audio segment in seconds. value
String The output corresponding to the insight. confidence
Number The confidence score for the computed insight. -
Topics
Parameter Type Description start
Number Start time of the audio segment in seconds. end
Number End time of the audio segment in seconds. value
String The output corresponding to the insight. confidence
Number The confidence score for the computed insight. -
ExtractiveSummary
Parameter Type Description start
Number Start time of the audio segment in seconds. end
Number End time of the audio segment in seconds. sentence
String The summarized text segment. -
AbstractiveSummaryLong
Parameter Type Description value
String The text of a long abstractive summary. start
Number Start time of the audio segment in seconds. end
Number End time of the audio segment in seconds. confidence
Number The confidence score for the computed insight. groupId
String The index of this long abstractive summary. -
AbstractiveSummaryShort
Parameter Type Description value
String The text of a short abstractive summary. start
Number Start time of the audio segment in seconds. end
Number End time of the audio segment in seconds. confidence
Number The confidence score for the computed insight.
NOTES:
- In case of
ExtractiveSummary
, the start and end times refer to the exact time of the segment. - In case of
AbstractiveSummaryLong
andAbstractiveSummaryShort
the start and end time refer to the time of text blob which is abstracted.