Introduction to the RingCentral Artificial Intelligence APIs
RingCentral Artificial Intelligence API is in beta
The RingCentral's Artificial Intelligence API is currently in beta. Developers should be aware of the following:
- Their feature sets are not reflective of the full scope currently planned.
- Backwards compatibility is not guaranteed from one release to the next during the beta period. Changes can be introduced at any time that may impact your applications with little notice.
Getting Started with the Artificial Intelligence API
The RingCentral Artificial Intelligence API helps developers process and extract meaningful insights from media files. This not only includes creating a transcript, but also aids in performing sentiment and emotion analysis, speaker identification, and speaker diarization. It also aids in the extraction and generation of key content, like content summaries and action items.
What can I do using the Artificial Intelligence API?
In addition to converting speech-to-text, the Artificial Intelligence API also helps developers to extract meaningful insights and data from media files. Using the Artificial Intelligence API, developers can:
- Convert speech to text
- Detect the engagement level with talk-to-listen ratios
- Enhance post-meeting experiences by automatically creating an agenda for the next meeting
- Perform sentiment/emotional analysis of those speaking
Key Artificial Intelligence API concepts
Below are the details of some of the key concepts used by these Artificial Intelligence APIs which we would like you to become familiar with.
Calling the APIs asyncrhonously
Certain requests, such as extracting emotion from a large audio, may take some time to process and could result in timing out your request. In these circumstances we recommend calling this APIs in an asynchronous manner by specifying a URL via the request's
webhook parameter. When RingCentral finishes processing your request, a response payload will be posted to the webhook URL you specified.
To correlate any callback you receive in this manner to the request that generated it, we recommend including a correlation ID of somekind in the webhook URL you specify.
Upon receiving a callback from RingCentral, please respond with an HTTP status code of
200 to acknowledge receipt. Replying with any other HTTP status code will signal to RingCentral to attempt re-delivery. RingCentral will attempt to redeliver callbacks up to five times with an exponential backoff.
What can affect the processing time of a media file?
When considering when to engage a Artificial Intelligence API asynchronously, consider the following factors that could impact the processing time of your request.
- The relative geographic location of the API caller compared to the RingCentral Platform server
- Network infrastructure e.g. speed, load
- The size of the request payload, e.g. the duration of audio file
- The number of jobs in the RingCentral Platform processing queue
Supported audio formats
RingCentral supports various audio formats for ease of integration. We support all audio types supported by ffmpeg, including but not limited to the following:
- MP3, MP4, MPA
- PCM (signed/unsigned) (8/16/32/64 bit) (big/little endian)
- WMV (Windows Media Video)
What languages are supported for media input? i.e English, German, Spanish?
Speaker identification itself is language-agnostic, but for all other APIs we currently only support English. Support for other languages is planned prior to the end of our beta period.
What is speaker enrollment, and how does it work?
Speaker enrollment is the process by which identities are associated with a voice or accoustic signature. This ultimately allows for RingCentral to identify who is speaking, and pass that identity information on its reports.
Tips on how to get the best results from media files
Do not process or alter media files. It is recommended to pass audio and media content in its original format, without processing it or modifying in any way. Changes such as encoding, up-sampling, down-sampling, and automatic gain control (AGC) will impact the accuracy of the results you will receive.
Store media data in a lossless format. Lossy audio will have a negative impact on the accuracy of the API.
Transmit multiple audio channels. Allow RingCentral to downmix audio channels into a single channel during its transcoding process so that it can leverage channel segmentation in its analysis.