Events Interface

Problem Statement

Sellers interact with customers and accounts through emails and meetings primarily. It is essential for Rox Agents and our platform to have a pristine view of real life events with customers. We should have a unified interface that can be queried by all consumers to get the most up-to-date and accurate description of activity with a customer.

Many sellers use third-party call recorders like Gong, Attention, Avoma, etc. that generate transcripts for their meetings. We ingest these transcripts and metadata about the events through the API integrations provided by these call recorders and store them internally. Sellers usually have meetings that are scheduled using calendar invites, but might meet ad-hoc as well. Calendar invites might also be deleted later on or moved to a new time. Hence, we needed an entity resolution engine which can -

  • Map call recorder events to calendar events

  • Determine which call recorder events should be separate events because they are not present in the seller's calendar.

Challenge

Sellers usually edit meetings on their calendar very frequently, leading to cases where the calendar event is moved or deleted after the meeting happened and call recording is saved. This leads to a unique problem - call recordings can't be associated to calendar events only on the basis of the calendar_event_id.

As a result, we were missing out on essential context about a seller's accounts.

Solution

Building an Entity Resolution Logic

Given a call recorder event and a calendar event, we want to be as accurate as possible while determining whether these 2 entities are about the same real world meeting. Based on our observations, we came up with a matcher algorithm which states that 2 events are the same when both the conditions are met -

  • Their scheduled start times match reasonably - they are within 2 minutes of each other

  • One of these two is equal -

    • Calendar Event Id

    • Event Title

For the call recorder events that could not be mapped to any calendar event through the above algorithm, we treat them as a net-new interaction with the customer. These events are materialised as separate unified events.

Lookup table creation

Given an empty lookup table in the beginning, we are given two datasets sequentially - the calendar events and the call recorder events.

Graph creation

This entity resolution takes place during the regular runs of our knowledge graph. During that process, we assign a unique rox_id to all entities which is persisted for consecutive runs of the graph build and is the unique identifier for this entity throughout the entire Rox system.

  1. For the first graph run for a seller's org

  1. If the graph exists already - we need to determine which events are being introduced for the first time and which already existed. For the latter, we need to intelligently assign the rox_id from the previous version of the graph

Technical Architecture

Similar to our entire graph building process, the datasets are stored as relational database tables in Snowflake. We use Snowpark library in Python to load the tables as dataframes and perform data operations.

Handling scaleability

A few of Rox's customers have large datasets for both calendar events and call recorder events which resulted in very long running queries on Snowflake. Since we are committed to complete every build of our knowledge graph in less than 30 minutes, we had to come up with innovative methods to speed up the entire process. A few of those include -

  1. Optimizing the lookup table process - Instead of using a complex full outer join over 3 conditions, we broke the process into a map and reduce which relied on smart de-duplicating logic. This reduced the execution time of the worst queries from ~45minutes down to ~15seconds.

  2. Materializing tables more frequently - Snowflake executes queries lazily, which means that the queries involved in creating a data frame are executed only when the data frame is accessed. We observed that the final query which gave us the resulting graph was overloaded with SELECT and JOIN clauses. Materializing tables regularly, that is, writing the data frames back to temporary tables in Snowflake made the entire process much more efficient and clean.

Last updated