Events Interface
Problem Statement
Sellers interact with customers and accounts through emails and meetings primarily. It is essential for Rox Agents and our platform to have a pristine view of real life events with customers. We should have a unified interface that can be queried by all consumers to get the most up-to-date and accurate description of activity with a customer.
Many sellers use third-party call recorders like Gong, Attention, Avoma, etc. that generate transcripts for their meetings. We ingest these transcripts and metadata about the events through the API integrations provided by these call recorders and store them internally. Sellers usually have meetings that are scheduled using calendar invites, but might meet ad-hoc as well. Calendar invites might also be deleted later on or moved to a new time. Hence, we needed an entity resolution engine which can -
Map call recorder events to calendar events
Determine which call recorder events should be separate events because they are not present in the seller's calendar.
Challenge
Sellers usually edit meetings on their calendar very frequently, leading to cases where the calendar event is moved or deleted after the meeting happened and call recording is saved. This leads to a unique problem - call recordings can't be associated to calendar events only on the basis of the calendar_event_id.
As a result, we were missing out on essential context about a seller's accounts.
Solution
Building an Entity Resolution Logic
Given a call recorder event and a calendar event, we want to be as accurate as possible while determining whether these 2 entities are about the same real world meeting. Based on our observations, we came up with a matcher algorithm which states that 2 events are the same when both the conditions are met -
Their scheduled start times match reasonably - they are within 2 minutes of each other
One of these two is equal -
Calendar Event Id
Event Title
For the call recorder events that could not be mapped to any calendar event through the above algorithm, we treat them as a net-new interaction with the customer. These events are materialised as separate unified events.
Lookup table creation
Given an empty lookup table in the beginning, we are given two datasets sequentially - the calendar events and the call recorder events.

Graph creation
This entity resolution takes place during the regular runs of our knowledge graph. During that process, we assign a unique rox_id
to all entities which is persisted for consecutive runs of the graph build and is the unique identifier for this entity throughout the entire Rox system.
For the first graph run for a seller's org

If the graph exists already - we need to determine which events are being introduced for the first time and which already existed. For the latter, we need to intelligently assign the
rox_id
from the previous version of the graph

Technical Architecture
Similar to our entire graph building process, the datasets are stored as relational database tables in Snowflake. We use Snowpark library in Python to load the tables as dataframes and perform data operations.
Handling scaleability
A few of Rox's customers have large datasets for both calendar events and call recorder events which resulted in very long running queries on Snowflake. Since we are committed to complete every build of our knowledge graph in less than 30 minutes, we had to come up with innovative methods to speed up the entire process. A few of those include -
Optimizing the lookup table process - Instead of using a complex full outer join over 3 conditions, we broke the process into a map and reduce which relied on smart de-duplicating logic. This reduced the execution time of the worst queries from ~45minutes down to ~15seconds.
Materializing tables more frequently - Snowflake executes queries lazily, which means that the queries involved in creating a data frame are executed only when the data frame is accessed. We observed that the final query which gave us the resulting graph was overloaded with
SELECT
andJOIN
clauses. Materializing tables regularly, that is, writing the data frames back to temporary tables in Snowflake made the entire process much more efficient and clean.
Last updated