LogoLogo
  • About Rox
    • Founder Note
      • 🚀Make the Best Better
      • 🪄New Era of Productivity
      • 🤖Agent Swarms
      • 🗺️System of Record
    • Product Overview
      • Plan & Prioritize: Transform Your GTM Strategy
      • Research: Self-Updating and Actionable Account Data
      • Discover: Continuous Discovery, in Auto-Pilot
      • Engage: Deep Personalization at Scale
      • Monitor: Stay Ahead with Intelligent Monitoring
    • Release Notes
    • Pricing
  • How to use Rox
    • Getting started
      • Add / Remove Agents
      • Add/Remove/Edit Columns
      • Add/Remove Filters
      • Add/Remove Tags
      • Add/Edit/Delete Views
      • Add/Remove Artifacts
    • Plan
      • Online Research Clever Column
      • Job Posting Clever Columns
      • Swarm Command (cmd+k)
      • Playbooks
    • Research
      • Add Custom Research Section
    • Engage
      • Next Steps
    • Integrations
      • Slack
      • Calendar
      • Email
  • How Rox Works
    • Rox Platform Architecture
    • System of Record
      • Rox Data Lakehouse
        • Content Ingestion for Public Data
        • Connector Hub for Enterprise App Data
        • Enterprise Business Data Share
        • Company and Contact Management
      • Rox Unified Knowledge Graph
        • ERM Layer
        • Entity Graph builder
      • Rox Data Access
        • Event Driven Framework
    • Agent Swarm
      • Architecture
        • Agent Evaluation System
      • Abilities
        • Research
          • Teams & Technologies
          • Custom Account Research
          • Account Overview
        • Engage
          • Outreach (Email & Linkedin)
          • Transcription
          • Conversational Meeting Briefings
        • Discover
          • Clever Columns
          • Pipeline Generation
        • Monitor
          • Insights
  • Blogs
    • How We Built a Scalable, Intelligent Engine for Sales Insights
  • REFERENCE
    • Admin
    • API
    • FAQ
Powered by GitBook
LogoLogo

Founders Note

  • Why ROX Exists?
  • Customer Use Cases
  • Agent Swarms
  • System of Records

Product Overview

  • Pricing

Using ROX

  • Getting Started
  • Doing Research
  • PG Asistance
  • Meeting Asistance

How ROX Works

  • Platform Architecture

Copyright © 2024 RoxAI. All rights reserved. 251 Rhode Island St, Suite 207,
San Francisco, CA 94103

On this page
  1. How Rox Works
  2. System of Record
  3. Rox Unified Knowledge Graph

Entity Graph builder

Introduction

The foundation of any effective Unified Knowledge Graph (UKG) lies in its ability to resolve entities across multiple data sources. Entity resolution ensures that data from CRM systems, ticketing platforms, product logs, and public data sources like ZoomInfo are accurately linked, de-duplicated, and contextualized.

This blog explores the step-by-step process of resolving entities and constructing the knowledge graph, highlighting algorithms, data flow, and real-world challenges.


The Challenge: Fragmented Data, Shared Identities

Entity resolution tackles the problem of fragmented identities across data sources. For example:

  • A Salesforce account for "Oracle" might have an ID of a and a domain of oracle.com.

  • The same organization in Zendesk might have an external ID of 1, also linked to oracle.com.

  • Product logs might refer to the same entity by a completely different mechanism

Without entity resolution, these would appear as separate entities, leading to inconsistency and duplication.


Entity Resolution in Four Steps

Step 1: Build the Lookup Table Using IDs

The simplest form of resolution if each of the system share IDs between them. Ex: Zendesk has an external ID field mapped to Salesforce

Example Lookup Table:

Source

ID

Domain

External ID

Account ID

Salesforce

a

oracle.com

Zendesk

1

oracle.com

a

Product Usage

one


Step 2: Merge Data into the Knowledge Graph

Using the lookup table:

  • No Existing Graph: Create a new graph where each row in the lookup table is assigned a unique rox_id (the universal entity identifier).

  • Existing Graph: Merge the lookup table with the existing graph using an outer join. Preserve new entities while dropping non-existent ones.

Example Graph:

rox_id

Salesforce

Zendesk

Product Usage

uuid1

a

1

one


Step 3: Resolve Relationships and Data Sources

Once entities are linked:

  • Assign a priority to relationships, ensuring data is merged in the correct order (e.g., Salesforce > Zendesk > Product Usage).

  • Materialize rox_id mappings for each data source, creating a unified representation.

Unified Representation:

rox_id

Data Source

Source ID

uuid1

Salesforce

a

uuid1

Zendesk

1

uuid1

Product Usage

one


Step 4: Materialize the Entities

The final step involves resolving attributes like domain, name, or email for each entity:

  1. Use ERM relationships to map fields from individual data sources.

  2. Resolve conflicts using rules (e.g., prioritize Salesforce over Zendesk for domains).

  3. Store the resolved values in the knowledge graph for downstream applications.

Example Materialized Entity:

Entity Type

rox_id

Domain

Source System

Company

uuid1

oracle.com

Salesforce


Algorithms and Optimization

  1. Exact Match

    • Directly links IDs across systems.

    • Example: Zendesk external ID a maps to Salesforce ID a.

  2. Fuzzy Match:

    • Uses fields like domains or emails for approximate matching.

    • Weighted matching (e.g., TF-IDF on domain) ensures accuracy for similar values.

  3. Priority-Based Resolution:

    • Orders data sources based on trustworthiness or data quality.

    • Apply Advanced AI algorithms to spot the relevancy between entity records


Challenges in Entity Resolution

  1. Low Fidelity Data:

    • Fields like domain or email might be incorrect or incomplete.

    • Example: A Salesforce entry for Databricks pointing to https://spark.apache.org.

  2. High Cardinality:

    • Multiple results for a single query (e.g., ZoomInfo returning several potential matches).

    • Solution: Introduce a "User Feedback Required" step for ambiguous cases.

  3. Dynamic Updates:

    • Ensuring real-time sync with new data sources while maintaining graph consistency.

Given the challenges above, we cannot simply persist the resolution, we need sometimes human in the loop to verify and confirm the associations, once confirmed, we maintain the mappings we constructed.


Streaming Graphs

Even though Rox operates in internet scale data, the expectations are high to see entities and their relationship as quickly as possible, the graph build process is happening more in a batch fashion, but with advent of latest streaming technologies, Rox will looking to change capture and process deltas as fast as it detects, and materializes entity and relationships, We will explore this in future sections.

Conclusion

Entity resolution is the cornerstone of building a Unified Knowledge Graph. By linking, de-duplicating, and contextualizing data across diverse sources, the UKG enables seamless insights and intelligent automation. The process—though complex—ensures that organizations can leverage their data with accuracy and confidence.

PreviousERM LayerNextRox Data Access

Last updated 5 months ago