Rox Data Layer

Rox Data Layer: Powering Unified, Scalable Revenue Intelligence

Problem Statement

In today’s enterprise landscape, revenue data is scattered across disparate systems—CRMs, support tools, billing platforms, and more. Each of these systems stores information in unique formats, making it difficult to gain a single, trusted view of customer accounts. Teams spend hours reconciling records, building custom pipelines, and stitching data back together—all while risking data loss, duplication, and inconsistency.

The result: fragmented insights, siloed teams, and delayed decisions.


Features of the Rox Data Layer

The Rox Data Layer materializes data into columns, which can be used to better understand your territory—either through a high-level rollup view or a deep dive into an individual account.

  • Projections Projections dynamically walk the UKG to construct queries. Each projection query includes both the relevant column and the rox_id—a unique identifier attached to every entity in the system. This ensures that data from disparate systems can be stitched together consistently while preserving entity-level fidelity.

  • Custom Queries For advanced use cases, custom queries allow users to define their own transformations and logic. Each custom query is associated with a column and can be written in plain SQL. Rox abstracts away the complexity—users only need to define the query logic, while the system handles execution, caching, and optimization.


System Design

Snowflake

All external data systems are consolidated into Snowflake, our central data warehouse. Using data integration platforms such as Fivetran, we sync source data into Snowflake on a scheduled cadence.

Snowflake then acts as our processing engine, executing queries that transform, enrich, or combine data—for example, creating account-level scoring models from multiple attributes. We chose Snowflake because of its ability to handle compute-intensive operations at scale.

To further streamline ingestion, we’ve enabled Change Data Capture (CDC) through audit metadata (created_at, last_modified_on, etc.). This ensures only net-new or updated entities are processed, improving efficiency and freshness.


PostgreSQL: Low-Latency Querying

To deliver sub-second performance for end users, we replicate processed data from Snowflake into PostgreSQL (PSQL). This ensures that queries against the Rox Data Layer remain fast and reliable, enabling real-time pagination, filtering, and sorting. To perform this copy reliably, we leverage our internal Task Framework to perform one of the following:

  • Snowflake → S3 → PSQL copies, where values are exported to S3 and then ingested into PostgreSQL.

  • CDC-driven updates, where the task framework dynamically generate PSQL statements.

This job ensures that the PSQL layer remains synchronized and ready to serve low-latency queries.


Challenges & Next Steps

While the Rox Data Layer is already delivering significant value, we are actively tackling several technical challenges to further improve scale and reliability:

Volume of Data

A single enterprise can have hundreds of thousands to millions of accounts. However, not all accounts are being viewed at any given moment. Our challenge is to intelligently determine which accounts to prioritize in queries, ensuring resources are allocated efficiently while still delivering on-demand responsiveness.

Consistency Across Systems

Maintaining consistency between Snowflake (the processing layer) and PSQL (the querying layer) is a critical challenge. Discrepancies can occur at two points:

  1. When the data integration system syncs source data into Snowflake.

  2. When jobs transfer processed data from Snowflake into PSQL.

With CDC now enabled within Snowflake, the next step is to work with integration platforms to also support CDC. This would allow Rox to make the entire pipeline event-driven, reducing latency and eliminating discrepancies between systems.

Data Representation

One point of contention was deciding how to structure tables in Snowflake and PostgreSQL to store data effectively.

Wide-Format vs. Long-Format Schema

  • Wide Format (traditional structure): Each row represents a single entity (e.g., an account), with attributes spread across multiple columns.

    rox_id   | name        | revenue   | industry   | region  
    -----------------------------------------------------------  
    101      | Acme Corp   | 1200000   | Retail     | North America  
    102      | Globex Inc  | 850000    | Finance    | Europe  
  • Long Format (Rox approach): Each row represents a single attribute-value pair, typed by column.

    rox_id   | column_name | string_value | int_value | float_value  
    ---------------------------------------------------------------  
    101      | name        | Acme Corp    | NULL       | NULL  
    101      | revenue     | NULL         | NULL       | 1200000  
    101      | industry    | Retail       | NULL       | NULL  
    101      | region      | North America| NULL       | NULL  

We ultimately chose the long-format structure. While this approach makes the table more sparse—since not every column is populated—it delivers significant advantages. The design enables straightforward partitioning, filtering, and sorting operations.

By storing values in a typed, long-format structure:

  • Numeric values (int_value, float_value) can support < and > operations.

  • String values (string_value) can support IN clauses or fuzzy-matching operations.

  • Filtering, sorting, and pagination become simple, consistent, and performant across any type of column.


Conclusion

By unifying data from every system into a single, trusted view, Rox eliminates silos and accelerates the path from raw data to actionable insight through the Rox Data Layer.

For business leaders, this translates to:

  • Operational efficiency by reducing manual reconciliation across systems.

  • Revenue growth through a consolidated near real-time view of your territory.

With Rox, your organization gains a scalable, enterprise-ready hub for revenue intelligence, ensuring that every decision is backed by data you can trust.

Last updated