Query Optimization Practice at Rox: List Deal Query Example

It is known that Object-Relational Mapping (ORM) would produce inefficient query. We at Rox have to rewrite the query against Postgres, and sometimes observed up to 1000x execution time difference!

Background

Rox Data Fabric consists of multiple entities (i.e., User, Account, Person, and Deal) and the linkage between these entities. The List Deal query accesses the following tables:

Deal (rox_entity_id, name, stage_name, amount, close_date, next_step, ...)
EntityLink (source_entity_id, source_entity_type, target_entity_id, target_entity_type, relationship_type, ...)
DealStage (public_id, name, is_won, is_hidden, is_closed, ...)

One of most important queries is to list the deals associated with an Account Executive (AE), along with the owned accounts.

Version 0

The initial version of this query is a 4-table joins:

SELECT d.*,
    account_owner.source_entity_id AS account_id,
    account_owner.target_entity_id AS owner_id
FROM Deal d JOIN DealStage s ON d.stage_name=s.name
    JOIN EntityLink deal_account ON d.rox_entity_id=deal_account.source_entity_id
    JOIN EntityLink account_owner ON deal_account.target_entity_id=account_owner.source_entity_id
WHERE NOT s.is_hidden;

It serves well for deals owned by a single user, until Rox has to serve an Account Manager who would view all the deals owned by a team of AEs.

Query Rewriting

After a second thought of categorizing the deals to list, we split the compute of deal_id, account_id, and owner_id into a union of two subqueries:

Deals owned by the user: a semi-join on EntityLink
Deals owned by others, but viewable as part of owned accounts as a manager: a self-join of EntityLink

Thus, we reduced the cardinality of the join inputs.

And the union result still needs to do joins with Deal and DealStage.

It runs fast alone, but P99 latency under load was so high that it is NOT an interactive query anymore.

Join Reordering

After examining the query execution plan, we observed that the most of the execution time was the HashJoin between Deal and DealStage, where not only the query could not take full advantage of the index on Deal, but also the join result splitting into disk degrades the query performance significantly.

We confirmed this theory by increasing work_mem.

Since we know Deal is large, while the deals to list is much smaller, we proposed join reordering by:

First Deal joins with the union result, and
Then joins DealStage and filters with DealStage.is_hidden.

The reordered plan reduced the variation in query execution time, without bumping work_mem up.

If you are interested in query optimization in production, please join us!

PreviousHow We Built a Scalable, Intelligent Engine for Sales Insights NextHow Rox Handles Refresh Flow from CRM

Last updated 3 months ago