Query Optimization Practice at Rox: List Deal Query Example

It is known that Object-Relational Mapping (ORM) would produce inefficient query. We at Rox have to rewrite the query against Postgres, and sometimes observed up to 1000x execution time difference!

Background

Rox constructs Knowledge Graph, which consists of multiple entities (User, Account, Person, and Deal) and the linkage between these entities. For the Deal query below, we have the following tables:

  • Deal (rox_entity_id, name, stage_name, amount, close_date, next_step, ...)

  • EntityLink (source_entity_id, source_entity_type, target_entity_id, target_entity_type, relationship_type, ...)

  • DealStage (public_id, name, is_won, is_hidden, is_closed, ...)

One of most important queries is to list the deals associated with an Account Executive (AE), along with the owning accounts.

Version 0

The initial version of this query is a 4-table joins:

SELECT d.*,
    account_owner.source_entity_id AS account_id,
    account_owner.target_entity_id AS owner_id
FROM Deal d JOIN DealStage s ON d.stage_name=s.name
    JOIN EntityLink deal_account ON d.rox_entity_id=deal_account.source_entity_id
    JOIN EntityLink account_owner ON deal_account.target_entity_id=account_owner.source_entity_id
WHERE NOT s.is_hidden;

It serves well for deals owned by a single user, until Rox has to serve an Account Manager who would view all the deals owned by a team of AEs.

Query Rewriting

After a second thought of categorizing the deals to list, we split the part that gets deal_id, account_id, and owner_id into a union of two subqueries:

  • Deals owned by the user: a semi-join on EntityLink

  • Deals owned by others, but viewable from account assignment rules: a self-join of EntityLink

The union result still needs to do joins with Deal and DealStage. Thus, we reduced to 3-table joins.

It runs fast alone, but P99 latency was so high under load that it is NOT an interactive query anymore.

Join Reordering

After examining the query execution plan, we observed that the most of the execution time was the HashJoin between Deal and DealStage, where not only the query could not take full advantage of the index on Deal, but also the join result splitting into disk degrades the query performance significantly.

We confirmed this theory by increasing work_mem.

Since we know Deal is large, while the deals to list is much smaller, we proposed join reordering by:

  1. First Deal joins with the union result, and

  2. Then joins DealStage and filters with DealStage.is_hidden.

The reordered plan reduced the variation in query execution time, without increasing work_mem.


If you are interested in query optimization in production, please join us!

Last updated