Query Optimization Practice at Rox: List Deal Query Example
It is known that Object-Relational Mapping (ORM) would produce inefficient query. We at Rox have to rewrite the query against Postgres, and sometimes observed up to 1000x execution time difference!
Background
Rox constructs Knowledge Graph, which consists of multiple entities (User, Account, Person, and Deal) and the linkage between these entities. For the Deal query below, we have the following tables:
Deal (rox_entity_id, name, stage_name, amount, close_date, next_step, ...)
EntityLink (source_entity_id, source_entity_type, target_entity_id, target_entity_type, relationship_type, ...)
DealStage (public_id, name, is_won, is_hidden, is_closed, ...)
One of most important queries is to list the deals associated with an Account Executive (AE), along with the owning accounts.
Version 0
The initial version of this query is a 4-table joins:
SELECT d.*,
account_owner.source_entity_id AS account_id,
account_owner.target_entity_id AS owner_id
FROM Deal d JOIN DealStage s ON d.stage_name=s.name
JOIN EntityLink deal_account ON d.rox_entity_id=deal_account.source_entity_id
JOIN EntityLink account_owner ON deal_account.target_entity_id=account_owner.source_entity_id
WHERE NOT s.is_hidden;
Query Rewriting
After a second thought of categorizing the deals to list, we split the part that gets deal_id, account_id, and owner_id
into a union of two subqueries:
Deals owned by the user: a semi-join on
EntityLink
Deals owned by others, but viewable from account assignment rules: a self-join of
EntityLink
The union result still needs to do joins with Deal
and DealStage
. Thus, we reduced to 3-table joins.
Join Reordering
After examining the query execution plan, we observed that the most of the execution time was the HashJoin
between Deal
and DealStage
, where not only the query could not take full advantage of the index on Deal
, but also the join result splitting into disk degrades the query performance significantly.
We confirmed this theory by increasing work_mem
.
Since we know Deal is large, while the deals to list is much smaller, we proposed join reordering by:
First
Deal
joins with the union result, andThen joins
DealStage
and filters withDealStage.is_hidden
.
The reordered plan reduced the variation in query execution time, without increasing work_mem
.
If you are interested in query optimization in production, please join us!
Last updated