# Query Optimization Practice at Rox: List Deal Query Example

It is known that Object-Relational Mapping (ORM) would produce inefficient query. We at Rox have to rewrite the query against Postgres, and sometimes observed up to 1000x execution time difference!

## Background

Rox Data Fabric consists of multiple entities (i.e., User, Account, Person, and Deal) and the linkage between these entities. The List Deal query accesses the following tables:

* `Deal (rox_entity_id, name, stage_name, amount, close_date, next_step, ...)`
* `EntityLink (source_entity_id, source_entity_type, target_entity_id, target_entity_type, relationship_type, ...)`
* `DealStage (public_id, name, is_won, is_hidden, is_closed, ...)`

One of most important queries is to list the deals associated with an Account Executive (AE), along with the owned accounts.

## Version 0

The initial version of this query is a 4-table joins:

```sql
SELECT d.*,
    account_owner.source_entity_id AS account_id,
    account_owner.target_entity_id AS owner_id
FROM Deal d JOIN DealStage s ON d.stage_name=s.name
    JOIN EntityLink deal_account ON d.rox_entity_id=deal_account.source_entity_id
    JOIN EntityLink account_owner ON deal_account.target_entity_id=account_owner.source_entity_id
WHERE NOT s.is_hidden;
```

{% hint style="info" %}
It serves well for deals owned by a single user, until Rox has to serve an Account Manager who would view all the deals owned by a team of AEs.
{% endhint %}

## Query Rewriting

After a second thought of categorizing the deals to list, we split the compute of `deal_id, account_id, and owner_id` into a union of two subqueries:

* Deals owned by the user: a semi-join on `EntityLink`
* Deals owned by others, but viewable as part of owned accounts as a manager: a self-join of `EntityLink`

Thus, we reduced the cardinality of the join inputs.

And the union result still needs to do joins with `Deal` and `DealStage`.&#x20;

{% hint style="info" %}
It runs fast alone, but P99 latency under load was so high that it is NOT an interactive query anymore.
{% endhint %}

## Join Reordering

After examining the query execution plan, we observed that the most of the execution time was the `HashJoin` between `Deal` and `DealStage`, where not only the query could not take full advantage of the index on `Deal`, but also the join result splitting into disk degrades the query performance significantly.

We confirmed this theory by increasing `work_mem`.

Since we know Deal is large, while the deals to list is much smaller, we proposed join reordering by:

1. First `Deal` joins with the union result, and
2. Then joins `DealStage` and filters with `DealStage.is_hidden`.

The reordered plan reduced the variation in query execution time, without bumping `work_mem` up.

***

If you are interested in query optimization in production, please [join us](https://www.rox.com/company#job-listings)!&#x20;
