ASOF JOIN keyword

ASOF JOIN is a powerful SQL keyword that allows you to join two time-series tables.

It is a variant of the JOIN keyword and shares many of its execution traits.

This document will demonstrate how to utilize them, and link to other relevant JOIN context.

JOIN overview

The JOIN operation is broken into three components:

  • Select clause
  • Join clause
  • Where clause

This document will demonstrate the JOIN clause, where the other keywords demonstrate their respective clauses.

Visualized, a JOIN operation looks like this:

Flow chart showing the syntax of the high-level syntax of the JOIN keyword

  • selectClause - see the SELECT reference docs for more information.

  • joinClause ASOF JOIN with an optional ON clause which allows only the = predicate and an optional TOLERANCE clause:

    Flow chart showing the syntax of the ASOF, LT, and SPLICE JOIN keyword

  • whereClause - see the WHERE reference docs for more information.

In addition, the following are items of importance:

  • Columns from joined tables are combined in a single row.

  • Columns with the same name originating from different tables will be automatically aliased into a unique column namespace of the result set.

  • Though it is usually preferable to explicitly specify join conditions, QuestDB will analyze WHERE clauses for implicit join conditions and will derive transient join conditions where necessary.

Execution order

Join operations are performed in order of their appearance in a SQL query.

Read more about execution order in the JOIN reference documentation.

ASOF JOIN

ASOF JOIN joins two time-series on their timestamp, using the following logic: for each row in the first time-series,

  1. consider all timestamps in the second time-series earlier or equal to the first one
  2. choose the latest such timestamp
  3. If the optional TOLERANCE clause is specified, an additional condition applies: the chosen record from t2 must satisfy t1.ts - t2.ts <= tolerance_value. If no record from t2 meets this condition (along with t2.ts <= t1.ts), then the row from t1 will not have a match.

Example

Let's use an example with two tables:

  • trades: trade events on a single stock
  • order_book: level-1 order book snapshots for that stock

trades data:

timestamppricesize
08:00:00.007140175.97400
08:00:00.609618178.55400
08:00:00.672131176.09400
08:00:00.672147176.03400
08:00:01.146931175.45400
08:00:01.495188177.90400
08:00:01.991977175.35400
08:00:01.991991175.36400
08:00:02.039451175.36400
08:00:02.836413175.55400
08:00:03.447858176.79400
08:00:04.782191181.0015
08:00:05.408871175.77400
08:00:06.007145176.52400
08:00:06.740159184.001
08:00:07.593841175.75400
08:00:10.310291176.3829
08:00:10.550535175.86400
08:00:10.761790175.94400
08:00:12.046660176.15400
08:00:12.897624176.62400
08:00:13.838193176.5125
08:00:15.125509176.17400
08:00:16.727077176.48400
08:00:18.813886176.68400
08:00:22.180535176.05400
08:00:25.125634176.16400
08:00:26.117889176.331
08:00:26.184839176.52400
08:00:26.185102176.4125

order_book data:

timestampbid_pricebid_sizeask_priceask_size
08:00:00176.475542176.8213054
08:00:01176.334744176.68404
08:00:02176.07136176.764946
08:00:03176.0784176.752182
08:00:04176.07112176.592734
08:00:05176.38212176.56966
08:00:06176.33176176.528174
08:00:07176.33276176.677345
08:00:08176.3348176.671600
08:00:09176.3566176.672400
08:00:10176.36695176.3820698
08:00:11176.3598176.592800
08:00:12176.48104176.594040
08:00:13176.48165176.386035
08:00:14176.3556176.38720
08:00:15176.35119176.381530
08:00:16176.35133176.383710
08:00:18176.3584176.381880
08:00:19176.3514176.38180
08:00:20176.3514176.38180
08:00:21176.35112176.381440
08:00:22176.35133176.381710
08:00:25176.35122176.383929
08:00:26176.35300176.376952
08:00:28176.0728176.37496

We want to join each trade event to the relevant order book snapshot. All we have to write is

A basic ASOF JOIN example
trades ASOF JOIN order_book

and we get this result:

timestamppricesizetimestamp1bid_pricebid_sizeask_priceask_size
08:00:00.007140175.9740008:00:00176.475542176.8213054
08:00:00.609618178.5540008:00:00176.475542176.8213054
08:00:00.672131176.0940008:00:00176.475542176.8213054
08:00:00.672147176.0340008:00:00176.475542176.8213054
08:00:01.146931175.4540008:00:01176.334744176.608404
08:00:01.495188177.9040008:00:01176.334744176.608404
08:00:01.991977175.3540008:00:01176.334744176.608404
08:00:01.991991175.3640008:00:01176.334744176.608404
08:00:02.039451175.3640008:00:02176.07136176.764946
08:00:02.836413175.5540008:00:02176.07136176.764946
08:00:03.447858176.7940008:00:03176.0784176.752182
08:00:04.782191181.001508:00:04176.07112176.592734
08:00:05.408871175.7740008:00:05176.38212176.506966
08:00:06.007145176.5240008:00:06176.33176176.528174
08:00:06.740159184.00108:00:06176.33176176.528174
08:00:07.593841175.7540008:00:07176.33276176.677345
08:00:10.310291176.382908:00:10176.36695176.3820698
08:00:10.550535175.8640008:00:10176.36695176.3820698
08:00:10.761790175.9440008:00:10176.36695176.3820698
08:00:12.046660176.1540008:00:12176.48104176.594040
08:00:12.897624176.6240008:00:12176.48104176.594040
08:00:13.838193176.512508:00:13176.48165176.386035
08:00:15.125509176.1740008:00:15176.35119176.381530
08:00:16.727077176.4840008:00:16176.35133176.383710
08:00:18.813886176.6840008:00:18176.3584176.381880
08:00:22.180535176.0540008:00:22176.35133176.381710
08:00:25.125634176.1640008:00:25176.35122176.383929
08:00:26.117889176.33108:00:26176.35300176.376952
08:00:26.184839176.5240008:00:26176.35300176.376952
08:00:26.185102176.412508:00:26176.35300176.376952

Using ON for matching column value

The tables in the above example are just about one stock; in reality the same table covers many stocks, and you want the results not to get mixed between them. This is what the ON clause is for -- you can point out the key (ticker) column and get results separate for each key.

Here's the trades table expanded to include two stocks, and a new symbol column:

timestampsymbolpricesize
08:00:00.007168AAPL176.91400
08:00:00.834205AAPL175.93400
08:00:00.988111AAPL176.47100
08:00:01.199577AAPL175.46400
08:00:01.495172AAPL177.95400
08:00:01.538683GOOG175.82400
08:00:01.555565AAPL176.3325
08:00:02.006636GOOG150.010
08:00:02.039451AAPL175.36400
08:00:02.460454GOOG175.45400
08:00:03.012909GOOG175.51
08:00:03.494927GOOG185.05
08:00:03.524212AAPL175.48400
08:00:04.648333AAPL175.66400
08:00:04.943421GOOG175.48400
08:00:05.884890AAPL176.5428
08:00:05.961856GOOG175.66400
08:00:06.589806GOOG175.65400
08:00:06.740159AAPL184.01
08:00:07.342978GOOG176.55400
08:00:07.345877AAPL176.73400
08:00:10.419065AAPL176.41400
08:00:11.636237AAPL176.69400
08:00:11.683078GOOG176.67400
08:00:13.650868AAPL176.52124
08:00:13.650880AAPL176.59124
08:00:14.055762AAPL176.66400
08:00:14.083022GOOG176.81400
08:00:15.088091GOOG176.52400
08:00:15.125494AAPL176.12400
08:00:15.147691GOOG176.54400

Order book, similarly extended with the symbol column:

timestampsymbolbid_pricebid_sizeask_priceask_size
08:00:00AAPL176.475542176.8213054
08:00:01GOOG130.327516130.925652
08:00:01AAPL176.334744176.68404
08:00:02GOOG130.599046130.689264
08:00:02AAPL176.07136176.764946
08:00:03GOOG130.344086130.8212676
08:00:03AAPL176.0784176.752182
08:00:04GOOG130.29350130.798780
08:00:04AAPL176.07112176.592734
08:00:05GOOG130.29182130.686060
08:00:05AAPL176.38212176.56966
08:00:06GOOG130.48394130.656828
08:00:06AAPL176.33176176.528174
08:00:07GOOG130.52366130.6121260
08:00:07AAPL176.33276176.677345
08:00:08GOOG130.48480130.7613032
08:00:08AAPL176.3348176.671600
08:00:09GOOG130.48216130.746458
08:00:09AAPL176.3566176.672400
08:00:10GOOG130.4872130.742400
08:00:10AAPL176.36695176.3820698
08:00:11GOOG130.511236130.5226596
08:00:11AAPL176.3598176.592800
08:00:12GOOG130.5378130.6822000
08:00:12AAPL176.48104176.594040
08:00:13GOOG130.6174130.685200
08:00:13AAPL176.48165176.386035
08:00:14GOOG130.6138130.628616
08:00:14AAPL176.3556176.38720
08:00:15GOOG130.6394130.529374

And here's the ASOF JOIN query with the ON clause added:

ASOF JOIN with symbol matchingDemo this query
SELECT t.timestamp, t.symbol, price, size, bid_price, bid_size, ask_price, ask_size
FROM trades t ASOF JOIN order_book ON (symbol);

Result:

timestampsymbolpricesizebid_pricebid_sizeask_priceask_size
08:00:00.007168AAPL176.91400176.475542176.8213054
08:00:00.834205AAPL175.93400176.475542176.8213054
08:00:00.988111AAPL176.47100176.475542176.8213054
08:00:01.199577AAPL175.46400176.334744176.608404
08:00:01.495172AAPL177.95400176.334744176.608404
08:00:01.538683GOOG175.82400130.327516130.9025652
08:00:01.555565AAPL176.3325176.334744176.608404
08:00:02.006636GOOG150.0010130.599046130.689264
08:00:02.039451AAPL175.36400176.07136176.764946
08:00:02.460454GOOG175.45400130.599046130.689264
08:00:03.012909GOOG175.501130.344086130.8212676
08:00:03.494927GOOG185.005130.344086130.8212676
08:00:03.524212AAPL175.48400176.0784176.752182
08:00:04.648333AAPL175.66400176.07112176.592734
08:00:04.943421GOOG175.48400130.29350130.798780
08:00:05.884890AAPL176.5428176.38212176.506966
08:00:05.961856GOOG175.66400130.29182130.686060
08:00:06.589806GOOG175.65400130.48394130.656828
08:00:06.740159AAPL184.001176.33176176.528174
08:00:07.342978GOOG176.55400130.52366130.6121260
08:00:07.345877AAPL176.73400176.33276176.677345
08:00:10.419065AAPL176.41400176.36695176.3820698
08:00:11.636237AAPL176.69400176.3598176.592800
08:00:11.683078GOOG176.67400130.511236130.5226596
08:00:13.650868AAPL176.52124176.48165176.386035
08:00:13.650880AAPL176.59124176.48165176.386035
08:00:14.055762AAPL176.66400176.3556176.38720
08:00:14.083022GOOG176.81400130.60138130.628616
08:00:15.088091GOOG176.52400130.60394130.529374
08:00:15.125494AAPL176.12400176.3556176.38720
08:00:15.147691GOOG176.54400130.60394130.529374

TOLERANCE clause

The TOLERANCE clause enhances ASOF and LT JOINs by limiting how far back in time the join should look for a match in the right table. The TOLERANCE parameter accepts a time interval value (e.g., 2s, 100ms, 1d).

When specified, a record from the left table t1 at t1.ts will only be joined with a record from the right table t2 at t2.ts if both conditions are met: t2.ts <= t1.ts and t1.ts - t2.ts <= tolerance_value

This ensures that the matched record from the right table is not only the latest one on or before t1.ts, but also within the specified time window.

ASOF JOIN with a TOLERANCE parameter
SELECT ...
FROM table1
ASOF JOIN table2 TOLERANCE 10s
[WHERE ...]

TOLERANCE also works together with the ON clause:

ASOF JOIN with keys and a TOLERANCE parameter
SELECT ...
FROM table1
ASOF JOIN table2 ON (key_column) TOLERANCE 1m
[WHERE ...]

The interval_literal must be a valid QuestDB interval string, like '5s' (5 seconds), '100ms' (100 milliseconds), '2m' ( 2 minutes), '3h' (3 hours), or '1d' (1 day).

Example using TOLERANCE:

Consider the trades and order_book tables from the previous examples. If we want to join trades to order book snapshots that occurred no more than 1 second before the trade:

TOLERANCE example
SELECT t.timestamp, t.price, t.size, ob.timestamp AS ob_ts, ob.bid_price, ob.bid_size
FROM trades t
ASOF JOIN order_book ob TOLERANCE 1s;

Let's analyze a specific trade: trades at 08:00:01.146931. Without TOLERANCE, it joins with order_book at 08:00:01. The time difference is 0.146931s. If we set TOLERANCE '100ms', this trade would not find a match, because 0.146931s (146.931ms) is greater than 100ms. The previous order_book entry at 08:00:00 would be even further away (1.146931s).

Another trade: trades at 08:00:00.007140. Without TOLERANCE, it joins with order_book at 08:00:00. The time difference is 0.007140s (7.14ms). If we set TOLERANCE '5ms', this trade would not find a match because 7.14ms > 5ms.

Supported Units for interval_literal

The TOLERANCE interval literal supports the following time unit qualifiers:

  • U: Microseconds
  • T: Milliseconds
  • s: Seconds
  • m: Minutes
  • h: Hours
  • d: Days
  • w: Weeks

For example, '100U' is 100 microseconds, '50T' is 50 milliseconds, '2s' is 2 seconds, '30m' is 30 minutes, '1h' is 1 hour, '7d' is 7 days, and '2w' is 2 weeks. Please note that months (M) and years (Y) are not supported as units for the TOLERANCE clause.

Performance impact of TOLERANCE

Specifying TOLERANCE can also improve performance. ASOF JOIN execution plans often scan backward in time on the right table to find a matching entry for each left-table row. TOLERANCE allows these scans to terminate early - once a right-table record is older than the left-table record by more than the specified tolerance - thus avoiding unnecessary processing of more distant records.

Timestamp considerations

ASOF join can be performed only on tables or result sets that are ordered by time. When a table is created with a designated timestamp the order of records is enforced and the timestamp column name is in the table metadata. ASOF join uses this timestamp column from metadata.

caution

ASOF join requires that the tables or subqueries have designated timestamps. This means they have an ascending order timestamp column, which may need to be specified with timestamp(ts). See below!

In case tables do not have a designated timestamp column, but data is in chronological order, timestamp columns can be specified at runtime:

SELECT *
FROM (a timestamp(ts))
ASOF JOIN (b timestamp (ts));

SQL Performance Hints for ASOF JOIN

QuestDB supports SQL hints that can optimize non-keyed ASOF join performance when filters are applied to the joined table:

ASOF JOIN with optimization hint
SELECT /*+ USE_ASOF_BINARY_SEARCH(trades order_book) */ *
FROM trades
ASOF JOIN (
SELECT * FROM order_book
WHERE state = 'VALID'
) order_book;

For more information on when and how to use these optimization hints, see the SQL Hints documentation.

SPLICE JOIN

Want to join all records from both tables?

SPLICE JOIN is a full ASOF JOIN.

Read the JOIN reference for more information on SPLICE JOIN.