Shrinking the Decision Space

A semantic model on public US flight data: the agent picks a named expression at the right grain instead of re-deriving joins and roll-ups, so there’s less to decide and less to get wrong.

SHRINKING THE DECISION SPACE

By George Hoersting | June 4, 2026

← ALL POSTS

Every decision an agent makes is a chance to fail. Agents need the right context, but as context grows it rots and the odds of hallucination climb. A smaller decision space — context compacted into a surface that’s easy to navigate — orients the agent faster and makes it more accurate. This post walks through how to provide that executable context about data, using real-world transportation data.

The US Bureau of Transportation Statistics publishes the On-Time Performance dataset: one record per domestic flight flown by a US carrier, with scheduled versus actual times, the resulting delay, origin and destination airports, operating carrier: denormalized reporting data with lots of foreign keys. It ships as zipped CSVs from transtats.bts.gov, one file per month, at several hundred thousand flights a month.

Xorq is a Python library with a composable expression language built on Ibis and Apache DataFusion. Expressions run across database engines, cache intelligently, and compile into portable, content-addressed artifacts you can version in git and share as a catalog.

semantic-bts is a repo that publishes a catalog with this data, so you can clone it and query the flight model directly instead of manually writing the joins and grain of the aggregates. It holds the source expression that downloads and parses the BTS zips, defines the dimensions and measures with boring-semantic-layer, and builds the expressions into xorq-catalog-bts; pi.dev is the agentic harness, with a small pi package so the agent works against the expressions natively.

semantic-bts comes with one helper command, rebuild, which blows away and rebuilds the catalog locally from src/exprs/.


A smaller decision space

A semantic layer defines dimensions and measures once, with the aggregates pinned to a grain, so the agent picks from a limited subset to generate a new calculation. When an agent is permitted to reconstruct joins and aggregations from scratch it often fails, which is roughly what we found on DABStep. Fewer decisions, fewer failures.

The catalog stores definitions, not data

Xorq builds each expression into a content-addressed artifact and stores it in a catalog. The catalog at xorq-catalog-bts is about 600KB; the million or so rows of flight data do not live inside the expression. The data is fetched and cached on your machine (a few hundred MB of parquet under ~/.cache/xorq).

So the first run does download and parse a million rows, but only once: the cache is keyed by a content hash of the expression, so nothing recomputes unless the inputs or logic change. The key is based on the expressions identity, connection info included in your Xorq profile is excluded. The cache here is a local parquet directory, so if it’s on a different machine, it rebuilds. If pointed at a shared source instead (say a Snowflake table cached into shared storage) and a second machines expression resolves to the same hash it fetches from the remote cache; since credentials aren’t in the key, two people with the same expression share the cache.


The build files

The expressions are plain Python in src/semantic_bts/exprs/. The source is a UDXF that downloads the BTS zips, hitting external HTTPS endpoints. The months it fetches aren’t hardcoded in the expression; they come from a deferred parameter year_months (default 2025_11,2025_12). The value binds at execution time, not build time: it lives in the expression graph.

con = xo.connect()
months_input = xo.memtable({"_": [0]}).select(
    year_months=xo.coalesce(
        xo.param("year_months", "string", default="2025_11,2025_12"), ""
    )
)
source_expr = flight_udxf(months_input, process_df=fetch_bts_months, ...)
expr = source_expr.cache(ParquetSnapshotCache.from_kwargs(source=con))

(The coalesce wrap works around xorq#2037.)

Every downstream expression is built on this parameterized source, so the param propagates through the whole graph. That lets you change the range at run time on any published expression without touching the build file (see Changing the range at run time below).

The model maps readable names onto the raw BTS columns:

model = SemanticModel(
    table=flights,
    dimensions={
        "quarter":           Dimension(expr=lambda t: t.Quarter),
        "reporting_airline": Dimension(expr=lambda t: t.Reporting_Airline),
        # ... origin/dest, time blocks, etc.
    },
    measures={
        "n_flights":     Measure(expr=lambda t: t.count()),
        "avg_dep_delay": Measure(expr=lambda t: t.DepDelay.mean()),
        "dep_delay_pct_block": Measure(
            expr=lambda t: t.DepDelay.sum() / t.ActualElapsedTime.sum() * 100
        ),
    },
)

The three aggregate entries are query() calls that share the same measures and differ only in grain, so aggregation logic is defined once using a simple spec:

expr_quarter_car = model.query(
    dimensions=["quarter", "reporting_airline"],
    measures=MEASURES,
).to_untagged()

Asking pi for a new query

Now that we’ve built the expressions and cataloged them, we can expose them to pi as tools it can query to build its context.

pi querying the BTS catalog

A new metric is just dimensions, measures, and a grain. Drop into pi with nix, no clone needed:

nix run github:xorq-labs/semantic-bts#pi

Ask in plain language, for example:

"make an expression that checks whether delays grow across the departure time blocks through the day,"

and because it already knows the model’s dimensions and measures it composes the query, builds the artifact, and registers it as a new catalog entry.


Changing the range at run time

Every expression with the bound param in its execution graph can be passed runtime params. Hand a downstream aggregate a different range and the runtime walks back up the graph and rebuilds the source flights from the zip files for those months:

xorq catalog -p xorq-catalog-bts run flights-by-quarter-carrier \
    --params year_months=2025_10,2025_11

Less to decide

A published catalog moves the joins, grains, and aggregations out of the agent’s hands. Picking flights-by-quarter-carrier is really just picking flights and carrier; re-deriving it is fifty lines of pandas with ample opportunities to confabulate a join or the grain of an aggregate. New metrics are still composed by agents, but the surface the agent reasons about stays small. That smaller decision space is what took Haiku from 50% to 84% on DABStep.

Aside: rebinding a leaf directly

The param above is the supported way to change the range. But the range isn’t special: any sub-expression in the graph can be swapped structurally with replace_nodes. Here we abandon the param entirely and substitute a hardcoded months input. The catalog wraps the entry in a content-addressing tag, so first peel that off with .ls.fused, then match the parameterized months projection and replace it:

import xorq.api as xo
import xorq.vendor.ibis.expr.operations as ops
from xorq.catalog.catalog import Catalog
from xorq.common.utils.graph_utils import replace_nodes

# the published catalog entry, loaded straight from the catalog repo
flights = Catalog.from_repo_path("xorq-catalog-bts").load("flights")

# A hardcoded replacement for the parameterized months input.
# Note: pass xo.literal(...); a bare string is read as a column name.
new_months = xo.memtable({"_": [0]}).select(
    year_months=xo.literal("2025_10,2025_11")
)

def rebind(node, _):
    # months_input is the only projection sitting directly on a memtable
    is_months = (
        isinstance(node, ops.Project)
        and "year_months" in node.values
        and isinstance(node.parent, ops.InMemoryTable)
    )
    return new_months.op() if is_months else node

rebound = replace_nodes(rebind, flights.ls.fused).to_expr()

Everything downstream is untouched, the UDXF body, the semantic model, the aggregates; only the months input changes, and the swapped subgraph carries no param. Because that input is part of the hashed expression, rebound gets its own content hash and caches independently.


If this is useful, star Xorq and semantic-bts on GitHub.