## BLOG

FROM FEATURESTORE TO FEATUREHOUSE

By Hussain Sultan | July 10, 2025

TL;DR

In ~300 lines of Python, we build a FeatureHouse that ingests live weather, back-fills history, and serves sub-millisecond online features. It runs on DuckDB + DuckLake with Arrow Flight—no Spark cluster, no Redis required.

The Time Alignment Problem That Breaks ML in Production

Picture this: You’re building a fraud detection model. At 2:05 PM, a transaction comes in that you need to classify. Your feature store helpfully provides the “average transaction amount in the last hour”—but silently includes transactions from 2:07 PM that haven’t happened yet.

Your offline model evaluation looks perfect (95% accuracy!) because it had access to “future” data. But in production, your model fails spectacularly because it can’t see into the future.

This is feature leakage—and it’s one of the more common pitfalls in ML projects when they move from notebook to production. This is one of the many problems that a good Feature Store solves.

Why Traditional Feature Stores Fall Short

Feature Stores solve important problems—feature registry, materialization, and ML development safeguards (like label leakage). But they typically come with painful trade-offs:

No lineage visibility into upstream transformations
Limited compute engines (usually just pandas) for transformations
Disconnected from ML training—requires rewriting fitted transformers
Serving optimized for inference only—not training workloads

Could we compose a better architecture that eliminates these pain points?

See the complete code here: weather_flight.py

This includes two micro-libraries:

Enter FeatureHouse

A FeatureHouse is simply a data lake that speaks features minus the pain points:

DuckLake (or Iceberg) – versioned, transactional storage of features data with time-travel.
Arrow Flight – millisecond data transport of features data.
Xorq – declarative framework with a compute catalog to compose multi-engine workloads.

Everything lives in open formats. No vendor lock-in—move the storage and the platform follows.

Composing FeatureHouse with Xorq

As mentioned, Xorq is an open framework for creating composite data stacks to address any ML data processing use case. To make this possible, Xorq features:

A declarative and deferred DSL based on Ibis
YAML-serializable Expression Format based on Ibis.
A catalog server based on Apache Arrow Flight for fast data serving at scale, accessible as Python expressions for easier integration.

Time Guarantees: The Three-Layer Defense

FeatureHouse ensures leak-free features through a three-layer defense system:

1. Window Operations

# This window definition is safe - it only looks backwards
import xorq as xo

win6 = xo.window(
    group_by=["city"],
    order_by="timestamp",
    preceding=5,  # Only use previous 5 records
    following=0   # Never peek into the future
)

Window operations create safe rolling aggregations, but they’re just the first layer.

2. AsOf Joins: The Point-in-Time Engine

The real magic happens in the asof_join—this is what guarantees you only see data points that were actually available at prediction time:

# For each entity at each timestamp, get the LATEST feature
# that was computed at or BEFORE that timestamp
result_expr = entity_df.asof_join(
    feature_expr,
    on="event_timestamp",   # Join on time
    predicates=["user_id"], # Match entities
)

What makes AsOf joins special:

Regular joins match exact timestamps (almost never what you want)
AsOf joins find the “most recent available” feature at each point in time
They naturally prevent data peeking because they can’t look forwards in time

3. TTL: The Freshness Boundary

FeatureView(
    name="temp_mean_6s",
    ttl=timedelta(seconds=3600),  # 1 hour freshness guarantee
    ...
)

# After the asof join, filter out stale features
if view.ttl:
    result_expr = result_expr.filter(
        feature_timestamp >= (event_timestamp - view.ttl)
    )

TTL provides the final safety net:

No stale features: Data older than 1 hour is rejected
Bounded staleness: Online queries fail fast if features are too old

Why All Three Layers Matter

Consider this timeline for user fraud detection:

Timeline:    10:00   10:30   11:00   11:30   12:00   12:30
Features:      F1      F2      F3      F4      F5      F6
Prediction:                                    ^
                                          Need features here

At 12:00, we need to predict fraud. Here’s how each layer protects us:

Windows: F5 was computed using only transactions from 9:00-11:00 (no data peeking)
AsOf Join: We get F5 (the latest available at 12:00), not F6 (computed at 12:30)
Catalog Server: We will be serving up our Feature Store behind an Arrow Flight end-point as part of the Catalog Server that is configured with a DuckDB + DuckLake backend and serves the UDXFs for fetching OpenWeatherMap data. See the docs for a deeper dive into UDXFs.

Real-World Example

Prerequisites

Sign-up for the free OpenWeatherMap API
Set-up environment variables
Install Xorq with the required dependencies

pip install xorq[examples]

And clone the GitHub repository from here: xorq and navigate to examples/weather_flight.py.

A Five-Command Weather Demo

We monitor four cities and compute a six-second rolling mean temperature.

python weather_flight.py serve_features      # 1. expose UDXF expr on Catalog Server
python weather_flight.py push                # 2. fetch raw data -> DuckLake
python weather_flight.py materialize_online  # 3. roll & upload online features
python weather_flight.py infer               # 4. fetch online features
python weather_flight.py historical          # 5. leak-free back-fill

How it Works

Flight server catalogs and runs the UDXF that pulls live weather, serves it in batches via Arrow Flight.
DuckLake appends each pull as a snapshot (ACID, versioned).
Xorq rolls a 6-second window and materializes results to online, Flight-based DuckDB instance.
Point-in-time joins ensure no data peeking during training.
TTL enforcement prevents serving stale or premature features.

The Heart of the Implementation

# 1. Declarative entities & sources
city = Entity("city", key_column="city", timestamp_column="timestamp")
offline_source = DataSource("batch", duck_con, "weather_history", schema)
online_source  = DataSource("online", flight_backend, "weather_history", schema)

# 2. A leak-free feature definition
win6 = xo.window(group_by=["city"],
                 order_by="timestamp",
                 preceding=5, following=0)
temp_mean = Feature(
    "temp_mean_6h", city,
    expr=offline_source.table.temp_c.mean().over(win6)
)

# 3. Register with the store
view  = FeatureView("city_weather", offline_source, online_source,
                    entity=city, features=[temp_mean])
store = FeatureStore()
store.registry.register_entity(city)
store.register_source(offline_source)
store.register_source(online_source)
store.register_view(view)

What Makes This Safe

The magic happens in the window definition. Traditional feature stores might accidentally include future data. FeatureHouse explicitly prevents this:

Timeline: t-5  t-4  t-3  t-2  t-1  t0(now)  t+1
Window:   |-------------- used --------------|
                                           ^ignored^

When you request features at time t0, the rolling window only considers data from t-5 through t0. Data from t+1 is invisible.

We took inspiration from Feast’s API, and implement similar semantics as Feast. A proper integration with Feast will provide the project management and feature registry while delegating multi-engine transformation to Xorq and perhaps supporting the Catalog server as a datasource in Feast.

Feast + Xorq: Best of Both Worlds

The real power emerges when you combine Xorq’s deferred execution with Feast’s mature feature management platform. This gives you a clear separation of concerns:

Feast Handles: Project Management & Collaboration

Feature Registry—centralized catalog with search, lineage, and governance
Web UI—point-and-click feature discovery and SQL editing
Team Collaboration—feature sharing, versioning, and approval workflows
Monitoring—feature drift detection and data quality alerts

Xorq Handles: Multi-Engine Execution & Lineage

Time-safe transformations across any compute engine (DuckDB, Snowflake, BigQuery, etc.)
Point-in-time correctness with mathematical guarantees
End-to-end lineage from raw data to served features
Fit-transform workflows—train sklearn pipelines, serve as UDXF endpoints

Micro-Benchmarks

See the code here: benchmark gist

This confirms that our current bottleneck lies not in the transport protocol, but in the storage engine—DuckDB. DuckDB’s lack of support for concurrent writes means that PUT throughput quickly flattens as load increases. In contrast, Arrow Flight is engineered for high-performance parallel data transfer—benchmarks show it can achieve up to 4,800 MB/s for DoPut and 6,000 MB/s for DoGet, utilizing nearly full network bandwidth.

Note that this is not benchmarking DuckLake—instead this is benchmarking an in-process DuckDB instance which serves the materialized features in the example above.

PUT vs GET Throughput - 100 Columns (Xorq Loopback Benchmark)

What’s Next

This demo shows the core concepts, but a production FeatureHouse would add:

Feature Registry for team collaboration
Streaming Sources for real-time ingestion
Push Sources for batch updates
Feast Integration for project management UI (see work-in-progress: feat/feast-utils)
Semantic Joins to model features a la FeatureTools (Deep Feature Synthesis)

The beauty of the composable approach is that each piece can be swapped independently. Need Snowflake instead of DuckDB? Just change the connection string.

Conclusion

Heavyweight feature platforms solved yesterday’s problems at yesterday’s scale.

A FeatureHouse built with Xorq, Arrow Flight, and DuckLake gives you:

Leak-free time travel through point-in-time windows
Fast online serving with sub-millisecond latency
Batch back-fills that guarantee consistency

All in a few hundred lines of declarative Python, running anywhere you can copy a file.

The lake-first approach means your features live in open formats, your transformations are portable across engines, and your time semantics are mathematically sound. The Catalog Server lets you expose these features over a portable Arrow Flight server that abstracts away the execution from its declaration.

Give it a spin at github.com/xorq-labs/xorq and let us know how far you can push the lake!

See also: