The Open Compute Framework for AI

[ Open Compute Format for Ai ]

Reuse, Optimize, and Govern your AI Compute

Define once, run anywhere—Xorq lets teams catalog, share, and serve ML in pure Python with declarative, portable, and governed expressions.

Install Xorq

How it Works

Simplify ML data processing with open source Xorq

30+ Source integrations

Reuse

ML Training

Define and run portable, multi-step ML training pipelines with deferred caching & observability.

Get started in seconds

Install Xorq

Request a free build

from xorq.ml import deferred_fit_predict_sklearn

deferred_model, model_udaf, predict = deferred_fit_predict_sklearn(
    expr=encoded_train,
    target="deposit",
    features=["encoded", "balance"],
    cls=MyEstimator,
)
predictions = encoded_test.mutate(predicted=predict.on_expr(encoded_test))

ML Training

Define and run portable, multi-step ML training pipelines with deferred caching & observability.

Reuse

Inference

Serve and reuse models anywhere with low-latency Arrow Flight and consistent scoring logic.

Get started in seconds

Install Xorq

Request a free build

from xorq.quickstart import transform_predict

pipeline = fetcher_expr.select("title").mutate(sentiment=transform_predict.on_expr)
❯ xorq build scripts/hn_inference.py -e pipeline

Inference

Serve and reuse models anywhere with low-latency Arrow Flight and consistent scoring logic.

Optimize

Compute Offloading

Push heavy logic to cheaper backends like DuckDB or Postgres with into_backend.

Get started in seconds

Install Xorq

Request a free build

pg = xo.postgres.connect_env()
ddb = xo.duckdb.connect()

expr = pg.table("batting").filter(_.yearID == 2015).into_backend(ddb).group_by("teamID").agg(_.G.mean())

Compute Offloading

Push heavy logic to cheaper backends like DuckDB or Postgres with into_backend.

Optimize

Hash-based Caching

Avoid redundant computation by caching expressions locally or to the cloud.

Get started in seconds

Install Xorq

Request a free build

from xorq.caching import ParquetStorage

expr = (
    xo.examples.iris.fetch()
    .filter(_.species == "Setosa")
    .cache(storage=ParquetStorage(source=xo.connect()))
)

Hash-based Caching

Avoid redundant computation by caching expressions locally or to the cloud.

Govern

Lineage and Observability

Track column-level lineage and collect OTel metrics for every expression automatically.

Get started in seconds

Install Xorq

Request a free build

from xorq.common.utils.lineage_utils import build_column_trees, print_tree

trees = build_column_trees(expr)
print_tree(trees["total_discount"])

Lineage and Observability

Track column-level lineage and collect OTel metrics for every expression automatically.

[ meet xorq ]

The Missing Layer in AI Infrastructure

Modern storage formats like Iceberg solve data reproducibility.
But compute—the transformations, features, models—remains brittle and siloed.

Xorq is the compute catalog: a unified layer to declare, reuse, and observe every expression of compute—across engines, teams, and environments.

Read the Docs

Legacy

Complex, imperative code

Low Code Reuse

Platform lock-in

Wasteful Re-Compute Costs

Opaque Lineage

Lack of Observability

With Xorq

Simple, Declarative Pipeline Definition

Greater Reuse & Collaboration

Run on any engine

Built-in Caching

Automatic Lineage

Observable by default

Thomas McGeehan, 66 Degrees

"Xorq is a new compute framework with quietly radical ideas. Deferred pipelines. Cross-engine execution. Portable Python UDFS. Arrow-native caching."

Wes McKinney, Posit PBC

"I'm excited about Xorq! Ibis and Apache DataFusion brought together to orchestrate multi-engine AI compute, all powered by Apache Arrow"

Julien Hurault, Boring Data

"Xorq was the only Python library that provided the openness, composability, and simplicity we needed to build the simplest, MCP-ready semantic layer on Earth."

Daniel Ashy, Yendo

"The Xorq framework greatly simplified our ML pipelines, improving performance 10x and reducing the compute and storage resources required to run them."

Reusability

Next-level Reusability

You've never experienced reuse like this.

Reusable Xorq "expressions" integrate seamlessly with Python. Easy to share and discover with compile-time validation and clear lineage.

How it works

Request a free build

Governance

Your AI Engineering Catalog

Innovate fast and confidently. Xorq automatically catalogs your AI compute artifacts as you go to facilitate reuse, troubleshooting, and quality.

Install Xorq

Request a free build

Optimization

Fast Iteration. Lower Compute Costs.

By the time you have it working locally, it's already optimized for production, with caching, millisecond data exchange, and other high-performance features.

Install Xorq

Request a free build

Collaboration

Put an End to ML Silos

One platform to support your entire AI engineering organization. Securely share and discover reusable artifacts across individuals, teams, and partners.

Install Xorq

Request a free build

[ The Compute Catalog ]

Accelerate AI Innovation

A compute catalog is a powerful asset. Share and discover reusable expressions, combine them into new composite expressions, and observe and troubleshoot their behavior.

Xorq Cloud Early Access

Just build, run, and serve.

Reusability, optimization, and governance are automatic.

Install Xorq

Request a demo

Build

Declare and compile multi-engine Python

Xorq catalogs every expression: versioned, composable, lineage-tracked artifacts represented as YAML format.

Run

Build Once, Run Anywhere.

Run Xorq expressions as portable UDXFs with cross-engine optimizations, caching & observability.

Serve

High-performance Catalog Servers

Create portable Catalog Servers, with millisecond data transfer via Apache Arrow Flight.

Use cases

Plug xorq into your stack

From experimentation to production, xorq fits into the workflows your team already uses—without forcing new tools or rewrites.

Data Preparation

Built-in Inference for DataFrames

Users often build custom machinery for their common ML inference tasks e.g. online inference and batch scoring. To address the batch scoring use-case, we are previewing a new built-in UDF for XGBoost models in that can be used to score data row-wise via a DataFrame API. Moreoever, with ’s multi-engine support, data practitioners do not have to leave their DataFrame API to connect to SQL sources from a familiar high-level, pandas-like, DSL.

Get Started

Data Preparation

Migrating to UV

As Python packaging tools evolve, uv has emerged as a promising alternative to Poetry, offering faster dependency resolution and simplified environment management. This guide walks through migrating a Python project from Poetry to uv, including handling maturin build backend, pre-commit hooks, and CI/CD configuration.

Get Started

Workflow

Relational Deep Learning

Deploy an ML pipeline across DuckDB for training and Spark for large-scale scoring—without rewriting a single line.

Get Started

Install xorq

Get started with Xorq

Start yourself or request your free build.

Install Xorq

Spin up your first Xorq engine in minutes—locally or in the cloud.

Install with pip

pip install xorq

Or use nix instead

nix run github:xorq-labs/xorq

Xorq quick start

Xorq docs

# of stars

# of followers

Request a demo

Not sure where to start? We’ll build your first Xorq UDXF—free. Tailored just to your stack, your use case, and your goals.

request a demo

Start with a template

Start with a pre-built pipeline tailored to real-world ML tasks—modify it, run it, and make it yours in minutes.

browse all templates

30+ ML integrations

the pipeline

Tutorials, Insights & Updates

Ideas and insights from a team building the most portable pipeline runtime.

View All

Xorq: The Open Compute Format for AI Data Engineering

Announcing a major upgrade to Xorq, an open compute catalog that helps teams compose, reuse, ship, and observe AI compute.

From FeatureStore to FeatureHouse

In ~300 lines of Python, we build a FeatureHouse that ingests live weather, back-fills history, and serves sub-millisecond online features. It runs on DuckDB + DuckLake with Arrow Flight—no Spark cluster, no Redis required.

Pipelines as UDFs

Using UDFs to simplify data pipeline use cases without building pipelines at all.

View All

FAQs

Find answers to common questions about Xorq below.

What is xorq?

Xorq is an open-source compute format for AI; a unified layer to declare, catalog, reuse, and observe every expression of compute—across engines, teams, and environments.

Why was Xorq created?

Xorq was created by a team of data scientists on a mission to help others accelerate AI innovation by simplifying and standardizing the declaration, reuse, portability, and governance of ML data processing.

Is Xorq easy to use?

Xorq is very easy to adopt. The open source library enhances Python with a declarative pandas-style syntax for defining AI data processing. It abstracts away implementation and data engineering details that normally complicate AI data processing and slow down production deployment.

Where can I learn more?

You can explore our documentation for detailed guides and tutorials. Additionally, our blog features insights and updates on xorq's capabilities. Join our community for discussions and shared learning experiences.

Still have questions?

We're here to help you!

contact support

Simpler ML,
Faster AI innovation.

Try Xorq today, or request a walkthrough.

Install Xorq

Request a demo

Copied to clipboard!

The Missing Layer in AI Infrastructure

Legacy

With Xorq

Next-level Reusability

Your AI Engineering Catalog

Fast Iteration. Lower Compute Costs.

Put an End to ML Silos

Accelerate AI Innovation

Just build, run, and serve.

Declare and compile multi-engine Python

Build Once, Run Anywhere.

High-performance Catalog Servers

Plug xorq into your stack

Built-in Inference for DataFrames

Migrating to UV

Relational Deep Learning

Get started with Xorq

Install Xorq

Request a demo

Start with a template

Tutorials, Insights & Updates

Xorq: The Open Compute Format for AI Data Engineering

From FeatureStore to FeatureHouse

Pipelines as UDFs

FAQs

Still have questions?

Simpler ML, Faster AI innovation.

Simpler ML,
Faster AI innovation.