By Hussain Sultan | March 27, 2025
Today, we’re excited to announce Xorq, an open source compute format for AI that greatly simplifies building and deploying multi-engine transformations, features, models, ML pipelines, and other AI compute with first-class support for pandas.
We think of Xorq as the missing analog to Apache Iceberg—an open format that makes compute modular and shareable.
Before founding Xorq, we were data scientists building complex ML pipelines at leading tech companies. We were frustrated by the brittleness of bespoke tooling required to fluidly develop and deploy ML pipelines:
We wanted a more standard and ergonomic way to build, cache, and serve pipelines—without locking ourselves into a single engine. And it was with that idea that Dan, Daniel, and I founded Xorq on a mission to help Python developers make a quantum leap in AI innovation.
We’ll tell the story behind the name Xorq on another day, but for now, just know that it’s from out of this world, and we pronounce it “zork.”
Xorq is for Python developers who are building increasingly heterogeneous ML pipelines. It is especially valuable to people who require the ability to:
As such, Xorq is well-suited to ML and AI use cases such as:
Get started easily with:
pip install xorqHere’s a simple example demonstrating Xorq’s ease-of-use:
import xorq as xo
import xorq.vendor.ibis.expr.datatypes as dt
@xo.udf.make_pandas_udf(
schema=xo.schema({"title": str, "url": str}),
return_type=dt.bool,
name="url_in_title",
)
def url_in_title(df):
return df.apply(
lambda s: (s.url or "") in (s.title or ""),
axis=1,
)
con = xo.connect()
name = "hn-data-small.parquet"
expr = xo.deferred_read_parquet(
con,
xo.options.pins.get_path(name),
name,
).mutate(**{"url_in_title": url_in_title.on_expr})
expr.execute().head()We really look forward to your feedback on the new release of Xorq. Here are some resources to help you get started:
We’re targeting our V1 release for June. Between now and then, our roadmap includes several key enhancements:
What does it mean to be pandas-style?
pandas-style: users can define their functions expecting to receive a pandas DataFrame and returning a python object castable to a pyarrow object, as opposed to having to receive pyarrow objects and returning pyarrow objects.
What does sklearn-style mean?
sklearn-style: users can either create deferred pipelines directly referencing scikit-learn classes (which conform to the fit-transform / fit-predict API) or create their own deferred operations by providing the fit and transform/predict methods. See the scikit-learn developer guide.