DataFrame backends

Arrow-native output, your DataFrame library.

xbbg is DataFrame-library agnostic. The core engine returns Apache Arrow data; conversion to your preferred library happens just before results are returned.

The backend abstraction is provided by narwhals, which handles the translation layer between Arrow and the supported libraries.

Supported Backends

Eager Backends

Eager backends return a fully materialized DataFrame immediately.

Backend	Output type	Best for
`pandas`	`pd.DataFrame`	Traditional workflows, ecosystem compatibility
`polars`	`pl.DataFrame`	High performance, large datasets
`pyarrow`	`pa.Table`	Zero-copy interop, memory efficiency
`narwhals`	Narwhals DataFrame	Library-agnostic code
`modin`	Modin DataFrame	Pandas API with parallel execution
`cudf`	cuDF DataFrame	GPU-accelerated processing (requires NVIDIA)

Lazy Backends

Lazy backends defer execution. The query graph is built when you call xbbg functions and evaluated only when you explicitly trigger execution (e.g. .collect() for Polars, .execute() for DuckDB).

Backend	Output type	Best for
`polars_lazy`	`pl.LazyFrame`	Deferred execution, query optimization
`narwhals_lazy`	Narwhals LazyFrame	Library-agnostic lazy evaluation
`duckdb`	DuckDB relation	SQL analytics, OLAP queries
`dask`	Dask DataFrame	Out-of-core and distributed computing
`ibis`	Ibis Table	Unified interface to many backends
`pyspark`	Spark DataFrame	Big data processing (requires Java)
`sqlframe`	SQLFrame DataFrame	SQL-first DataFrame operations

WARNING

Lazy backends support LONG, SEMI_LONG, LONG_TYPED, and LONG_WITH_METADATA output formats.

Selecting a Backend

Global default

Set the backend once for your session. All subsequent calls use it unless overridden.

python

import xbbg
from xbbg import Backend

xbbg.set_backend(Backend.POLARS)

# All calls now return pl.DataFrame
from xbbg import blp
df = blp.bdp('AAPL US Equity', 'PX_LAST')

You can also pass a string:

python

xbbg.set_backend('polars')

Per-call override

Pass backend as a keyword argument to any data function. This overrides the global default for that call only.

python

from xbbg import blp

# Overrides the global default for this call
df = blp.bdp('AAPL US Equity', 'PX_LAST', backend='pandas')

Checking Availability

Not all backends are installed in every environment. Use these utilities to inspect what is available before writing code that assumes a specific backend.

python

from xbbg import get_available_backends, is_backend_available, print_backend_status

# Returns a list of installed backend names
print(get_available_backends())
# ['pandas', 'polars', 'pyarrow', ...]

# Check a specific backend
if is_backend_available('polars'):
    print("Polars is installed")

# Print a detailed status table for all backends
print_backend_status()

Backend Examples

pandas

python

from xbbg import blp, Backend

df = blp.bdp('AAPL US Equity', 'PX_LAST', backend=Backend.PANDAS)
# Returns pd.DataFrame
print(type(df))  # <class 'pandas.core.frame.DataFrame'>

polars

python

from xbbg import blp, Backend

df = blp.bdp('AAPL US Equity', 'PX_LAST', backend=Backend.POLARS)
# Returns pl.DataFrame
print(type(df))  # <class 'polars.dataframe.frame.DataFrame'>

pyarrow

python

from xbbg import blp, Backend

table = blp.bdh(
    'SPX Index',
    'PX_LAST',
    start_date='2024-01-01',
    end_date='2024-12-31',
    backend=Backend.PYARROW,
)
# Returns pa.Table — no copy from the internal representation
print(type(table))  # <class 'pyarrow.lib.Table'>

duckdb

python

from xbbg import blp, Backend

relation = blp.bdh(
    'SPX Index',
    'PX_LAST',
    start_date='2024-01-01',
    end_date='2024-12-31',
    backend=Backend.DUCKDB,
)
# Returns a DuckDB relation — not yet executed
result = relation.fetchdf()  # trigger execution, returns pd.DataFrame

Performance Considerations

PyArrow is the zero-copy option. Because xbbg's internal representation is Arrow, returning a pa.Table requires no serialization or memory copy. Use this when passing data to other Arrow-compatible systems (DuckDB, Polars, Spark via Arrow flight, etc.) or when memory pressure matters.

Polars is the best choice for pure computation on large datasets. Its columnar engine and lazy execution model handle datasets that would be slow or impractical in pandas. The polars_lazy backend lets you chain additional query steps before triggering evaluation.

pandas remains the widest-compatibility option. Use it when integrating with libraries that only accept pd.DataFrame, or when working with existing pandas-based pipelines. It is not required as a dependency — if your code uses the polars or pyarrow backends exclusively, pandas does not need to be installed.

Lazy backends (DuckDB, Polars lazy, Dask, etc.) are useful when you want to compose queries across multiple xbbg calls before materializing any data, or when the result set is too large to hold in memory.

Output Formats — control the shape of returned data (LONG, LONG_TYPED, etc.)
API Reference — full function documentation with all parameters

Arrow-native output, your DataFrame library.

Supported Backends ​

Eager Backends ​

Lazy Backends ​

Selecting a Backend ​

Global default ​

Per-call override ​

Checking Availability ​

Backend Examples ​

Performance Considerations ​

Related ​

Supported Backends

Eager Backends

Lazy Backends

Selecting a Backend

Global default

Per-call override

Checking Availability

Backend Examples

Performance Considerations

Related