Skip to content

API Reference

AEFIndex

Manages the GeoParquet index for spatial/temporal queries.

from aef_loader import AEFIndex, DataSource

# Source Cooperative (public, no auth)
index = AEFIndex(source=DataSource.SOURCE_COOP)

# Or GCS (requires project for requester-pays)
index = AEFIndex(source=DataSource.GCS, gcp_project="my-project")

# Download the index (cached after first download)
await index.download()

# Load into memory
gdf = index.load()

# Query for tiles
tiles = await index.query(
    bbox=(-122.5, 37.5, -122.0, 38.0),  # WGS84 coordinates
    years=(2020, 2023),                   # Year range
    limit=10,                             # Max tiles
)

VirtualTiffReader

Opens COGs as virtual zarr stores organized by UTM zone, you can pass chunks as a parameter here to control chunking from the start.

For example, it's almost certain you want all your bands together on a single worker, so you would pass chunks={"band": -1} to ensure the band dimension is not split across chunks. Otherwise costly rechunks/shuffles are required between workers after zones are merged.

from aef_loader import VirtualTiffReader

async with VirtualTiffReader() as reader:
    # Load tiles organized by UTM zone
    tree = await reader.open_tiles_by_zone(tiles)

    # Returns a DataTree with zones as children:
    # ├── 10N/  → Dataset with A00-A63 variables in EPSG:32610
    # ├── 10S/  → Dataset with A00-A63 variables in EPSG:32710
    # ├── 11N/  → Dataset with A00-A63 variables in EPSG:32611
    # ...

DataSource

Enum for selecting the data backend.

from aef_loader import DataSource

DataSource.SOURCE_COOP  # AWS S3 (free, no auth)
DataSource.GCS          # Google Cloud Storage (requester pays)

Utility Functions

mask_nodata

Mask NoData values (-128) before processing.

from aef_loader import mask_nodata

masked = mask_nodata(data)

dequantize_aef

Dequantize int8 embeddings to float32. NoData values (-128) become NaN. Sets both nodata and _FillValue attrs to NaN on the output; all other existing attrs are preserved.

Formula: ((value / 127.5) ** 2) * sign(value)

from aef_loader import dequantize_aef

float_data = dequantize_aef(data)
# float_data.attrs["nodata"]     -> NaN
# float_data.attrs["_FillValue"] -> NaN

int8_to_float32

Cast int8 embeddings to float32 without dequantization. Unlike dequantize_aef, this performs a simple type cast (e.g. 64 → 64.0, not 0.252). NoData values (-128) become NaN. Useful for pipelines where the quantized embeddings are statistically equivalent and the nonlinear dequantization is unnecessary.

from aef_loader import int8_to_float32

float_data = int8_to_float32(data)
# 64 (int8) -> 64.0 (float32), -128 -> NaN

quantize_aef

Quantize float32 embeddings back to int8 for storage. Useful after dequantizing, reprojecting with bilinear interpolation, and re-quantizing. Sets both nodata and _FillValue attrs to -128 on the output; all other existing attrs are preserved.

from aef_loader import quantize_aef

int8_data = quantize_aef(float_data)
# int8_data.attrs["nodata"]     -> -128
# int8_data.attrs["_FillValue"] -> -128

set_aef_nodata

Stamp nodata and _FillValue attrs on a DataArray or Dataset. Returns a new object (the input is not modified). Defaults to -128 (the AEF int8 sentinel); pass np.nan for dequantized float data.

from aef_loader import set_aef_nodata

# Raw / quantized embeddings
da = set_aef_nodata(da)              # nodata=-128, _FillValue=-128

# Dequantized float data
da = set_aef_nodata(da, nodata=np.nan)  # nodata=NaN, _FillValue=NaN

reproject_datatree

Reproject all zones in a DataTree to a common CRS.

While not recommended, you can provide dst_nodata to reproject with a different nodata value.

Generally speaking, the library handles the change in the nodata value, e.g. when you dequantise from int8 to float32, nodata changes from -128 to NaN. This is done internally by setting the correct nodata value on the output of each transformation via set_aef_nodata so that downstream tools like xr_reproject can read it and handle it correctly during reprojection.

from aef_loader.utils import reproject_datatree
from odc.geo.geobox import GeoBox

target = GeoBox.from_bbox(
    bbox=(-122.5, 37.5, -122.0, 38.0),
    crs="EPSG:4326",
    resolution=0.0001,
)
combined = reproject_datatree(tree, target)

split_bands

Split a multi-band dataset into individual band variables.

from aef_loader import split_bands

split = split_bands(dataset)