Bulk dataset & DuckDB tier

The full normalized corpus — provisions, ML labels, the 42-subject taxonomy, and the rollup cube — as downloadable Parquet, plus DuckDB-over-R2 for ad-hoc analysis beyond the pre-aggregated API.

What’s coming

◆Parquet exports: provision_labels, provision_subjects, index_rollups, and the provisions_labeled fact table
◆DuckDB-over-R2 endpoints (/v1/corpus/search | stats | aggregate) for full-corpus queries
◆A documented, stable schema with GEOID / OCD-ID / URN / FRBR identifiers
◆Versioned snapshots with a changelog, so research is reproducible
◆Academic access terms; the legal text is public domain

Get notified Explore what’s live

The underlying data already exists in the engine — this surface is being built next.