Methodology & data dictionary

A full account of how the corpus is built: harvest sources and coverage definitions, the Atlas Civica Format, the FRBR versioning model, the ML labeling pipeline, and the normative scoring.

What’s coming

◆Coverage tiers: what “lit / pullable / trace / dark” mean, exactly
◆The Atlas Civica Format field-by-field, with stable identifier scheme
◆Versioning: Work / Expression / Manifestation + content-hash diffs
◆ML pipeline: llama-3.3-70b teacher labels → ModernBERT distillation plan
◆Normative dimensions & their limits — directional model estimates, not legal advice
◆How the function classes and normative scores are defined and validated

Get notified Explore what’s live

The underlying data already exists in the engine — this surface is being built next.