Transparency
Methodology & data dictionary
A full account of how the corpus is built: harvest sources and coverage definitions, the Atlas Civica Format, the FRBR versioning model, the ML labeling pipeline, and the normative scoring.
What’s coming
- ◆Coverage tiers: what “lit / pullable / trace / dark” mean, exactly
- ◆The Atlas Civica Format field-by-field, with stable identifier scheme
- ◆Versioning: Work / Expression / Manifestation + content-hash diffs
- ◆ML pipeline: llama-3.3-70b teacher labels → ModernBERT distillation plan
- ◆Normative dimensions & their limits — directional model estimates, not legal advice
- ◆How the function classes and normative scores are defined and validated
The underlying data already exists in the engine — this surface is being built next.