5.1 KiB
Project Status — 2026-03-30
What's Done
Data Pipeline
- 72,045 paragraphs extracted from ~9,000 10-K filings + 207 8-K filings
- 14 filing generators identified, quality metrics per generator
- 6 surgical patches applied (orphan words + heading stripping)
- Quality tier system: clean (80.7%), headed (10.3%), degraded (6.0%), minor (3.0%)
- Embedded bullet detection (2,163 paragraphs flagged degraded, 0.5x sample weight)
- All data integrity rules formalized (frozen originals, UUID-linked patches)
GenAI Labeling (Stage 1)
- Prompt v2.5 locked after 12+ iterations
- 3-model panel: gemini-flash-lite + mimo-v2-flash + grok-4.1-fast
- 150,009 annotations completed ($115.88, 0 failures)
- Orphan word re-annotation: 1,537 paragraphs re-run ($3.30), merged into
stage1.patched.jsonl - Codebook v3.0 with 3 major rulings
DAPT + TAPT Pre-Training
- DAPT corpus: 14,568 documents, ~1.056B tokens, cleaned (XBRL, URLs, page numbers stripped)
- DAPT training complete: eval loss 0.7250, perplexity 1.65. 1 epoch on 500M tokens, ~14.5h on RTX 3090.
- DAPT checkpoint at
checkpoints/dapt/modernbert-large/final/ - TAPT config: 5 epochs, whole-word masking, seq_len=512, batch=32
- Custom
WholeWordMaskCollator(upstreamtransformerscollator broken for BPE tokenizers) - Python 3.14 → 3.13 rollback (dill/datasets pickle incompatibility)
- Procedure documented in
docs/DAPT-PROCEDURE.md
Documentation
docs/DATA-QUALITY-AUDIT.md— full audit with all patches and quality tiersdocs/EDGAR-FILING-GENERATORS.md— 14 generators with signatures and quality profilesdocs/DAPT-PROCEDURE.md— pre-flight checklist, commands, monitoring guidedocs/NARRATIVE.md— 11 phases documented through TAPT launch
What's In Progress
TAPT Training — Running
Training on 72K Item 1C paragraphs using DAPT checkpoint. 5 epochs, whole-word masking, seq_len=512, batch=32. Early loss: 1.46 → 1.40 (first 1% of training). Expected ~1.6h total on RTX 3090. Expecting final loss ~1.0-1.2.
bun run py:train dapt --config configs/tapt/modernbert.yaml
Human Labeling (139/1,200)
- 3 of 6 annotators started: 68 + 50 + 21 paragraphs completed
- Deployed via labelapp with quiz gating + warmup
- Each annotator needs 600 paragraphs (BIBD assignment)
What's Next (in dependency order)
1. Fine-tuning pipeline (no blockers — can build now)
Build the dual-head classifier (7-class category + 4-class specificity) with:
- Shared ModernBERT backbone + 2 linear classification heads
- Sample weighting from quality tiers (1.0 clean/headed/minor, 0.5 degraded)
- Confidence-stratified label assembly (unanimous → majority → judge)
- Train/val/test split with stratification
- Ablation configs: base vs +DAPT vs +DAPT+TAPT
3. Judge prompt v3.0 update (no blockers — can do now)
Update buildJudgePrompt() with codebook v3.0 rulings:
- Materiality disclaimers → Strategy Integration
- SPACs → None/Other
- Person-vs-function test for Management↔RMP Then re-bench against gold labels.
4. Training data assembly (blocked on judge + human labels)
Combine all annotation sources into final training dataset:
- Unanimous Stage 1 labels (35,204 paragraphs, ~97% accuracy)
- Calibrated majority labels (~9-12K, ~85-90%)
- Judge high-confidence labels (~2-3K, ~84%)
- Judge low-confidence → downweight or exclude
- Quality tier sample weights applied
4. Judge production run (blocked on human gold labels)
Run judge on ~409 unresolved + flagged majority cases. Validate against expanded gold set from human labels.
5. Fine-tuning + ablations (blocked on steps 1-3)
7 experiments: {base, +DAPT, +DAPT+TAPT} × {with/without SCL} + best config.
6. Evaluation + paper (blocked on everything above)
Full GenAI benchmark (9 models) on 1,200 holdout. Comparison tables. Write-up.
Parallel Tracks
Track A (GPU): DAPT ✓ → TAPT (running) → Fine-tuning → Eval
↑
Track B (API): Judge v3 → Judge run ───────────┤
↑
Track C (Human): Labeling (139/1200) → Gold set validation
↑
Track D (Code): Fine-tune pipeline build ───────┘
TAPT finishes in ~1.5h. Track D (fine-tune pipeline) can proceed now. Track B can start (prompt update) but production run waits for Track C. Everything converges at fine-tuning.
Key File Locations
| What | Where |
|---|---|
| Patched paragraphs | data/paragraphs/training.patched.jsonl (49,795) |
| Patched annotations | data/annotations/stage1.patched.jsonl (150,009) |
| Quality scores | data/paragraphs/quality/quality-scores.jsonl (72,045) |
| DAPT corpus | data/dapt-corpus/shard-*.jsonl (14,756 docs) |
| DAPT config | python/configs/dapt/modernbert.yaml |
| TAPT config | python/configs/tapt/modernbert.yaml |
| DAPT checkpoint | checkpoints/dapt/modernbert-large/final/ |
| Training CLI | python/main.py dapt --config ... |