docs restructuring

2026-04-05 21:00:40 -04:00 · 2026-04-05 21:00:40 -04:00 · 745172adb8
commit 745172adb8
parent 3e010a6d0c
18 changed files with 21 additions and 3 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -15,9 +15,12 @@ Bun workspace monorepo. Three packages:
 | Codebook ethos (reasoning behind every codebook decision) | `docs/CODEBOOK-ETHOS.md` |
 | Project narrative (decisions, roadblocks, lessons) | `docs/NARRATIVE.md` |
 | Project status & todo list | `docs/STATUS.md` |
-| v1 codebook (preserved) | `docs/LABELING-CODEBOOK-v1.md` |
-| v1 narrative (preserved) | `docs/NARRATIVE-v1.md` |
-| Implementation plan for labelapp | `docs/labelapp-plan.md` |
+| Specificity improvement plan (pending threshold tuning) | `docs/SPECIFICITY-IMPROVEMENT-PLAN.md` |
+| Training docs (DAPT procedure, data quality audit, strategy notes) | `docs/training/` |
+| Data pipeline reference (tech guide, HTML cleaning, filing generators) | `docs/data-pipeline/` |
+| v1 archive (codebook, narrative, iteration logs, analyses) | `docs/archive/v1/` |
+| Planning archive (project overview, implementation plan, labelapp plan) | `docs/archive/planning/` |
+| Professor-provided reference materials | `docs/reference/` |
 | Labelapp-specific agent guide | `labelapp/AGENTS.md` |
 | Docker compose (Postgres) | `docker-compose.yaml` (root) |
 | DB credentials | `sec_cybert` / `sec_cybert` / `sec_cybert` on localhost:5432 |
--- a/docs/archive/planning/PROJECT-OVERVIEW.md
+++ b/docs/archive/planning/PROJECT-OVERVIEW.md
--- a/docs/archive/planning/implementation-plan.md
+++ b/docs/archive/planning/implementation-plan.md
--- a/docs/archive/planning/labelapp-plan.md
+++ b/docs/archive/planning/labelapp-plan.md
--- a/docs/archive/planning/signoff-deliverable.md
+++ b/docs/archive/planning/signoff-deliverable.md
--- a/docs/archive/v1/CODEBOOK-RATIONALE.md
+++ b/docs/archive/v1/CODEBOOK-RATIONALE.md
--- a/docs/archive/v1/F1-STRATEGY.md
+++ b/docs/archive/v1/F1-STRATEGY.md
--- a/docs/archive/v1/LABELING-CODEBOOK-v1.md
+++ b/docs/archive/v1/LABELING-CODEBOOK-v1.md
--- a/docs/archive/v1/NARRATIVE-v1.md
+++ b/docs/archive/v1/NARRATIVE-v1.md
--- a/docs/archive/v1/POST-LABELING-PLAN.md
+++ b/docs/archive/v1/POST-LABELING-PLAN.md
--- a/docs/archive/v1/T5-ANALYSIS.md
+++ b/docs/archive/v1/T5-ANALYSIS.md
--- a/docs/archive/v1/V35-ITERATION-LOG.md
+++ b/docs/archive/v1/V35-ITERATION-LOG.md
--- a/docs/data-pipeline/EDGAR-FILING-GENERATORS.md
+++ b/docs/data-pipeline/EDGAR-FILING-GENERATORS.md
--- a/docs/data-pipeline/SEC-HTML-CLEANING.md
+++ b/docs/data-pipeline/SEC-HTML-CLEANING.md
--- a/docs/data-pipeline/TECHNICAL-GUIDE.md
+++ b/docs/data-pipeline/TECHNICAL-GUIDE.md
--- a/docs/training/DAPT-PROCEDURE.md
+++ b/docs/training/DAPT-PROCEDURE.md
--- a/docs/training/DATA-QUALITY-AUDIT.md
+++ b/docs/training/DATA-QUALITY-AUDIT.md
--- a/docs/training/STRATEGY-NOTES.md
+++ b/docs/training/STRATEGY-NOTES.md
@ -194,3 +194,18 @@ Option 2 is defensible: "Human inter-annotator agreement on specificity (alpha=0
 The F1 threshold is achievable. The project is strong. The specificity distribution is the only structural problem, and it's fixable by aligning the codebook with the professor's construct (which we drifted from by being too precise). Everything else — the T5 ambiguity, the representative sample, the small classes — is manageable.

 The worst thing to do right now is panic and pivot. The second worst thing is to agonize and delay. Pick a path, execute, get real numbers.
+
+---
+
+## Decision Made: Option A (executed)
+
+**Chosen:** Option A — broaden Level 2 + loosen Level 4 to 1+ QV fact, full v2 codebook reboot.
+
+**What happened:**
+- v2 codebook approved 2026-04-04 (5/6 group approval)
+- Stage 1 re-run: Grok 4.1 Fast ×3 self-consistency panel, $135.51
+- Specificity distribution shifted to L1=41.1%, L2=22.7%, L3=24.9%, L4=11.4% — healthy
+- Independent threshold heads replaced CORAL, solving the spec F1 bottleneck (0.517 → 0.945)
+- Final model: Cat F1=0.943, Spec F1=0.945, both well above 0.80 target
+
+See `docs/STATUS.md` for full pipeline status and `docs/SPECIFICITY-IMPROVEMENT-PLAN.md` for the architecture iteration.