From 745172adb881f43f30bad87b107f395d5348135a Mon Sep 17 00:00:00 2001 From: Joey Eamigh <55670930+JoeyEamigh@users.noreply.github.com> Date: Sun, 5 Apr 2026 21:00:40 -0400 Subject: [PATCH] docs restructuring --- CLAUDE.md | 9 ++++++--- docs/{ => archive/planning}/PROJECT-OVERVIEW.md | 0 .../{ => archive/planning}/implementation-plan.md | 0 docs/{ => archive/planning}/labelapp-plan.md | 0 .../{ => archive/planning}/signoff-deliverable.md | 0 docs/{ => archive/v1}/CODEBOOK-RATIONALE.md | 0 docs/{ => archive/v1}/F1-STRATEGY.md | 0 docs/{ => archive/v1}/LABELING-CODEBOOK-v1.md | 0 docs/{ => archive/v1}/NARRATIVE-v1.md | 0 docs/{ => archive/v1}/POST-LABELING-PLAN.md | 0 docs/{ => archive/v1}/T5-ANALYSIS.md | 0 docs/{ => archive/v1}/V35-ITERATION-LOG.md | 0 .../EDGAR-FILING-GENERATORS.md | 0 docs/{ => data-pipeline}/SEC-HTML-CLEANING.md | 0 docs/{ => data-pipeline}/TECHNICAL-GUIDE.md | 0 docs/{ => training}/DAPT-PROCEDURE.md | 0 docs/{ => training}/DATA-QUALITY-AUDIT.md | 0 docs/{ => training}/STRATEGY-NOTES.md | 15 +++++++++++++++ 18 files changed, 21 insertions(+), 3 deletions(-) rename docs/{ => archive/planning}/PROJECT-OVERVIEW.md (100%) rename docs/{ => archive/planning}/implementation-plan.md (100%) rename docs/{ => archive/planning}/labelapp-plan.md (100%) rename docs/{ => archive/planning}/signoff-deliverable.md (100%) rename docs/{ => archive/v1}/CODEBOOK-RATIONALE.md (100%) rename docs/{ => archive/v1}/F1-STRATEGY.md (100%) rename docs/{ => archive/v1}/LABELING-CODEBOOK-v1.md (100%) rename docs/{ => archive/v1}/NARRATIVE-v1.md (100%) rename docs/{ => archive/v1}/POST-LABELING-PLAN.md (100%) rename docs/{ => archive/v1}/T5-ANALYSIS.md (100%) rename docs/{ => archive/v1}/V35-ITERATION-LOG.md (100%) rename docs/{ => data-pipeline}/EDGAR-FILING-GENERATORS.md (100%) rename docs/{ => data-pipeline}/SEC-HTML-CLEANING.md (100%) rename docs/{ => data-pipeline}/TECHNICAL-GUIDE.md (100%) rename docs/{ => training}/DAPT-PROCEDURE.md (100%) rename docs/{ => training}/DATA-QUALITY-AUDIT.md (100%) rename docs/{ => training}/STRATEGY-NOTES.md (93%) diff --git a/CLAUDE.md b/CLAUDE.md index fc05bc8..b9a3db4 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -15,9 +15,12 @@ Bun workspace monorepo. Three packages: | Codebook ethos (reasoning behind every codebook decision) | `docs/CODEBOOK-ETHOS.md` | | Project narrative (decisions, roadblocks, lessons) | `docs/NARRATIVE.md` | | Project status & todo list | `docs/STATUS.md` | -| v1 codebook (preserved) | `docs/LABELING-CODEBOOK-v1.md` | -| v1 narrative (preserved) | `docs/NARRATIVE-v1.md` | -| Implementation plan for labelapp | `docs/labelapp-plan.md` | +| Specificity improvement plan (pending threshold tuning) | `docs/SPECIFICITY-IMPROVEMENT-PLAN.md` | +| Training docs (DAPT procedure, data quality audit, strategy notes) | `docs/training/` | +| Data pipeline reference (tech guide, HTML cleaning, filing generators) | `docs/data-pipeline/` | +| v1 archive (codebook, narrative, iteration logs, analyses) | `docs/archive/v1/` | +| Planning archive (project overview, implementation plan, labelapp plan) | `docs/archive/planning/` | +| Professor-provided reference materials | `docs/reference/` | | Labelapp-specific agent guide | `labelapp/AGENTS.md` | | Docker compose (Postgres) | `docker-compose.yaml` (root) | | DB credentials | `sec_cybert` / `sec_cybert` / `sec_cybert` on localhost:5432 | diff --git a/docs/PROJECT-OVERVIEW.md b/docs/archive/planning/PROJECT-OVERVIEW.md similarity index 100% rename from docs/PROJECT-OVERVIEW.md rename to docs/archive/planning/PROJECT-OVERVIEW.md diff --git a/docs/implementation-plan.md b/docs/archive/planning/implementation-plan.md similarity index 100% rename from docs/implementation-plan.md rename to docs/archive/planning/implementation-plan.md diff --git a/docs/labelapp-plan.md b/docs/archive/planning/labelapp-plan.md similarity index 100% rename from docs/labelapp-plan.md rename to docs/archive/planning/labelapp-plan.md diff --git a/docs/signoff-deliverable.md b/docs/archive/planning/signoff-deliverable.md similarity index 100% rename from docs/signoff-deliverable.md rename to docs/archive/planning/signoff-deliverable.md diff --git a/docs/CODEBOOK-RATIONALE.md b/docs/archive/v1/CODEBOOK-RATIONALE.md similarity index 100% rename from docs/CODEBOOK-RATIONALE.md rename to docs/archive/v1/CODEBOOK-RATIONALE.md diff --git a/docs/F1-STRATEGY.md b/docs/archive/v1/F1-STRATEGY.md similarity index 100% rename from docs/F1-STRATEGY.md rename to docs/archive/v1/F1-STRATEGY.md diff --git a/docs/LABELING-CODEBOOK-v1.md b/docs/archive/v1/LABELING-CODEBOOK-v1.md similarity index 100% rename from docs/LABELING-CODEBOOK-v1.md rename to docs/archive/v1/LABELING-CODEBOOK-v1.md diff --git a/docs/NARRATIVE-v1.md b/docs/archive/v1/NARRATIVE-v1.md similarity index 100% rename from docs/NARRATIVE-v1.md rename to docs/archive/v1/NARRATIVE-v1.md diff --git a/docs/POST-LABELING-PLAN.md b/docs/archive/v1/POST-LABELING-PLAN.md similarity index 100% rename from docs/POST-LABELING-PLAN.md rename to docs/archive/v1/POST-LABELING-PLAN.md diff --git a/docs/T5-ANALYSIS.md b/docs/archive/v1/T5-ANALYSIS.md similarity index 100% rename from docs/T5-ANALYSIS.md rename to docs/archive/v1/T5-ANALYSIS.md diff --git a/docs/V35-ITERATION-LOG.md b/docs/archive/v1/V35-ITERATION-LOG.md similarity index 100% rename from docs/V35-ITERATION-LOG.md rename to docs/archive/v1/V35-ITERATION-LOG.md diff --git a/docs/EDGAR-FILING-GENERATORS.md b/docs/data-pipeline/EDGAR-FILING-GENERATORS.md similarity index 100% rename from docs/EDGAR-FILING-GENERATORS.md rename to docs/data-pipeline/EDGAR-FILING-GENERATORS.md diff --git a/docs/SEC-HTML-CLEANING.md b/docs/data-pipeline/SEC-HTML-CLEANING.md similarity index 100% rename from docs/SEC-HTML-CLEANING.md rename to docs/data-pipeline/SEC-HTML-CLEANING.md diff --git a/docs/TECHNICAL-GUIDE.md b/docs/data-pipeline/TECHNICAL-GUIDE.md similarity index 100% rename from docs/TECHNICAL-GUIDE.md rename to docs/data-pipeline/TECHNICAL-GUIDE.md diff --git a/docs/DAPT-PROCEDURE.md b/docs/training/DAPT-PROCEDURE.md similarity index 100% rename from docs/DAPT-PROCEDURE.md rename to docs/training/DAPT-PROCEDURE.md diff --git a/docs/DATA-QUALITY-AUDIT.md b/docs/training/DATA-QUALITY-AUDIT.md similarity index 100% rename from docs/DATA-QUALITY-AUDIT.md rename to docs/training/DATA-QUALITY-AUDIT.md diff --git a/docs/STRATEGY-NOTES.md b/docs/training/STRATEGY-NOTES.md similarity index 93% rename from docs/STRATEGY-NOTES.md rename to docs/training/STRATEGY-NOTES.md index 76524ce..29058c9 100644 --- a/docs/STRATEGY-NOTES.md +++ b/docs/training/STRATEGY-NOTES.md @@ -194,3 +194,18 @@ Option 2 is defensible: "Human inter-annotator agreement on specificity (alpha=0 The F1 threshold is achievable. The project is strong. The specificity distribution is the only structural problem, and it's fixable by aligning the codebook with the professor's construct (which we drifted from by being too precise). Everything else — the T5 ambiguity, the representative sample, the small classes — is manageable. The worst thing to do right now is panic and pivot. The second worst thing is to agonize and delay. Pick a path, execute, get real numbers. + +--- + +## Decision Made: Option A (executed) + +**Chosen:** Option A — broaden Level 2 + loosen Level 4 to 1+ QV fact, full v2 codebook reboot. + +**What happened:** +- v2 codebook approved 2026-04-04 (5/6 group approval) +- Stage 1 re-run: Grok 4.1 Fast ×3 self-consistency panel, $135.51 +- Specificity distribution shifted to L1=41.1%, L2=22.7%, L3=24.9%, L4=11.4% — healthy +- Independent threshold heads replaced CORAL, solving the spec F1 bottleneck (0.517 → 0.945) +- Final model: Cat F1=0.943, Spec F1=0.945, both well above 0.80 target + +See `docs/STATUS.md` for full pipeline status and `docs/SPECIFICITY-IMPROVEMENT-PLAN.md` for the architecture iteration.