updating narrative at checkpoint resume

2026-03-30 15:20:59 -04:00 · 2026-03-30 15:20:59 -04:00 · 1dce1ccb73
commit 1dce1ccb73
parent 669632af7b
2 changed files with 10 additions and 6 deletions
--- a/docs/NARRATIVE.md
+++ b/docs/NARRATIVE.md
@ -743,12 +743,12 @@ Full procedure, optimization journey, and cloud cost analysis in `docs/DAPT-PROC
 ### Early Training Results
-First eval at step 54 (~3% through):
+| Step | Loss | grad_norm | LR | Epoch | Note |
- **Loss: 0.80** — the model already knows English, so loss starts low. For comparison, a randomly initialized model would start at ~10.8. The loss reflects the model's ability to predict masked SEC filing tokens from context.
+|------|------|-----------|-----|-------|------|
- **grad_norm: 0.066** — very small, indicating gentle weight updates. Healthy sign.
+| 54 | 0.7991 | 0.066 | 2.66e-5 | 0.03 | Still in warmup (first 93 steps) |
- **learning_rate: 2.66e-5** — still in warmup phase (first 93 steps, 5% of training).
+| 1280 | 0.7233 | 0.068 | 1.57e-5 | 0.70 | 70% through, steady decline |
-Expected trajectory: loss drifts from ~0.80 to ~0.55-0.65 over the run. This is not the dramatic loss curve of fine-tuning — DAPT is nudging a capable language model toward SEC-specific vocabulary and co-occurrence patterns, not teaching it a new task from scratch.
+The loss dropped 0.076 over ~1,200 steps — a gentle, steady downward drift. For comparison, a randomly initialized model would start at ~10.8 (ln(50280 vocab size)). Starting at 0.80 reflects that ModernBERT already knows English; the model is learning SEC-specific token co-occurrence patterns, not language fundamentals. grad_norm remained stable at ~0.07 throughout, indicating healthy, non-volatile weight updates.
 ### TAPT Planning
--- a/python/src/dapt/train.py
+++ b/python/src/dapt/train.py
@ -154,7 +154,11 @@ def train(config: DAPTConfig) -> None:
    )
    # Train (with optional checkpoint resume)
-    trainer.train(resume_from_checkpoint=config.training.resume_from_checkpoint)
+    # Auto-detect checkpoint for resume (True = find latest in output_dir)
    resume = config.training.resume_from_checkpoint
    if resume is None and any((output_dir).glob("checkpoint-*")):
        resume = True
    trainer.train(resume_from_checkpoint=resume)
    # Save final model + tokenizer
    final_dir = output_dir / "final"