From 1dce1ccb73e274f33e4b65c913e0e13c86603137 Mon Sep 17 00:00:00 2001
From: Joey Eamigh <55670930+JoeyEamigh@users.noreply.github.com>
Date: Mon, 30 Mar 2026 15:20:59 -0400
Subject: [PATCH] updating narrative at checkpoint resume

---
 docs/NARRATIVE.md        | 10 +++++-----
 python/src/dapt/train.py |  6 +++++-
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/docs/NARRATIVE.md b/docs/NARRATIVE.md
index 0c74a40..49dc365 100644
--- a/docs/NARRATIVE.md
+++ b/docs/NARRATIVE.md
@@ -743,12 +743,12 @@ Full procedure, optimization journey, and cloud cost analysis in `docs/DAPT-PROC
 
 ### Early Training Results
 
-First eval at step 54 (~3% through):
-- **Loss: 0.80** — the model already knows English, so loss starts low. For comparison, a randomly initialized model would start at ~10.8. The loss reflects the model's ability to predict masked SEC filing tokens from context.
-- **grad_norm: 0.066** — very small, indicating gentle weight updates. Healthy sign.
-- **learning_rate: 2.66e-5** — still in warmup phase (first 93 steps, 5% of training).
+| Step | Loss | grad_norm | LR | Epoch | Note |
+|------|------|-----------|-----|-------|------|
+| 54 | 0.7991 | 0.066 | 2.66e-5 | 0.03 | Still in warmup (first 93 steps) |
+| 1280 | 0.7233 | 0.068 | 1.57e-5 | 0.70 | 70% through, steady decline |
 
-Expected trajectory: loss drifts from ~0.80 to ~0.55-0.65 over the run. This is not the dramatic loss curve of fine-tuning — DAPT is nudging a capable language model toward SEC-specific vocabulary and co-occurrence patterns, not teaching it a new task from scratch.
+The loss dropped 0.076 over ~1,200 steps — a gentle, steady downward drift. For comparison, a randomly initialized model would start at ~10.8 (ln(50280 vocab size)). Starting at 0.80 reflects that ModernBERT already knows English; the model is learning SEC-specific token co-occurrence patterns, not language fundamentals. grad_norm remained stable at ~0.07 throughout, indicating healthy, non-volatile weight updates.
 
 ### TAPT Planning
 
diff --git a/python/src/dapt/train.py b/python/src/dapt/train.py
index 92d7476..10d774e 100644
--- a/python/src/dapt/train.py
+++ b/python/src/dapt/train.py
@@ -154,7 +154,11 @@ def train(config: DAPTConfig) -> None:
     )
 
     # Train (with optional checkpoint resume)
-    trainer.train(resume_from_checkpoint=config.training.resume_from_checkpoint)
+    # Auto-detect checkpoint for resume (True = find latest in output_dir)
+    resume = config.training.resume_from_checkpoint
+    if resume is None and any((output_dir).glob("checkpoint-*")):
+        resume = True
+    trainer.train(resume_from_checkpoint=resume)
 
     # Save final model + tokenizer
     final_dir = output_dir / "final"