From 1dce1ccb73e274f33e4b65c913e0e13c86603137 Mon Sep 17 00:00:00 2001 From: Joey Eamigh <55670930+JoeyEamigh@users.noreply.github.com> Date: Mon, 30 Mar 2026 15:20:59 -0400 Subject: [PATCH] updating narrative at checkpoint resume --- docs/NARRATIVE.md | 10 +++++----- python/src/dapt/train.py | 6 +++++- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/docs/NARRATIVE.md b/docs/NARRATIVE.md index 0c74a40..49dc365 100644 --- a/docs/NARRATIVE.md +++ b/docs/NARRATIVE.md @@ -743,12 +743,12 @@ Full procedure, optimization journey, and cloud cost analysis in `docs/DAPT-PROC ### Early Training Results -First eval at step 54 (~3% through): -- **Loss: 0.80** — the model already knows English, so loss starts low. For comparison, a randomly initialized model would start at ~10.8. The loss reflects the model's ability to predict masked SEC filing tokens from context. -- **grad_norm: 0.066** — very small, indicating gentle weight updates. Healthy sign. -- **learning_rate: 2.66e-5** — still in warmup phase (first 93 steps, 5% of training). +| Step | Loss | grad_norm | LR | Epoch | Note | +|------|------|-----------|-----|-------|------| +| 54 | 0.7991 | 0.066 | 2.66e-5 | 0.03 | Still in warmup (first 93 steps) | +| 1280 | 0.7233 | 0.068 | 1.57e-5 | 0.70 | 70% through, steady decline | -Expected trajectory: loss drifts from ~0.80 to ~0.55-0.65 over the run. This is not the dramatic loss curve of fine-tuning — DAPT is nudging a capable language model toward SEC-specific vocabulary and co-occurrence patterns, not teaching it a new task from scratch. +The loss dropped 0.076 over ~1,200 steps — a gentle, steady downward drift. For comparison, a randomly initialized model would start at ~10.8 (ln(50280 vocab size)). Starting at 0.80 reflects that ModernBERT already knows English; the model is learning SEC-specific token co-occurrence patterns, not language fundamentals. grad_norm remained stable at ~0.07 throughout, indicating healthy, non-volatile weight updates. ### TAPT Planning diff --git a/python/src/dapt/train.py b/python/src/dapt/train.py index 92d7476..10d774e 100644 --- a/python/src/dapt/train.py +++ b/python/src/dapt/train.py @@ -154,7 +154,11 @@ def train(config: DAPTConfig) -> None: ) # Train (with optional checkpoint resume) - trainer.train(resume_from_checkpoint=config.training.resume_from_checkpoint) + # Auto-detect checkpoint for resume (True = find latest in output_dir) + resume = config.training.resume_from_checkpoint + if resume is None and any((output_dir).glob("checkpoint-*")): + resume = True + trainer.train(resume_from_checkpoint=resume) # Save final model + tokenizer final_dir = output_dir / "final"