updating narrative at checkpoint resume

This commit is contained in:
Joey Eamigh 2026-03-30 15:20:59 -04:00
parent 669632af7b
commit 1dce1ccb73
No known key found for this signature in database
GPG Key ID: CE8C05DFFC53C9CB
2 changed files with 10 additions and 6 deletions

View File

@ -743,12 +743,12 @@ Full procedure, optimization journey, and cloud cost analysis in `docs/DAPT-PROC
### Early Training Results
First eval at step 54 (~3% through):
- **Loss: 0.80** — the model already knows English, so loss starts low. For comparison, a randomly initialized model would start at ~10.8. The loss reflects the model's ability to predict masked SEC filing tokens from context.
- **grad_norm: 0.066** — very small, indicating gentle weight updates. Healthy sign.
- **learning_rate: 2.66e-5** — still in warmup phase (first 93 steps, 5% of training).
| Step | Loss | grad_norm | LR | Epoch | Note |
|------|------|-----------|-----|-------|------|
| 54 | 0.7991 | 0.066 | 2.66e-5 | 0.03 | Still in warmup (first 93 steps) |
| 1280 | 0.7233 | 0.068 | 1.57e-5 | 0.70 | 70% through, steady decline |
Expected trajectory: loss drifts from ~0.80 to ~0.55-0.65 over the run. This is not the dramatic loss curve of fine-tuning — DAPT is nudging a capable language model toward SEC-specific vocabulary and co-occurrence patterns, not teaching it a new task from scratch.
The loss dropped 0.076 over ~1,200 steps — a gentle, steady downward drift. For comparison, a randomly initialized model would start at ~10.8 (ln(50280 vocab size)). Starting at 0.80 reflects that ModernBERT already knows English; the model is learning SEC-specific token co-occurrence patterns, not language fundamentals. grad_norm remained stable at ~0.07 throughout, indicating healthy, non-volatile weight updates.
### TAPT Planning

View File

@ -154,7 +154,11 @@ def train(config: DAPTConfig) -> None:
)
# Train (with optional checkpoint resume)
trainer.train(resume_from_checkpoint=config.training.resume_from_checkpoint)
# Auto-detect checkpoint for resume (True = find latest in output_dir)
resume = config.training.resume_from_checkpoint
if resume is None and any((output_dir).glob("checkpoint-*")):
resume = True
trainer.train(resume_from_checkpoint=resume)
# Save final model + tokenizer
final_dir = output_dir / "final"