updating narrative at checkpoint resume
This commit is contained in:
parent
669632af7b
commit
1dce1ccb73
@ -743,12 +743,12 @@ Full procedure, optimization journey, and cloud cost analysis in `docs/DAPT-PROC
|
|||||||
|
|
||||||
### Early Training Results
|
### Early Training Results
|
||||||
|
|
||||||
First eval at step 54 (~3% through):
|
| Step | Loss | grad_norm | LR | Epoch | Note |
|
||||||
- **Loss: 0.80** — the model already knows English, so loss starts low. For comparison, a randomly initialized model would start at ~10.8. The loss reflects the model's ability to predict masked SEC filing tokens from context.
|
|------|------|-----------|-----|-------|------|
|
||||||
- **grad_norm: 0.066** — very small, indicating gentle weight updates. Healthy sign.
|
| 54 | 0.7991 | 0.066 | 2.66e-5 | 0.03 | Still in warmup (first 93 steps) |
|
||||||
- **learning_rate: 2.66e-5** — still in warmup phase (first 93 steps, 5% of training).
|
| 1280 | 0.7233 | 0.068 | 1.57e-5 | 0.70 | 70% through, steady decline |
|
||||||
|
|
||||||
Expected trajectory: loss drifts from ~0.80 to ~0.55-0.65 over the run. This is not the dramatic loss curve of fine-tuning — DAPT is nudging a capable language model toward SEC-specific vocabulary and co-occurrence patterns, not teaching it a new task from scratch.
|
The loss dropped 0.076 over ~1,200 steps — a gentle, steady downward drift. For comparison, a randomly initialized model would start at ~10.8 (ln(50280 vocab size)). Starting at 0.80 reflects that ModernBERT already knows English; the model is learning SEC-specific token co-occurrence patterns, not language fundamentals. grad_norm remained stable at ~0.07 throughout, indicating healthy, non-volatile weight updates.
|
||||||
|
|
||||||
### TAPT Planning
|
### TAPT Planning
|
||||||
|
|
||||||
|
|||||||
@ -154,7 +154,11 @@ def train(config: DAPTConfig) -> None:
|
|||||||
)
|
)
|
||||||
|
|
||||||
# Train (with optional checkpoint resume)
|
# Train (with optional checkpoint resume)
|
||||||
trainer.train(resume_from_checkpoint=config.training.resume_from_checkpoint)
|
# Auto-detect checkpoint for resume (True = find latest in output_dir)
|
||||||
|
resume = config.training.resume_from_checkpoint
|
||||||
|
if resume is None and any((output_dir).glob("checkpoint-*")):
|
||||||
|
resume = True
|
||||||
|
trainer.train(resume_from_checkpoint=resume)
|
||||||
|
|
||||||
# Save final model + tokenizer
|
# Save final model + tokenizer
|
||||||
final_dir = output_dir / "final"
|
final_dir = output_dir / "final"
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user