SEC-cyBERT

Author	SHA1	Message	Date
Joey Eamigh	67beaede45	quantization + onnx sweeps Phase 10.8: torchao/bnb quant sweep on iter1-independent. bf16 already optimal; torchao int8-wo gives -19% VRAM at no F1 cost; all 4-bit variants collapse (ModernBERT-large too quant-sensitive). Phase 10.9: ONNX export + ORT eval. Legacy exporter only working path (dynamo adds 56 Memcpy nodes); ORT fp32 -22% latency vs torch via kernel fusion but bf16+flash-attn-2 still wins; fp16 broken on rotary; dynamic int8 silently CPU-fallback + 0.5 F1 collapse. Driver scripts wired to bun run py:quant / py:onnx; full reports at results/eval/{quant,onnx}/REPORT.md.	2026-04-07 05:10:38 -04:00
Joey Eamigh	6f47185af9	data: add 241 data files to LFS store Initial migration of data/ to .lfs-store/ with zstd-19 compression and xxh3 per-file change detection manifest.	2026-04-05 16:24:49 -04:00
Joey Eamigh	a5f06f2db7	infra: migrate from DVC to Git LFS with xxh3 change detection Replace DVC pipeline with Git LFS on self-hosted Gitea. New scripts use per-file xxh3 hashing for change detection and parallel zstd-19 compression. Supports separate data/checkpoint push modes.	2026-04-05 16:21:14 -04:00
Joey Eamigh	41df5923f2	adding old strategy notes for posterity (need to clean git history anyway)	2026-04-05 12:17:03 -04:00
Joey Eamigh	42f8849b14	first finetune attempt	2026-04-05 12:16:16 -04:00
Joey Eamigh	d653ed9a20	pivot point	2026-04-03 14:43:53 -04:00
Joey Eamigh	c0273c9e2e	adding dvc backend so data can be cleanly pulled	2026-03-30 16:53:35 -04:00
Joey Eamigh	fe5155ab6d	pretraining config for run	2026-03-29 23:55:49 -04:00
Joey Eamigh	a9a7d59603	DAPT/TAPT scaffolding	2026-03-29 16:12:19 -04:00
Joey Eamigh	3260a9c5d9	labelapp scaffold	2026-03-28 23:44:37 -04:00
Joey Eamigh	78d1f978de	initial scrape and tag	2026-03-28 20:39:36 -04:00

11 Commits