Phase 10.8: torchao/bnb quant sweep on iter1-independent. bf16 already
optimal; torchao int8-wo gives -19% VRAM at no F1 cost; all 4-bit
variants collapse (ModernBERT-large too quant-sensitive).
Phase 10.9: ONNX export + ORT eval. Legacy exporter only working path
(dynamo adds 56 Memcpy nodes); ORT fp32 -22% latency vs torch via
kernel fusion but bf16+flash-attn-2 still wins; fp16 broken on rotary;
dynamic int8 silently CPU-fallback + 0.5 F1 collapse.
Driver scripts wired to bun run py:quant / py:onnx; full reports at
results/eval/{quant,onnx}/REPORT.md.
61 lines
710 B
Plaintext
61 lines
710 B
Plaintext
# Data (working copies — compressed copies tracked via Git LFS in .lfs-store/)
|
|
/data/
|
|
/models/
|
|
/checkpoints/
|
|
/results/eval/onnx/models/
|
|
*.tar.zst
|
|
*.onnx
|
|
*.onnx.data
|
|
|
|
# Dependencies
|
|
ts/node_modules/
|
|
ts/bun.lock
|
|
|
|
# Python
|
|
python/.venv/
|
|
python/uv.lock
|
|
__pycache__/
|
|
*.pyc
|
|
|
|
# Editor
|
|
.vscode/
|
|
.idea/
|
|
|
|
# OS
|
|
.DS_Store
|
|
|
|
# dependencies (bun install)
|
|
node_modules
|
|
|
|
# output
|
|
out
|
|
dist
|
|
*.tgz
|
|
|
|
# code coverage
|
|
coverage
|
|
*.lcov
|
|
|
|
# logs
|
|
logs
|
|
_.log
|
|
report.[0-9]_.[0-9]_.[0-9]_.[0-9]_.json
|
|
|
|
# dotenv environment variable files
|
|
.env
|
|
.env.development.local
|
|
.env.test.local
|
|
.env.production.local
|
|
.env.local
|
|
|
|
# caches
|
|
.eslintcache
|
|
.cache
|
|
*.tsbuildinfo
|
|
unsloth_compiled_cache/
|
|
|
|
# Finder (MacOS) folder config
|
|
.DS_Store
|
|
python/*.whl
|
|
|