Joey Eamigh
e4d341cecc
data: 2 new
2026-04-06 15:50:30 -04:00
Joey Eamigh
4f5c88d94a
trying ensenble and nofilter versions of the model
2026-04-06 15:50:15 -04:00
Joey Eamigh
745172adb8
docs restructuring
2026-04-05 21:00:40 -04:00
Joey Eamigh
3e010a6d0c
update readme to show git lfs
2026-04-05 20:33:57 -04:00
Joey Eamigh
dac00f90db
data: 246 new
2026-04-05 18:44:40 -04:00
Joey Eamigh
6f47185af9
data: add 241 data files to LFS store
...
Initial migration of data/ to .lfs-store/ with zstd-19 compression
and xxh3 per-file change detection manifest.
2026-04-05 16:24:49 -04:00
Joey Eamigh
823b476f2a
data: 241 new
2026-04-05 16:23:49 -04:00
Joey Eamigh
a5f06f2db7
infra: migrate from DVC to Git LFS with xxh3 change detection
...
Replace DVC pipeline with Git LFS on self-hosted Gitea. New scripts
use per-file xxh3 hashing for change detection and parallel zstd-19
compression. Supports separate data/checkpoint push modes.
2026-04-05 16:21:14 -04:00
Joey Eamigh
2e932bc327
working model!!!!!
2026-04-05 15:37:50 -04:00
Joey Eamigh
41df5923f2
adding old strategy notes for posterity (need to clean git history anyway)
2026-04-05 12:17:03 -04:00
Joey Eamigh
42f8849b14
first finetune attempt
2026-04-05 12:16:16 -04:00
Joey Eamigh
531317f7d4
corpus labeled
2026-04-05 01:30:39 -04:00
Joey Eamigh
6f4d6c57a4
labelapp updates v2
2026-04-05 00:55:53 -04:00
Joey Eamigh
160adc42ab
v2 holdout
2026-04-04 22:49:24 -04:00
Joey Eamigh
1f2d748a1d
new codebook and ethos
2026-04-04 15:01:20 -04:00
Joey Eamigh
d653ed9a20
pivot point
2026-04-03 14:43:53 -04:00
Joey Eamigh
26367a8e86
analyze gold
2026-04-02 09:28:44 -04:00
Joey Eamigh
c9497f5709
6 model panel benchmark
2026-04-02 02:02:36 -04:00
Joey Eamigh
e2c7a21c99
human labeling done
2026-04-02 00:28:31 -04:00
Joey Eamigh
b4319845e4
signoff deliverable draft
2026-03-31 16:58:29 -04:00
Joey Eamigh
96246d0197
docs & reference files
2026-03-31 16:27:47 -04:00
Joey Eamigh
32cd5ecfa8
opus golden set scaffolding
2026-03-30 22:02:52 -04:00
Joey Eamigh
7b660fe361
roll back to python 3.13 to fix everything lol
2026-03-30 21:25:46 -04:00
Joey Eamigh
8190950f1a
fix some bugs for tapt
2026-03-30 20:44:10 -04:00
Joey Eamigh
75ab92628b
wrong pytorch somehow installed itself again lol
2026-03-30 20:15:16 -04:00
Joey Eamigh
3292980d33
tapt setup
2026-03-30 19:46:20 -04:00
Joey Eamigh
c0273c9e2e
adding dvc backend so data can be cleanly pulled
2026-03-30 16:53:35 -04:00
Joey Eamigh
1dce1ccb73
updating narrative at checkpoint resume
2026-03-30 15:20:59 -04:00
Joey Eamigh
669632af7b
tweaking checkpoint saves
2026-03-30 11:41:20 -04:00
Joey Eamigh
313e14fb96
decisions for TAPT
2026-03-30 00:33:28 -04:00
Joey Eamigh
fe5155ab6d
pretraining config for run
2026-03-29 23:55:49 -04:00
Joey Eamigh
3b06d19d46
fix labelapp cheat sheet sidebar scroll
2026-03-29 22:45:00 -04:00
Joey Eamigh
e5f89ffabb
caching in the pipelines
2026-03-29 21:17:50 -04:00
Joey Eamigh
99cf4a606c
thread tokenization and chunking
2026-03-29 21:03:11 -04:00
Joey Eamigh
9d41dd199f
DAPT and precleaning for DAPT
2026-03-29 20:33:39 -04:00
Joey Eamigh
c4d7732c87
initial dapt prep
2026-03-29 17:05:48 -04:00
Joey Eamigh
273a862cdf
fix migration system
2026-03-29 16:59:51 -04:00
Joey Eamigh
ca4bc288c9
labelapp timing fixes and migration
2026-03-29 16:37:51 -04:00
Joey Eamigh
a9a7d59603
DAPT/TAPT scaffolding
2026-03-29 16:12:19 -04:00
Joey Eamigh
99762c8ab3
nextjs entrypoint
2026-03-29 14:38:26 -04:00
Joey Eamigh
11cb91564b
fix docker scripts
2026-03-29 01:17:00 -04:00
Joey Eamigh
8e773d5335
deployment and minor tweaks
2026-03-29 01:15:37 -04:00
Joey Eamigh
8c496ededa
labelapp onboarding
2026-03-29 00:59:58 -04:00
Joey Eamigh
3505c45cdc
labelapp v1
2026-03-29 00:32:24 -04:00
Joey Eamigh
3260a9c5d9
labelapp scaffold
2026-03-28 23:44:37 -04:00
Joey Eamigh
b1503a1942
init pg db for labelapp
2026-03-28 23:26:56 -04:00
Joey Eamigh
48e488933a
updating narrative and plan
2026-03-28 22:54:32 -04:00
Joey Eamigh
78d1f978de
initial scrape and tag
2026-03-28 20:39:36 -04:00