3.3 KiB
3.3 KiB
sec-cyBERT
Bun workspace monorepo. Three packages:
packages/schemas/— shared Zod schemas (@sec-cybert/schemas). Import directly by path:from "@sec-cybert/schemas/label.ts"ts/— GenAI labeling pipeline (CLI scripts, Vercel AI SDK, OpenRouter)labelapp/— Next.js human labeling webapp (Drizzle, Postgres, shadcn/ui, Playwright)
Quick reference
| What | Where |
|---|---|
| Shared schemas (Zod) | packages/schemas/src/ |
| Labeling codebook (source of truth for all category/specificity definitions) | docs/LABELING-CODEBOOK.md |
| Codebook ethos (reasoning behind every codebook decision) | docs/CODEBOOK-ETHOS.md |
| Project narrative (decisions, roadblocks, lessons) | docs/NARRATIVE.md |
| Project status & todo list | docs/STATUS.md |
| v1 codebook (preserved) | docs/LABELING-CODEBOOK-v1.md |
| v1 narrative (preserved) | docs/NARRATIVE-v1.md |
| Implementation plan for labelapp | docs/labelapp-plan.md |
| Labelapp-specific agent guide | labelapp/AGENTS.md |
| Docker compose (Postgres) | docker-compose.yaml (root) |
| DB credentials | sec_cybert / sec_cybert / sec_cybert on localhost:5432 |
Root scripts
All commands run from repo root via bun run <script>. No need to cd into subpackages.
Labelapp (la:*)
| Script | What it does |
|---|---|
la:dev |
Start Next.js dev server (Turbopack) |
la:build |
Production build |
la:typecheck |
TypeScript type-check |
la:lint |
ESLint |
la:test |
API tests + Playwright E2E |
la:test:api |
API tests only (bun test) |
la:test:e2e |
Playwright E2E only |
la:db:generate |
Generate Drizzle migration |
la:db:migrate |
Apply Drizzle migrations |
la:db:studio |
Drizzle Studio (DB browser) |
la:seed |
Seed paragraphs + annotations |
la:sample |
Run paragraph sampling |
la:assign |
Generate annotator assignments |
la:export |
Export labels |
la:docker |
Build + push Docker image |
GenAI pipeline (ts:*)
| Script | What it does |
|---|---|
ts:sec |
CLI entrypoint (bun run ts/src/cli.ts) |
ts:typecheck |
TypeScript type-check |
Python training (py:*)
| Script | What it does |
|---|---|
py:train |
CLI entrypoint (uv run main.py — pass subcommand as arg, e.g. bun run py:train dapt --config ...) |
Data management (data:*)
| Script | What it does |
|---|---|
data:push |
Compress data/ → .dvc-store/, DVC add + push to R2 |
data:pull |
DVC pull from R2 + decompress into data/ |
data:package |
Build standalone .tar.zst archives for submission |
Cross-package
| Script | What it does |
|---|---|
typecheck |
Type-check all TS packages in parallel |
Rules
bunfor all JS/TS.uvfor Python.- No barrel files. Direct path-based imports only.
- No TODO comments. Finish what you start.
- No parallel codepaths. Find and extend existing code before writing new.
- Schemas live in
packages/schemas/— do not duplicate type definitions elsewhere. labelapp/uses flat layout (nosrc/dir):app/,db/,lib/,components/at root.labelapp/uses file-based Drizzle migrations (drizzle-kit generate+drizzle-kit migrate), notpush.- Tests:
bun testfor backend route integration (__test__/dirs adjacent to routes), Playwright for E2E (tests/).