SEC-cyBERT/CLAUDE.md
2026-04-05 00:55:53 -04:00

3.3 KiB

sec-cyBERT

Bun workspace monorepo. Three packages:

  • packages/schemas/ — shared Zod schemas (@sec-cybert/schemas). Import directly by path: from "@sec-cybert/schemas/label.ts"
  • ts/ — GenAI labeling pipeline (CLI scripts, Vercel AI SDK, OpenRouter)
  • labelapp/ — Next.js human labeling webapp (Drizzle, Postgres, shadcn/ui, Playwright)

Quick reference

What Where
Shared schemas (Zod) packages/schemas/src/
Labeling codebook (source of truth for all category/specificity definitions) docs/LABELING-CODEBOOK.md
Codebook ethos (reasoning behind every codebook decision) docs/CODEBOOK-ETHOS.md
Project narrative (decisions, roadblocks, lessons) docs/NARRATIVE.md
Project status & todo list docs/STATUS.md
v1 codebook (preserved) docs/LABELING-CODEBOOK-v1.md
v1 narrative (preserved) docs/NARRATIVE-v1.md
Implementation plan for labelapp docs/labelapp-plan.md
Labelapp-specific agent guide labelapp/AGENTS.md
Docker compose (Postgres) docker-compose.yaml (root)
DB credentials sec_cybert / sec_cybert / sec_cybert on localhost:5432

Root scripts

All commands run from repo root via bun run <script>. No need to cd into subpackages.

Labelapp (la:*)

Script What it does
la:dev Start Next.js dev server (Turbopack)
la:build Production build
la:typecheck TypeScript type-check
la:lint ESLint
la:test API tests + Playwright E2E
la:test:api API tests only (bun test)
la:test:e2e Playwright E2E only
la:db:generate Generate Drizzle migration
la:db:migrate Apply Drizzle migrations
la:db:studio Drizzle Studio (DB browser)
la:seed Seed paragraphs + annotations
la:assign Generate annotator assignments
la:export Export labels
la:docker Build + push Docker image

GenAI pipeline (ts:*)

Script What it does
ts:sec CLI entrypoint (bun run ts/src/cli.ts)
ts:typecheck TypeScript type-check

Python training (py:*)

Script What it does
py:train CLI entrypoint (uv run main.py — pass subcommand as arg, e.g. bun run py:train dapt --config ...)

Data management (data:*)

Script What it does
data:push Compress data/.dvc-store/, DVC add + push to R2
data:pull DVC pull from R2 + decompress into data/
data:package Build standalone .tar.zst archives for submission

Cross-package

Script What it does
typecheck Type-check all TS packages in parallel

Rules

  • bun for all JS/TS. uv for Python.
  • No barrel files. Direct path-based imports only.
  • No TODO comments. Finish what you start.
  • No parallel codepaths. Find and extend existing code before writing new.
  • Schemas live in packages/schemas/ — do not duplicate type definitions elsewhere.
  • labelapp/ uses flat layout (no src/ dir): app/, db/, lib/, components/ at root.
  • labelapp/ uses file-based Drizzle migrations (drizzle-kit generate + drizzle-kit migrate), not push.
  • Tests: bun test for backend route integration (__test__/ dirs adjacent to routes), Playwright for E2E (tests/).