82 lines
3.1 KiB
Markdown
82 lines
3.1 KiB
Markdown
# sec-cyBERT
|
|
|
|
Bun workspace monorepo. Three packages:
|
|
|
|
- `packages/schemas/` — shared Zod schemas (`@sec-cybert/schemas`). Import directly by path: `from "@sec-cybert/schemas/label.ts"`
|
|
- `ts/` — GenAI labeling pipeline (CLI scripts, Vercel AI SDK, OpenRouter)
|
|
- `labelapp/` — Next.js human labeling webapp (Drizzle, Postgres, shadcn/ui, Playwright)
|
|
|
|
## Quick reference
|
|
|
|
| What | Where |
|
|
|------|-------|
|
|
| Shared schemas (Zod) | `packages/schemas/src/` |
|
|
| Labeling codebook (source of truth for all category/specificity definitions) | `docs/LABELING-CODEBOOK.md` |
|
|
| Project narrative (decisions, roadblocks, lessons) | `docs/NARRATIVE.md` |
|
|
| Implementation plan for labelapp | `docs/labelapp-plan.md` |
|
|
| Labelapp-specific agent guide | `labelapp/AGENTS.md` |
|
|
| Docker compose (Postgres) | `docker-compose.yaml` (root) |
|
|
| DB credentials | `sec_cybert` / `sec_cybert` / `sec_cybert` on localhost:5432 |
|
|
|
|
## Root scripts
|
|
|
|
All commands run from repo root via `bun run <script>`. No need to cd into subpackages.
|
|
|
|
### Labelapp (`la:*`)
|
|
|
|
| Script | What it does |
|
|
|--------|-------------|
|
|
| `la:dev` | Start Next.js dev server (Turbopack) |
|
|
| `la:build` | Production build |
|
|
| `la:typecheck` | TypeScript type-check |
|
|
| `la:lint` | ESLint |
|
|
| `la:test` | API tests + Playwright E2E |
|
|
| `la:test:api` | API tests only (`bun test`) |
|
|
| `la:test:e2e` | Playwright E2E only |
|
|
| `la:db:generate` | Generate Drizzle migration |
|
|
| `la:db:migrate` | Apply Drizzle migrations |
|
|
| `la:db:studio` | Drizzle Studio (DB browser) |
|
|
| `la:seed` | Seed paragraphs + annotations |
|
|
| `la:sample` | Run paragraph sampling |
|
|
| `la:assign` | Generate annotator assignments |
|
|
| `la:export` | Export labels |
|
|
| `la:docker` | Build + push Docker image |
|
|
|
|
### GenAI pipeline (`ts:*`)
|
|
|
|
| Script | What it does |
|
|
|--------|-------------|
|
|
| `ts:sec` | CLI entrypoint (`bun run ts/src/cli.ts`) |
|
|
| `ts:typecheck` | TypeScript type-check |
|
|
|
|
### Python training (`py:*`)
|
|
|
|
| Script | What it does |
|
|
|--------|-------------|
|
|
| `py:train` | CLI entrypoint (`uv run main.py` — pass subcommand as arg, e.g. `bun run py:train dapt --config ...`) |
|
|
|
|
### Data management (`data:*`)
|
|
|
|
| Script | What it does |
|
|
|--------|-------------|
|
|
| `data:push` | Compress `data/` → `.dvc-store/`, DVC add + push to R2 |
|
|
| `data:pull` | DVC pull from R2 + decompress into `data/` |
|
|
| `data:package` | Build standalone `.tar.zst` archives for submission |
|
|
|
|
### Cross-package
|
|
|
|
| Script | What it does |
|
|
|--------|-------------|
|
|
| `typecheck` | Type-check all TS packages in parallel |
|
|
|
|
## Rules
|
|
|
|
- `bun` for all JS/TS. `uv` for Python.
|
|
- No barrel files. Direct path-based imports only.
|
|
- No TODO comments. Finish what you start.
|
|
- No parallel codepaths. Find and extend existing code before writing new.
|
|
- Schemas live in `packages/schemas/` — do not duplicate type definitions elsewhere.
|
|
- `labelapp/` uses flat layout (no `src/` dir): `app/`, `db/`, `lib/`, `components/` at root.
|
|
- `labelapp/` uses file-based Drizzle migrations (`drizzle-kit generate` + `drizzle-kit migrate`), not `push`.
|
|
- Tests: `bun test` for backend route integration (`__test__/` dirs adjacent to routes), Playwright for E2E (`tests/`).
|