SEC-cyBERT/ts/data/pilot/pilot-report.txt

SEC-cyBERT Prompt Pilot Report — 2026-03-28T05:14:43.318Z
Prompt version: v1.0
Sample: 0 paragraphs, seed=42
Models: google/gemini-3.1-flash-lite-preview, xiaomi/mimo-v2-flash, x-ai/grok-4.1-fast
Total cost: $0.0000

═══ PER-MODEL STATS ═══

google/gemini-3.1-flash-lite-preview:
  Cost: $0.0000 ($NaN/para)
  Tokens: 0 in, 0 out, 0 reasoning
  Avg latency: 0ms
  Categories:
  Specificity:
  Category confidence:
  Specificity confidence:

xiaomi/mimo-v2-flash:
  Cost: $0.0000 ($NaN/para)
  Tokens: 0 in, 0 out, 0 reasoning
  Avg latency: 0ms
  Categories:
  Specificity:
  Category confidence:
  Specificity confidence:

x-ai/grok-4.1-fast:
  Cost: $0.0000 ($NaN/para)
  Tokens: 0 in, 0 out, 0 reasoning
  Avg latency: 0ms
  Categories:
  Specificity:
  Category confidence:
  Specificity confidence:


═══ AGREEMENT ANALYSIS ═══
Paragraphs with all 3 models: 0

Content Category Agreement:
  3/3 unanimous: 0/0 (NaN%)
  2/3 majority:  0/0 (NaN%)
  All disagree:  0/0 (NaN%)

Specificity Level Agreement:
  3/3 unanimous: 0/0 (NaN%)
  2/3 majority:  0/0 (NaN%)
  All disagree:  0/0 (NaN%)

Both dimensions 3/3: 0/0 (NaN%)

Pairwise category agreement:
  gemini-3.1-flash-lite-preview × mimo-v2-flash: 0/0 (NaN%)
  gemini-3.1-flash-lite-preview × grok-4.1-fast: 0/0 (NaN%)
  mimo-v2-flash × grok-4.1-fast: 0/0 (NaN%)


═══ DISAGREEMENT DETAILS ═══


═══ COST PROJECTIONS (50K paragraphs) ═══
  TOTAL Stage 1 (all 3 models): ~$NaN

  Observed disagreement rate: NaN%
  Estimated Stage 2 judge calls: ~NaN
  (Judge cost depends on Sonnet 4.6 pricing — see OpenRouter)