SEC-cyBERT/results/eval/iter1-clspool/report_opus-46.txt
2026-04-07 00:51:48 -04:00

55 lines
2.3 KiB
Plaintext

======================================================================
HOLDOUT EVALUATION: iter1-clspool vs Opus-4.6
======================================================================
Samples evaluated: 1200
Total inference time: 6.84s
Avg latency: 5.70ms/sample
Throughput: 175 samples/sec
──────────────────────────────────────────────────
CATEGORY CLASSIFICATION
──────────────────────────────────────────────────
Macro F1: 0.9229 ✓ (target: 0.80)
Weighted F1: 0.9228
Macro Prec: 0.9183
Macro Recall: 0.9311
MCC: 0.9102
AUC (OvR): 0.9925
ECE: 0.0622
Kripp Alpha: 0.9096
Category F1 Prec Recall
------------------------- -------- -------- --------
Board Governance 0.9455 0.9204 0.9720
Incident Disclosure 0.9212 0.8837 0.9620
Management Role 0.9245 0.9187 0.9304
None/Other 0.9115 0.8476 0.9858
Risk Management Process 0.8529 0.9096 0.8028
Strategy Integration 0.9498 0.9905 0.9123
Third-Party Risk 0.9550 0.9578 0.9521
──────────────────────────────────────────────────
SPECIFICITY CLASSIFICATION
──────────────────────────────────────────────────
Macro F1: 0.8804 ✓ (target: 0.80)
Weighted F1: 0.8976
Macro Prec: 0.8892
Macro Recall: 0.8750
MCC: 0.8466
AUC (OvR): 0.9698
QWK: 0.9188
MAE: 0.1408
ECE: 0.0874
Kripp Alpha: 0.9041
Level F1 Prec Recall
------------------------- -------- -------- --------
L1: Generic 0.9267 0.9041 0.9504
L2: Domain 0.7972 0.8085 0.7862
L3: Firm-Specific 0.8465 0.9189 0.7846
L4: Quantified 0.9514 0.9254 0.9789
======================================================================