SEC-cyBERT/results/eval/iter1-clspool/report_gpt-54.txt
2026-04-07 00:51:48 -04:00

55 lines
2.3 KiB
Plaintext

======================================================================
HOLDOUT EVALUATION: iter1-clspool vs GPT-5.4
======================================================================
Samples evaluated: 1200
Total inference time: 6.84s
Avg latency: 5.70ms/sample
Throughput: 175 samples/sec
──────────────────────────────────────────────────
CATEGORY CLASSIFICATION
──────────────────────────────────────────────────
Macro F1: 0.9296 ✓ (target: 0.80)
Weighted F1: 0.9307
Macro Prec: 0.9290
Macro Recall: 0.9334
MCC: 0.9179
AUC (OvR): 0.9911
ECE: 0.0556
Kripp Alpha: 0.9175
Category F1 Prec Recall
------------------------- -------- -------- --------
Board Governance 0.9518 0.9602 0.9435
Incident Disclosure 0.9540 0.9651 0.9432
Management Role 0.9290 0.9000 0.9600
None/Other 0.8800 0.8049 0.9706
Risk Management Process 0.8653 0.8883 0.8434
Strategy Integration 0.9652 0.9905 0.9412
Third-Party Risk 0.9621 0.9940 0.9322
──────────────────────────────────────────────────
SPECIFICITY CLASSIFICATION
──────────────────────────────────────────────────
Macro F1: 0.8920 ✓ (target: 0.80)
Weighted F1: 0.9098
Macro Prec: 0.9042
Macro Recall: 0.8836
MCC: 0.8634
AUC (OvR): 0.9778
QWK: 0.9225
MAE: 0.1275
ECE: 0.0766
Kripp Alpha: 0.9100
Level F1 Prec Recall
------------------------- -------- -------- --------
L1: Generic 0.9362 0.9230 0.9498
L2: Domain 0.8091 0.8865 0.7440
L3: Firm-Specific 0.8718 0.8423 0.9034
L4: Quantified 0.9510 0.9652 0.9372
======================================================================