diff --git a/.dvc-store.dvc b/.dvc-store.dvc index beaedb2..44b1443 100644 --- a/.dvc-store.dvc +++ b/.dvc-store.dvc @@ -1,6 +1,6 @@ outs: -- md5: c633654a20f23d76af34689f7e27d58a.dir - size: 729964105 - nfiles: 111 +- md5: 6147599f136e4781a2de20adcb2aba1f.dir + size: 737313104 + nfiles: 135 hash: md5 path: .dvc-store diff --git a/docs/POST-LABELING-PLAN.md b/docs/POST-LABELING-PLAN.md new file mode 100644 index 0000000..3a8ec0b --- /dev/null +++ b/docs/POST-LABELING-PLAN.md @@ -0,0 +1,111 @@ +# Post-Labeling Plan — Gold Set Repair & Final Pipeline + +Written 2026-04-01 while waiting for the last human annotator to finish. + +--- + +## The Situation + +Human labeling is nearly complete (1,200 paragraphs, 6 annotators, 3 per paragraph via BIBD). Current inter-annotator agreement: +- **Cohen's Kappa (avg):** 0.622 +- **Krippendorff's alpha:** 0.616 + +These numbers are at the floor of "substantial agreement" (Landis & Koch) but below the 0.667 threshold Krippendorff recommends for tentative conclusions. The holdout was deliberately stratified to over-sample hard cases (120 Management↔RMP splits, 80 None/Other↔Strategy splits, 80 Spec [3,4] splits, etc.), so raw consensus reflects sampling difficulty, not pure annotator quality. + +The task is genuinely hard: 7 categories, 4 specificity levels, 5 decision rules, 3 codebook rulings, multi-step reasoning required (person-vs-function test, QV fact counting). The GenAI panel struggled with the same boundaries. + +--- + +## Immediate Analysis (once last annotator finishes) + +1. **Export labels** from labelapp (`bun run la:export`) +2. **Per-dimension alpha:** Compute Krippendorff's alpha for category and specificity separately. Hypothesis: category alpha is significantly higher than specificity alpha (matching the GenAI pattern where Spec 4 was only 37.6% unanimous). +3. **Pairwise Kappa matrix:** All 15 annotator pairs. Identify if one annotator is a systematic outlier or if disagreement is uniform. +4. **Stratum-level agreement:** Break down consensus rates by sampling stratum (Management↔RMP, None/Other↔Strategy, Spec [3,4], proportional random, etc.). The hard strata should show lower agreement; the proportional random stratum should be higher. + +--- + +## The Adverse Incentive Problem + +The assignment requires F1 > 0.80 on the holdout to pass. This creates a perverse incentive: pick easy, unambiguous paragraphs for the holdout → high human agreement, high GenAI scores, high fine-tuned model F1 → passing grade, meaningless evaluation. + +We did the opposite: stratified to stress-test decision boundaries. This produces a harder holdout with lower headline numbers but an actually informative evaluation. + +**Mitigation:** Report F1 on both the full 1,200 holdout AND the 720-paragraph "proportional stratified random" subsample separately. The proportional subsample approximates what a random holdout would look like. The delta between the two quantifies exactly how much performance degrades at decision boundaries. This isn't gaming — it's rigorous reporting. + +The A-grade criteria ("error analysis," "comparison to amateur labels") are directly served by our approach. The low human agreement rate is a finding, not a failure. + +--- + +## Gold Set Repair Strategy: 13+ Signals Per Paragraph + +### Existing signals (7 per paragraph) +- 3 human labels (from labelapp, with notes and timing) +- 3 Stage 1 GenAI labels (gemini-flash-lite, mimo-v2-flash, grok-4.1-fast) +- 1 Opus golden label (with full reasoning trace) + +### New signals from GenAI benchmark (6+ additional) +The assignment requires benchmarking 6+ models from 3+ suppliers against the holdout. This serves triple duty: +1. Assignment deliverable (GenAI benchmark table) +2. Gold set repair evidence (6+ more annotation signals for adjudication) +3. "GenAI vs amateur" comparison (A-grade criterion) + +**Candidate models (6+ from 3+ suppliers):** +- OpenAI: gpt-5.4-mini, gpt-5.4 +- Google: gemini-3-flash, gemini-3-pro (or similar) +- Anthropic: claude-sonnet-4.6, claude-haiku-4.5 +- xAI: grok-4.20 (or similar) +- Others as needed for count + +After the benchmark, each paragraph has **13+ independent annotations**. This is an absurdly rich signal for adjudication. + +### Adjudication tiers + +**Tier 1 — High confidence:** 10+/13 annotators agree on both dimensions. Gold label, no intervention needed. Expected: ~500-600 paragraphs. + +**Tier 2 — Clear majority with cross-validation:** Human majority exists (2/3) and matches GenAI consensus (majority of 10 GenAI labels). Strong signal — take the consensus. Expected: ~300-400 paragraphs. + +**Tier 3 — Human split, GenAI consensus:** Humans disagree but GenAI labels converge. Use Opus reasoning trace + GenAI consensus to inform expert adjudication. Human (Joey) makes the final call. Expected: ~100-200 paragraphs. + +**Tier 4 — Universal disagreement:** Humans and GenAI both split. Genuinely ambiguous. Expert adjudication with documented reasoning, or flag as inherently ambiguous and report in error analysis. Expected: ~50-100 paragraphs. + +The GenAI labels are evidence for adjudication, not the gold label itself. The final label is always a human decision. This avoids circularity — we're not evaluating GenAI against GenAI-derived labels. We're using GenAI agreement patterns to identify which human label is most likely correct in cases of human disagreement. + +If we can't produce reliable gold labels from 13+ signals per paragraph, the construct itself is ill-defined. That would be an important finding too — but given that the GenAI panel achieved 70.8% both-unanimous on 50K paragraphs (unstratified), and the hardest axes have clear codebook resolutions, the construct should hold. + +--- + +## The Meta-Narrative + +The finding that trained student annotators achieve α = 0.616 while calibrated LLM panels achieve 70.8%+ unanimity on the same task validates the synthetic experts hypothesis. For complex, rule-heavy classification tasks requiring multi-step reasoning, LLMs with reasoning tokens can match or exceed human annotation quality. + +This isn't a failure of the humans — it's the whole point of the project. The Ringel pipeline exists because these tasks are too cognitively demanding for consistent human annotation at scale. The human labels are essential as a calibration anchor, but GenAI's advantage on rule-application tasks is a key finding. + +--- + +## Task Sequence (dependency order) + +### Can start now (no blockers) +- [ ] Judge prompt v3.0 update (codebook rulings → `buildJudgePrompt()`) +- [ ] Fine-tuning pipeline code (dual-head classifier, sample weighting, train/val/test split) +- [ ] GenAI benchmark infrastructure (scripts to run 6+ models on holdout) + +### After last annotator finishes +- [ ] Export + per-dimension alpha + pairwise Kappa matrix + stratum breakdown +- [ ] Run GenAI benchmark on 1,200 holdout (6+ models, 3+ suppliers) +- [ ] Gold set adjudication using 13+ signals per paragraph +- [ ] Judge v3.0 validation against adjudicated gold set + +### After gold set is finalized +- [ ] Training data assembly (unanimous + calibrated majority + judge) +- [ ] Fine-tuning + ablations (7 experiments) +- [ ] Final evaluation on holdout +- [ ] Writeup + IGNITE slides + +--- + +## Open Questions + +1. **F1 threshold per-dimension?** Worth asking Ringel if the 0.80 F1 requirement applies to the joint 28-class label or can be reported per-dimension (category + specificity separately). +2. **Soft labels for ambiguous cases?** For Tier 4 paragraphs, could use label distributions as soft targets during training instead of forcing a hard label. More sophisticated but harder to evaluate. +3. **One bad annotator vs. uniform disagreement?** The pairwise Kappa matrix will answer this. If one annotator is systematically off, their labels could be downweighted during adjudication. diff --git a/docs/reference/Ringel 2023 Synthetic Experts.pdf b/docs/reference/Ringel 2023 Synthetic Experts.pdf new file mode 100644 index 0000000..108e157 Binary files /dev/null and b/docs/reference/Ringel 2023 Synthetic Experts.pdf differ diff --git a/docs/signoff-deliverable.md b/docs/signoff-deliverable.md index 51f94b2..4369e61 100644 --- a/docs/signoff-deliverable.md +++ b/docs/signoff-deliverable.md @@ -37,11 +37,11 @@ Our construct of interest is **cybersecurity disclosure quality** in SEC filings ## 2. Sources and Citations -The construct is **theoretically grounded** in disclosure theory ([Verrecchia, 2001]()) and regulatory compliance as an information-provision mechanism. The SEC's final rule provides the taxonomic backbone: it specifies four content domains — governance, risk management, strategy integration, and incident disclosure — creating a natural multi-class classification task directly from the regulatory text. Our categories further map to [NIST CSF 2.0](https://www.nist.gov/cyberframework) functions (GOVERN, IDENTIFY, PROTECT, DETECT, RESPOND, RECOVER) for independent academic grounding. +The construct is **theoretically grounded** in disclosure theory ([Verrecchia, 2001]()) and regulatory compliance as an information-provision mechanism. The SEC's final rule provides the taxonomic backbone: it specifies four content domains — governance, risk management, strategy integration, and incident disclosure — creating a natural multi-class classification task directly from the regulatory text. Our categories further map to [NIST CSF 2.0](https://www.nist.gov/cyberframework) functions (GOVERN, IDENTIFY, PROTECT, DETECT, RESPOND, RECOVER) for independent academic grounding. -The **specificity dimension** draws on the disclosure quality literature. [Berkman et al. (2018)](https://doi.org/10.2308/accr-52165) demonstrate that boilerplate risk-factor disclosures are uninformative to investors, while specific disclosures predict future outcomes. [Gordon, Loeb, and Sohail (2010)](https://doi.org/10.1016/j.jaccpubpol.2010.09.013) establish that voluntary IT security disclosures vary in informativeness and that more specific disclosures correlate with market valuations. [Von Solms and Von Solms (2004)](https://doi.org/10.1016/j.cose.2004.07.002) provide the information security governance framework connecting board oversight to operational risk management. The [Gibson Dunn annual surveys](https://www.gibsondunn.com/cybersecurity-disclosure-overview-2024/) of S&P 100 cybersecurity disclosures empirically document the variation in quality across firms, confirming that the specificity gradient is observable in practice. +The **specificity dimension** draws on the disclosure quality literature. [Hope, Hu, and Lu (2016)](https://doi.org/10.1007/s11142-016-9371-1) demonstrate that boilerplate risk-factor disclosures are uninformative to investors, while specific disclosures predict future outcomes. [Gordon, Loeb, and Sohail (2010)](https://doi.org/10.2307/25750692) establish that voluntary IT security disclosures vary in informativeness and that more specific disclosures correlate with market valuations. [Von Solms and Von Solms (2004)](https://doi.org/10.1016/j.cose.2004.05.002) provide the information security governance framework connecting board oversight to operational risk management. The [Gibson Dunn annual surveys](https://www.gibsondunn.com/cybersecurity-disclosure-survey-of-form-10-k-cybersecurity-disclosures-by-sp-100-cos/) of S&P 100 cybersecurity disclosures empirically document the variation in quality across firms, confirming that the specificity gradient is observable in practice. -The **methodological foundation** is the [Ringel (2023)](https://arxiv.org/abs/2310.15560) synthetic experts pipeline — frontier LLMs generate training labels, then a small open-weights model is fine-tuned to approximate the GenAI labeler at near-zero marginal cost. [Ma et al. (2026)](https://arxiv.org/abs/2601.09142) provide the multi-model consensus labeling architecture we adopt for quality assurance. **No validated classifier or public labeled dataset for SEC cybersecurity disclosure quality currently exists** — this is the gap our project fills. +The **methodological foundation** is the [Ringel (2023)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4542949) synthetic experts pipeline — frontier LLMs generate training labels, then a small open-weights model is fine-tuned to approximate the GenAI labeler at near-zero marginal cost. [Ma et al. (2026)](https://arxiv.org/abs/2601.09142) provide the multi-model consensus labeling architecture we adopt for quality assurance. **No validated classifier or public labeled dataset for SEC cybersecurity disclosure quality currently exists** — this is the gap our project fills. ## 3. Data Description diff --git a/labelapp/.sampled-ids.original.json b/labelapp/.sampled-ids.original.json new file mode 100644 index 0000000..95084d2 --- /dev/null +++ b/labelapp/.sampled-ids.original.json @@ -0,0 +1,1202 @@ +[ + "002cda6b-a604-4820-af2e-805ed8de6b92", + "00729651-233a-4c81-8bc9-8c9371e5e55f", + "0076cdcd-042f-4c0a-94a0-21ab0bb56961", + "00a8b4a9-7732-4726-8864-8bdc3758b921", + "00afff92-e27a-407b-9a89-94dffae69ded", + "00c8076f-1692-4c77-aac5-8d9fe8256d4e", + "00fe18b8-3fd1-4f0b-9a0f-2ef1c654e559", + "0144bfd2-5513-4d83-89e2-50db3003354a", + "01644e4d-252e-480e-90e8-62a70bfcd33c", + "01664cf4-653a-48af-94e2-87972bad68b7", + "0178fbd7-c912-4be9-8d3e-b9ef90f0a79d", + "01e4ee22-f5e2-42f4-b4a1-c6e66ec16a6c", + "01ee1e53-aa28-4046-9e60-875939838eef", + "01fe797c-a453-40ce-b012-36b8e5a6e321", + "0254176f-40ca-4d1d-b0bf-24b68f9f0b5f", + "026c8eca-8fdb-42f2-a5c6-a01e6f08000a", + "0271e72c-0c5c-49c5-bc13-0911c0938c11", + "035cfc55-3936-4da7-b3ce-8ed2432fe082", + "0386c996-51a7-4230-bc94-64627b540dad", + "03a5c956-4893-4ad1-beb4-40916d68ece5", + "03e3a0bf-bc1a-45f7-a1e6-7d176bb8402d", + "03e7a40e-5b5d-49e3-9e07-75da8596917f", + "040fc21e-1e6b-402c-bfae-0f0c14d6935f", + "0421e917-1127-434f-9153-425aebe34526", + "043dcd53-793c-4cf0-b0a6-604366b475e6", + "04692fe3-9f04-419a-a1d5-98f7ad37055e", + "04e33d43-67c6-43bc-84b2-a0f868b23afb", + "04fab55e-8883-419a-ab31-9b77ca1e886a", + "050581dc-ff37-4470-b419-d4834ceee34a", + "051cd71c-07c6-487c-802c-3c9e1fd785ca", + "0522b5b1-d827-4fe1-bd91-662ce9b2059c", + "059c3afb-e408-4c38-92ee-72f00d14537d", + "05fdb95c-f042-46a8-b6d7-96890f16ccb5", + "066d9753-3a21-4dab-9a69-9bbf96274e4c", + "0686f89b-98b5-4ce9-b0f4-9140c412e1dd", + "06be810d-6a8d-48ca-abe1-ae53c6350ea2", + "0716b78d-759e-4bfa-927a-f4bdc976b425", + "077e429c-de5e-4c01-8e10-a5ce0fa31bc9", + "07a48b66-f3f4-4e7b-ac98-29c42760a616", + "0847ccd0-9aa2-41cf-a21f-20eec833c9b4", + "089cdb76-e593-4e78-9b6a-5bff54806a36", + "08cc8f1d-d8cd-4d45-80c2-320d21722558", + "0982534a-75a8-41e9-96b1-17b645d95909", + "0aa49a29-b3c7-4ad1-a8f9-ed38c2c9ad9c", + "0aabaad9-77c1-4690-9ffc-43f90fef1b0e", + "0af7e40c-ec7d-453c-8ff2-aa0ab992e1bb", + "0b10af20-2cdd-4a9d-9290-8ee68ab02bef", + "0b7d7a40-9d63-4788-9200-839eaa1f1aa8", + "0b929ecc-dceb-455a-9668-a349829a7ef8", + "0ba1e4a2-76b7-4166-ba7f-e9fb513cb79c", + "0bf5e3e6-d86c-417d-883e-793ecc364994", + "0c8bab24-2fec-4096-8c70-ebe674c412a5", + "0c93b4c5-f39b-4391-ae7e-ba01ca0b0ee6", + "0ce67ebb-95d0-47c6-addd-2dd44564404a", + "0cecf56c-091a-40c0-9ec0-ead2b367c3b1", + "0ceeb618-d0c1-4a58-8708-a7735fbecd29", + "0cf3f32d-0d3d-4bf4-afd1-71bed85f1f49", + "0d29c9de-e69e-4e32-9dff-3e695653e724", + "0d63f232-6a34-4e36-9559-c7b7ba64594d", + "0d8ce839-7e76-40d9-89ec-7036d3ce1b53", + "0d8de798-357b-4304-bac7-dd6ac9b6756f", + "0dcbec49-5c0b-4dc5-a8ab-2a0006e12776", + "0e2ffc88-d579-40c2-bc21-1d3bd79345ce", + "0e4847aa-1759-455d-bfc9-38d9695cf451", + "0e63c9c4-7a87-4438-96f9-30ed7483eb20", + "0e781ac2-191a-4a9d-8f11-3a8ba229d812", + "0e838e22-4576-4817-a205-ef88e6153c0e", + "0e9fdf8e-5890-4626-91b4-53fd92db609b", + "0f0bb60a-8c01-42ba-a3de-43c78938536e", + "0f1c9ceb-7222-45a3-8df7-80bb97b82e26", + "0f8d60d2-3e53-4f4c-a083-0a21b923e1a0", + "0f9f3acd-a50e-443f-8065-c1a3e395bdac", + "0fcfe7f4-3ab7-4bc8-a464-ac6bce08159d", + "104ccdf3-8529-424d-abd2-bb6e0094f83f", + "106dd0cf-664b-4726-869a-5f4b9ba7a278", + "1082fe29-d7f8-49a5-b0e8-6588b97c7e5e", + "108c2f88-0017-4d97-8194-bc0cf23c3f20", + "1099c71c-1f0a-4358-8e93-586e439367e9", + "11348064-57f4-4944-a820-63be2af231b4", + "115c023f-71d7-4c71-8f43-c593ef213ff5", + "117b8c81-3f10-4a37-9e54-5f00b760e9bc", + "118255e4-99b8-4b2e-9884-708f88762960", + "11b0cbcb-d3b7-44c1-a95a-16e053520f25", + "11ecc4ed-3029-438f-9a89-8f3f906b604f", + "12148ccd-1269-498d-b043-8279a081cb30", + "12880a52-e70b-4d24-9d0d-270114b2102b", + "12c9055b-546e-467f-a17c-265af3a9485d", + "12d2a0ff-b468-4d13-a647-bf40ed9ddbdb", + "133547d2-5fba-4137-867f-3c3ff719baef", + "13a9c0c7-41ef-4b57-a3da-e0ae37111053", + "13b0b45e-8383-44ac-90f4-50a2246c1b1d", + "13bf83c6-961f-4ee7-a104-4a673c90a7ac", + "13d1f969-d229-44a0-969a-ecef3bfcb620", + "13fb6ba3-ae0f-49d4-bfa6-b6edaba3f632", + "152b5ab2-65b8-42a2-b178-3492769c88d7", + "1544173d-acf8-4fcf-8fed-0f9f44d9fa76", + "158c33a8-3534-4c17-949c-063598358934", + "15bb5b7f-452e-4d9a-9015-9b953ebacd20", + "15e7cf99-a685-4b9c-a9d9-a1c2d196ca4b", + "15ef9917-aa77-48a2-8314-1b52949f1e40", + "160fec46-85f1-459a-a930-d74c6fd3fd67", + "1626d556-df43-4114-9929-8dcbda2aac45", + "1627726f-d34e-4d7f-8d1c-0b4eadf65001", + "165c40e0-6570-45a2-ad41-fcb8e25231a4", + "1673f332-9f72-4498-935b-5bf2f037ca56", + "168b2a33-599d-4531-80fa-0a5d28dbd99c", + "1691aaaa-4756-4c14-a92c-465ac3b11132", + "175a1268-4faa-4eb6-8432-13959d8adf13", + "1782f989-cd58-4acf-8c30-22d8af199c27", + "17f2cc44-c64b-4358-a4ac-6d9f57c94021", + "17f4da72-2839-4b90-ae08-4e60dc5b6551", + "185229c4-61bf-46b7-9042-7ab2a435d97a", + "1867f61a-d86b-4039-b21e-8faaa4679c95", + "18c73e38-6c43-4e60-b6a3-77f7a1269d12", + "18d8523e-ee1b-4608-90a4-0ba7256b6dd5", + "18e566e2-f9f3-44d7-8553-759ecf1053fb", + "1936606a-e39f-4591-82e4-69ebe871cee0", + "1a298d27-8563-4847-a0b0-cc2b145655d9", + "1a4ee0fb-19f6-4f2f-ad17-ae656716f8df", + "1a90c7ec-ec01-4805-9013-32c00613216e", + "1a9b4fd7-f944-4301-bbcf-fb085cf8235e", + "1b26954b-7c4f-4958-a4a6-01f8c59d07b7", + "1b301d86-ba26-4a97-a797-86c299755fef", + "1b759a39-1878-4b17-bf0b-648866c1035b", + "1bb44ab1-bf51-4cd2-8cbf-bea08c7b9ed0", + "1bb53013-6ee5-4cf5-81d7-feb46f32361a", + "1bf000d0-2baa-4d95-aef5-595422cde9e8", + "1c467688-8951-4610-ba69-f53295a7a3fb", + "1d2347cc-2345-4680-aa81-bbd76569663e", + "1d555463-8868-4d8f-a01a-987f5c4bba2f", + "1d780f32-0716-4fd7-b6d4-10152a0811a5", + "1d7deb2d-0122-4b17-919e-bceeff9ae9d5", + "1d8c9889-43df-4797-8643-b06bdec395aa", + "1dc3d160-9826-4ca1-a23a-51d634e8a6d7", + "1dd7b104-f8c4-4299-83c9-ffb8005bd944", + "1dd98252-6e22-4d27-a6c0-7fd612339906", + "1ddcdf90-6547-47f9-bb54-7124d97d66b2", + "1df7c959-d68f-4f50-b4f8-696dd1b23dd7", + "1e12023d-eddc-4fd5-9b7a-0a73aae4861b", + "1e663392-ee13-4489-9434-f5cb3c40160c", + "1e775594-38cf-4f8c-9922-24bd5e80cc07", + "1f1bb7df-5ea3-4085-b9a1-232cbf90daf9", + "1f29ea8c-3208-47b0-be00-62f1c31046e3", + "1f2c5ba9-2935-4c92-b6a9-d3206969d041", + "1fb13733-2f62-4ebf-8f48-36300dd10fe7", + "20212526-1cb7-4cc6-9f42-674dd36627d8", + "203ccd43-147d-46fc-b092-2e1072a2d47a", + "203d0d09-ce67-4ffc-bd39-49e7931cf525", + "206e2e28-64e0-43da-b40b-4b0b8f2a09da", + "20c70335-70e6-45a8-8a52-a394d3c51530", + "20cb5b3f-0ca5-42bf-9ecf-b1484cbd655f", + "20e158f9-eb79-4cb2-8f90-4f1181d952c5", + "211038c2-4fe1-41f0-91fc-15e154ef532a", + "215b0075-f87b-4546-b97c-7cd65bd0654e", + "218db9d0-4300-4588-aa90-d6594a0965ef", + "21a1f08f-4438-4122-af64-9fa3363abc36", + "21bf0b84-c1b4-4f13-9c16-74ad5a288a7f", + "21d7e07e-8598-4737-b1b2-357af5868a99", + "221bd4bf-ccbc-4779-9fb5-379575ce08b7", + "222f8bed-feb9-43ab-b898-05213d59e6c6", + "2280bb85-de8e-4016-a569-137e253bf51c", + "22da6695-00e9-4e93-9447-f04840a42c71", + "234f6ede-9d0b-4db8-9e54-fd2851c9ce90", + "236d0f17-3a55-4e56-a745-b3b774d456db", + "2383c9dd-c743-4ffe-8f26-f2774a2fcc19", + "23d37342-e629-4859-9a3e-3497676ef60b", + "23e9bfde-7098-4a1b-8831-4642e87d75c9", + "2437eb5a-8c70-4978-8c22-1a1e6898ca16", + "24522a01-e0fa-496d-90a9-027cfa60a01f", + "24ed45fc-4935-48fb-bfb1-7a65e2494157", + "25139676-edff-4cc0-aa89-2c1f681f0ade", + "251e60b1-2a1c-42d4-9793-c1815c222693", + "252ac317-ff0e-49d6-aa5f-c8ca067b586e", + "25f77ed5-a401-4a60-a0b6-c438f972ef6b", + "25f82556-820b-404e-990f-1883d082d142", + "2608bfd3-adc1-4a28-9189-8f8226521ac5", + "2633a6b3-d601-49d9-a369-831f9db7a2be", + "264bf8af-23a4-4941-929a-ba0a8820a66b", + "26b9bd93-5d45-4249-95d6-e0f4f6d85072", + "26bf0b7e-f067-4eae-8bcd-9d073ae89d09", + "271075ad-bd59-4498-82c0-9d2d1bcb653e", + "27336789-7ee1-4b01-bea8-1093845b980b", + "274c8155-b46d-4ff2-8e9b-fd8bcc4090db", + "27589342-b708-4738-9008-0f90d68fe452", + "27799b3b-3006-4781-b29b-a4fd702817de", + "27800e97-f932-4eed-b723-ac245db3b259", + "27e9ab17-656b-468f-b64f-f1614f786c6e", + "28955a3a-ff97-4ee2-a505-2ec0f0fe4eb2", + "28aff949-ca00-4087-aa96-004a2a18e1fc", + "29191e26-c43f-47f4-a7c3-8429406e1b9a", + "293635eb-1d2c-4c5f-83c0-656e0922db42", + "2946c8f6-df6f-4ea6-8f1a-49ea582e2e16", + "294c8a54-8a36-497b-9af4-2055cb3678d2", + "295f7d51-262f-4179-8bb0-9358f7101052", + "29944faa-02ea-4af6-9424-855fa6d7607d", + "29bbd722-d7c8-4e63-b093-ebffce879a9d", + "29d2b8e9-fb02-402f-8a7f-9741fa08f414", + "2a0977f1-8d32-4eb9-8b8e-d3cf19e230cb", + "2ac26f22-0426-4f39-8c20-03aed982886f", + "2ae9067e-8b46-406d-8565-ebd0f4033dc5", + "2b782a8e-0b4b-43b3-a350-832548d0f45f", + "2b7a71fb-895b-4d01-a756-4fac9301d842", + "2b9fcc0b-83d7-4b15-a03b-72eec55e1f3b", + "2ba57c3f-f61c-4d9c-a058-233d57e66e3e", + "2bccf3ac-7efa-4be8-8f66-28f9f78ee385", + "2c1dabfb-0300-46ac-b5a0-d5f9e141cea1", + "2c53b7df-ce08-41e4-8d57-94ccdaf23b49", + "2c5e3e60-d6e7-4ab7-8432-dde8bb3b6529", + "2cb214d2-f4d5-4795-bcf5-291c28f45323", + "2cfa468d-35e8-4fff-80d7-09eac8ec7aa9", + "2d0f6871-63cc-436e-b8d2-5db3a55dbaf7", + "2d6eaf49-860a-420e-8e62-658e4fda69ef", + "2d9a622a-ba1f-46a7-a66b-3160750127d4", + "2db251a5-b50e-4b9a-b460-4f8fb46aa469", + "2dee4287-78e9-481b-8cd6-c1350cf7733d", + "2e027569-24fa-473d-8e3e-3a35626419e0", + "2e5290ed-06ad-44be-b4ee-dff865a2afdc", + "2e75e9aa-2ea0-4de9-99e7-3fdec19dd061", + "2e783ae4-3b9e-4683-b630-f0dd0d16f01e", + "2e7a18d8-55b3-4e10-b433-2e08832e2816", + "2e8cbdbf-59b5-48fd-b203-746817df53ed", + "2eb5d422-c33c-401c-a978-3863bc0e7f24", + "2eedebac-8d43-4e29-b35e-0e6d39932129", + "2ef814c7-c748-4f6b-9b73-344b715d6469", + "2f05f0af-0ec1-4076-b73f-df86a1db7952", + "2f1c6847-9826-49b7-808a-4d606306867b", + "2f89172a-49e3-4bf0-bb05-1721abfc116d", + "2faf709e-91d3-4c7e-a511-e7e8ff865615", + "301de177-adae-4ac3-a561-86caf0d7ee6c", + "303685cf-ce63-4429-98fe-fcef1152a4ab", + "30603a17-1e21-463d-beab-f75dc1e71410", + "30d186b9-097c-4270-83a9-caf64819573c", + "3135990c-85d9-41c1-a8a6-2967cfafc2f4", + "3187942b-e561-4a10-808e-0e93e0bfa344", + "31d56093-b24e-4ba1-bcc9-88c74e687fae", + "325420dc-f389-41b2-9ab2-b76cd0452cef", + "32884934-9f18-4c8b-b718-af2676169b33", + "32b6ff49-e216-4b25-9a52-ef8454309e1e", + "33185cdc-99e9-46c5-aa77-a37ca6448ae5", + "334dab27-9991-4b54-b263-bf0bae0aed58", + "33bac9fa-af20-4c74-85d0-6554f526dd43", + "33c082e0-2c43-4210-8db1-2265eb96e1df", + "3421fe7c-b119-422b-8f3c-00ca4e904b23", + "3442463d-20c1-4215-b3e1-f87c8f23ecfc", + "3488649a-9198-4627-af9c-2d8472699089", + "34b3f840-e91f-4da6-9bd7-01ee9ac58a14", + "34c7fd32-df05-4342-9c6a-94cd8ce37da4", + "3594dab8-7f9b-4bce-ab2d-620e9c7b00f8", + "35a5f80a-3bb8-4ced-93ef-b96236a2e68d", + "35d6f3f7-82de-4517-ae87-5f84a9a9320f", + "3601781d-1602-4744-8ed4-26b169035206", + "367108c2-8f7b-4a80-95d6-054d3b8d5d16", + "367d3a37-b2dd-4710-b950-d55ca7b4f960", + "36b67740-9d45-4584-ad5b-83842f567e6e", + "36c16fc2-5792-4444-9012-c0f67f9b08f7", + "36e94e96-652a-4b9f-b622-9f4db0e26559", + "372f4f2a-93d1-4cdf-ae34-95ba8aa1e5bc", + "3735c653-261e-472e-a2d2-0536ee93faee", + "376577ad-f825-4c96-aca4-4db4d582aeeb", + "3774b08c-ddd1-45c0-98f0-c111ac08cdaf", + "382ed87d-8358-494b-a621-cf1cb93e9ca4", + "3840e1ab-921b-4933-a690-6bfcbe5fbad5", + "386bc49f-d4e3-46e9-a44c-cbe72890e6f1", + "386c83d8-9ad4-45f3-b101-2fc6436279d9", + "386fcaef-018b-4389-a75c-e354677086f6", + "3877ea7a-b7f4-4e19-9168-f5ae8eae2205", + "3879887f-4f04-4208-96f3-dd90fa3c29a1", + "388ecdf2-b6c9-455d-9ce2-a3df16e3eacc", + "389ce947-a0b4-4c9b-8e88-be828c888a48", + "38be69f2-7663-4cdd-8e89-893c31fc2206", + "38dc1e81-e4f2-4ed3-97d7-e1efe841bc2e", + "393af8db-1f4a-4442-83cb-623edb8933cd", + "3948d0cb-56f6-4559-91e5-8f910aa98760", + "396fedc9-4672-436e-b05c-ab74523db3d1", + "39dd6b59-db21-42c1-a220-4ffac71ecca3", + "3a4f4524-bda8-4e88-beb2-e8f26bf9756c", + "3a55951d-04bc-4650-9519-ddde7c21d5f6", + "3a764fe7-1202-4d3b-8193-0cd7061ace56", + "3b11496f-b9f9-485d-ac9d-d06c3b791e26", + "3b15b5f0-c9a9-4d47-9980-1804a2293d71", + "3b3ed064-bf06-4c18-82d7-89d2194525b3", + "3bad3d1f-6b69-4874-ba6f-9f5030bccc15", + "3c47379a-9926-4595-bad7-afd816fbfb5b", + "3c5bf8fc-195d-44c8-a2a2-069fce59dfe5", + "3cac6612-b99a-4488-add9-e0cdcb5aea25", + "3daca104-0626-4162-b205-6fc14089527a", + "3dd90084-1889-44b7-8fab-0cd9bf213b96", + "3e1ef814-7d6b-420c-8576-7e50a712c7a4", + "3e376dc9-861b-42a7-aa1b-791cac70f992", + "3e3f2c61-dad8-4cd6-b0fd-f17d9a9bf5ba", + "3e6ffc65-b4f9-4798-a833-7648197d64f8", + "3ed85781-2f91-466a-b750-0e07b9b3e544", + "3f510185-7584-40eb-a021-c7f719a6d8ef", + "3f523a86-6849-4cd3-a7b0-a68bf2f4f093", + "3fbe5176-e6bd-476d-9496-ba00b8c3bde4", + "4004d4c1-0ebe-4004-b71a-06ef0035aa65", + "401ea95f-d6eb-41a6-b828-a8456d337c7d", + "40448459-830c-4811-9334-46bb4e7b4d4d", + "404b69fe-ef86-4610-b35c-45ac238b0b55", + "40c60b3d-44dc-4621-b31c-a6f257c31a53", + "41492a09-0efc-462b-8fd9-a432c21f90a0", + "4153c8e5-7e08-4b05-b467-332a3f431667", + "417e5fc8-10c3-471f-ae37-78a8e33d7ee0", + "41fe546a-ec28-4e29-8aee-edb9e1eebe9c", + "4202044e-49b4-4a72-b23c-57147565f398", + "4213efb3-4b53-4c98-b4f1-b0159b16e2eb", + "422d302f-7f0a-4221-bc0f-beae3866bb4d", + "423f662e-8468-4200-99c9-3af5d70199ca", + "424b7a5d-7fa6-457d-9031-5473b349e795", + "4266bc28-4acb-409d-9b36-ac3ee83d1521", + "4371c9c4-4586-4e30-82ce-ae73aa99c116", + "43805023-dbc2-430b-a10b-f17fa8b482ac", + "43975c32-eb3d-4962-81d5-e6fc42037d97", + "44129438-a376-4160-9d6a-7c01c6d50840", + "442e4710-5965-4a84-af18-7f2410fa40a5", + "44748813-ed16-48ff-b21d-ca53fa34e0a7", + "453fda4c-e795-4ecf-87c4-4c75c0ff21f5", + "45961c99-f02a-46a8-822f-d21d29f1d0f5", + "45abc0d8-8c9c-4c3e-b20a-35c2df2c6bcd", + "45b4d73e-ccf7-46e5-ac74-8c32881cc5f5", + "45d2ff50-de69-4ebd-a50d-5f5f7c5fbb67", + "460a7267-fbb3-4bb0-8283-10c94a9447fc", + "4649139e-0e18-4d17-8f05-d20c4cad0371", + "46b0aaa5-aad8-4207-8069-de525e7274e8", + "46be2d0e-208a-40ff-bd3e-04ceb4373c58", + "46ccbf60-b34a-4d65-871a-e32fc7c26a12", + "46dcff96-10f9-4564-b5f0-7362d3aba68e", + "46e6f334-d67b-4ada-8541-c15222b24bed", + "46f7ff18-9773-4645-8fd6-68b52ed8aec0", + "47d3438a-c1f8-43a4-a2a4-267553d7cccb", + "47f80eb4-3f86-4c91-b231-e9a2df60965c", + "48164b09-f79c-4238-aead-c2f91be5b0c6", + "485b76a9-4e62-450b-a1be-ae238be570e6", + "48a4418d-5945-4ce7-ad48-628c92fcd5ca", + "48b7d0c4-3a36-4484-85ae-39c4270e0e7e", + "48f835e9-74c0-41d8-bde7-a220e992e224", + "48f88098-8b77-448c-8e63-ac7ee83ed676", + "499f8c92-407f-49d4-8b49-0a66dc034d4e", + "49a49249-290f-4c04-85f9-d09a79e42fc3", + "49e627b4-d0a7-4b5f-9fe7-d285e1a84df5", + "4a62d43a-f07b-41c7-a930-bcff62b62a8e", + "4a807308-4de1-4acd-b33a-ba6f73ff3679", + "4a961689-32e7-4d37-805f-d7bc6f0a8d0f", + "4aa7fde8-fde2-4c58-9f54-3f28936cf252", + "4ac260fb-2bbd-4930-b205-6e6469d6bc5e", + "4ae72f14-8dc6-4637-8680-3f3421654ddf", + "4bb6a21e-d29d-4460-b88b-5d8d4e49c10a", + "4c0343e8-da15-4fef-9734-d3055b5ef5f4", + "4c187b47-2191-4b08-9e08-135c0e842418", + "4c5783e9-491c-4fc4-b122-a9d91dbe8db8", + "4caf9df7-d02a-41f6-b9bf-b7396b1ab2bc", + "4cb927be-b60b-4d7d-bc4f-5e4e0fb3caa0", + "4cbf7a31-2e69-4c78-bd9a-81ca8baa4603", + "4ccc39f0-1f82-4705-b39a-e7f785981c67", + "4d35732c-fa1a-4be9-91d1-9425316bc1ca", + "4d3d6fdb-99f2-4ad2-9844-fc7c7bc9c66b", + "4d5a0e93-da04-43e8-9c4f-4c1e211521b0", + "4d892f97-7ad1-43fb-be8e-b27122773adc", + "4def7a80-ff7b-4ce3-ae48-0b23d2d474f9", + "4e6dae55-6846-45d2-bf2c-674e5dd03eb0", + "4eb059c0-8985-43ce-ad16-1844597c6c2a", + "4f182b2b-5cf8-43e5-aa03-6b050ac369ca", + "4f4473bd-3faf-491c-9698-710d721f57ec", + "4fb15f62-323e-4120-815b-36b53da951fc", + "4fcc605d-25c9-4143-8b16-6342e31ece15", + "4fd9f918-2e58-474f-8820-6b118fbb5e62", + "4fdbb014-2a7a-4a29-a05d-11aef8211163", + "4ff0258d-ce5c-4cae-9273-37e0fd9a47e2", + "501d8eda-6772-4e80-b9fa-1d59102230dd", + "502fc6a5-a5d0-439b-8e30-50638dc5f98b", + "50c22a12-1b2c-4505-ac12-2ea5901f44ba", + "50d02d0c-4efe-4947-af17-52aed57cf645", + "5105862f-b71f-4701-903d-c1f45b8b5fae", + "5107940f-bded-46ff-ba23-afb074b052bf", + "5180a97d-f3cd-4283-80d7-f49bdddc6d7f", + "51eaa722-8948-4e63-bac6-55bd4c7ce059", + "5231f113-e574-42de-907d-1e4c0522d123", + "52589fce-da18-4e99-9b57-1614541c29a8", + "52b61f8c-6ac7-4ce5-a8b0-e40d4b3159f5", + "531238d9-ef96-495d-96ca-7439ef6ecff5", + "53327174-7ae7-4ad2-9f38-be1022570b2a", + "534404c3-0512-40d0-90cc-ee291bb760bc", + "535d8ef6-7fb9-4acc-a0df-242c9b243d82", + "5373d4b4-81cf-49ca-8561-ae80462818e4", + "5378a0fd-1dc5-46e1-ab6d-448c8e4f712c", + "53ae6543-9942-432a-8689-be96748896b0", + "543e0e1e-0e29-4aa5-b8b2-bbe7e240845c", + "5462f9ad-464b-45b4-a05e-086cffac9b35", + "54e41421-aafa-4363-8da2-170efe107f6d", + "554f841a-e1b6-4cd5-a06d-635966343512", + "555a173c-e4d9-4984-adab-d23e21d01757", + "555fa570-011f-4eb2-8c23-f6b1c97696eb", + "55e3c32d-461d-4e76-befa-e0b4fb41918d", + "560d4d23-4aec-4d5a-adc2-495951b8b68f", + "5613db98-bd7e-4332-b153-162408254143", + "563125fb-1738-486c-a47a-953696459d0a", + "564b3049-3ab0-4cad-9d89-a61aa1fac8c0", + "567d3f32-1305-40cc-bede-1efd85ccf666", + "56a2e867-e89c-497b-a03e-daa44f9b0607", + "570731ec-2324-45be-81e0-959ceb6df5e4", + "5750092e-a69d-47df-8dae-84393aacf4f2", + "57b6ac3a-e2e3-4cda-b8be-b246f36698dd", + "57e73d73-042f-4a64-9ab7-d7ae0c0215bc", + "5830bdaa-2667-409d-ba86-ecc50391a25a", + "5835970d-521b-4ae1-a43a-1d9cbe85151c", + "5836d8c8-c0d9-4d5b-89cb-4fa6e40f7054", + "58507dbf-1ec6-4b17-8e52-3886ef2f2b5a", + "5881b504-501c-4579-95a8-dcb0496c0789", + "58f81ca0-a5e4-41fb-b610-c0d2202f8ec6", + "59081129-5e5d-4970-a8f9-0ca434e9762b", + "5909de57-28ab-4137-9374-55f7854b9dd6", + "59598645-bc83-4192-a7aa-bd1c61ede567", + "596ae5ef-3ab5-4d93-91f1-52dd43ba2831", + "597f87fa-4da3-49d2-a69c-fb5aefd6c8c0", + "59826d4e-6f1a-4153-ba5e-23b8be468f09", + "59bc96c8-9ce2-48df-96f1-b26dfceadae4", + "59eaea29-abb1-4ee6-b37b-20edf4e88139", + "59f2dc0f-03f6-4e45-bbde-ab7dc0049ba7", + "5a00fd33-9bb6-439a-8c7d-cc5696e5719e", + "5a072d98-f0e1-4bb7-8bb8-618213c4e1d0", + "5a242b9b-5006-420d-bdbe-4ed22ba133cd", + "5a4dc79c-7d87-41f7-aa55-106665291bb8", + "5a521ae4-f17f-4d2c-a350-e52f700f04c4", + "5a667e5f-49f9-455f-8daf-57ef5b00a325", + "5a8c96c9-d88a-42f1-ae6b-453230ea1bef", + "5aa5214f-a712-4c7f-958a-9242977852b4", + "5acc92f6-1eee-417c-96eb-3927df18bfa2", + "5aeb9975-5b99-4f04-9512-5f16b4c08deb", + "5b796118-7fa8-4b08-a9eb-4c0852a28113", + "5bd30697-7bd4-4374-bbb7-5625b8bcb422", + "5c49b8ab-c4be-4723-aa83-835288df26ea", + "5c5327bf-9bda-472b-a85f-b2477d752c88", + "5cdb4c13-2105-4fa0-ae14-905b6348b88d", + "5cfc5783-c20a-44c8-96f3-83eef3d8e5df", + "5d0638e5-7a70-462c-83a7-d22c4bcabef9", + "5d5982f6-af39-472e-b9f6-b1f66ec5319e", + "5d614803-47cb-4e0a-bc16-f2390b1406a3", + "5d732b7c-d12f-4250-b2b7-2e687f6f2a11", + "5d88433b-8a5a-4d2e-a19f-0b6562bf835d", + "5d93037a-b7f1-4021-90dc-48c9d909645e", + "5df3a6c9-0050-4abd-a163-5b092f375929", + "5e38e2b3-8430-4ab7-9422-18a5dcea63e3", + "5ec733fe-e590-42b2-aea3-61f234941acc", + "5f21fd27-86c5-4a3e-acbd-790655ffb376", + "5f3ce958-a210-4029-a2c0-5ef9cf76d2b7", + "5f568227-40c4-47f2-b9e1-93dc0220d31d", + "602f93ae-64a4-46c6-9377-9a7c380d82ed", + "606eaa10-e1e9-4c63-9f65-c42823fa1516", + "60b33188-df88-4181-894e-737a314ce694", + "60c8eae1-8b16-46d3-b377-c1a3ef837368", + "60f45081-92cb-423d-995a-dde0d0044d45", + "610e2326-9d07-43ee-9fa4-0fc7b86744a5", + "616993a3-b537-4571-848c-988c8fde4346", + "616a631f-f49e-496e-adf9-391d0100aaa9", + "6175d12b-87be-4f9d-a8d3-7d5aeac46a06", + "61aca4ee-2733-4752-8583-dd8bd7c55df5", + "61c3fe3e-9286-4226-a298-50d0db49b34c", + "61ed94ef-48af-407c-96ed-8acd6ffc5e1d", + "61f22bcf-ac20-46d2-be37-a3535f16077e", + "622263db-ce91-4f04-b565-8060ebacb323", + "6230f75c-fc5e-47e1-b05a-dc5f0ba26402", + "62519925-251f-422f-928a-9f630c434483", + "625daf86-0dda-427c-b6a1-6a3e981da6d0", + "6286c3f5-e240-438e-b249-9a160bd72ae4", + "62a9f3f3-5517-467d-8fe5-b8b56b7b917c", + "62eb56e8-83c4-480e-b6fb-ccdd7bc13bf8", + "62fd2104-90d7-4283-8055-f45f9447aa3f", + "63106629-9d6e-440e-9eb5-5bdf082b397d", + "63ccdf96-efb9-4f3d-bad8-741e4a3c9606", + "63fba9ea-b11d-4d6e-a482-44f9a04873c0", + "645a380b-66fd-444e-b69f-ff2e2e1e5b34", + "64ee5917-7a0e-414b-8240-d7bdeb33a28e", + "650abb7d-2c86-4555-a625-2b32198ae52b", + "652c920d-a71f-4b8b-bfad-20ee0e26e5f9", + "65518c9d-20a6-40ce-bff6-e8a00943094c", + "6563e622-51a4-4194-aebc-e51dce5659df", + "65673eeb-e8ee-418b-bd9e-44c00559d57c", + "65a0e95f-f6ab-434a-8946-75f5f5ac2d36", + "65ee7840-ca56-4746-9df2-feefa667b850", + "661edf05-5e99-47e9-814e-204fddab2ab9", + "66293f82-fec1-43f3-b9e3-e55170bb16af", + "6668a518-e8f2-4265-8e25-1be92405ff16", + "6684ee2a-0ba1-4f0c-a563-5a25b105932e", + "66e4953d-838c-4512-80fc-d0f509462d4c", + "6702c52c-17b4-4f8e-abb9-5a81cc7f5091", + "6705a01e-96f6-492e-82de-18b39a84219d", + "673c49ff-dd5e-4322-aa46-d5f5a7fd5e53", + "674a470f-ab1a-41a4-b0bc-c85d13446b07", + "674c82e9-38c8-4c5e-8bc5-99dc61c19535", + "676e5a8a-5d4f-435a-8703-e432e19d4bd5", + "67b87782-aa9d-49a3-8fcd-da3e74b70cab", + "680177c6-e0df-4252-874f-55f75dec03f6", + "6807f074-3a2d-4f87-bd44-8d61628882a6", + "6827a002-04e6-433a-84ac-e1da97048753", + "68382b22-a5fb-41b1-a059-e1880378fb1c", + "685f4643-cea7-4b65-8fbf-16a6173f1c53", + "688f9e4c-abc0-45f7-9b99-bd2892d43cea", + "692b5e88-a6a5-48cd-94f6-f1f98230df39", + "6946c251-15ad-4cb1-af97-dbf1d5713f85", + "69688847-eb44-439e-b454-23004cf3521a", + "69d83009-a4fb-476d-a18f-451c3eb4ed8b", + "6abaa5bb-4373-4861-bb47-3bb9755938a9", + "6ac76a89-0fad-405f-ad22-905561a566d5", + "6acde2cb-07cb-4b4c-82e5-1ea6df30bcdd", + "6ad178b1-0c3f-4daf-90b7-b0afd30ed529", + "6ae1e45b-1d28-467a-81db-f81d6419f094", + "6ae79fe7-679d-4129-abbf-39585fdd0295", + "6af8730f-a8ff-4043-9bda-98ddefbb5044", + "6ba82f0f-c060-4d83-861c-4b13a5df84d5", + "6bbf100b-c6f3-479b-9335-d4839ca536ff", + "6bfb3e6b-c3a4-4e87-9ec7-7c56f2072eeb", + "6c0a656c-be06-4533-9ceb-0f2db6da234b", + "6c3f844b-76d4-4850-959e-c42da5d62d62", + "6c4b9daa-8944-47c0-9659-9b4418a16d10", + "6c8ecfe2-c111-4ecb-b77f-02696fc99aaa", + "6c9f002e-48e4-4969-ab33-2a383e697a2b", + "6cc627c8-6e63-4066-8a42-73bd4660a28c", + "6cd23130-3e16-4a9e-b914-d5ce8f0c9e1b", + "6ce04115-062b-4fd9-9ef9-e5a4ffeb033d", + "6d585815-39e0-4374-8425-a58e99e84355", + "6d79373d-b62d-46da-9779-25d4f4ead163", + "6dc6bb4a-2466-4b66-9ad7-280aa553f6d8", + "6e1ec8d2-a21a-4f14-b557-1f8b7d334b97", + "6e1fdbcc-0f5c-43a0-a2ff-a789b6e92455", + "6e2c4431-3817-4a11-b048-550727ab6b14", + "6e9aaf29-6800-4bb5-ad09-0f7ecac5073b", + "6eee07ff-cb60-477c-87b9-9dea6a9a4db1", + "6f019912-237f-4262-9da4-2eaa029f983a", + "6ffeff42-1c3d-48bf-b5a7-b3420669e7ac", + "709ac4bc-a81c-407b-8c0d-ad382dfe2a1c", + "70e686ae-342c-40ae-b5e0-d3b148923d85", + "713ee024-f818-4148-aaef-adf9cb35d704", + "714c2a42-c414-46a8-9f06-af81f990e5de", + "716e0b36-3128-42bb-af98-f618a9195f61", + "7178826b-a041-42a7-b3bd-e36c8372d34d", + "717f94d2-2ced-4e10-b4e6-676bcb78ef25", + "71c51cf9-e528-421d-bc67-9892ccb75631", + "71fc7a51-830b-4970-b89a-e8ae050d9b9c", + "7256a0ff-a101-40a6-97d6-82e68e8d51c8", + "7259ccde-d605-4afa-81d7-1ba765cd0eec", + "7267409e-9e6a-49fa-bb35-ca19156d07db", + "728d85d1-a0f3-4c1c-a48e-8d6b53ac5aec", + "729e169d-0a0a-477f-b844-6aef3ba53a0a", + "72b83f02-f9db-4ff2-b00f-7d749eb80df0", + "72ce7dbe-cb29-47c0-92b8-f79a422186a8", + "72dac570-b429-4814-9049-b0af34f5a25e", + "7394296a-c2cd-44a5-b223-525e10d33ca4", + "73bfbcac-377e-47c2-8deb-60dbe289cf35", + "73f11ba2-6878-4654-a9b0-f657a3a2d91c", + "74608e45-9119-4d03-968d-4ace20560a9c", + "747f5e58-cc63-45c0-8a8f-b79477476711", + "74d29d78-8de1-43ca-8d55-345ff6548f0a", + "75982451-d25f-41ef-bf0b-24a98feb8ab1", + "75a4bb66-837b-42e5-b077-18254c6e4794", + "75c9653a-8a18-44fc-a207-9622afa6a239", + "75de7441-2798-41d5-b433-4dc3e48e9193", + "7605ea9e-c79e-4d76-95c9-692276e97fa9", + "7622cba9-6085-4554-9cc1-d4114c4ced9b", + "7676155d-4fb4-4c29-ac11-d772aff9e7ba", + "76f3c200-8c07-4e06-ae47-ddac37a194d8", + "773169e1-b25e-403a-961f-fffcfcfb233d", + "779094df-cd8d-4b9d-a782-2a28fb495a1d", + "779eecf7-d7e8-4776-8144-4a89a007b45c", + "7844e0f6-86e8-4a23-aea5-d00e86dc2165", + "7846b522-fb97-4ca0-a018-3151f7db4d8c", + "78556354-a2b8-4f54-9db0-ccd2bc9d65cc", + "78617a6e-f7c1-4f57-854f-79fe7265eaf2", + "78bb1640-fbce-4531-9fbc-277622e8f085", + "78cad2a1-856d-4b45-a647-f6533cea8280", + "78e00e8d-1ae9-4fff-b98d-b5a75603e0cf", + "79347c2f-79a1-4b1a-87e7-4109ea2ea17a", + "796c4802-45af-460d-afb0-1b02d04b524d", + "7a36add6-9ea7-4a4d-a755-caf8ac83f3fb", + "7a3ade82-a062-4fa3-a27d-976b8455f4bc", + "7a70eb58-1285-48e7-be9f-75b90d354cfa", + "7bf890f1-1b98-49cf-b8ec-13ef2b654b35", + "7c09602a-952d-4648-aeaf-af4323667614", + "7c27116f-938d-4fe6-8e2c-3665c9056c2f", + "7c657345-ec7d-4b61-9d2b-c8fae0e2a75c", + "7c69a76c-00fa-4fe4-b6b6-1c79e8c27c06", + "7cb820e9-7d7d-436a-8716-98006f969474", + "7cc8afea-5673-4804-b42a-cfc2cab4e4dd", + "7cd50851-109b-4dba-a63d-f36ae4571c7b", + "7cee2df5-dddb-49d3-aa33-f6eff02ce9df", + "7d0144b6-c6b5-4964-a41b-66d18962cbbf", + "7d021fcc-4a12-4c61-a4d5-003fce2c56d5", + "7d33a32b-e3bd-44f3-961c-a8adfc4cd7f1", + "7dd4e1b8-921f-47fc-b6a3-1d3ebd1e8f7b", + "7defd9f1-24fb-4a9c-a71b-aa11a31de024", + "7e2a6f47-5390-4826-ba8a-9fc6f5a62c2a", + "7e57c388-3094-44b3-9b8f-bdf51a29ae16", + "7e9ae3d0-450a-4439-becb-1fe78312986a", + "7eea97e3-f808-4d16-a0d4-a83d5515eda3", + "7ef53cab-e268-4ed5-8b02-84ac874e5c9b", + "7f2a848d-b987-418d-9d12-224c996849fa", + "7f360625-812a-4ad6-abfe-75d16775421f", + "7ffdc0b3-3763-462a-8106-4893c7c0d22c", + "8016afa2-1f0c-4ba4-ad49-c828c79ed495", + "80599a62-a3f3-460d-b38b-295c4d546a45", + "807c96b2-fb80-4528-b857-1b38fbc33770", + "80973bf4-4795-4c5a-8000-135545f73b7e", + "80cc593a-83bb-4836-bac5-1319b98eab06", + "80e80cd4-3950-4bcc-b23a-fb1ba8d86fd8", + "80fccf92-ba60-4146-8b11-d2b4c8e3c46c", + "8106dbae-f1db-4727-87d2-de31b8ce77c5", + "81319dea-6ef7-4b8b-9c07-cfbffedefb46", + "817ef94b-5b1c-4410-9a93-b97264039ef0", + "8193e58f-876b-4b2c-a859-224a2c4ad0fb", + "81a11b09-a5ae-48e1-8b1f-d4f50193c479", + "81b9627d-2ca7-436d-9bec-9bc9e28909ec", + "81e63730-fb39-42e9-9a9f-cbd7f16d4cc0", + "82145116-888a-450f-8bdb-209b3166a3ea", + "822079af-eadb-4a6c-9ad2-c2291c6394eb", + "822c2dd5-fd71-471a-a605-e64d2f129084", + "825a8905-2809-49da-92ba-b29195112770", + "826a053d-e669-444c-b3df-f83fde403bce", + "82a04ccb-b52e-4103-af2a-31c239bb08c5", + "82a1ef46-16a8-45d0-a323-09fc3565f631", + "82d0fdd9-53d5-43fa-b3a9-eaa3f0d4b515", + "82dc722f-31b0-4ece-a9c0-7771d0b8d9ed", + "837e31d5-059e-42d8-95d5-d6e44cba3cc3", + "84097720-5a36-47e0-ab03-17a35b408e9f", + "84a88f0c-aca9-45d8-b6e2-5480afe8a953", + "84b55dc9-5275-4726-a8e4-1469dd70e0a4", + "84b7a1b3-9163-43e6-8cca-54bf799c77aa", + "84ce68ab-a62b-4f16-9b69-17308c98db57", + "84d756f6-678b-4b55-8ad9-1fe7cd4deae1", + "84e4f1f0-d242-42c0-b61d-9fb9df2b1ce5", + "84f1edaf-1df6-4dc8-b6de-385b2b41c7a4", + "8576a196-c4c1-4fc5-9690-b629670f43ac", + "85845d83-b950-4703-8621-39d0fa284cbc", + "86045e04-6bcd-4dea-8037-2acf0463393f", + "86430d28-a635-4445-af97-a89ca56eeeef", + "8649dc07-a676-4793-a227-2571cad766e6", + "8674a30c-1e37-42aa-baaf-57134eb6affb", + "86836057-6be1-4568-9772-149c6076dc64", + "8689d015-4cf9-496b-a88c-92bfca09d18f", + "8780c40f-5312-4c30-b613-38f5e2fa7f31", + "87e4c2ba-11e3-468d-8413-22fdc3004fc6", + "87fb6a75-e736-47a3-84b3-aeaf1bcb2c4b", + "8815cfd5-bb37-4152-a03d-b704c81c43ea", + "8844948b-8d46-446e-99ec-8c6ff33bd1a8", + "884f9bf0-c2e9-42f7-9591-d250037b5a23", + "88b02618-596a-4cd5-b4c2-318ff1860ae0", + "88c01412-db7a-4df1-afe2-df5a5842c719", + "8906a723-0b17-4c27-ab13-abd43b89de14", + "8946e62d-c55d-459c-a48b-cd3e9a81153c", + "894e8858-5cb7-45e8-af46-1baa17f3e884", + "897f99a9-98f6-48dc-ae00-7c2e2db0f3e2", + "8a2c8769-bfd9-4336-ab9d-f8475a47aa2b", + "8a341870-01aa-421a-b858-5f26020c8c4d", + "8a923fdc-0dae-4e35-b542-1a819a672b2e", + "8a9c22f0-c6a0-419d-b0d3-b262853d5290", + "8aa854c3-1bfa-45f6-8efb-9ace27256dd0", + "8ac84372-e47e-46b1-95bd-e3ae850c8820", + "8adfdec4-5093-490d-a6d4-0f608c11c30e", + "8b1dcdf1-abc2-4266-947d-7e86565ad7f3", + "8b23337f-01ec-489a-912d-d8166317da28", + "8b7e6e15-a7a5-4c2e-89ce-269f465ec1d1", + "8c3f3c78-dbbd-4c40-b843-96b9ca94813e", + "8c706ca8-f631-4147-a1ac-3fd6105fab61", + "8da45b9f-7e7c-432a-80aa-e57558887704", + "8dee7567-1e56-4231-87bd-ada9699826bf", + "8e7b5c25-099a-49d3-87e2-88ca1257e0d8", + "8e82f178-f848-4b4a-93f0-397894c69026", + "8e83d76f-9c09-4c38-ac6a-c803a9680f1e", + "8e88ff5b-2a6f-47d6-bc79-d69b8ad9377e", + "8ef41e9a-87bf-4d7d-af07-d0d2631b49b7", + "8f4198fa-90d8-457b-b191-d2ce93b6e38f", + "8f8c31b5-c9fb-4fd9-85f1-cff2b4cf72f7", + "8fa195b2-8243-4cad-8a4a-c77a7f8ac9d5", + "900bf85d-f088-42ed-af45-8bee5174311b", + "9038d183-df61-49a3-befb-8883bf4c6a00", + "90499d34-49f0-419a-9d99-02d6e05ddcca", + "9052f482-b888-48d1-87a4-ae2f61220d02", + "90799147-85f8-440c-a39a-852fe2ce9701", + "90947422-a911-4413-bf79-1fc68b6f9619", + "90a05e35-1af0-476e-8076-7ef746f47c6b", + "90a7b739-ebf9-41a7-94f7-8bc2cb0d994a", + "90b7af2f-7e61-4216-a963-ebec5cd2eef2", + "90c2f3e7-a4ab-4b3c-acd6-0fe01958bba7", + "90c4bb57-226b-4b1c-a076-bd1eca9b8879", + "90e0ec52-2f5f-4edf-b315-cd52e999583d", + "90f99f84-c451-494a-997e-6976550cad62", + "9102bcbe-6597-418e-9e06-aa6288e5e100", + "919e9ce2-a163-453f-9f76-040c2d6798fd", + "91a17880-28da-453f-b0b4-903e4ffd0631", + "91d0f84e-f1ec-483f-989d-107f5e7dfc0d", + "92159a8a-1b06-4fbd-b147-3589f038061f", + "9276e6c7-938a-4339-8eff-6e652f3a79c1", + "928fd150-6124-4d39-9889-6fe44872e8c3", + "92a54e0a-dda5-41a4-8138-2c093a5d1693", + "92a87177-ad9e-4c33-abd1-f598462dca01", + "92c27438-e26d-4bb5-96e9-499e0efc1447", + "9308236c-7ae4-4137-b9d3-a80cfa1acc09", + "930cc5cf-7146-4f08-a987-aaa779989cd4", + "93a958d7-4ed6-47d0-9d7a-f91a04482567", + "95145e2c-9477-4bf4-be30-cda4e7974d27", + "9550996a-8edd-431f-b41e-3f8da9816a5b", + "9551107e-5795-4bce-8aaf-98e8c2587ca8", + "95586545-9a8f-4e0e-8a8b-f94068adf115", + "955aad3d-283a-4f18-86f6-1fc73b1eae03", + "9565b398-1de7-44ec-90fe-8c51fb94e4a5", + "9594d9e4-9dfb-4624-a9af-025a8a27134d", + "95bc6352-7756-4037-a89b-3e49d90802ba", + "962207fd-b1d8-4662-aada-2b6df0fbc065", + "96249d2e-8c94-43a0-bc49-f214ad44a7ed", + "96807cd8-dcf1-42d8-9106-38e0f4d84628", + "96be83bf-632e-4fa4-b05a-c9fbc6435823", + "96f3aeed-cc86-456f-8e14-c5bd69a94d32", + "96f576d0-b9df-4281-ac47-cba62cd0a432", + "96fc3216-a996-486a-8361-272a78ae989e", + "97289214-c023-4c06-9a5c-b6b1e58ae343", + "9769a9e2-c4f0-4f48-9fce-8c4fb1bb85df", + "97b9edd5-07ca-46a7-b24a-bcef4125db46", + "97d3b836-0d7d-4930-ba52-09961aa35ece", + "98160740-bf18-4851-96a7-e703344932c5", + "9890b588-3cb6-4f9a-a7cc-ca79ae90fb79", + "98ca43b6-7688-4ff5-bb16-65b7704add9c", + "98cee863-78c8-4407-97f4-1137b298dedd", + "993f1d83-f4a5-4c99-bc70-5394c5a63ce8", + "99466d6b-8715-46b5-8d0c-dffedcb2e4c1", + "99724e22-9dc0-460e-87dc-8096cb319a81", + "99d5cade-77ae-45cf-a374-31de1a232c83", + "99f000fc-2289-4b9e-850e-db2f6df0be76", + "9a541bee-6ec7-4bc0-99f1-3fe58e7deb01", + "9a72ceeb-4bab-4365-88e0-1eb96920e7a8", + "9ae5998f-0019-4e68-b402-9e687cc8ce39", + "9b4901f5-bac2-4412-ac25-cf1c18836ede", + "9be6444e-e860-417e-96bf-eaa20d558263", + "9be91f4e-d314-49a2-b434-2cc57e67d34e", + "9c27230c-71fe-49ca-8c53-a27ebcc00889", + "9c4801b8-0b9f-4df1-8728-fa586c268d5c", + "9c7fb76a-581b-418c-b5c9-cd578227a198", + "9c976444-812b-4e99-ae59-3da07969cac5", + "9ceba5c7-90e9-4982-af76-31a5b5b2ffb8", + "9d04c74b-3042-43f5-9ff0-cbce5fb38a53", + "9d177233-ec01-46d6-a34f-7a1b8ce32969", + "9d503c73-712f-4bb1-8385-b29cffcfd375", + "9df0bd9c-a19d-46a5-98c9-0fc149505699", + "9df822c1-657c-443a-bf28-ae90e2f67da1", + "9e5692c0-a357-4c6d-9774-f0eaabaee6e1", + "9e586bfa-c84b-494f-9f4d-66b7d68182ec", + "9e7fab25-f94d-437b-b7c1-bacd5c12150e", + "9f63c967-0fff-4207-952d-fa3af374fc66", + "9fa16d67-ce43-454b-9a69-239de11cc488", + "9fbfd0aa-fd1b-4925-9938-179a5186c3cb", + "a00818ae-596e-4f9f-8b3b-3304e5c0b668", + "a01e2502-3a64-4dd8-93c2-f886ea6ff16b", + "a03ad6dd-4982-405c-b955-6e16432a1153", + "a058fed5-9651-4fc3-b588-fa2861a634af", + "a0cfd7eb-daeb-48c6-b804-e024575c4545", + "a0d01951-defd-425c-99ce-ccb10f298462", + "a113b4aa-b818-409c-852e-63935c0135a7", + "a12c7c2e-6fa1-4a6d-8054-1735e4c8c1ea", + "a1ed8d6b-9e69-4033-a5f8-7d8c83786227", + "a24bd8eb-17e5-4b4b-84c1-c52dac519aea", + "a2ff7e1e-da9c-4e94-a3b7-711f011cead1", + "a3643b5f-1cfe-4736-bcd8-fb48afe4201b", + "a3783db6-471a-4787-979f-9ef0893004fc", + "a39493ae-b0b2-4d89-963d-eb2fefa9a378", + "a3d2c86e-9ae2-46f3-a223-1260f4df9922", + "a3d5aea6-c1ab-462a-b19f-b8058381f5df", + "a3e744d8-4a9d-49f7-bccd-fa73d8f634d0", + "a4219bba-ae8b-4f4f-aaf7-fb3922c16db2", + "a42cc5d3-30e6-46e1-8d5b-1d72bdd91902", + "a43f2cca-b7e3-4564-9d03-2947b34ce51b", + "a46d41af-fce6-43ee-9287-031b06e757b9", + "a4c10ca1-3932-4180-979d-0e161452c0b1", + "a4cae512-f8a2-4a36-9453-b5f5390a674d", + "a527301c-6cda-44ba-9322-42a35d91b8a7", + "a54ce093-dc2d-40b1-89c5-c201ee59ce95", + "a5af12c2-76ee-4f7f-a02c-6716fe5e90bb", + "a5d8217b-5a49-406e-a83a-21fb11099b8e", + "a5e950d5-76f4-45e3-a2fc-9d8e5c5149cc", + "a5ef71d1-67e5-4bde-bb9a-b320ed812fbe", + "a6137076-b5ac-4e4c-8ee5-c7c87506d213", + "a6702086-da1b-4040-b7f3-f61c8bc8b97d", + "a6c2fe3d-c20c-4f0c-ad8b-7f01fffe86cd", + "a7072da8-3382-43ac-bac8-7b5b6362d2e6", + "a74953d0-5406-4572-b28b-2fa3a4cc3c58", + "a749730e-2658-4b09-933c-78d45a59329e", + "a79dd5c8-3566-4702-8693-988bcb76875d", + "a7bd6eec-b131-460a-bbc6-ec9f3231a6a2", + "a7bf4e7f-88de-4880-9418-742998e3a21b", + "a7ea8beb-e5fb-4e3d-91db-a50312a8efa0", + "a80aff38-8302-4373-bb06-befe339e930d", + "a81d4844-6791-4fd9-a744-9d34eab9a6da", + "a81dbbc2-1335-42dc-b661-31457386bda2", + "a83a32a6-1f1b-46f1-aeab-312d4b2aa374", + "a84d46b9-3fa9-4121-ab7e-19184d15e9b8", + "a8752596-c9c3-4b28-83d1-cb53c2a8db48", + "a8c84b9b-e4a1-4f5c-a24a-e2bb9f77e0f4", + "a9609ce5-a159-4e71-ab5b-49b96f325077", + "a96317ea-0530-452a-8e31-9d8926bafe6b", + "a974c769-da57-48fd-a541-7442e8798ace", + "a9a32930-b13b-45c8-9cac-0f1a9dcd78a7", + "a9fe672f-4204-4360-832e-67620331c47a", + "aa54b6b4-a702-4f55-86ce-e8b82a343e4e", + "aa71773c-01c2-45b8-b8b5-1a5b36542895", + "aaa8974b-9dc7-4504-b59e-270342d8a12b", + "aaf14d8d-4a47-4320-9e89-ed5c2116bde7", + "ab012400-f0c7-455a-8ceb-82841019bb80", + "ab0f3398-b7d4-491b-8133-4b2e8fe5fb0a", + "ab78a084-8495-4def-b686-be8984c7e5eb", + "ab9c49b4-5ba3-4e6a-9d7e-7a5a99bb4efb", + "aba1ade9-deac-4d54-94b4-86ee7eb15cc7", + "ac953e67-6343-4622-917e-492f37ce9dee", + "acd886a4-f48c-4d68-bcc3-2e72f87ed9c7", + "acfc3e42-2898-4ceb-accd-3ab52485add7", + "ad307553-7efa-409e-a8cc-0908e5276ed9", + "ad3de8e8-6fcb-4d30-9fdc-c2fd7d80f06b", + "ad870dc0-3256-4f75-b624-a930b69a29aa", + "ad958b81-6ae4-497e-b14e-cbfe011234aa", + "adb017a1-2c2d-4c49-b5f9-abb79808eb13", + "ae4b4d92-5a4b-4607-8f06-b8554487b9aa", + "ae8ed85e-3b8a-4911-bc50-197887abaf90", + "aed31c1c-cf14-488e-b96d-9af73b99044e", + "aed443b4-ae99-4ead-8b54-3c558654abdf", + "aef2b7f1-2d56-44cc-9e12-da50c1391da2", + "af319651-c1b3-450c-a62e-cbb83382a3f5", + "afebb56b-4601-4d78-b9a8-43c8cec665c5", + "b006b40f-7e6f-4e58-8cb0-cac8fc3df631", + "b058dca1-11f6-492d-bb2b-e19ce5e76e23", + "b05aa5fc-60b1-4504-8d51-a6dc7e3b6a9a", + "b0873b3f-abcf-453a-8e17-cdf5b14f9dae", + "b08c876a-2d29-452e-b888-ce2632098ac7", + "b0e8bfee-88b1-44eb-8642-1418b41596e3", + "b0f899c4-cdf1-4885-8852-f43eadd3a950", + "b109d725-0803-40f0-b56f-093fceaf1358", + "b113de62-02d5-45a1-a70e-8ca82a0ae631", + "b17942f1-f420-4202-8193-02878b3304e5", + "b1ac4f9b-aaf5-4733-88d7-5bf54a167b1f", + "b1b216b6-2d7c-4b0c-82b3-4851f952275f", + "b1f7d2aa-c949-496e-a72d-a58bf5ec6182", + "b2462d95-ea2a-4236-a9e8-bb271e9ea159", + "b2afb1c8-c323-4492-b722-74ba5442235f", + "b2d08922-40b2-48e5-b00f-2ed903b7d4d9", + "b2d694e0-35f9-4cf6-a66a-2dc5f60cd559", + "b2e9c9e0-da9a-4259-a6b8-176be992473a", + "b2edd90c-74c7-48c8-ab0a-8087071fba1e", + "b33723ef-13a5-49ba-8aac-f5f73bd2e227", + "b33b87dd-2122-411a-8885-3d0cb3fab88c", + "b364efb4-c26d-4219-91de-066b2a5bead0", + "b3ed8da9-eeaa-4ac4-914d-59400953f0c1", + "b3f3d320-211a-4b66-835b-22b9b5f36173", + "b4006bdb-5685-408f-992a-1ca2e257ce61", + "b483e3da-b38e-474c-88de-2f1625319d3b", + "b53a472e-a47d-4377-aa6d-274c66a1429b", + "b5493020-18f9-4116-9518-ee9174cb0c83", + "b586c110-fc9e-4645-9a5a-c02a1e19449e", + "b5b0646c-c57f-4b3a-a854-a0494deb23ba", + "b5c57e1f-8847-455b-8074-38dd06e6ab5c", + "b6177f82-aa0c-44d2-a857-6efe2329e407", + "b66911de-c416-4d58-b5b4-1c54f479cba9", + "b6a2db27-9e13-432a-9afb-9a2bc49fb328", + "b7662d01-5bd7-4212-9ece-30fee5c949f7", + "b782b049-bb05-40a1-8835-6b1948ea9e05", + "b7adbee3-7396-4f69-8008-48a081883897", + "b80c3cb5-08ae-47c1-a199-3ad639c42c49", + "b87cbc6e-5f99-454a-8888-6a5e2927809a", + "b87d49cd-63e2-48d5-9bd2-73cb83db6735", + "b8ba082b-249c-4149-ba2c-ba2c1ecc501a", + "b8be7b53-1496-4e88-bd44-20c122e52793", + "b8c52004-ec18-440c-a0f2-ab0eb818f145", + "b8fe4afa-f7ca-4a6c-9720-115bf82f6deb", + "b90b39b4-739c-4619-b769-4fcbe69730b7", + "b90e04b9-62b4-4cac-994c-0f337218e1e4", + "b940155e-c038-4127-b6ab-2a5770dd1e5b", + "b9504d84-bd0b-464a-9365-313571898cb2", + "b9546cdb-f6b6-4a74-9701-23710631e0c5", + "b972bdbc-374f-4393-b537-f4ed6581564e", + "b9b4667c-5f06-4edf-a0b7-1b91dda61b19", + "b9e9490d-6ed9-4711-8334-7ad79c7a131a", + "b9f2651e-9539-4741-9d53-83b0927b40f5", + "ba0c1976-a9f5-437e-a28a-97b52267e2e1", + "ba355476-6a8d-483d-ba86-40245b28058f", + "ba40ff6e-f655-4787-87ad-08b10055caeb", + "bac9d757-414a-4d9d-bb53-a81a7e89c262", + "bacc90c2-09ec-4d86-a7d2-cdd6b2f53eb5", + "bb301f0b-608d-4dca-a1f9-d0e71a767f32", + "bb7b89f2-a226-4fac-82b8-0d8b589b9626", + "bb90c9e3-aefb-4293-85d2-9692a0dbb66e", + "bc00cbbb-21be-46e2-af8a-0ac0ec7624f0", + "bc09677b-81ba-4f7a-8a82-13bbfaee699c", + "bc7c47e4-43b0-4445-a27d-ecf8a0fd8555", + "bccf76d7-e153-4711-890d-c6808e94f882", + "bceae63e-b310-4fff-bde1-d3bfae1a01cd", + "bcfa0355-3e0b-4e41-b98b-e023ca86d43c", + "bd2481f9-57fd-44ac-9299-89303b57fb47", + "bd257850-b8d8-4419-8205-cb4011982da1", + "bd99e7da-24fc-49b2-85be-a73f22c73e6e", + "bdafac91-0d75-4344-a04a-0f34d132b2c7", + "bdb79c32-c77a-43f2-948d-eefb2a6b77e3", + "bdb842a3-8341-4c62-be64-58a684403df3", + "bdc04d22-423c-4504-b7f1-d56927a886e1", + "be18435e-0c08-443b-b05a-37202e55ddf5", + "be379dfc-6a2b-4150-9d85-0e51a6090159", + "becda35c-d05f-46de-9493-aba283bba66f", + "beddf08c-bdf9-4130-9336-2ecbc6eeb30c", + "bee6b5ae-7726-4cec-8a6e-0bf85c86ce4a", + "beebada8-c735-434b-a5ed-ed5aa524466e", + "bf14fc17-5226-4605-bad2-886f6a337573", + "bf365b99-5f72-4a45-aabd-5fd3f4fd4d02", + "bfc7ed87-c0d9-453d-8c13-75f47fba48a7", + "bfce66bf-0235-4d99-9bae-ae88ab01720a", + "bff60e93-88c7-40a2-912a-62a0cff1e530", + "c02b7e6b-caf6-4e76-b6cc-07f09475b195", + "c07fd453-85e4-481b-bab0-f90847d86c0b", + "c09c309b-d146-4411-a5a1-d757f77c3d83", + "c0b83aab-2117-41f3-ba47-7fdf7c7635d3", + "c0b97e72-f59b-4909-92ef-eddb648a7338", + "c0e2c7c2-eac9-4762-8486-00d2e70b3314", + "c10f2a54-e1a4-4b45-9c06-487e1e771962", + "c14be38e-e882-4bca-b6dd-d95ffb38112d", + "c2214d7c-e032-4f5d-a169-a6f3e6691fea", + "c226c652-3c64-4119-a34e-304d433522a6", + "c2649f6a-ef1a-4cef-8644-32e520988953", + "c2a4cb57-0585-4dc9-a4af-178c82b99df8", + "c2e27c96-c6de-452e-9267-3b14f3071fe3", + "c3301a50-127d-40ea-bdf8-7fe6163b3f64", + "c34fb56c-9190-4e93-8e75-c322dbb563ae", + "c3525305-b28b-4f2b-af73-9e8ae52e2221", + "c399bd92-537e-4acf-a4eb-6eb1d98346e2", + "c3aede75-9013-463c-9561-f27c66b5354d", + "c442d966-32f3-4e50-b0c7-a0ea006f2aa6", + "c4b8e879-fd79-48de-bbd2-b06f3d50a196", + "c57b25ce-ef5d-4d33-a32e-ace50573f601", + "c5a72602-25c5-475b-a8ed-b983d1da5219", + "c5df588a-4c58-479b-b258-2de50988e150", + "c67211ad-6a9f-4f34-af12-0fa39a9d32c1", + "c67cab9e-ba4e-4067-96a5-cb63666cdca2", + "c690bb40-d309-4c46-aed1-23fc0aa9f6a5", + "c6a1ee8d-9155-453f-b5c5-54aceb97ad93", + "c71739a9-02bb-43cb-b6b7-1f9603ce71ef", + "c736a87e-aa88-4ba0-9a96-5ddc35ebe6e9", + "c75bcdbe-e377-4199-a677-d4076e54268b", + "c76ef227-027b-4c81-ba42-58cbd24ef8b8", + "c7948d23-36a0-422d-89ea-f005b433cda0", + "c79c1907-c40c-4e89-bb42-3b005f603d72", + "c8aa381e-b0e7-4f14-a1dc-466080715c25", + "c8da897b-2bf4-42e6-a044-699d883a443b", + "c8ea51e9-50bd-4ec3-86f3-18fd4abfd380", + "c9242275-42e8-4734-93d2-4075527c52c6", + "c9867121-f0ad-4d25-9315-e88821ba7bf3", + "ca77908e-4a83-411a-b7cb-1f9f30527889", + "ca827347-7f2f-4689-b48e-b2df0c968ec2", + "cabd37d5-d786-48f8-8e33-173b32e574e4", + "caedddce-5902-4c38-865b-bba97070acfd", + "cb0b9981-0578-4c5c-9af3-fe55b6d9997a", + "cb38f06a-3a65-4ea2-aec6-0a3f01eb1792", + "cb518f47-d383-4fe5-b3cd-635c8e6a2939", + "cb743ab0-6d3b-42a7-b9e7-c143e3984d70", + "cbd52b23-8a09-4448-8a4d-443377a6821a", + "cc144adb-dfbf-4f73-8d7b-2244be6c9037", + "cc82eb9f-fd4a-446d-8412-ed4418edb0a4", + "cd0b3588-f057-4bdb-8cdc-943ab2dcc0c2", + "cd0d67ef-483f-4251-b324-a1464f0a5caf", + "cd73e5fc-bb05-401a-80f4-5d97c58463ab", + "cdd11f1a-fec4-451d-8290-ff7780ec53c2", + "cde042e7-72a2-4f9c-8602-3f57fc4b19e3", + "ce1e8fe4-ed69-4919-a213-24a56f67c4a0", + "ceb95454-91d0-4260-af0b-9615927f20c1", + "ced01c22-0398-46a9-b35c-199df27caad0", + "cf825f05-9826-4d89-ad6e-9195c7a5a6f0", + "cf8f0457-5f48-4848-b1d6-dc3317de9fbe", + "d05a22d8-3454-48d3-b9d2-1e3c212cb5e0", + "d0947ac3-0730-488e-94eb-e67d8ad49e46", + "d0fb4bf3-afba-4491-a5f0-15793545c583", + "d11e3134-d9eb-415f-923a-2f75ef246315", + "d141e5b5-2144-46b2-a04b-e1b0512e220f", + "d181cca2-826d-46ae-944e-cf873da41790", + "d19a3712-7784-42cb-9f49-d1b984c57e77", + "d19a9fcd-f9c4-4c1e-b86f-7c659d2eb2ce", + "d1b2e5cf-dde2-45c7-b598-a2acde491b47", + "d1c5b5fb-9ec8-41c4-acc8-aa3f94fdf3f7", + "d1ec2af8-064d-4cb2-af5c-1ec577454f94", + "d20e87dc-110f-4300-8ad3-189d3f43a8e5", + "d244829b-dacf-4eef-bfbc-7eccfe9c99d9", + "d30b82d8-fed0-4850-9cc9-1ee98b0beed6", + "d338efe5-0965-40f5-9738-b0c6772ad347", + "d3729625-6b9e-4e09-8473-6e80e71f0cc6", + "d3ed6a69-c878-4b83-874d-5561975e353d", + "d43e8168-6722-4704-85ef-cb0d1b828211", + "d4534ef4-00b8-4beb-a1ed-14996bbe4ade", + "d47d2cd2-ccf1-4ea2-9160-e7d7c3dbb45e", + "d4af55bf-a1c6-4f78-b62e-9ba2b1ac42f3", + "d4afe916-f0d0-4fc6-b937-ec58b1fc9cd2", + "d4e8fe10-3d0a-4f3f-81d5-bb0693021407", + "d5989759-471e-46c7-bcde-c217782a5d62", + "d5dc17c2-b997-42b6-8349-4a3397ee1788", + "d63f18e4-7d2c-4657-b1dd-2ee8b3abb3ac", + "d6701bf5-7687-4c32-a098-a83815bb9303", + "d6860afb-be7c-43f9-9b7c-db3f494f5e76", + "d68d7969-d9cb-47aa-a430-caafa1ee77c4", + "d6c963ae-6c2b-484e-afd0-6aa3116b804c", + "d76a676e-2b4d-4adf-b57c-08bb0322f2b1", + "d76df2df-84c8-4bde-a1fe-c55e4aa52bbd", + "d772cf84-d72d-4bdf-9661-48e094a4b60d", + "d7791f6e-00ae-49e0-8983-d8921887e3a4", + "d78797d2-a9f7-4f55-a5ed-c058d8c4c1c3", + "d7a74414-85c5-452c-9b6c-fa0e5587029b", + "d7c8655f-8382-4aa1-96d1-3d7c7d2502d8", + "d7d20e72-6b92-473b-aaa1-9cff198b956d", + "d80a3b97-f0b8-4a3a-b0aa-1a889beef77b", + "d81e2764-9108-455a-8e9d-b42ca1cc3ab8", + "d844fb6c-c2ae-4beb-8da2-e01a3bd4bd8e", + "d8664c7f-042e-4330-b009-d68c95dac3cf", + "d880e79a-dd2f-4e13-abba-d24842928b78", + "d8b602f6-77c5-43c8-aaff-07db32b73887", + "d8cd7f87-d138-4d9d-95fb-c011256f7fbf", + "d948b0ce-67a2-4862-89a4-ef6b492098da", + "d992026c-3d2f-4d47-934f-640ae2b442ce", + "d99c4022-edf3-402b-8ccd-e2d82930b26e", + "d9fa7ca5-e87f-49ed-bb5d-c4512da94028", + "da68dda8-df22-4f21-90cc-79b184766d0a", + "da834fa4-68a5-4db9-8ffe-eacfec8f9427", + "dadb2fbb-c253-4fc0-b560-ac8c0b95067b", + "dae9a793-a402-455d-ac09-d442455af480", + "db74f26d-f0d7-42e4-9691-b5d774cc5bb2", + "db92a416-e884-4581-b239-75eff07292c8", + "dbbc5d9c-f77a-4d80-9e9d-1c404d761237", + "dbda5827-9a55-49ca-b6c6-9c5023cff945", + "dc1217f8-04dd-4eb7-94cf-f39c225e84cf", + "dc75dfcf-35cb-495a-a596-c76a8f63f243", + "dc8a2798-1068-4096-a615-7cfb08d42ee5", + "dccf9d9f-0dd4-44e5-9f4c-0357e3d22a40", + "dd2ab3b1-0742-41b1-b208-72a4edc1096f", + "de6db0b2-f002-41ad-9d60-91eb57303125", + "de72a755-f529-48bb-ae8c-0511961b7e7e", + "de8195a4-f338-4846-b4af-1317e15ed154", + "de8e0c5c-e260-4abd-a46a-f8004546421b", + "de9f7325-2ddd-4107-8de7-f402ff98172c", + "df77a03c-0658-4deb-b59b-ad0fa3daf11c", + "df962ba1-e68f-4d35-bed4-c44f3eb3eea0", + "e02c5a28-986a-4af2-96fc-72d44fd04a5b", + "e0479f62-c0a7-4bd6-bb6b-8e8731e49807", + "e0798c7e-fe8d-4941-a1eb-cc9d290ba698", + "e07e55ea-6e58-4af1-9f3f-6f882b022036", + "e0911c7c-5831-4ab3-b393-64b1added846", + "e150a440-53bc-4b3e-bf9b-f4ce4d2b547d", + "e1960124-a190-43bd-b194-611c36bdeb29", + "e19dfeb3-5e1d-46db-aa7a-97c249a40e9d", + "e2060fcb-7c72-4b83-bba1-55ed5f311719", + "e222b757-f3f7-4445-ba21-0c1880ed0d31", + "e224bfe3-51e1-4114-8c54-94ae813bd74a", + "e22a19ff-37f8-4d34-8f1d-c0c8960a7495", + "e2529c91-4e67-4c0a-a1ee-4190674977ba", + "e29762d8-4c69-4bc6-b59a-2db0058270f6", + "e328f630-0f1d-49c1-8eae-0c71c0c92815", + "e35b502e-e2b0-4425-82ff-5fb25d1546cd", + "e37345d3-54cf-4ca9-904d-e8d8dc2764a0", + "e3b703a7-d8f3-4268-8b8a-d3f8aea1e8bd", + "e40563c3-99eb-4ff4-9b63-ecd3d0b778a4", + "e435bb11-f619-4ac2-8252-03adb0ccc13b", + "e46153e5-570f-445b-ace2-d205690a2df1", + "e4694324-4ba9-4615-b1be-ce19c9d9c504", + "e4bd0e2f-f0ef-4a42-8665-95b43c917296", + "e4d1f954-086e-4199-bf31-dfb4d267bd1f", + "e4d4cb20-b297-4bc8-83cb-f7fc99d72bad", + "e53b0593-c6a8-4606-8549-6771e048ffe9", + "e5761211-edba-441b-a8fd-a6aeaf8a7fea", + "e590a496-0d98-4305-82d2-01f439ad34d6", + "e5ac9d7d-3148-4036-8e7b-6ee7b5e2c672", + "e5c441cf-99a8-4d66-94f6-d96c69440b9f", + "e5dc3022-8f08-4f08-933a-8f6b04e7cce1", + "e5f5962e-1323-40c7-a89a-0bd9d2205253", + "e60a0e4a-e472-4397-a373-2b5e476330a9", + "e65eb295-c06d-4378-996c-e8f3adf43a38", + "e664133c-023c-4a21-aed3-a89da3840c34", + "e71d21e9-d65c-4b77-908c-1966deae97d0", + "e7af9370-d070-4a1a-a746-33b2596cdc93", + "e7baf7db-1661-4067-b5de-09a3f3faa6ec", + "e7d16ef0-7dbf-494b-81d7-a0c1b63df61f", + "e7e3afe3-50d3-4458-8e69-379827a8ab2e", + "e80e1ddd-3377-4ab9-a4ed-1886dcd8af44", + "e851ad91-6b76-4b16-b21e-ca8bc06b7efc", + "e85cd79e-af6c-44d4-9c95-78037975225e", + "e8a0896d-c301-479e-a545-fa24f49e0d88", + "e8c6b15d-32dd-4b5a-9789-3a1982b048ca", + "e8d4dfa9-c5fe-4f2c-93aa-18d13a90ae45", + "e8ebd93a-2f1d-463f-abf4-700b2c980692", + "e8ff4c45-6603-4595-9258-c88a6e4188f8", + "e91d954f-0884-4a28-9d86-cb3f63781e58", + "e92582fd-c7b9-4a71-8447-b30d9f2100a2", + "e9449ccc-f538-466f-ba5e-5cfddeea25f6", + "e98ada24-b233-4f42-ba41-dd30f474eaa7", + "e9b3f103-3fe1-44ab-8f12-414c226a484c", + "e9bd80ff-82c5-4c17-835d-652f218e8762", + "e9cecee8-2df4-4790-9f21-17274cb23e1f", + "e9ea2148-1a15-4ea0-838f-625a2d5ee0c1", + "ea47d1c2-afc2-471b-9c08-28bc76ab9bc4", + "ea5d70b7-06f4-4cce-a04a-55ebe03ef76d", + "eacf9788-571e-4b2e-9af0-bdc6a88be68f", + "eaec187b-a870-46f7-990e-b05583ea4076", + "eb1f8fd3-4cc3-4038-881b-255c09395850", + "eb415ee3-3f78-4809-810b-f37918549d56", + "ebf0e287-670b-4fd6-ac3b-178fbd53c644", + "ec1acdf2-78a7-46dc-b629-823caa9a04cf", + "ec2124c4-d62e-45a7-b270-fefc06d2db74", + "ec36614d-f6d7-4606-86e2-a03e982577e0", + "ec66215d-8770-452f-9ca5-28427140d34d", + "eca862bc-ab8c-470c-a997-c6bf35a4eff5", + "ecbbcfb0-6c80-42a0-81ef-fab4cfe3bbb0", + "eccf3587-ab80-4f8d-b231-b36444c9d85f", + "ecf94710-c9df-4afa-8506-7d017211d7fa", + "ed9f7f44-fa1f-4ee1-96b6-93aafefc3202", + "ee2082ce-6877-4dcd-bdee-c392e6830ef4", + "ee39224b-77e8-4d17-b8e8-c8d76a3722ae", + "ee505036-3c21-40f9-b895-ac4c28507309", + "ee9c9482-9b2e-4541-91ca-c12d25fdf240", + "eeda524e-e08e-4954-b1e8-894212b7a87d", + "ef3e8012-b311-4b23-ad71-3b7ade623102", + "ef4b317c-3416-4c6c-a980-3690211bf9a1", + "ef7243d3-252c-4f79-928c-82c73bf3ed91", + "ef847696-4000-4a1f-9156-187b486c5e9e", + "f026f2be-6229-4dac-944c-596b02e693ad", + "f0b90c24-2880-4511-b638-3b6529768f58", + "f0f7e779-b5a1-4c27-80ba-6cba91aa5305", + "f11e2b8a-19ec-43d3-aec3-dab270bfcaf2", + "f1338708-1c99-4e11-9e8d-0fd117c66e83", + "f156f30d-b08d-433b-99e5-f9f51d703c14", + "f16c67ff-2d2c-4527-a91a-ef003cf4eb05", + "f18141cf-d339-4240-a3a8-a1d09154f0a8", + "f1ad0a7b-12c4-477d-9a0c-16f6e5428c33", + "f1b32c63-d9af-4146-a957-5e868fff3e54", + "f1e531b6-45ea-4a74-aafa-bd390b2d83e3", + "f22d5b0c-c75c-4dbb-8db0-cd5f3b77bfc8", + "f280485e-51b0-4986-9c1b-e45746c573cf", + "f2be0710-9b9a-413c-a8d1-21578aded0ab", + "f2c4614a-a2b6-40f8-88c6-84b36b7841f9", + "f2cae539-ca17-4130-a300-cdad850cb254", + "f2e477db-4366-42c6-ace1-939b824ce2ac", + "f2f07023-6a9b-49d7-9e7d-2f963bd045c8", + "f30c44c4-fa20-4bc1-8356-b14d0dcdf8da", + "f322b45a-42b9-4aca-aee1-0e569b54231a", + "f32d955c-2b59-4ba9-910d-9fb4450b2c41", + "f36a1d13-e5b3-48cc-b488-9f71d55c98c9", + "f3981230-f843-497d-9f44-c0cd0bbf66a9", + "f3aa0622-6cf6-4ea7-9998-fdf5be71997f", + "f3da11d8-14d8-4f7a-81cd-844dc74be917", + "f3db92c2-bec7-42dc-9559-3499a1245c8c", + "f3df3650-f616-40ac-8208-e54b3e309b82", + "f3ea3432-5958-4da0-8b88-88433b831877", + "f3eca696-f6e5-4ef8-9572-19b241e51293", + "f41efaa0-c68d-4a71-863e-5967e9feb1eb", + "f4239653-a46f-4fb6-b70f-f44b48a9d270", + "f46290c4-9b6a-4304-8c3e-57703d556f01", + "f4656a7e-a880-4521-8d5c-bce8adb795f5", + "f46ea2af-962b-4f55-b34e-1eb64bb9b705", + "f4c19448-715d-40ea-b92c-6462772bdf89", + "f549fd64-fc55-4029-bf9f-64b186f01299", + "f579c1bb-1d4d-466c-a13c-f8fdf5f88974", + "f5d4c379-0c77-479f-b86d-3e675c0361be", + "f66b04fe-4247-4a30-8ddd-9d19c53c91b2", + "f717907a-0119-4b22-9c09-e3fa554e18fb", + "f7364aed-8735-49f9-b7df-3f936dd85435", + "f73921b8-5ea9-48f6-8c88-2d63f4faa05e", + "f7514ae1-79f8-46a4-8902-5e6b43277f95", + "f759b9fc-5fb2-4bb9-829b-cd193b240bc0", + "f75ac78a-3be4-4f63-a650-3ba65f1fa138", + "f80cafc4-6f0c-40b6-a773-7353edde3c88", + "f842011c-114e-4c2e-9998-b6ac0d017dca", + "f86b677f-0338-4e47-a484-aba9fe4927a8", + "f89b1b5f-1807-44a7-a434-db0fe437ee9f", + "f8edf775-fe03-482f-9493-9c9141105f5a", + "f96fc799-dc3a-4ef2-9d20-c806ef4ea694", + "f99eaead-55b4-41b2-82f4-e3ab0e43555c", + "fa29e5c8-db3b-43b3-9d51-c1c36c7c944c", + "fa5d826d-487b-4c90-b01b-9e2aa71523f8", + "fa99ef96-ff49-457c-af4d-29c3c01fd973", + "faaef33b-90eb-4f67-a836-d2b93ce0889e", + "fabc7808-1764-4a79-ad0c-c1a95f3058e7", + "fb1bb73c-e1b0-4170-8510-a244e016ce2f", + "fb2068f1-f10c-49f4-9593-57ec723d5543", + "fb43034b-ffb6-40c0-b2f2-078a3f10b308", + "fb60bb03-9246-4163-8e39-534ed903f954", + "fb61a4d7-0a83-4429-a7ba-59143016484e", + "fc06fb45-9efe-49ee-ac87-b61582f13edd", + "fc08b118-d11b-4c50-82d0-05751236c035", + "fc3b3308-6811-4799-beb1-bc3a0ee20c0e", + "fc3dcc25-2607-462c-a232-6d46a3bd9055", + "fc55a473-8b19-4cce-9325-ee56d5481b0a", + "fca0e578-ce80-4193-a0ba-c4e78d1f60a7", + "fcacbb35-ea8f-4781-9822-165eead2a3ed", + "fcc65c5e-874b-42ed-bba6-69dfe6c4571a", + "fcf4f2bb-3ec9-4b51-a0a6-12db6aa11472", + "fcf8dff7-a782-4208-9597-2b91fc04d18c", + "fd38f4ed-c749-4187-9650-8db39902551c", + "fd819411-41f2-49e8-af6d-afc8f8e73ebf", + "fd848475-d61b-4708-b855-6b2e7675638b", + "fdc4f628-08bf-4c13-836d-22fd57587371", + "fdc58e76-f293-48a9-ace3-3dbd4fa0e363", + "fe0bfe8c-685e-4d5b-8710-d35ecfa04161", + "fe1bb7fd-054d-4db7-bb9a-797113b1afe8", + "fe1e4214-23a5-4935-979f-2fe3ee42693c", + "fe38e111-5e1d-4d20-b2e6-f01985643dbe", + "fe3d318e-2c46-4dab-aab2-1e478e38a79f", + "fee1d24b-9bca-4ce6-8956-e82debf94a38", + "ff3b17ba-4a82-4263-81a8-8762e3f961bd" +] \ No newline at end of file diff --git a/labelapp/package.json b/labelapp/package.json index a1ed1bf..8db81d7 100644 --- a/labelapp/package.json +++ b/labelapp/package.json @@ -16,6 +16,7 @@ "sample": "bun run scripts/sample.ts", "assign": "bun run scripts/assign.ts", "export": "bun run scripts/export.ts", + "dump": "bun run scripts/dump-all.ts", "test": "bun test app/ lib/ && playwright test", "test:api": "bun test app/ lib/", "test:e2e": "playwright test", diff --git a/labelapp/scripts/dump-all.ts b/labelapp/scripts/dump-all.ts new file mode 100644 index 0000000..3de0891 --- /dev/null +++ b/labelapp/scripts/dump-all.ts @@ -0,0 +1,597 @@ +/** + * Comprehensive data dump from the labelapp database. + * + * Exports: + * data/gold/human-labels-raw.jsonl — every individual label with timing + * data/gold/paragraphs-holdout.jsonl — paragraph metadata for the 1,200 holdout + * data/gold/annotators.json — annotator profiles + onboarding timestamps + * data/gold/quiz-sessions.jsonl — all quiz attempts + * data/gold/metrics.json — comprehensive IRR: per-dimension alpha/kappa, pairwise matrices, per-category, per-stratum + */ + +process.env.DATABASE_URL ??= + "postgresql://sec_cybert:sec_cybert@10.1.10.10:5432/sec_cybert"; + +import { writeFile, mkdir } from "node:fs/promises"; +import { existsSync } from "node:fs"; +import { db } from "../db"; +import * as schema from "../db/schema"; +import { + cohensKappa, + krippendorffsAlpha, + agreementRate, + perCategoryAgreement, +} from "../lib/metrics"; + +const OUT_DIR = "/home/joey/Documents/sec-cyBERT/data/gold"; + +const CATEGORIES = [ + "Board Governance", + "Management Role", + "Risk Management Process", + "Third-Party Risk", + "Incident Disclosure", + "Strategy Integration", + "None/Other", +]; + +function toJSONL(records: object[]): string { + return records.map((r) => JSON.stringify(r)).join("\n") + "\n"; +} + +async function main() { + if (!existsSync(OUT_DIR)) await mkdir(OUT_DIR, { recursive: true }); + + // ── Load everything ── + console.log("Loading all data from database..."); + const [allLabels, allAnnotators, allParagraphs, allQuizSessions, allAdjudications] = + await Promise.all([ + db.select().from(schema.humanLabels), + db.select().from(schema.annotators), + db.select().from(schema.paragraphs), + db.select().from(schema.quizSessions), + db.select().from(schema.adjudications), + ]); + + const nonAdminAnnotators = allAnnotators.filter((a) => a.id !== "admin"); + const annotatorIds = nonAdminAnnotators.map((a) => a.id).sort(); + const annotatorNames = new Map(allAnnotators.map((a) => [a.id, a.displayName])); + + // Filter to non-admin labels only + const labels = allLabels.filter((l) => l.annotatorId !== "admin"); + + console.log(` ${labels.length} human labels (non-admin)`); + console.log(` ${allParagraphs.length} paragraphs`); + console.log(` ${nonAdminAnnotators.length} annotators`); + console.log(` ${allQuizSessions.length} quiz sessions`); + console.log(` ${allAdjudications.length} adjudications`); + + // ── 1. Raw labels JSONL ── + console.log("\nExporting raw labels..."); + const rawLabels = labels.map((l) => ({ + paragraphId: l.paragraphId, + annotatorId: l.annotatorId, + annotatorName: annotatorNames.get(l.annotatorId) ?? l.annotatorId, + contentCategory: l.contentCategory, + specificityLevel: l.specificityLevel, + notes: l.notes, + labeledAt: l.labeledAt?.toISOString() ?? null, + sessionId: l.sessionId, + durationMs: l.durationMs, + activeMs: l.activeMs, + })); + await writeFile(`${OUT_DIR}/human-labels-raw.jsonl`, toJSONL(rawLabels)); + console.log(` ${rawLabels.length} labels → human-labels-raw.jsonl`); + + // ── 2. Paragraph metadata JSONL ── + console.log("\nExporting paragraph metadata..."); + const paragraphRecords = allParagraphs.map((p) => ({ + id: p.id, + text: p.text, + wordCount: p.wordCount, + paragraphIndex: p.paragraphIndex, + companyName: p.companyName, + cik: p.cik, + ticker: p.ticker, + filingType: p.filingType, + filingDate: p.filingDate, + fiscalYear: p.fiscalYear, + accessionNumber: p.accessionNumber, + secItem: p.secItem, + stage1Category: p.stage1Category, + stage1Specificity: p.stage1Specificity, + stage1Method: p.stage1Method, + stage1Confidence: p.stage1Confidence, + })); + await writeFile(`${OUT_DIR}/paragraphs-holdout.jsonl`, toJSONL(paragraphRecords)); + console.log(` ${paragraphRecords.length} paragraphs → paragraphs-holdout.jsonl`); + + // ── 3. Annotators JSON ── + console.log("\nExporting annotator profiles..."); + const annotatorProfiles = nonAdminAnnotators.map((a) => ({ + id: a.id, + displayName: a.displayName, + onboardedAt: a.onboardedAt?.toISOString() ?? null, + })); + await writeFile(`${OUT_DIR}/annotators.json`, JSON.stringify(annotatorProfiles, null, 2)); + console.log(` ${annotatorProfiles.length} annotators → annotators.json`); + + // ── 4. Quiz sessions JSONL ── + console.log("\nExporting quiz sessions..."); + const quizRecords = allQuizSessions.map((q) => ({ + id: q.id, + annotatorId: q.annotatorId, + annotatorName: annotatorNames.get(q.annotatorId) ?? q.annotatorId, + startedAt: q.startedAt?.toISOString() ?? null, + completedAt: q.completedAt?.toISOString() ?? null, + passed: q.passed, + score: q.score, + totalQuestions: q.totalQuestions, + answers: q.answers, + })); + await writeFile(`${OUT_DIR}/quiz-sessions.jsonl`, toJSONL(quizRecords)); + console.log(` ${quizRecords.length} quiz sessions → quiz-sessions.jsonl`); + + // ── 5. Comprehensive metrics ── + console.log("\nComputing metrics..."); + + // Group labels by paragraph + const byParagraph = new Map(); + for (const label of labels) { + const group = byParagraph.get(label.paragraphId); + if (group) group.push(label); + else byParagraph.set(label.paragraphId, [label]); + } + + // Only paragraphs with 3+ labels + const fullyLabeled = new Map(); + for (const [pid, lbls] of byParagraph) { + if (lbls.length >= 3) fullyLabeled.set(pid, lbls); + } + + // Paragraphs with 2+ labels (for pairwise) + const multiLabeled = new Map(); + for (const [pid, lbls] of byParagraph) { + if (lbls.length >= 2) multiLabeled.set(pid, lbls); + } + + const multiLabeledParaIds = [...multiLabeled.keys()]; + + // ─── Per-annotator stats ─── + const perAnnotatorStats = annotatorIds.map((aid) => { + const myLabels = labels.filter((l) => l.annotatorId === aid); + const activeTimes = myLabels + .map((l) => l.activeMs) + .filter((t): t is number => t !== null); + const wallTimes = myLabels + .map((l) => l.durationMs) + .filter((t): t is number => t !== null); + return { + id: aid, + name: annotatorNames.get(aid) ?? aid, + labelCount: myLabels.length, + medianActiveMs: activeTimes.length > 0 ? median(activeTimes) : null, + meanActiveMs: activeTimes.length > 0 ? mean(activeTimes) : null, + medianDurationMs: wallTimes.length > 0 ? median(wallTimes) : null, + meanDurationMs: wallTimes.length > 0 ? mean(wallTimes) : null, + totalActiveMs: activeTimes.length > 0 ? sum(activeTimes) : null, + totalDurationMs: wallTimes.length > 0 ? sum(wallTimes) : null, + labelsWithActiveTime: activeTimes.length, + }; + }); + + // ─── Category consensus ─── + const categoryArrays: string[][] = []; + for (const lbls of fullyLabeled.values()) { + categoryArrays.push(lbls.map((l) => l.contentCategory)); + } + const categoryConsensusRate = agreementRate(categoryArrays); + + // ─── Specificity consensus ─── + const specArrays: string[][] = []; + for (const lbls of fullyLabeled.values()) { + specArrays.push(lbls.map((l) => String(l.specificityLevel))); + } + const specConsensusRate = agreementRate(specArrays); + + // ─── Both consensus ─── + const bothArrays: string[][] = []; + for (const lbls of fullyLabeled.values()) { + bothArrays.push( + lbls.map((l) => `${l.contentCategory}|${l.specificityLevel}`), + ); + } + const bothConsensusRate = agreementRate(bothArrays); + + // ─── Krippendorff's Alpha: category (nominal, use ordinal distance = 0/1) ─── + // We encode categories as integers for alpha computation + const catIndex = new Map(CATEGORIES.map((c, i) => [c, i + 1])); + + const categoryRatingsMatrix: (number | null)[][] = annotatorIds.map( + (annotatorId) => + multiLabeledParaIds.map((paraId) => { + const label = multiLabeled + .get(paraId) + ?.find((l) => l.annotatorId === annotatorId); + if (!label) return null; + return catIndex.get(label.contentCategory) ?? null; + }), + ); + + // Krippendorff's alpha for category (note: using ordinal distance on nominal data + // — this is conservative; nominal distance would give higher alpha) + const categoryAlpha = + annotatorIds.length >= 2 && multiLabeledParaIds.length > 0 + ? krippendorffsAlpha(categoryRatingsMatrix) + : 0; + + // ─── Krippendorff's Alpha: specificity (ordinal) ─── + const specRatingsMatrix: (number | null)[][] = annotatorIds.map( + (annotatorId) => + multiLabeledParaIds.map((paraId) => { + const label = multiLabeled + .get(paraId) + ?.find((l) => l.annotatorId === annotatorId); + return label?.specificityLevel ?? null; + }), + ); + + const specAlpha = + annotatorIds.length >= 2 && multiLabeledParaIds.length > 0 + ? krippendorffsAlpha(specRatingsMatrix) + : 0; + + // ─── Pairwise Cohen's Kappa — category ─── + const kappaCategory: number[][] = Array.from( + { length: annotatorIds.length }, + () => new Array(annotatorIds.length).fill(0), + ); + const kappaCatDetails: { + a1: string; + a2: string; + kappa: number; + n: number; + }[] = []; + + for (let i = 0; i < annotatorIds.length; i++) { + kappaCategory[i][i] = 1; + for (let j = i + 1; j < annotatorIds.length; j++) { + const a1 = annotatorIds[i]; + const a2 = annotatorIds[j]; + const shared1: string[] = []; + const shared2: string[] = []; + + for (const [, lbls] of multiLabeled) { + const l1 = lbls.find((l) => l.annotatorId === a1); + const l2 = lbls.find((l) => l.annotatorId === a2); + if (l1 && l2) { + shared1.push(l1.contentCategory); + shared2.push(l2.contentCategory); + } + } + + if (shared1.length >= 2) { + const kappa = cohensKappa(shared1, shared2); + kappaCategory[i][j] = kappa; + kappaCategory[j][i] = kappa; + kappaCatDetails.push({ + a1: annotatorNames.get(a1) ?? a1, + a2: annotatorNames.get(a2) ?? a2, + kappa, + n: shared1.length, + }); + } + } + } + + // ─── Pairwise Cohen's Kappa — specificity ─── + const kappaSpec: number[][] = Array.from( + { length: annotatorIds.length }, + () => new Array(annotatorIds.length).fill(0), + ); + const kappaSpecDetails: { + a1: string; + a2: string; + kappa: number; + n: number; + }[] = []; + + for (let i = 0; i < annotatorIds.length; i++) { + kappaSpec[i][i] = 1; + for (let j = i + 1; j < annotatorIds.length; j++) { + const a1 = annotatorIds[i]; + const a2 = annotatorIds[j]; + const shared1: string[] = []; + const shared2: string[] = []; + + for (const [, lbls] of multiLabeled) { + const l1 = lbls.find((l) => l.annotatorId === a1); + const l2 = lbls.find((l) => l.annotatorId === a2); + if (l1 && l2) { + shared1.push(String(l1.specificityLevel)); + shared2.push(String(l2.specificityLevel)); + } + } + + if (shared1.length >= 2) { + const kappa = cohensKappa(shared1, shared2); + kappaSpec[i][j] = kappa; + kappaSpec[j][i] = kappa; + kappaSpecDetails.push({ + a1: annotatorNames.get(a1) ?? a1, + a2: annotatorNames.get(a2) ?? a2, + kappa, + n: shared1.length, + }); + } + } + } + + // ─── Per-category agreement ─── + const perCategory = perCategoryAgreement( + labels.map((l) => ({ + category: l.contentCategory, + annotatorId: l.annotatorId, + paragraphId: l.paragraphId, + })), + CATEGORIES, + ); + + // ─── Per-stratum agreement (using stage1 data to identify strata) ─── + const paragraphMeta = new Map(allParagraphs.map((p) => [p.id, p])); + + // Classify each paragraph's stratum based on stage1 data + function classifyStratum(pid: string): string { + const para = paragraphMeta.get(pid); + if (!para) return "unknown"; + const method = para.stage1Method; + const cat = para.stage1Category; + const spec = para.stage1Specificity; + + // Check if it was a disputed paragraph based on method + if (method === "unresolved") return "unresolved"; + if (method === "majority") { + // Try to identify the dispute type from the category + if (cat === "Management Role" || cat === "Risk Management Process") + return "mgmt_rmp_split"; + if (cat === "None/Other" || cat === "Strategy Integration") + return "noneother_strategy_split"; + if (cat === "Board Governance") return "board_mgmt_split"; + if (spec === 3 || spec === 4) return "spec_34_split"; + return "majority_other"; + } + if (method === "unanimous") return "unanimous"; + return "proportional_random"; + } + + const strataAgreement: Record = {}; + for (const [pid, lbls] of fullyLabeled) { + const stratum = classifyStratum(pid); + if (!strataAgreement[stratum]) { + strataAgreement[stratum] = { total: 0, agreed: 0 }; + } + strataAgreement[stratum].total++; + const allSameCat = lbls.every( + (l) => l.contentCategory === lbls[0].contentCategory, + ); + const allSameSpec = lbls.every( + (l) => l.specificityLevel === lbls[0].specificityLevel, + ); + if (allSameCat && allSameSpec) strataAgreement[stratum].agreed++; + } + + const strataRates: Record = {}; + for (const [stratum, data] of Object.entries(strataAgreement)) { + strataRates[stratum] = { + ...data, + rate: data.total > 0 ? data.agreed / data.total : 0, + }; + } + + // ─── Timing summary ─── + const allActiveTimes = labels + .map((l) => l.activeMs) + .filter((t): t is number => t !== null); + const allWallTimes = labels + .map((l) => l.durationMs) + .filter((t): t is number => t !== null); + + // ─── Category distribution ─── + const categoryDist: Record = {}; + for (const cat of CATEGORIES) categoryDist[cat] = 0; + for (const l of labels) { + categoryDist[l.contentCategory] = + (categoryDist[l.contentCategory] ?? 0) + 1; + } + + // ─── Specificity distribution ─── + const specDist: Record = { "1": 0, "2": 0, "3": 0, "4": 0 }; + for (const l of labels) { + specDist[String(l.specificityLevel)] = + (specDist[String(l.specificityLevel)] ?? 0) + 1; + } + + // ─── Majority label distribution (for fully-labeled paragraphs) ─── + const majorityCategories: Record = {}; + for (const cat of CATEGORIES) majorityCategories[cat] = 0; + + for (const lbls of fullyLabeled.values()) { + const catCounts = new Map(); + for (const l of lbls) { + catCounts.set(l.contentCategory, (catCounts.get(l.contentCategory) ?? 0) + 1); + } + let maxCount = 0; + let majorCat = ""; + for (const [cat, count] of catCounts) { + if (count > maxCount) { + maxCount = count; + majorCat = cat; + } + } + if (majorCat) majorityCategories[majorCat]++; + } + + const metrics = { + summary: { + totalLabels: labels.length, + totalParagraphs: allParagraphs.length, + fullyLabeledParagraphs: fullyLabeled.size, + adjudicatedParagraphs: allAdjudications.length, + annotatorCount: annotatorIds.length, + }, + consensus: { + categoryOnly: round(categoryConsensusRate, 4), + specificityOnly: round(specConsensusRate, 4), + both: round(bothConsensusRate, 4), + }, + krippendorffsAlpha: { + category: round(categoryAlpha, 4), + specificity: round(specAlpha, 4), + note: "Category alpha uses ordinal distance on nominal data (conservative). Specificity alpha uses ordinal distance.", + }, + pairwiseKappa: { + category: { + annotators: annotatorIds.map((id) => annotatorNames.get(id) ?? id), + matrix: kappaCategory.map((row) => row.map((v) => round(v, 4))), + pairs: kappaCatDetails.map((d) => ({ + ...d, + kappa: round(d.kappa, 4), + })), + mean: round( + kappaCatDetails.length > 0 + ? kappaCatDetails.reduce((s, d) => s + d.kappa, 0) / + kappaCatDetails.length + : 0, + 4, + ), + }, + specificity: { + annotators: annotatorIds.map((id) => annotatorNames.get(id) ?? id), + matrix: kappaSpec.map((row) => row.map((v) => round(v, 4))), + pairs: kappaSpecDetails.map((d) => ({ + ...d, + kappa: round(d.kappa, 4), + })), + mean: round( + kappaSpecDetails.length > 0 + ? kappaSpecDetails.reduce((s, d) => s + d.kappa, 0) / + kappaSpecDetails.length + : 0, + 4, + ), + }, + }, + perCategoryAgreement: Object.fromEntries( + Object.entries(perCategory).map(([k, v]) => [k, round(v, 4)]), + ), + perStratumAgreement: strataRates, + distributions: { + categoryLabels: categoryDist, + specificityLabels: specDist, + majorityCategories, + }, + timing: { + overallMedianActiveMs: allActiveTimes.length > 0 ? median(allActiveTimes) : null, + overallMeanActiveMs: allActiveTimes.length > 0 ? round(mean(allActiveTimes), 0) : null, + overallMedianDurationMs: allWallTimes.length > 0 ? median(allWallTimes) : null, + overallMeanDurationMs: allWallTimes.length > 0 ? round(mean(allWallTimes), 0) : null, + totalActiveHours: + allActiveTimes.length > 0 + ? round(sum(allActiveTimes) / 3_600_000, 2) + : null, + totalWallHours: + allWallTimes.length > 0 + ? round(sum(allWallTimes) / 3_600_000, 2) + : null, + labelsWithActiveTime: allActiveTimes.length, + labelsWithoutActiveTime: labels.length - allActiveTimes.length, + }, + perAnnotator: perAnnotatorStats, + }; + + await writeFile(`${OUT_DIR}/metrics.json`, JSON.stringify(metrics, null, 2)); + console.log(` metrics → metrics.json`); + + // ── Print summary to console ── + console.log("\n" + "=".repeat(60)); + console.log("HUMAN LABELING SUMMARY"); + console.log("=".repeat(60)); + console.log(`\nParagraphs: ${fullyLabeled.size} fully labeled / ${allParagraphs.length} total`); + console.log(`Labels: ${labels.length} total`); + console.log(`\n── Consensus Rates (3/3 agree) ──`); + console.log(` Category only: ${(categoryConsensusRate * 100).toFixed(1)}%`); + console.log(` Specificity only: ${(specConsensusRate * 100).toFixed(1)}%`); + console.log(` Both: ${(bothConsensusRate * 100).toFixed(1)}%`); + console.log(`\n── Krippendorff's Alpha ──`); + console.log(` Category: ${categoryAlpha.toFixed(4)}`); + console.log(` Specificity: ${specAlpha.toFixed(4)}`); + console.log(`\n── Pairwise Kappa (category) ──`); + console.log(` Mean: ${metrics.pairwiseKappa.category.mean}`); + for (const pair of kappaCatDetails) { + console.log(` ${pair.a1} × ${pair.a2}: ${pair.kappa.toFixed(4)} (n=${pair.n})`); + } + console.log(`\n── Pairwise Kappa (specificity) ──`); + console.log(` Mean: ${metrics.pairwiseKappa.specificity.mean}`); + for (const pair of kappaSpecDetails) { + console.log(` ${pair.a1} × ${pair.a2}: ${pair.kappa.toFixed(4)} (n=${pair.n})`); + } + console.log(`\n── Per-Category Agreement ──`); + for (const [cat, rate] of Object.entries(perCategory)) { + console.log(` ${cat}: ${(rate * 100).toFixed(1)}%`); + } + console.log(`\n── Per-Stratum Agreement ──`); + for (const [stratum, data] of Object.entries(strataRates)) { + console.log( + ` ${stratum}: ${(data.rate * 100).toFixed(1)}% (${data.agreed}/${data.total})`, + ); + } + console.log(`\n── Timing ──`); + if (allActiveTimes.length > 0) { + console.log(` Median active time: ${(median(allActiveTimes) / 1000).toFixed(1)}s`); + console.log(` Mean active time: ${(mean(allActiveTimes) / 1000).toFixed(1)}s`); + console.log(` Total active hours: ${(sum(allActiveTimes) / 3_600_000).toFixed(2)}h`); + console.log(` Total wall hours: ${(sum(allWallTimes) / 3_600_000).toFixed(2)}h`); + } + console.log(` Labels with active timer: ${allActiveTimes.length}/${labels.length}`); + + console.log(`\n── Per-Annotator ──`); + for (const a of perAnnotatorStats) { + const activeH = a.totalActiveMs ? (a.totalActiveMs / 3_600_000).toFixed(2) : "N/A"; + const medSec = a.medianActiveMs ? (a.medianActiveMs / 1000).toFixed(1) : "N/A"; + console.log( + ` ${a.name}: ${a.labelCount} labels, median ${medSec}s active, ${activeH}h total`, + ); + } + + console.log(`\n${"=".repeat(60)}`); + console.log(`All data exported to ${OUT_DIR}/`); + console.log("=".repeat(60)); + + process.exit(0); +} + +function median(arr: number[]): number { + const sorted = [...arr].sort((a, b) => a - b); + const mid = Math.floor(sorted.length / 2); + return sorted.length % 2 !== 0 + ? sorted[mid] + : (sorted[mid - 1] + sorted[mid]) / 2; +} + +function mean(arr: number[]): number { + return arr.reduce((s, v) => s + v, 0) / arr.length; +} + +function sum(arr: number[]): number { + return arr.reduce((s, v) => s + v, 0); +} + +function round(n: number, decimals: number): number { + const factor = 10 ** decimals; + return Math.round(n * factor) / factor; +} + +main().catch((err) => { + console.error("Dump failed:", err); + process.exit(1); +}); diff --git a/package.json b/package.json index 1088d87..5752141 100644 --- a/package.json +++ b/package.json @@ -16,6 +16,7 @@ "la:sample": "bun run --filter labelapp sample", "la:assign": "bun run --filter labelapp assign", "la:export": "bun run --filter labelapp export", + "la:dump": "bun run --filter labelapp dump", "la:docker": "docker build -f labelapp/Dockerfile -t registry.claiborne.soy/labelapp:latest . --push", "ts:sec": "bun run --filter sec-cybert sec", "ts:typecheck": "bun run --filter sec-cybert typecheck", diff --git a/scripts/analyze-gold.py b/scripts/analyze-gold.py new file mode 100644 index 0000000..cf0638e --- /dev/null +++ b/scripts/analyze-gold.py @@ -0,0 +1,1224 @@ +""" +Comprehensive analysis of human labeling data cross-referenced with +Stage 1 GenAI panel and Opus golden labels. + +Outputs charts to data/gold/charts/ and a summary to stdout. +""" + +import json +import os +from collections import Counter, defaultdict +from pathlib import Path + +import matplotlib +matplotlib.use("Agg") +import matplotlib.pyplot as plt +import matplotlib.ticker as mticker +import numpy as np + +# ── Paths ── +GOLD_DIR = Path("/home/joey/Documents/sec-cyBERT/data/gold") +CHART_DIR = GOLD_DIR / "charts" +STAGE1_PATH = Path("/home/joey/Documents/sec-cyBERT/data/annotations/stage1.patched.jsonl") +OPUS_PATH = Path("/home/joey/Documents/sec-cyBERT/data/annotations/golden/opus.jsonl") +HOLDOUT_PATH = GOLD_DIR / "paragraphs-holdout.jsonl" +LABELS_PATH = GOLD_DIR / "human-labels-raw.jsonl" +METRICS_PATH = GOLD_DIR / "metrics.json" +OPUS_ID_MAP_PATH = GOLD_DIR / "opus-to-db-id-map.json" + +CATEGORIES = [ + "Board Governance", "Management Role", "Risk Management Process", + "Third-Party Risk", "Incident Disclosure", "Strategy Integration", "None/Other", +] +CAT_SHORT = ["BG", "MR", "RMP", "TPR", "ID", "SI", "N/O"] +CAT_MAP = dict(zip(CATEGORIES, CAT_SHORT)) +SPEC_LEVELS = [1, 2, 3, 4] + +CHART_DIR.mkdir(parents=True, exist_ok=True) + + +def load_jsonl(path: Path) -> list[dict]: + records = [] + with open(path) as f: + for line in f: + line = line.strip() + if line: + records.append(json.loads(line)) + return records + + +def majority_vote(items: list[str]) -> str | None: + """Return majority item if one exists, else None.""" + c = Counter(items) + top, count = c.most_common(1)[0] + return top if count > len(items) / 2 else None + + +def plurality_vote(items: list) -> tuple: + """Return most common item and its count.""" + c = Counter(items) + return c.most_common(1)[0] + + +# ── Load data ── +print("Loading data...") +human_labels = load_jsonl(LABELS_PATH) +paragraphs_all = load_jsonl(HOLDOUT_PATH) +opus_labels = load_jsonl(OPUS_PATH) +metrics = json.loads(METRICS_PATH.read_text()) + +# Build paragraph metadata lookup (only holdout ones) +holdout_ids = {l["paragraphId"] for l in human_labels} +para_meta = {} +for p in paragraphs_all: + if p["id"] in holdout_ids: + para_meta[p["id"]] = p + +# Load Stage 1 annotations for holdout +stage1_annots = [] +with open(STAGE1_PATH) as f: + for line in f: + d = json.loads(line) + if d["paragraphId"] in holdout_ids: + stage1_annots.append(d) + +# Build lookups +# Opus labels: only use if we have sufficient coverage (>50% of holdout) +# The Opus golden run may have been done on a different sample than what's in the DB. +opus_by_pid: dict[str, dict] = {} +for r in opus_labels: + if r["paragraphId"] in holdout_ids: + opus_by_pid[r["paragraphId"]] = r +# Also try ID remapping if direct match is low +if len(opus_by_pid) < 600 and OPUS_ID_MAP_PATH.exists(): + opus_id_map = json.loads(OPUS_ID_MAP_PATH.read_text()) + for r in opus_labels: + db_pid = opus_id_map.get(r["paragraphId"]) + if db_pid and db_pid in holdout_ids and db_pid not in opus_by_pid: + opus_by_pid[db_pid] = r + +OPUS_AVAILABLE = len(opus_by_pid) >= 600 # gate all Opus analysis on sufficient coverage +opus_coverage = len(opus_by_pid) +print(f" Opus labels matched to holdout: {opus_coverage}/1200" + f" {'— SKIPPING Opus analysis (insufficient coverage)' if not OPUS_AVAILABLE else ''}") + +# Stage 1: 3 annotations per paragraph +stage1_by_pid: dict[str, list[dict]] = defaultdict(list) +for a in stage1_annots: + stage1_by_pid[a["paragraphId"]].append(a) + +# Human labels grouped by paragraph +human_by_pid: dict[str, list[dict]] = defaultdict(list) +for l in human_labels: + human_by_pid[l["paragraphId"]].append(l) + +# Annotator names +annotator_names = sorted({l["annotatorName"] for l in human_labels}) +annotator_ids = sorted({l["annotatorId"] for l in human_labels}) +name_to_id = {} +for l in human_labels: + name_to_id[l["annotatorName"]] = l["annotatorId"] + +print(f" {len(human_labels)} human labels across {len(holdout_ids)} paragraphs") +print(f" {len(stage1_annots)} Stage 1 annotations") +print(f" {len(opus_labels)} Opus labels") +print(f" Annotators: {', '.join(annotator_names)}") + +# ── Derive per-paragraph consensus labels ── +consensus = {} # pid -> {human_cat, human_spec, human_cat_method, ...} +for pid, lbls in human_by_pid.items(): + cats = [l["contentCategory"] for l in lbls] + specs = [l["specificityLevel"] for l in lbls] + + cat_maj = majority_vote(cats) + spec_maj = majority_vote([str(s) for s in specs]) + + # Stage 1 + s1 = stage1_by_pid.get(pid, []) + s1_cats = [a["label"]["content_category"] for a in s1] + s1_specs = [a["label"]["specificity_level"] for a in s1] + s1_cat_maj = majority_vote(s1_cats) if s1_cats else None + s1_spec_maj = majority_vote([str(s) for s in s1_specs]) if s1_specs else None + + # Opus + op = opus_by_pid.get(pid) + op_cat = op["label"]["content_category"] if op else None + op_spec = op["label"]["specificity_level"] if op else None + + consensus[pid] = { + "human_cats": cats, + "human_specs": specs, + "human_cat_maj": cat_maj, + "human_spec_maj": int(spec_maj) if spec_maj else None, + "human_cat_unanimous": len(set(cats)) == 1, + "human_spec_unanimous": len(set(specs)) == 1, + "s1_cats": s1_cats, + "s1_specs": s1_specs, + "s1_cat_maj": s1_cat_maj, + "s1_spec_maj": int(s1_spec_maj) if s1_spec_maj else None, + "s1_cat_unanimous": len(set(s1_cats)) == 1 if s1_cats else False, + "opus_cat": op_cat, + "opus_spec": op_spec, + "word_count": para_meta.get(pid, {}).get("wordCount", 0), + } + + +# ═══════════════════════════════════════════════════════════ +# CHART 1: Pairwise Kappa Heatmaps (category + specificity) +# ═══════════════════════════════════════════════════════════ +def plot_kappa_heatmaps(): + fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5.5)) + + for ax, dim_key, title in [ + (ax1, "category", "Category"), + (ax2, "specificity", "Specificity"), + ]: + data = metrics["pairwiseKappa"][dim_key] + names = data["annotators"] + matrix = np.array(data["matrix"]) + + # Mask diagonal + mask = np.eye(len(names), dtype=bool) + display = np.where(mask, np.nan, matrix) + + im = ax.imshow(display, cmap="RdYlGn", vmin=0, vmax=1, aspect="equal") + ax.set_xticks(range(len(names))) + ax.set_xticklabels(names, rotation=45, ha="right", fontsize=9) + ax.set_yticks(range(len(names))) + ax.set_yticklabels(names, fontsize=9) + ax.set_title(f"Pairwise Cohen's κ — {title}", fontsize=12, fontweight="bold") + + for i in range(len(names)): + for j in range(len(names)): + if i != j: + color = "white" if matrix[i][j] < 0.4 else "black" + ax.text(j, i, f"{matrix[i][j]:.2f}", ha="center", va="center", + fontsize=8, color=color) + + fig.colorbar(im, ax=[ax1, ax2], shrink=0.8, label="Cohen's κ") + fig.tight_layout() + fig.savefig(CHART_DIR / "01_kappa_heatmaps.png", dpi=150) + plt.close(fig) + print(" 01_kappa_heatmaps.png") + + +# ═══════════════════════════════════════════════════════════ +# CHART 2: Per-annotator category distribution +# ═══════════════════════════════════════════════════════════ +def plot_annotator_category_dist(): + fig, ax = plt.subplots(figsize=(12, 6)) + + # Also add Stage 1 majority (and Opus if available) + sources = list(annotator_names) + ["Stage1 Maj"] + (["Opus"] if OPUS_AVAILABLE else []) + + dist = {s: Counter() for s in sources} + for l in human_labels: + dist[l["annotatorName"]][l["contentCategory"]] += 1 + + for pid, c in consensus.items(): + if c["s1_cat_maj"]: + dist["Stage1 Maj"][c["s1_cat_maj"]] += 1 + if OPUS_AVAILABLE and c["opus_cat"]: + dist["Opus"][c["opus_cat"]] += 1 + + x = np.arange(len(sources)) + width = 0.11 + offsets = np.arange(len(CATEGORIES)) - len(CATEGORIES) / 2 + 0.5 + + colors = plt.cm.Set2(np.linspace(0, 1, len(CATEGORIES))) + + for i, (cat, color) in enumerate(zip(CATEGORIES, colors)): + counts = [dist[s].get(cat, 0) for s in sources] + totals = [sum(dist[s].values()) for s in sources] + pcts = [c / t * 100 if t > 0 else 0 for c, t in zip(counts, totals)] + ax.bar(x + offsets[i] * width, pcts, width, label=CAT_MAP[cat], color=color) + + ax.set_xticks(x) + ax.set_xticklabels(sources, rotation=45, ha="right") + ax.set_ylabel("% of labels") + ax.set_title("Category Distribution by Annotator (incl. Stage1 & Opus)", fontweight="bold") + ax.legend(bbox_to_anchor=(1.02, 1), loc="upper left", fontsize=8) + ax.yaxis.set_major_formatter(mticker.PercentFormatter()) + fig.tight_layout() + fig.savefig(CHART_DIR / "02_category_distribution.png", dpi=150) + plt.close(fig) + print(" 02_category_distribution.png") + + +# ═══════════════════════════════════════════════════════════ +# CHART 3: Per-annotator specificity distribution +# ═══════════════════════════════════════════════════════════ +def plot_annotator_spec_dist(): + fig, ax = plt.subplots(figsize=(12, 5)) + + sources = list(annotator_names) + ["Stage1 Maj"] + (["Opus"] if OPUS_AVAILABLE else []) + + dist = {s: Counter() for s in sources} + for l in human_labels: + dist[l["annotatorName"]][l["specificityLevel"]] += 1 + + for pid, c in consensus.items(): + if c["s1_spec_maj"]: + dist["Stage1 Maj"][c["s1_spec_maj"]] += 1 + if OPUS_AVAILABLE and c["opus_spec"]: + dist["Opus"][c["opus_spec"]] += 1 + + x = np.arange(len(sources)) + width = 0.18 + colors = ["#e74c3c", "#f39c12", "#2ecc71", "#3498db"] + spec_labels = ["1 Generic", "2 Sector", "3 Firm-Specific", "4 Quantified"] + + for i, (level, color, label) in enumerate(zip(SPEC_LEVELS, colors, spec_labels)): + counts = [dist[s].get(level, 0) for s in sources] + totals = [sum(dist[s].values()) for s in sources] + pcts = [c / t * 100 if t > 0 else 0 for c, t in zip(counts, totals)] + ax.bar(x + (i - 1.5) * width, pcts, width, label=label, color=color) + + ax.set_xticks(x) + ax.set_xticklabels(sources, rotation=45, ha="right") + ax.set_ylabel("% of labels") + ax.set_title("Specificity Distribution by Annotator (incl. Stage1 & Opus)", fontweight="bold") + ax.legend() + ax.yaxis.set_major_formatter(mticker.PercentFormatter()) + fig.tight_layout() + fig.savefig(CHART_DIR / "03_specificity_distribution.png", dpi=150) + plt.close(fig) + print(" 03_specificity_distribution.png") + + +# ═══════════════════════════════════════════════════════════ +# CHART 4: Human confusion matrix (aggregated pairwise) +# ═══════════════════════════════════════════════════════════ +def plot_human_confusion(): + fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6)) + + # Category confusion + cat_conf = np.zeros((len(CATEGORIES), len(CATEGORIES))) + cat_idx = {c: i for i, c in enumerate(CATEGORIES)} + + for pid, lbls in human_by_pid.items(): + cats = [l["contentCategory"] for l in lbls] + for i in range(len(cats)): + for j in range(i + 1, len(cats)): + a, b = cat_idx[cats[i]], cat_idx[cats[j]] + cat_conf[a][b] += 1 + cat_conf[b][a] += 1 + + # Normalize rows + row_sums = cat_conf.sum(axis=1, keepdims=True) + cat_conf_norm = np.where(row_sums > 0, cat_conf / row_sums * 100, 0) + + im1 = ax1.imshow(cat_conf_norm, cmap="YlOrRd", aspect="equal") + ax1.set_xticks(range(len(CAT_SHORT))) + ax1.set_xticklabels(CAT_SHORT, fontsize=9) + ax1.set_yticks(range(len(CAT_SHORT))) + ax1.set_yticklabels(CAT_SHORT, fontsize=9) + ax1.set_title("Human Category Confusion (row-normalized %)", fontweight="bold") + ax1.set_xlabel("Annotator B") + ax1.set_ylabel("Annotator A") + + for i in range(len(CAT_SHORT)): + for j in range(len(CAT_SHORT)): + val = cat_conf_norm[i][j] + if val > 0.5: + color = "white" if val > 40 else "black" + ax1.text(j, i, f"{val:.0f}", ha="center", va="center", + fontsize=7, color=color) + + # Specificity confusion + spec_conf = np.zeros((4, 4)) + for pid, lbls in human_by_pid.items(): + specs = [l["specificityLevel"] for l in lbls] + for i in range(len(specs)): + for j in range(i + 1, len(specs)): + a, b = specs[i] - 1, specs[j] - 1 + spec_conf[a][b] += 1 + spec_conf[b][a] += 1 + + row_sums = spec_conf.sum(axis=1, keepdims=True) + spec_conf_norm = np.where(row_sums > 0, spec_conf / row_sums * 100, 0) + + im2 = ax2.imshow(spec_conf_norm, cmap="YlOrRd", aspect="equal") + ax2.set_xticks(range(4)) + ax2.set_xticklabels(["Spec 1", "Spec 2", "Spec 3", "Spec 4"], fontsize=9) + ax2.set_yticks(range(4)) + ax2.set_yticklabels(["Spec 1", "Spec 2", "Spec 3", "Spec 4"], fontsize=9) + ax2.set_title("Human Specificity Confusion (row-normalized %)", fontweight="bold") + + for i in range(4): + for j in range(4): + val = spec_conf_norm[i][j] + if val > 0.5: + color = "white" if val > 40 else "black" + ax2.text(j, i, f"{val:.0f}", ha="center", va="center", + fontsize=9, color=color) + + fig.colorbar(im1, ax=ax1, shrink=0.8) + fig.colorbar(im2, ax=ax2, shrink=0.8) + fig.tight_layout() + fig.savefig(CHART_DIR / "04_human_confusion.png", dpi=150) + plt.close(fig) + print(" 04_human_confusion.png") + + +# ═══════════════════════════════════════════════════════════ +# CHART 5: Human majority vs Stage 1 majority vs Opus +# ═══════════════════════════════════════════════════════════ +def plot_cross_source_confusion(): + comparisons = [ + ("Human Maj", "Stage1 Maj", "human_cat_maj", "s1_cat_maj"), + ] + if OPUS_AVAILABLE: + comparisons += [ + ("Human Maj", "Opus", "human_cat_maj", "opus_cat"), + ("Stage1 Maj", "Opus", "s1_cat_maj", "opus_cat"), + ] + ncols = len(comparisons) + fig, axes = plt.subplots(1, ncols, figsize=(7 * ncols, 5.5)) + if ncols == 1: + axes = [axes] + + for ax, (name_a, name_b, key_a, key_b) in zip(axes, comparisons): + conf = np.zeros((len(CATEGORIES), len(CATEGORIES))) + cat_idx = {c: i for i, c in enumerate(CATEGORIES)} + total = 0 + agree = 0 + + for pid, c in consensus.items(): + a_val = c[key_a] + b_val = c[key_b] + if a_val and b_val: + conf[cat_idx[a_val]][cat_idx[b_val]] += 1 + total += 1 + if a_val == b_val: + agree += 1 + + # Normalize rows + row_sums = conf.sum(axis=1, keepdims=True) + conf_norm = np.where(row_sums > 0, conf / row_sums * 100, 0) + + im = ax.imshow(conf_norm, cmap="YlGnBu", aspect="equal") + ax.set_xticks(range(len(CAT_SHORT))) + ax.set_xticklabels(CAT_SHORT, fontsize=8) + ax.set_yticks(range(len(CAT_SHORT))) + ax.set_yticklabels(CAT_SHORT, fontsize=8) + pct = agree / total * 100 if total > 0 else 0 + ax.set_title(f"{name_a} vs {name_b}\n({pct:.1f}% agree, n={total})", + fontweight="bold", fontsize=10) + ax.set_ylabel(name_a) + ax.set_xlabel(name_b) + + for i in range(len(CAT_SHORT)): + for j in range(len(CAT_SHORT)): + val = conf_norm[i][j] + if val > 0.5: + color = "white" if val > 50 else "black" + ax.text(j, i, f"{val:.0f}", ha="center", va="center", + fontsize=7, color=color) + + fig.tight_layout() + fig.savefig(CHART_DIR / "05_cross_source_category.png", dpi=150) + plt.close(fig) + print(" 05_cross_source_category.png") + + +# ═══════════════════════════════════════════════════════════ +# CHART 6: Cross-source specificity confusion +# ═══════════════════════════════════════════════════════════ +def plot_cross_source_specificity(): + comparisons = [ + ("Human Maj", "Stage1 Maj", "human_spec_maj", "s1_spec_maj"), + ] + if OPUS_AVAILABLE: + comparisons += [ + ("Human Maj", "Opus", "human_spec_maj", "opus_spec"), + ("Stage1 Maj", "Opus", "s1_spec_maj", "opus_spec"), + ] + ncols = len(comparisons) + fig, axes = plt.subplots(1, ncols, figsize=(5.5 * ncols, 4.5)) + if ncols == 1: + axes = [axes] + + for ax, (name_a, name_b, key_a, key_b) in zip(axes, comparisons): + conf = np.zeros((4, 4)) + total = 0 + agree = 0 + + for pid, c in consensus.items(): + a_val = c[key_a] + b_val = c[key_b] + if a_val is not None and b_val is not None: + conf[a_val - 1][b_val - 1] += 1 + total += 1 + if a_val == b_val: + agree += 1 + + row_sums = conf.sum(axis=1, keepdims=True) + conf_norm = np.where(row_sums > 0, conf / row_sums * 100, 0) + + im = ax.imshow(conf_norm, cmap="YlGnBu", aspect="equal") + ax.set_xticks(range(4)) + ax.set_xticklabels(["S1", "S2", "S3", "S4"], fontsize=9) + ax.set_yticks(range(4)) + ax.set_yticklabels(["S1", "S2", "S3", "S4"], fontsize=9) + pct = agree / total * 100 if total > 0 else 0 + ax.set_title(f"{name_a} vs {name_b}\n({pct:.1f}% agree, n={total})", + fontweight="bold", fontsize=10) + ax.set_ylabel(name_a) + ax.set_xlabel(name_b) + + for i in range(4): + for j in range(4): + val = conf_norm[i][j] + if val > 0.5: + color = "white" if val > 50 else "black" + ax.text(j, i, f"{val:.0f}", ha="center", va="center", + fontsize=9, color=color) + + fig.tight_layout() + fig.savefig(CHART_DIR / "06_cross_source_specificity.png", dpi=150) + plt.close(fig) + print(" 06_cross_source_specificity.png") + + +# ═══════════════════════════════════════════════════════════ +# CHART 7: Per-annotator agreement with Stage1 and Opus +# ═══════════════════════════════════════════════════════════ +def plot_annotator_vs_references(): + fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5)) + + # Build per-annotator label lookup + ann_labels: dict[str, dict[str, dict]] = defaultdict(dict) + for l in human_labels: + ann_labels[l["annotatorName"]][l["paragraphId"]] = l + + for ax, dim, title in [(ax1, "cat", "Category"), (ax2, "spec", "Specificity")]: + ref_sources = [ + ("Stage1 Maj", "s1_cat_maj", "s1_spec_maj"), + ("Human Maj", "human_cat_maj", "human_spec_maj"), + ] + if OPUS_AVAILABLE: + ref_sources.insert(1, ("Opus", "opus_cat", "opus_spec")) + + x = np.arange(len(annotator_names)) + width = 0.25 if len(ref_sources) == 3 else 0.3 + + for ri, (ref_name, ref_key_cat, ref_key_spec) in enumerate(ref_sources): + rates = [] + for ann_name in annotator_names: + agree = 0 + total = 0 + for pid, lbl in ann_labels[ann_name].items(): + c = consensus.get(pid) + if not c: + continue + if dim == "cat": + ref_val = c[ref_key_cat] + ann_val = lbl["contentCategory"] + else: + ref_val = c[ref_key_spec] + ann_val = lbl["specificityLevel"] + if ref_val is not None: + total += 1 + if str(ann_val) == str(ref_val): + agree += 1 + rates.append(agree / total * 100 if total > 0 else 0) + + ax.bar(x + (ri - 1) * width, rates, width, label=ref_name) + + ax.set_xticks(x) + ax.set_xticklabels(annotator_names, rotation=45, ha="right") + ax.set_ylabel("Agreement %") + ax.set_title(f"Per-Annotator {title} Agreement with References", fontweight="bold") + ax.legend() + ax.set_ylim(0, 100) + + fig.tight_layout() + fig.savefig(CHART_DIR / "07_annotator_vs_references.png", dpi=150) + plt.close(fig) + print(" 07_annotator_vs_references.png") + + +# ═══════════════════════════════════════════════════════════ +# CHART 8: Agreement rate by word count (binned) +# ═══════════════════════════════════════════════════════════ +def plot_agreement_by_wordcount(): + fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5)) + + # Bin paragraphs by word count + wc_bins = [(0, 50), (51, 80), (81, 120), (121, 180), (181, 500)] + bin_labels = ["≤50", "51-80", "81-120", "121-180", "180+"] + + for ax, dim, title in [(ax1, "cat", "Category"), (ax2, "both", "Both")]: + rates = [] + ns = [] + for lo, hi in wc_bins: + agree = 0 + total = 0 + for pid, c in consensus.items(): + wc = c["word_count"] + if lo <= wc <= hi: + total += 1 + if dim == "cat": + if c["human_cat_unanimous"]: + agree += 1 + else: + if c["human_cat_unanimous"] and c["human_spec_unanimous"]: + agree += 1 + rates.append(agree / total * 100 if total > 0 else 0) + ns.append(total) + + bars = ax.bar(range(len(bin_labels)), rates, color="#3498db") + ax.set_xticks(range(len(bin_labels))) + ax.set_xticklabels(bin_labels) + ax.set_xlabel("Word Count") + ax.set_ylabel("Unanimous Agreement %") + ax.set_title(f"{title} Consensus by Paragraph Length", fontweight="bold") + ax.set_ylim(0, 80) + + for bar, n in zip(bars, ns): + ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 1, + f"n={n}", ha="center", va="bottom", fontsize=8) + + fig.tight_layout() + fig.savefig(CHART_DIR / "08_agreement_by_wordcount.png", dpi=150) + plt.close(fig) + print(" 08_agreement_by_wordcount.png") + + +# ═══════════════════════════════════════════════════════════ +# CHART 9: Active time vs agreement +# ═══════════════════════════════════════════════════════════ +def plot_time_vs_agreement(): + fig, ax = plt.subplots(figsize=(10, 5)) + + # For each paragraph, compute median active time and whether humans agreed + agreed_times = [] + disagreed_times = [] + + for pid, lbls in human_by_pid.items(): + times = [l["activeMs"] for l in lbls if l["activeMs"] is not None] + if not times: + continue + med_time = sorted(times)[len(times) // 2] / 1000 # seconds + + cats = [l["contentCategory"] for l in lbls] + if len(set(cats)) == 1: + agreed_times.append(med_time) + else: + disagreed_times.append(med_time) + + bins = np.linspace(0, 120, 30) + ax.hist(agreed_times, bins=bins, alpha=0.6, label=f"Category agreed (n={len(agreed_times)})", + color="#2ecc71", density=True) + ax.hist(disagreed_times, bins=bins, alpha=0.6, label=f"Category disagreed (n={len(disagreed_times)})", + color="#e74c3c", density=True) + ax.set_xlabel("Median Active Time per Paragraph (seconds)") + ax.set_ylabel("Density") + ax.set_title("Labeling Time: Agreed vs Disagreed Paragraphs", fontweight="bold") + ax.legend() + ax.set_xlim(0, 120) + fig.tight_layout() + fig.savefig(CHART_DIR / "09_time_vs_agreement.png", dpi=150) + plt.close(fig) + print(" 09_time_vs_agreement.png") + + +# ═══════════════════════════════════════════════════════════ +# CHART 10: None/Other deep dive — what do people label instead? +# ═══════════════════════════════════════════════════════════ +def plot_none_other_analysis(): + fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5)) + + # For paragraphs where at least one annotator said None/Other + # What did the others say? + noneother_vs = Counter() + noneother_pids = set() + for pid, lbls in human_by_pid.items(): + cats = [l["contentCategory"] for l in lbls] + if "None/Other" in cats and len(set(cats)) > 1: + noneother_pids.add(pid) + for c in cats: + if c != "None/Other": + noneother_vs[c] += 1 + + # Also: paragraphs where NO human said None/Other but Stage1 or Opus did + s1_noneother_human_not = Counter() + for pid, c in consensus.items(): + human_cats = set(c["human_cats"]) + if "None/Other" not in human_cats: + if c["s1_cat_maj"] == "None/Other": + for hc in c["human_cats"]: + s1_noneother_human_not[hc] += 1 + + cats_sorted = sorted(noneother_vs.keys(), key=lambda c: -noneother_vs[c]) + ax1.barh(range(len(cats_sorted)), [noneother_vs[c] for c in cats_sorted], color="#e74c3c") + ax1.set_yticks(range(len(cats_sorted))) + ax1.set_yticklabels([CAT_MAP.get(c, c) for c in cats_sorted]) + ax1.set_xlabel("Count") + ax1.set_title(f"When someone says N/O but others disagree\n({len(noneother_pids)} paragraphs)", + fontweight="bold") + ax1.invert_yaxis() + + # What does Stage1 say when humans disagree on category? + s1_for_disagreed = Counter() + for pid, c in consensus.items(): + if not c["human_cat_unanimous"] and c["s1_cat_maj"]: + s1_for_disagreed[c["s1_cat_maj"]] += 1 + + cats_sorted2 = sorted(s1_for_disagreed.keys(), key=lambda c: -s1_for_disagreed[c]) + ax2.barh(range(len(cats_sorted2)), [s1_for_disagreed[c] for c in cats_sorted2], color="#3498db") + ax2.set_yticks(range(len(cats_sorted2))) + ax2.set_yticklabels([CAT_MAP.get(c, c) for c in cats_sorted2]) + ax2.set_xlabel("Count") + ax2.set_title(f"Stage1 majority for human-disagreed paragraphs\n(n={sum(s1_for_disagreed.values())})", + fontweight="bold") + ax2.invert_yaxis() + + fig.tight_layout() + fig.savefig(CHART_DIR / "10_none_other_analysis.png", dpi=150) + plt.close(fig) + print(" 10_none_other_analysis.png") + + +# ═══════════════════════════════════════════════════════════ +# CHART 11: Aaryan vs everyone else — where does he diverge? +# ═══════════════════════════════════════════════════════════ +def plot_outlier_annotator(): + # Find the annotator with lowest avg kappa + cat_kappas = metrics["pairwiseKappa"]["category"]["pairs"] + ann_kappa_sum = defaultdict(lambda: {"sum": 0, "n": 0}) + for pair in cat_kappas: + ann_kappa_sum[pair["a1"]]["sum"] += pair["kappa"] + ann_kappa_sum[pair["a1"]]["n"] += 1 + ann_kappa_sum[pair["a2"]]["sum"] += pair["kappa"] + ann_kappa_sum[pair["a2"]]["n"] += 1 + + outlier = min(ann_kappa_sum, key=lambda a: ann_kappa_sum[a]["sum"] / ann_kappa_sum[a]["n"]) + + fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5)) + + # What does the outlier label differently? + # Compare outlier's category choices vs the majority of the other 2 annotators + outlier_id = name_to_id.get(outlier, outlier) + outlier_diverge_from = Counter() # (outlier_cat, others_cat) pairs + outlier_diverge_to = Counter() + + for pid, lbls in human_by_pid.items(): + outlier_lbl = None + others = [] + for l in lbls: + if l["annotatorName"] == outlier: + outlier_lbl = l + else: + others.append(l) + + if outlier_lbl and len(others) >= 2: + other_cats = [o["contentCategory"] for o in others] + if other_cats[0] == other_cats[1] and other_cats[0] != outlier_lbl["contentCategory"]: + outlier_diverge_from[other_cats[0]] += 1 + outlier_diverge_to[outlier_lbl["contentCategory"]] += 1 + + # Diverge FROM (what category the others agreed on) + cats1 = sorted(outlier_diverge_from.keys(), key=lambda c: -outlier_diverge_from[c]) + ax1.barh(range(len(cats1)), [outlier_diverge_from[c] for c in cats1], color="#e74c3c") + ax1.set_yticks(range(len(cats1))) + ax1.set_yticklabels([CAT_MAP.get(c, c) for c in cats1]) + ax1.set_xlabel("Count") + ax1.set_title(f"{outlier} disagrees: what others chose\n(others agreed, {outlier} didn't)", + fontweight="bold") + ax1.invert_yaxis() + + # Diverge TO (what did the outlier pick instead) + cats2 = sorted(outlier_diverge_to.keys(), key=lambda c: -outlier_diverge_to[c]) + ax2.barh(range(len(cats2)), [outlier_diverge_to[c] for c in cats2], color="#f39c12") + ax2.set_yticks(range(len(cats2))) + ax2.set_yticklabels([CAT_MAP.get(c, c) for c in cats2]) + ax2.set_xlabel("Count") + ax2.set_title(f"What {outlier} chose instead", fontweight="bold") + ax2.invert_yaxis() + + fig.suptitle(f"Outlier Analysis: {outlier} (lowest avg κ = " + f"{ann_kappa_sum[outlier]['sum']/ann_kappa_sum[outlier]['n']:.3f})", + fontweight="bold", fontsize=12) + fig.tight_layout() + fig.savefig(CHART_DIR / "11_outlier_annotator.png", dpi=150) + plt.close(fig) + print(" 11_outlier_annotator.png") + + +# ═══════════════════════════════════════════════════════════ +# CHART 12: Human vs GenAI consensus comparison +# ═══════════════════════════════════════════════════════════ +def plot_human_vs_genai_consensus(): + fig, axes = plt.subplots(1, 3, figsize=(16, 5)) + + # For each paragraph: human unanimity vs stage1 unanimity + # Quadrants: both agree, human only, stage1 only, neither + human_unan_cat = sum(1 for c in consensus.values() if c["human_cat_unanimous"]) + s1_unan_cat = sum(1 for c in consensus.values() if c["s1_cat_unanimous"]) + both_unan_cat = sum(1 for c in consensus.values() + if c["human_cat_unanimous"] and c["s1_cat_unanimous"]) + + human_unan_spec = sum(1 for c in consensus.values() if c["human_spec_unanimous"]) + s1_unan_spec = sum(1 for c in consensus.values() + if len(set(c["s1_specs"])) == 1 if c["s1_specs"]) + + # Chart 1: Category agreement Venn-style comparison + ax = axes[0] + labels_data = ["Human\nunanimous", "Stage1\nunanimous", "Both\nunanimous"] + vals = [human_unan_cat, s1_unan_cat, both_unan_cat] + pcts = [v / 1200 * 100 for v in vals] + bars = ax.bar(range(3), pcts, color=["#3498db", "#e74c3c", "#2ecc71"]) + ax.set_xticks(range(3)) + ax.set_xticklabels(labels_data) + ax.set_ylabel("%") + ax.set_title("Category Unanimity Rates", fontweight="bold") + for bar, v, p in zip(bars, vals, pcts): + ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 1, + f"{p:.1f}%\n({v})", ha="center", fontsize=9) + + # Chart 2: Human vs Stage1 category agreement breakdown + ax = axes[1] + both_agree = 0 # human unanimous AND matches s1 + human_unan_s1_diff = 0 # human unanimous but s1 differs + s1_unan_human_diff = 0 # s1 unanimous but human majority differs + both_majority_agree = 0 # neither unanimous but majorities match + majorities_differ = 0 + + for pid, c in consensus.items(): + hm = c["human_cat_maj"] + sm = c["s1_cat_maj"] + hu = c["human_cat_unanimous"] + su = c["s1_cat_unanimous"] + if not hm or not sm: + continue + if hm == sm: + both_majority_agree += 1 + else: + majorities_differ += 1 + + total = both_majority_agree + majorities_differ + vals = [both_majority_agree, majorities_differ] + pcts = [v / total * 100 for v in vals] + labels_d = ["Majorities\nagree", "Majorities\ndiffer"] + colors_d = ["#2ecc71", "#e74c3c"] + bars = ax.bar(range(2), pcts, color=colors_d) + ax.set_xticks(range(2)) + ax.set_xticklabels(labels_d) + ax.set_ylabel("%") + ax.set_title(f"Human vs Stage1 Category Agreement\n(n={total})", fontweight="bold") + for bar, v, p in zip(bars, vals, pcts): + ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.5, + f"{v}\n({p:.1f}%)", ha="center", fontsize=9) + + # Chart 3: Same for specificity + ax = axes[2] + spec_agree = 0 + spec_differ = 0 + for pid, c in consensus.items(): + hm = c["human_spec_maj"] + sm = c["s1_spec_maj"] + if hm is None or sm is None: + continue + if hm == sm: + spec_agree += 1 + else: + spec_differ += 1 + + total = spec_agree + spec_differ + vals = [spec_agree, spec_differ] + pcts = [v / total * 100 for v in vals] + bars = ax.bar(range(2), pcts, color=colors_d) + ax.set_xticks(range(2)) + ax.set_xticklabels(labels_d) + ax.set_ylabel("%") + ax.set_title(f"Human vs Stage1 Specificity Agreement\n(n={total})", fontweight="bold") + for bar, v, p in zip(bars, vals, pcts): + ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.5, + f"{v}\n({p:.1f}%)", ha="center", fontsize=9) + + fig.tight_layout() + fig.savefig(CHART_DIR / "12_human_vs_genai_consensus.png", dpi=150) + plt.close(fig) + print(" 12_human_vs_genai_consensus.png") + + +# ═══════════════════════════════════════════════════════════ +# CHART 13: Per-annotator specificity bias +# ═══════════════════════════════════════════════════════════ +def plot_specificity_bias(): + fig, ax = plt.subplots(figsize=(10, 5)) + + # For each annotator, compare their spec vs Opus spec + ann_labels_by_name: dict[str, dict[str, dict]] = defaultdict(dict) + for l in human_labels: + ann_labels_by_name[l["annotatorName"]][l["paragraphId"]] = l + + names = annotator_names + biases = [] # mean(annotator_spec - stage1_majority_spec) + for name in names: + diffs = [] + for pid, lbl in ann_labels_by_name[name].items(): + c = consensus.get(pid) + if c and c["s1_spec_maj"] is not None: + diffs.append(lbl["specificityLevel"] - c["s1_spec_maj"]) + biases.append(np.mean(diffs) if diffs else 0) + + colors = ["#e74c3c" if b < -0.1 else "#2ecc71" if b > 0.1 else "#95a5a6" for b in biases] + bars = ax.bar(range(len(names)), biases, color=colors) + ax.set_xticks(range(len(names))) + ax.set_xticklabels(names, rotation=45, ha="right") + ax.set_ylabel("Mean (Human - Stage1 Maj) Specificity") + ax.set_title("Specificity Bias vs Stage1 (negative = under-rates, positive = over-rates)", + fontweight="bold") + ax.axhline(0, color="black", linewidth=0.5) + + for bar, b in zip(bars, biases): + ax.text(bar.get_x() + bar.get_width() / 2, + bar.get_height() + (0.02 if b >= 0 else -0.05), + f"{b:+.2f}", ha="center", fontsize=9) + + fig.tight_layout() + fig.savefig(CHART_DIR / "13_specificity_bias.png", dpi=150) + plt.close(fig) + print(" 13_specificity_bias.png") + + +# ═══════════════════════════════════════════════════════════ +# CHART 14: Disagreement axes — human vs GenAI top confusions +# ═══════════════════════════════════════════════════════════ +def plot_disagreement_axes(): + fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6)) + + # Human disagreement axes (where 2 annotators agree, 1 disagrees) + human_axes = Counter() + for pid, lbls in human_by_pid.items(): + cats = [l["contentCategory"] for l in lbls] + if len(set(cats)) == 2: + c = Counter(cats) + items = c.most_common() + axis = tuple(sorted([items[0][0], items[1][0]])) + human_axes[axis] += 1 + elif len(set(cats)) == 3: + for i, c1 in enumerate(cats): + for c2 in cats[i+1:]: + if c1 != c2: + axis = tuple(sorted([c1, c2])) + human_axes[axis] += 1 + + top_human = human_axes.most_common(10) + labels_h = [f"{CAT_MAP[a]}↔{CAT_MAP[b]}" for (a, b), _ in top_human] + counts_h = [c for _, c in top_human] + + ax1.barh(range(len(labels_h)), counts_h, color="#e74c3c") + ax1.set_yticks(range(len(labels_h))) + ax1.set_yticklabels(labels_h, fontsize=9) + ax1.set_xlabel("Disagreement count") + ax1.set_title("Human Top Disagreement Axes", fontweight="bold") + ax1.invert_yaxis() + + # Stage 1 disagreement axes on same paragraphs + s1_axes = Counter() + for pid, c in consensus.items(): + s1_cats = c["s1_cats"] + if len(set(s1_cats)) == 2: + cnt = Counter(s1_cats) + items = cnt.most_common() + axis = tuple(sorted([items[0][0], items[1][0]])) + s1_axes[axis] += 1 + elif len(set(s1_cats)) == 3: + for i, c1 in enumerate(s1_cats): + for c2 in s1_cats[i+1:]: + if c1 != c2: + axis = tuple(sorted([c1, c2])) + s1_axes[axis] += 1 + + top_s1 = s1_axes.most_common(10) + labels_s = [f"{CAT_MAP[a]}↔{CAT_MAP[b]}" for (a, b), _ in top_s1] + counts_s = [c for _, c in top_s1] + + ax2.barh(range(len(labels_s)), counts_s, color="#3498db") + ax2.set_yticks(range(len(labels_s))) + ax2.set_yticklabels(labels_s, fontsize=9) + ax2.set_xlabel("Disagreement count") + ax2.set_title("Stage 1 Top Disagreement Axes (same paragraphs)", fontweight="bold") + ax2.invert_yaxis() + + fig.tight_layout() + fig.savefig(CHART_DIR / "14_disagreement_axes.png", dpi=150) + plt.close(fig) + print(" 14_disagreement_axes.png") + + +# ═══════════════════════════════════════════════════════════ +# CHART 15: Quiz performance vs labeling quality +# ═══════════════════════════════════════════════════════════ +def plot_quiz_vs_quality(): + fig, ax = plt.subplots(figsize=(10, 5)) + + # Load quiz data + quiz_sessions = load_jsonl(GOLD_DIR / "quiz-sessions.jsonl") + + # Best quiz score per annotator + best_quiz: dict[str, int] = {} + attempts: dict[str, int] = defaultdict(int) + for q in quiz_sessions: + name = q["annotatorName"] + attempts[name] += 1 + if q["passed"]: + if name not in best_quiz or q["score"] > best_quiz[name]: + best_quiz[name] = q["score"] + + # Agreement rate with Stage1 majority per annotator + ann_labels_by_name: dict[str, dict[str, dict]] = defaultdict(dict) + for l in human_labels: + ann_labels_by_name[l["annotatorName"]][l["paragraphId"]] = l + + s1_agree = {} + for name in annotator_names: + agree = 0 + total = 0 + for pid, lbl in ann_labels_by_name[name].items(): + c = consensus.get(pid) + if c and c["s1_cat_maj"]: + total += 1 + if lbl["contentCategory"] == c["s1_cat_maj"]: + agree += 1 + s1_agree[name] = agree / total * 100 if total > 0 else 0 + + x = np.arange(len(annotator_names)) + width = 0.35 + ax.bar(x - width/2, [attempts.get(n, 0) for n in annotator_names], + width, label="Quiz attempts", color="#f39c12") + ax2 = ax.twinx() + ax2.bar(x + width/2, [s1_agree.get(n, 0) for n in annotator_names], + width, label="Category agree w/ Stage1 (%)", color="#3498db", alpha=0.7) + + ax.set_xticks(x) + ax.set_xticklabels(annotator_names, rotation=45, ha="right") + ax.set_ylabel("Quiz attempts", color="#f39c12") + ax2.set_ylabel("Opus agreement %", color="#3498db") + ax.set_title("Quiz Attempts vs Labeling Quality (Stage1 Agreement)", fontweight="bold") + + lines1, labels1 = ax.get_legend_handles_labels() + lines2, labels2 = ax2.get_legend_handles_labels() + ax.legend(lines1 + lines2, labels1 + labels2, loc="upper left") + + fig.tight_layout() + fig.savefig(CHART_DIR / "15_quiz_vs_quality.png", dpi=150) + plt.close(fig) + print(" 15_quiz_vs_quality.png") + + +# ═══════════════════════════════════════════════════════════ +# CHART 16: Aaryan-excluded metrics comparison +# ═══════════════════════════════════════════════════════════ +def plot_with_without_outlier(): + fig, axes = plt.subplots(1, 2, figsize=(12, 5)) + + # Find outlier (lowest avg kappa) + cat_kappas = metrics["pairwiseKappa"]["category"]["pairs"] + ann_kappa_sum = defaultdict(lambda: {"sum": 0, "n": 0}) + for pair in cat_kappas: + ann_kappa_sum[pair["a1"]]["sum"] += pair["kappa"] + ann_kappa_sum[pair["a1"]]["n"] += 1 + ann_kappa_sum[pair["a2"]]["sum"] += pair["kappa"] + ann_kappa_sum[pair["a2"]]["n"] += 1 + outlier = min(ann_kappa_sum, key=lambda a: ann_kappa_sum[a]["sum"] / ann_kappa_sum[a]["n"]) + + # Compute consensus with and without outlier + # For paragraphs where outlier participated + outlier_participated = 0 + cat_agree_with = 0 + cat_agree_without = 0 + spec_agree_with = 0 + spec_agree_without = 0 + both_agree_with = 0 + both_agree_without = 0 + + for pid, lbls in human_by_pid.items(): + if len(lbls) < 3: + continue + names = [l["annotatorName"] for l in lbls] + if outlier not in names: + continue + outlier_participated += 1 + + cats_all = [l["contentCategory"] for l in lbls] + specs_all = [l["specificityLevel"] for l in lbls] + cats_excl = [l["contentCategory"] for l in lbls if l["annotatorName"] != outlier] + specs_excl = [l["specificityLevel"] for l in lbls if l["annotatorName"] != outlier] + + cat_u_all = len(set(cats_all)) == 1 + cat_u_excl = len(set(cats_excl)) == 1 + spec_u_all = len(set(specs_all)) == 1 + spec_u_excl = len(set(specs_excl)) == 1 + + if cat_u_all: cat_agree_with += 1 + if cat_u_excl: cat_agree_without += 1 + if spec_u_all: spec_agree_with += 1 + if spec_u_excl: spec_agree_without += 1 + if cat_u_all and spec_u_all: both_agree_with += 1 + if cat_u_excl and spec_u_excl: both_agree_without += 1 + + n = outlier_participated + metrics_labels = ["Category\nUnanimous", "Specificity\nUnanimous", "Both\nUnanimous"] + with_vals = [cat_agree_with / n * 100, spec_agree_with / n * 100, both_agree_with / n * 100] + without_vals = [cat_agree_without / n * 100, spec_agree_without / n * 100, both_agree_without / n * 100] + + ax = axes[0] + x = np.arange(3) + width = 0.35 + ax.bar(x - width/2, with_vals, width, label=f"All 3 annotators", color="#e74c3c") + ax.bar(x + width/2, without_vals, width, label=f"Excluding {outlier}", color="#2ecc71") + ax.set_xticks(x) + ax.set_xticklabels(metrics_labels) + ax.set_ylabel("% of paragraphs") + ax.set_title(f"Agreement on {outlier}'s paragraphs (n={n})", fontweight="bold") + ax.legend() + + for i, (w, wo) in enumerate(zip(with_vals, without_vals)): + delta = wo - w + ax.text(i, max(w, wo) + 2, f"Δ={delta:+.1f}pp", ha="center", fontsize=9, fontweight="bold") + + # Chart 2: kappa distributions with/without + ax = axes[1] + kappas_with = [p["kappa"] for p in cat_kappas] + kappas_without = [p["kappa"] for p in cat_kappas if outlier not in (p["a1"], p["a2"])] + + positions = [1, 2] + bp = ax.boxplot([kappas_with, kappas_without], positions=positions, widths=0.5, + patch_artist=True) + bp["boxes"][0].set_facecolor("#e74c3c") + bp["boxes"][0].set_alpha(0.5) + bp["boxes"][1].set_facecolor("#2ecc71") + bp["boxes"][1].set_alpha(0.5) + + ax.set_xticks(positions) + ax.set_xticklabels(["All pairs", f"Excl. {outlier}"]) + ax.set_ylabel("Cohen's κ (category)") + ax.set_title("Kappa Distribution", fontweight="bold") + + # Add individual points + for pos, kappas in zip(positions, [kappas_with, kappas_without]): + jitter = np.random.normal(0, 0.04, len(kappas)) + ax.scatter([pos + j for j in jitter], kappas, alpha=0.6, s=30, color="black", zorder=3) + + fig.tight_layout() + fig.savefig(CHART_DIR / "16_with_without_outlier.png", dpi=150) + plt.close(fig) + print(" 16_with_without_outlier.png") + + +# ═══════════════════════════════════════════════════════════ +# TEXTUAL ANALYSIS OUTPUT +# ═══════════════════════════════════════════════════════════ +def print_analysis(): + print("\n" + "=" * 70) + print("CROSS-SOURCE ANALYSIS") + print("=" * 70) + + # Human majority vs Stage1 majority vs Opus — category + h_eq_s1 = sum(1 for c in consensus.values() + if c["human_cat_maj"] and c["s1_cat_maj"] and c["human_cat_maj"] == c["s1_cat_maj"]) + h_eq_op = sum(1 for c in consensus.values() + if c["human_cat_maj"] and c["opus_cat"] and c["human_cat_maj"] == c["opus_cat"]) + s1_eq_op = sum(1 for c in consensus.values() + if c["s1_cat_maj"] and c["opus_cat"] and c["s1_cat_maj"] == c["opus_cat"]) + + # Count where all exist + n_with_all_cat = sum(1 for c in consensus.values() + if c["human_cat_maj"] and c["s1_cat_maj"] and c["opus_cat"]) + n_with_hmaj = sum(1 for c in consensus.values() if c["human_cat_maj"]) + n_with_s1maj = sum(1 for c in consensus.values() if c["s1_cat_maj"]) + + print(f"\n── Category Agreement Rates ──") + print(f" Human maj = Stage1 maj: {h_eq_s1}/{n_with_hmaj} ({h_eq_s1/n_with_hmaj*100:.1f}%)") + if OPUS_AVAILABLE: + n_with_opus_and_hmaj = sum(1 for c in consensus.values() + if c["human_cat_maj"] and c["opus_cat"]) + n_with_opus_and_s1 = sum(1 for c in consensus.values() + if c["s1_cat_maj"] and c["opus_cat"]) + if n_with_opus_and_hmaj > 0: + print(f" Human maj = Opus: {h_eq_op}/{n_with_opus_and_hmaj} ({h_eq_op/n_with_opus_and_hmaj*100:.1f}%)") + if n_with_opus_and_s1 > 0: + print(f" Stage1 maj = Opus: {s1_eq_op}/{n_with_opus_and_s1} ({s1_eq_op/n_with_opus_and_s1*100:.1f}%)") + else: + print(f" (Opus comparison skipped — only {opus_coverage}/1200 matched)") + + # Specificity + h_eq_s1_spec = sum(1 for c in consensus.values() + if c["human_spec_maj"] is not None and c["s1_spec_maj"] is not None + and c["human_spec_maj"] == c["s1_spec_maj"]) + + n_h_spec = sum(1 for c in consensus.values() if c["human_spec_maj"] is not None) + + print(f"\n── Specificity Agreement Rates ──") + print(f" Human maj = Stage1 maj: {h_eq_s1_spec}/{n_h_spec} ({h_eq_s1_spec/n_h_spec*100:.1f}%)") + + # Disagreement patterns between human and Stage1 + print(f"\n── Disagreement Patterns (Human vs Stage1) ──") + human_unan_s1_agrees = 0 + human_unan_s1_differs = 0 + s1_unan_human_agrees = 0 + s1_unan_human_differs = 0 + for c in consensus.values(): + hm = c["human_cat_maj"] + sm = c["s1_cat_maj"] + hu = c["human_cat_unanimous"] + su = c["s1_cat_unanimous"] + if hm and sm: + if hu and su: + if hm == sm: + human_unan_s1_agrees += 1 + else: + human_unan_s1_differs += 1 + + print(f" Both unanimous, agree: {human_unan_s1_agrees}") + print(f" Both unanimous, DIFFER: {human_unan_s1_differs}") + + # Where do the majorities differ? Top confusion axes + human_s1_confusion = Counter() + for c in consensus.values(): + hm = c["human_cat_maj"] + sm = c["s1_cat_maj"] + if hm and sm and hm != sm: + axis = tuple(sorted([hm, sm])) + human_s1_confusion[axis] += 1 + + if human_s1_confusion: + print(f"\n Top Human↔Stage1 disagreement axes:") + for (a, b), count in human_s1_confusion.most_common(8): + print(f" {CAT_MAP[a]}↔{CAT_MAP[b]}: {count}") + + # Paragraphs with NO majority on any source + no_human_maj = sum(1 for c in consensus.values() if c["human_cat_maj"] is None) + no_s1_maj = sum(1 for c in consensus.values() if c["s1_cat_maj"] is None) + print(f"\n── 3-way splits (no majority) ──") + print(f" Human: {no_human_maj} paragraphs") + print(f" Stage1: {no_s1_maj} paragraphs") + + +# ═══════════════════════════════════════════════════════════ +# Run all +# ═══════════════════════════════════════════════════════════ +print("\nGenerating charts...") +plot_kappa_heatmaps() +plot_annotator_category_dist() +plot_annotator_spec_dist() +plot_human_confusion() +plot_cross_source_confusion() +plot_cross_source_specificity() +plot_annotator_vs_references() +plot_agreement_by_wordcount() +plot_time_vs_agreement() +plot_none_other_analysis() +plot_outlier_annotator() +plot_human_vs_genai_consensus() +plot_specificity_bias() +plot_disagreement_axes() +plot_quiz_vs_quality() +plot_with_without_outlier() +print_analysis() +print(f"\nAll charts saved to {CHART_DIR}/") diff --git a/ts/src/cli.ts b/ts/src/cli.ts index 778e400..337048c 100644 --- a/ts/src/cli.ts +++ b/ts/src/cli.ts @@ -1,7 +1,7 @@ import { readJsonl } from "./lib/jsonl.ts"; import { Paragraph } from "@sec-cybert/schemas/paragraph.ts"; import { Annotation } from "@sec-cybert/schemas/annotation.ts"; -import { STAGE1_MODELS } from "./lib/openrouter.ts"; +import { STAGE1_MODELS, BENCHMARK_MODELS } from "./lib/openrouter.ts"; import { runBatch } from "./label/batch.ts"; import { runGoldenBatch } from "./label/golden.ts"; import { computeConsensus } from "./label/consensus.ts"; @@ -26,7 +26,9 @@ Commands: label:annotate-all [--limit N] [--concurrency N] label:consensus label:judge [--concurrency N] - label:golden [--paragraphs ] [--limit N] [--delay N] (Opus via Agent SDK) + label:golden [--paragraphs ] [--limit N] [--delay N] [--concurrency N] (Opus via Agent SDK) + label:bench-holdout --model [--concurrency N] [--limit N] (benchmark model on holdout) + label:bench-holdout-all [--concurrency N] [--limit N] (all BENCHMARK_MODELS on holdout) label:cost`); process.exit(1); } @@ -223,8 +225,8 @@ async function cmdJudge(): Promise { } async function cmdGolden(): Promise { - // Load the 1,200 human-labeled paragraph IDs from the labelapp sample - const sampledIdsPath = "../labelapp/.sampled-ids.json"; + // Load the 1,200 human-labeled paragraph IDs from the original sample + const sampledIdsPath = flag("ids") ?? "../labelapp/.sampled-ids.original.json"; const sampledIds = new Set( JSON.parse(await import("node:fs/promises").then((fs) => fs.readFile(sampledIdsPath, "utf-8"))), ); @@ -248,9 +250,77 @@ async function cmdGolden(): Promise { errorsPath: `${DATA}/annotations/golden/opus-errors.jsonl`, limit: flag("limit") !== undefined ? flagInt("limit", 50) : undefined, delayMs: flag("delay") !== undefined ? flagInt("delay", 1000) : 1000, + concurrency: flagInt("concurrency", 1), }); } +async function loadHoldoutParagraphs(): Promise { + const sampledIdsPath = "../labelapp/.sampled-ids.original.json"; + const sampledIds = new Set( + JSON.parse(await import("node:fs/promises").then((fs) => fs.readFile(sampledIdsPath, "utf-8"))), + ); + process.stderr.write(` Loaded ${sampledIds.size} holdout IDs from ${sampledIdsPath}\n`); + + const paragraphsPath = `${DATA}/paragraphs/paragraphs-clean.patched.jsonl`; + const { records: allParagraphs, skipped } = await readJsonl(paragraphsPath, Paragraph); + if (skipped > 0) process.stderr.write(` ⚠ Skipped ${skipped} invalid paragraph lines\n`); + + const paragraphs = allParagraphs.filter((p) => sampledIds.has(p.id)); + process.stderr.write(` Matched ${paragraphs.length}/${sampledIds.size} holdout paragraphs\n`); + + if (paragraphs.length === 0) { + process.stderr.write(" ✖ No matching paragraphs found\n"); + process.exit(1); + } + return paragraphs; +} + +async function cmdBenchHoldout(): Promise { + const modelId = flag("model"); + if (!modelId) { + console.error("--model is required"); + process.exit(1); + } + const paragraphs = await loadHoldoutParagraphs(); + const modelShort = modelId.split("/")[1]!; + + await runBatch(paragraphs, { + modelId, + stage: "benchmark", + outputPath: `${DATA}/annotations/bench-holdout/${modelShort}.jsonl`, + errorsPath: `${DATA}/annotations/bench-holdout/${modelShort}-errors.jsonl`, + sessionsPath: SESSIONS_PATH, + concurrency: flagInt("concurrency", 60), + limit: flag("limit") !== undefined ? flagInt("limit", 50) : undefined, + }); +} + +async function cmdBenchHoldoutAll(): Promise { + const paragraphs = await loadHoldoutParagraphs(); + const concurrency = flagInt("concurrency", 60); + const limit = flag("limit") !== undefined ? flagInt("limit", 50) : undefined; + + // Exclude Stage 1 models — we already have their annotations + const benchModels = BENCHMARK_MODELS.filter( + (m) => !(STAGE1_MODELS as readonly string[]).includes(m), + ); + process.stderr.write(` Running ${benchModels.length} benchmark models (excluding Stage 1 panel)\n`); + + for (const modelId of benchModels) { + const modelShort = modelId.split("/")[1]!; + process.stderr.write(`\n ═══ ${modelId} ═══\n`); + await runBatch(paragraphs, { + modelId, + stage: "benchmark", + outputPath: `${DATA}/annotations/bench-holdout/${modelShort}.jsonl`, + errorsPath: `${DATA}/annotations/bench-holdout/${modelShort}-errors.jsonl`, + sessionsPath: SESSIONS_PATH, + concurrency, + limit, + }); + } +} + async function cmdCost(): Promise { const modelCosts: Record = {}; const stageCosts: Record = {}; @@ -359,6 +429,12 @@ switch (command) { case "label:golden": await cmdGolden(); break; + case "label:bench-holdout": + await cmdBenchHoldout(); + break; + case "label:bench-holdout-all": + await cmdBenchHoldoutAll(); + break; case "label:cost": await cmdCost(); break; diff --git a/ts/src/label/golden.ts b/ts/src/label/golden.ts index 282617c..9e7dca9 100644 --- a/ts/src/label/golden.ts +++ b/ts/src/label/golden.ts @@ -74,6 +74,8 @@ export interface GoldenBatchOpts { limit?: number; /** Delay between requests in ms. Default 1000 (1 req/s). */ delayMs?: number; + /** Number of concurrent workers. Default 1 (serial). */ + concurrency?: number; } /** Build the enhanced system prompt: full codebook + v2.5 operational prompt + JSON schema. */ @@ -138,6 +140,9 @@ async function annotateGolden( outputTokens: 0, }; + // Prevent git pull and other non-essential traffic when running concurrently + process.env.CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC = "1"; + for await (const message of query({ prompt: buildUserPrompt(paragraph), options: { @@ -150,10 +155,11 @@ async function annotateGolden( // No tools — pure classification allowedTools: [], disallowedTools: ["Bash", "Read", "Write", "Edit", "Glob", "Grep", "WebSearch", "WebFetch", "Agent", "AskUserQuestion"], - // Isolation: no hooks, no settings, no session persistence + // Isolation: no hooks, no settings, no session persistence, no plugins hooks: {}, settingSources: [], persistSession: false, + plugins: [], // Single-turn: one prompt → one structured response maxTurns: 1, permissionMode: "dontAsk", @@ -242,14 +248,15 @@ async function annotateGolden( } /** - * Run golden set annotation: serial 1-req/s through the Agent SDK. + * Run golden set annotation through the Agent SDK. + * Supports concurrent workers for parallelism. * Crash-safe with JSONL checkpoint resume. */ export async function runGoldenBatch( paragraphs: Paragraph[], opts: GoldenBatchOpts, ): Promise { - const { outputPath, errorsPath, limit, delayMs = 1000 } = opts; + const { outputPath, errorsPath, limit, delayMs = 1000, concurrency = 1 } = opts; const runId = uuidv4(); // Build system prompt once (codebook + operational prompt) @@ -272,69 +279,86 @@ export async function runGoldenBatch( } process.stderr.write( - ` Starting golden annotation │ Opus 4.6 (Agent SDK) │ ${total} remaining of ${paragraphs.length}\n`, + ` Starting golden annotation │ Opus 4.6 (Agent SDK) │ ${total} remaining of ${paragraphs.length} │ concurrency=${concurrency}\n`, ); let processed = 0; let errored = 0; const startTime = Date.now(); + // Serialized file writes to prevent corruption + let writeQueue = Promise.resolve(); + function safeAppend(path: string, data: object) { + writeQueue = writeQueue.then(() => appendJsonl(path, data)); + return writeQueue; + } + // Graceful shutdown let stopping = false; const onSignal = () => { if (stopping) return; stopping = true; - process.stderr.write("\n ⏸ Stopping — finishing current request...\n"); + process.stderr.write("\n ⏸ Stopping — finishing in-flight requests...\n"); }; process.on("SIGINT", onSignal); process.on("SIGTERM", onSignal); - for (const paragraph of remaining) { - if (stopping) break; + // Dashboard refresh + function renderStatus() { + const elapsed = (Date.now() - startTime) / 1000; + const rate = elapsed > 0 ? (processed / elapsed) * 60 : 0; + const etaMin = rate > 0 ? Math.round((total - processed) / rate) : 0; + process.stderr.write( + `\x1b[2K\r ${processed}/${total} (${((processed / total) * 100).toFixed(1)}%) │ ${rate.toFixed(1)} para/min │ ETA ${etaMin}m │ ${errored} errors`, + ); + } + const dashboardInterval = setInterval(renderStatus, 2000); - try { - const annotation = await annotateGolden(paragraph, runId, systemPrompt); - await appendJsonl(outputPath, annotation); - processed++; + // Worker pool: N concurrent workers pulling from shared queue + let nextIdx = 0; + async function worker() { + while (nextIdx < remaining.length && !stopping) { + const idx = nextIdx++; + const paragraph = remaining[idx]!; - if (processed % 10 === 0 || processed === total) { - const elapsed = (Date.now() - startTime) / 1000; - const rate = (processed / elapsed) * 60; - const etaMin = Math.round((total - processed) / rate); - process.stderr.write( - ` ${processed}/${total} (${((processed / total) * 100).toFixed(1)}%) │ ${rate.toFixed(1)} para/min │ ETA ${etaMin}m │ ${errored} errors\n`, - ); + try { + const annotation = await annotateGolden(paragraph, runId, systemPrompt); + await safeAppend(outputPath, annotation); + processed++; + } catch (error) { + errored++; + await safeAppend(errorsPath, { + paragraphId: paragraph.id, + error: error instanceof Error ? error.message : String(error), + modelId: "anthropic/claude-opus-4-6", + timestamp: new Date().toISOString(), + }); + + if (errored >= 10 && processed === 0) { + stopping = true; + process.stderr.write("\n ✖ 10 errors with no successes. Stopping.\n"); + } } - } catch (error) { - errored++; - await appendJsonl(errorsPath, { - paragraphId: paragraph.id, - error: error instanceof Error ? error.message : String(error), - modelId: "anthropic/claude-opus-4-6", - timestamp: new Date().toISOString(), - }); - process.stderr.write( - ` ✖ Error on ${paragraph.id}: ${error instanceof Error ? error.message : String(error)}\n`, - ); - - // 5 consecutive errors with no successes = likely systemic - if (errored >= 5 && processed === 0) { - process.stderr.write(" ✖ 5 errors with no successes. Stopping.\n"); - break; + // Per-worker delay between requests + if (!stopping) { + await new Promise((r) => setTimeout(r, delayMs)); } } - - // Rate limit: 1 req/s - if (!stopping) { - await new Promise((r) => setTimeout(r, delayMs)); - } } + const workers = Array.from( + { length: Math.min(concurrency, remaining.length) }, + () => worker(), + ); + await Promise.all(workers); + // Cleanup + clearInterval(dashboardInterval); process.off("SIGINT", onSignal); process.off("SIGTERM", onSignal); + renderStatus(); process.stderr.write( `\n ✓ Golden annotation done: ${processed} processed, ${errored} errors\n`, );