SEC-cyBERT/docs/LABELING-CODEBOOK.md
2026-04-04 15:01:20 -04:00

461 lines
24 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Labeling Codebook — SEC Cybersecurity Disclosure Quality (v2)
This codebook is the authoritative reference for all human and GenAI labeling. Every annotator (human or model) must follow these definitions exactly.
---
## Classification Design
**Unit of analysis:** One paragraph from an SEC filing (Item 1C of 10-K, or Item 1.05/8.01/7.01 of 8-K). Each paragraph is classified in isolation — do not use context from other paragraphs in the same filing.
**Each paragraph receives two labels:**
1. **Content Category** — one of 7 mutually exclusive classes
2. **Specificity Level** — ordinal integer 14
Each paragraph receives exactly one content category (single-label, not multi-label).
---
## Dimension 1: Content Category
### Primary Test
For every paragraph, ask: **"What question does this paragraph primarily answer?"**
| Question | Category |
|----------|----------|
| How does the board oversee cybersecurity? | Board Governance |
| How is management organized to handle cybersecurity? | Management Role |
| What does the cybersecurity program do? | Risk Management Process |
| How are third-party cyber risks managed? | Third-Party Risk |
| What happened in a cybersecurity incident? | Incident Disclosure |
| How does cybersecurity affect the business or finances? | Strategy Integration |
| *(none of the above / no substantive disclosure)* | None/Other |
If a paragraph touches multiple categories, assign the one whose question the paragraph most directly answers. When genuinely split, the category whose content occupies the most text wins.
---
### Board Governance (BG)
- **SEC basis:** Item 106(c)(1)
- **Answers:** "How does the board oversee cybersecurity?"
- **Covers:** Board or committee oversight of cybersecurity risks, briefing frequency and scope, board member cybersecurity expertise, delegation of oversight responsibilities, how information flows to the board
- **Key markers:** "Board of Directors oversees," "Audit Committee," "quarterly briefings to the board," "board-level expertise," "board committee," "reports to the Board"
- **Includes:** Governance-chain paragraphs (Board → Committee → Officer → Program) where the purpose is describing the oversight structure
> *"The Audit Committee receives quarterly reports from the CISO on the Company's cybersecurity posture, including threat landscape assessments and vulnerability management results."*
> → **BG, Specificity 2** — answers "how does the board oversee?" The CISO and program details are subordinate to the reporting structure.
---
### Management Role (MR)
- **SEC basis:** Item 106(c)(2)
- **Answers:** "How is management organized to handle cybersecurity?"
- **Covers:** Cybersecurity leadership roles and responsibilities, qualifications and credentials, career history and experience, management-level committee structure and membership, reporting lines between management roles, team composition and size
- **Key markers:** "CISO," "reports to the CIO," "years of experience," "CISSP," "management committee," named individuals, career background, team size
- **Assign when:** The paragraph's primary content is about management's organizational role in cybersecurity — who holds responsibilities, how those responsibilities are divided, what qualifies those people, or how management-level oversight is structured
**Person-removal test (MR vs. RMP):** Remove all person-specific content (names, titles, qualifications, experience, reporting lines, team composition, committee membership). If the remaining text still describes a substantive cybersecurity program → **RMP**. If the paragraph collapses to near-nothing → **MR**.
> *"Our CFO and VP of IT jointly oversee our cybersecurity program. The CFO is responsible for risk governance and insurance, while the VP of IT manages technical operations. They report to the board quarterly on cybersecurity matters."*
> → **MR, Specificity 3** — answers "how is management organized?" Role allocation and reporting structure are the substance.
> *"Our CISO oversees a cybersecurity program that includes vulnerability scanning, penetration testing, and incident response planning aligned with NIST CSF."*
> → **RMP, Specificity 2** — person-removal test: "a cybersecurity program that includes vulnerability scanning, penetration testing, and incident response planning aligned with NIST CSF" → still a complete program description. The CISO is attribution, not content.
---
### Risk Management Process (RMP)
- **SEC basis:** Item 106(b)
- **Answers:** "What does the cybersecurity program do?"
- **Covers:** Risk assessment methodology, framework adoption, vulnerability management, security monitoring, incident response planning, security operations, tools and technologies, employee training programs, ERM integration
- **Key markers:** "NIST CSF," "ISO 27001," "vulnerability management," "penetration testing," "SOC," "SIEM," "incident response," "tabletop exercises"
- **Assign when:** The paragraph's primary content describes cybersecurity activities, processes, tools, or methodologies — regardless of who is mentioned as responsible
> *"We maintain a cybersecurity program aligned with the NIST Cybersecurity Framework. The program includes regular vulnerability assessments, penetration testing, and 24/7 monitoring through our Security Operations Center."*
> → **RMP, Specificity 2** — answers "what does the program do?" Domain terminology present but nothing unique to THIS company.
---
### Third-Party Risk (TP)
- **SEC basis:** Item 106(b)
- **Answers:** "How are third-party cyber risks managed?"
- **Covers:** Vendor/supplier cybersecurity oversight, external assessor requirements, contractual security requirements, supply chain risk management
- **Key markers:** "third-party," "service providers," "vendor risk," "SOC 2 report," "contractual requirements," "supply chain"
- **Assign when:** The central topic is oversight of external parties' cybersecurity
**TP vs. RMP:** A firm hired to assess the company's OWN security → RMP (the firm serves the company's program). Requirements imposed ON vendors, or assessment of vendors' cybersecurity → TP.
> *"We require all critical vendors to maintain SOC 2 Type II certification and conduct annual security assessments of our top 50 service providers."*
> → **TP, Specificity 4** — requirements imposed on vendors; "50 service providers" is a QV-eligible fact.
---
### Incident Disclosure (ID)
- **SEC basis:** 8-K Item 1.05 (and 8.01/7.01 post-May 2024)
- **Answers:** "What happened in a cybersecurity incident?"
- **Covers:** Description of actual cybersecurity incidents — nature, scope, timing, impact, remediation, investigation
- **Key markers:** "unauthorized access," "detected," "incident," "breach," "forensic investigation," "remediation," "compromised"
- **Assign when:** The paragraph describes events that actually occurred
- **Not for:** Hypothetical incident language ("we may experience...") in a 10-K → classify by actual content (usually RMP or SI)
> *"On January 15, 2024, we detected unauthorized access to our customer support portal. We activated our incident response plan and engaged Mandiant for forensic investigation."*
> → **ID, Specificity 4** — describes what happened; January 15, 2024 (specific date) and Mandiant (named third-party firm) are QV-eligible facts.
---
### Strategy Integration (SI)
- **SEC basis:** Item 106(b)(2)
- **Answers:** "How does cybersecurity affect the business or finances?"
- **Covers:** Materiality assessments, cybersecurity insurance, budget/investment allocation, cost of incidents, business strategy impact
- **Key markers:** "materially affected," "reasonably likely to materially affect," "insurance," "budget," "investment"
**Materiality assessment rule:** A paragraph that states a CONCLUSION about whether cybersecurity has or will affect business outcomes is SI — regardless of how generic the language is.
| Language | Type | Category |
|----------|------|----------|
| "Have not materially affected our business strategy, results of operations, or financial condition" | Backward-looking assessment | **SI** |
| "Are reasonably likely to materially affect" | Forward-looking assessment (SEC's Item 106(b)(2) language) | **SI** |
| "We have not experienced any material cybersecurity incidents" | Negative assertion with materiality framing | **SI** |
| "Could have a material adverse effect on our business" | Speculation — every 10-K says this | **Not SI** — classify by other content, or N/O |
| "Managing material risks" | "Material" as adjective, not an assessment | **Not SI** |
| "Risks that may materially affect... see Item 1A" | Describes what another section covers | **N/O** |
**The test:** Is the company STATING A CONCLUSION about materiality? "Reasonably likely" signals an assessment (SEC's required language). Bare "could" or "may" is speculation.
> *"Risks from cybersecurity threats have not materially affected, and are not reasonably likely to materially affect, our business strategy, results of operations, or financial condition."*
> → **SI, Specificity 1** — materiality assessment, but boilerplate language.
---
### None/Other (N/O)
- **Answers:** None of the six substantive questions
- **Covers:** Forward-looking disclaimers, section headers, cross-references, general business language, non-cybersecurity content, text extracted from outside Item 1C/1.05
- **Always receives Specificity 1**
**SPACs and no-operations companies:** Companies with no cybersecurity program receive N/O regardless of incidental mentions of board oversight or risk acknowledgment. The absence of a program is not a disclosure.
**N/O vs. SI:** A cross-reference is N/O even if it mentions materiality. "For risks that may materially affect us, see Item 1A" → N/O (pointing elsewhere). "Risks have not materially affected us. See Item 1A" → SI (the first sentence IS an assessment; the cross-reference is subordinate).
**N/O vs. RMP:** Generic risk language without cybersecurity-specific content is N/O. But if a paragraph describes actual cybersecurity measures ("we have implemented network monitoring and access controls"), it is RMP — even if the framing is generic.
> *"This Annual Report contains forward-looking statements within the meaning of Section 27A of the Securities Act."*
> → **N/O, Specificity 1**
> *"We are a special purpose acquisition company with no business operations. We have not adopted any cybersecurity risk management program."*
> → **N/O, Specificity 1** — no substantive disclosure; absence of a program is not a program description.
---
## Category Decision Rules
### Rule 1: Board Governance vs. Management Role
**Governance-chain paragraphs** (Board → Committee → Officer → Program) default to **BG** when the purpose is describing the oversight structure. They become MR only when management's organizational role — responsibilities, qualifications, committee membership — is the primary content.
| Pattern | Category |
|---------|----------|
| Board/committee oversees, receives reports, delegates | BG |
| Management reports TO the board (describing oversight flow) | BG |
| Management roles, responsibilities, and how they're divided | MR |
| Person's qualifications, credentials, experience | MR |
| Management-level committee structure and membership | MR |
| Board-level committee (Audit Committee, Risk Committee of the Board) | BG |
| Management-level committee (Cybersecurity Steering Committee, Security Council) | MR (if about structure/membership) or RMP (if about activities) |
### Rule 2: Management Role vs. Risk Management Process
Apply the **person-removal test**: remove all person-specific content (names, titles, qualifications, experience, reporting lines, team composition, committee membership). If a substantive cybersecurity program description remains → **RMP**. If the paragraph loses its substance → **MR**.
**The core distinction:** MR is about **roles** — who is responsible, how they're organized, what qualifies them. RMP is about **activities** — what the program does, how it operates, what tools and frameworks it uses.
| Signal | Category |
|--------|----------|
| Role title as brief attribution, then program details | RMP |
| Role allocation, responsibilities divided among people | MR |
| Reporting structure between management roles | MR |
| Qualifications, credentials, experience, career history | MR |
| Activities, tools, frameworks, processes as primary content | RMP |
| "Under the direction of our CISO, the Company has implemented..." | RMP (program is the content) |
### Rule 3: Third-Party Risk vs. Risk Management Process
| Signal | Category |
|--------|----------|
| Company's own internal processes, tools, teams | RMP |
| Third parties hired to serve the company (assessors, consultants, pen testers) | RMP |
| Requirements imposed on vendors | TP |
| Assessment of vendor cybersecurity posture | TP |
| Third parties mentioned as one component of an internal program | RMP |
| Vendor oversight as the central topic | TP |
### Rule 4: Incident Disclosure vs. Strategy Integration
| Signal | Category |
|--------|----------|
| What happened (timeline, scope, response actions) | ID |
| Business/financial impact of an incident | SI |
| Mixed — incident with brief cost mention | ID (incident frame dominates) |
| Mixed — financial analysis with brief incident reference | SI (business frame dominates) |
### Rule 5: None/Other Threshold
N/O only when the paragraph contains no substantive cybersecurity disclosure. If a paragraph describes any actual cybersecurity measure, process, or assessment — however generic — assign the relevant substantive category.
**Exceptions:**
- SPACs and no-operations companies with no real program → N/O
- Pure speculation ("could have a material adverse effect") with no substantive disclosure → N/O
- Regulatory compliance acknowledged generically ("subject to various regulations") without describing what the company does to comply → N/O
---
## Dimension 2: Specificity Level
### Decision Test
Check in order. Stop at the first "yes."
1. **Does it contain at least one QV-eligible fact?****Level 4: Quantified-Verifiable**
2. **Does it contain at least one firm-specific fact (IS list)?****Level 3: Firm-Specific**
3. **Does it use any cybersecurity domain terminology (Domain list)?****Level 2: Domain-Adapted**
4. **None of the above****Level 1: Generic Boilerplate**
None/Other paragraphs always receive Specificity 1.
### Level Definitions
| Level | Name | Description |
|-------|------|-------------|
| 1 | Generic Boilerplate | Could paste into any company's filing unchanged. Uses only general business/risk language. No cybersecurity domain terminology, no firm-specific details, no verifiable facts. |
| 2 | Domain-Adapted | Uses cybersecurity domain terminology that a security professional would recognize as industry-specific, but contains nothing unique to THIS company. |
| 3 | Firm-Specific | Contains at least one fact identifying something unique to THIS company's cybersecurity posture — a named role, committee, program, or organizational detail not found at every public company. |
| 4 | Quantified-Verifiable | Contains at least one hard fact that an external party could independently verify — a specific number, date, named external entity, or externally-issued credential. |
---
### Domain Terminology — any one → at least Level 2
These terms originate from the cybersecurity domain. The test: **would this term appear naturally in a generic enterprise risk management document that has nothing to do with cybersecurity?** If no → it is domain terminology.
**Practices and activities:**
- Penetration testing / pen testing
- Vulnerability scanning / vulnerability assessments
- Red teaming / red team exercises
- Phishing simulations
- Security awareness training
- Threat hunting / threat intelligence
- Patch management
- Identity and access management (IAM)
- Data loss prevention (DLP)
- Network segmentation
- Encryption (as a security measure)
**Tool and infrastructure categories:**
- SIEM (Security Information and Event Management)
- SOC (Security Operations Center)
- EDR / XDR / MDR (endpoint/extended/managed detection and response)
- WAF (Web Application Firewall)
- IDS / IPS (Intrusion Detection/Prevention System)
- MFA / 2FA (Multi-Factor Authentication)
- Firewall (as a security control)
- Antivirus / anti-malware
**Architectural concepts:**
- Zero trust / zero trust architecture
- Defense in depth
- Least privilege / principle of least privilege
**Named standards and frameworks:**
- NIST CSF / NIST Cybersecurity Framework
- ISO 27001 / ISO 27002
- SOC 2 (Type I / Type II)
- CIS Controls / CIS Benchmarks
- PCI DSS, HIPAA, GDPR (in cybersecurity context)
- COBIT, MITRE ATT&CK
**Specific threat types:**
- Ransomware, malware, phishing (as a threat)
- DDoS (Distributed Denial of Service)
- Supply chain attack / compromise
- Social engineering
- Advanced persistent threat (APT)
- Zero-day vulnerability / exploit
### NOT Domain Terminology — remains Level 1
These terms belong to general business, IT, or enterprise risk management.
- Risk assessment / risk management / risk mitigation
- Incident response plan / IRP
- Business continuity plan / disaster recovery
- Tabletop exercises (without cybersecurity qualifier)
- Enterprise risk management (ERM)
- Internal controls / policies and procedures
- Compliance (general)
- "Processes to identify, assess, and manage risks"
- "Measures to protect our systems and data"
- "Regular monitoring" / "continuous improvement"
- "Cross-functional team"
- "Dedicated cybersecurity team" (organizational approach, not a unique fact)
---
### Firm-Specific Facts — any one → at least Level 3
These identify something unique to THIS company's cybersecurity posture.
**IS firm-specific:**
- **Cybersecurity-specific titles:** CISO, CTO, CIO, VP of IT/Security, Information Security Officer, Director of IT Security, Cybersecurity Director, Chief Digital Officer (when overseeing cyber)
- **Named non-generic committees:** Technology Committee, Cybersecurity Committee, Cybersecurity Steering Committee, Risk Committee (NOT "Audit Committee" — every public company has one)
- **Specific team/department compositions:** "Legal, Compliance, and Finance" (NOT "a cross-functional team")
- **Named internal programs with distinguishing identifiers:** "Cyber Incident Response Plan (CIRP)" (generic "incident response plan" does not qualify)
- **Named individuals** in a cybersecurity role context
- **Specific organizational claims:** "24/7 security operations" (implies specific organizational investment beyond generic monitoring)
**NOT firm-specific (too generic):**
- "The Board," "Board of Directors," "Audit Committee," "management" — exist at every public company
- CEO, CFO, COO, President, General Counsel — not cybersecurity-specific roles
- "Head of IT," "IT Manager," "Director of IT" — general IT titles, not cybersecurity leadership
- "Third-party experts," "external consultants," "cybersecurity firms" — unnamed entities
- "Quarterly," "annual," "regular" — generic cadences without specific dates
- "The Company," company self-references, subsidiary names
- Generic program names without distinguishing identifiers: "incident response plan," "cybersecurity program," "risk management program"
---
### QV-Eligible Facts — any one → Level 4
A QV fact is one that an external party could independently verify using public records, certification databases, or third-party sources. The test: **is it both quantified (a specific number, date, or named external reference) AND independently verifiable?**
**IS QV-eligible:**
- **Specific numbers:** dollar amounts, headcounts, percentages, years of experience (as a number), team sizes, specific durations
- **Specific dates:** month+year or exact date tied to a cybersecurity fact or event
- **Named external entities:** third-party firms (Mandiant, Deloitte, CrowdStrike), products/tools (Splunk, CrowdStrike Falcon, Azure Sentinel, ServiceNow)
- **Certifications held by individuals:** CISSP, CISM, CEH, CRISC (verifiable via issuing body)
- **Certifications/audits held by the company:** "We maintain ISO 27001 certification," "completed SOC 2 Type II audit" (verifiable external claim)
- **Named universities in credential context:** "Ph.D. from Princeton University"
**NOT QV-eligible:**
- Named roles (CISO, CTO) → firm-specific (Level 3), not a quantified claim
- Named individuals without verifiable details → identification, not a quantified claim
- Named committees → organizational structure, not externally verifiable
- Named internal programs → firm naming, not external verification
- Named standards FOLLOWED (not certified): "aligned with NIST CSF" → Domain-Adapted (Level 2)
- Generic cadences: "quarterly," "annually" → not specific enough to verify
- Fiscal year as generic reporting context: "fiscal 2024" without a specific cybersecurity fact tied to it
**Certification distinction:**
- "Our program is aligned with ISO 27001" → **Level 2** (references a standard)
- "We are working toward ISO 27001 certification" → **Level 3** (firm-specific intent)
- "We maintain ISO 27001 certification" → **Level 4** (verifiable claim with external body)
---
### Validation Step
Before finalizing specificity:
1. Identify all facts in the paragraph
2. Check each against the NOT lists — remove any that appear
3. Classify the remaining facts: QV-eligible → Level 4, firm-specific → Level 3, domain terminology → Level 2
4. Apply the decision test with validated facts only
---
## LLM Response Schema
```typescript
import { z } from "zod";
export const ContentCategory = z.enum([
"Board Governance",
"Management Role",
"Risk Management Process",
"Third-Party Risk",
"Incident Disclosure",
"Strategy Integration",
"None/Other",
]);
export const SpecificityLevel = z.union([
z.literal(1),
z.literal(2),
z.literal(3),
z.literal(4),
]);
export const Confidence = z.enum(["high", "medium", "low"]);
export const LabelOutput = z.object({
content_category: ContentCategory
.describe("The single most applicable content category"),
specificity_level: SpecificityLevel
.describe("1=generic, 2=domain-adapted, 3=firm-specific, 4=quantified-verifiable"),
category_confidence: Confidence
.describe("high=clear-cut, medium=some ambiguity, low=genuinely torn"),
specificity_confidence: Confidence
.describe("high=clear-cut, medium=borderline, low=could argue 2+ levels"),
reasoning: z.string()
.describe("1-2 sentence justification citing specific evidence from the text"),
});
```
---
## Annotator Information Template
Each paragraph is presented with this context:
```
Company: {company_name} ({ticker})
Filing type: {filing_type}
Filing date: {filing_date}
Section: {sec_item}
Paragraph:
{paragraph_text}
```
---
## Gold Set Protocol
### Sampling
**Stratified by category** with specificity floor:
- ~170 paragraphs per content category × 7 = ~1,190
- Random within each stratum (not selected for difficulty)
- Secondary constraint: minimum ~100 per specificity level across the full set
- Separate development set (~200 paragraphs) for prompt iteration — excluded from holdout
### Human Labeling
1. Three independent annotators label the full holdout using this codebook
2. Compute inter-rater reliability:
- Cohen's Kappa (category, pairwise) — target > 0.75
- Krippendorff's Alpha (specificity, ordinal) — target > 0.67
3. Gold labels = majority vote. Where all three disagree, model consensus serves as tiebreaker.
### AI-Labeled Extension
Up to 20,000 additional paragraphs labeled by model panel consensus for supplementary evaluation. Not the assignment-defined holdout.
---
## NIST CSF 2.0 Mapping
| Our Category | NIST CSF 2.0 |
|-------------|-------------|
| Board Governance | GOVERN (GV.OV, GV.RR) |
| Management Role | GOVERN (GV.RR, GV.RM) |
| Risk Management Process | IDENTIFY (ID.RA), GOVERN (GV.RM), PROTECT (all) |
| Third-Party Risk | GOVERN (GV.SC) |
| Incident Disclosure | DETECT, RESPOND, RECOVER |
| Strategy Integration | GOVERN (GV.OC, GV.RM) |