SEC-cyBERT/docs/archive/v1/CODEBOOK-RATIONALE.md

# Codebook Rationale & Interpretive Guide

Companion to `LABELING-CODEBOOK.md`. Covers the "why" behind design decisions and common interpretive pitfalls that aren't obvious from the codebook itself.

---

## Category Design: Mapping to SEC Regulation S-K Item 106

The six substantive categories map directly to the structure of the SEC's cybersecurity disclosure rule (adopted July 2023):

| Codebook Category | SEC Basis | What the SEC is asking |
|---|---|---|
| Board Governance | Item 106(c)(1) | How does the board oversee cyber risk? |
| Management Role | Item 106(c)(2) | Who in management is responsible, and what qualifies them? |
| Risk Management Process | Item 106(b) | What processes do you use to assess, identify, and manage cyber risk? |
| Third-Party Risk | Item 106(b) | How do you handle vendor/supply chain cyber risk? |
| Strategy Integration | Item 106(b)(2) | Has cyber risk materially affected your business or financials? |
| Incident Disclosure | 8-K Item 1.05 | What happened in an actual cybersecurity incident? |
| None/Other | N/A | Classifier catch-all for non-substantive content |

### Editorial choice: Third-Party Risk as a separate category

The SEC does not give Third-Party Risk its own subsection — vendor/supply chain oversight is part of 106(b) alongside general risk management. The codebook carves it out as a distinct class because it represents a sufficiently different disclosure pattern to be analytically useful.

### "Risk Management" is broader than it sounds

The SEC's 106(b) definition of risk management encompasses the full lifecycle: assessing, identifying, **and managing** cybersecurity risks. Under frameworks like NIST CSF (which the SEC references), "managing" includes Respond and Recover functions — not just preventive controls.

This means incident response **procedures** (escalation chains, playbooks, notification workflows, materiality determination processes) are Risk Management Process, not Incident Disclosure. The test:

| What the paragraph describes | Category |
|---|---|
| Pre-established process for handling incidents (playbooks, escalation chains, "in the event of...") | **Risk Management Process** |
| An actual incident that occurred (dates, scope, remediation of a real event) | **Incident Disclosure** |

Conditional language ("in the event of," "if necessary," "if and when") is a strong signal that the paragraph describes a process, not an event.

### "Strategy Integration" is narrower than it sounds

Strategy Integration does not mean "strategic approach to cybersecurity." It specifically covers the **business and financial consequences** of cyber risk — the SEC 106(b)(2) question of whether cyber risk hit the bottom line or changed business strategy.

What qualifies:
- Materiality assessments ("have not materially affected our business strategy, results of operations, or financial condition")
- Cybersecurity spending and investment (budgets, dollar amounts, year-over-year changes)
- Insurance coverage (carriers, limits, deductibles)
- Financial impact of incidents (costs, revenue loss, insurance claims)

What does not qualify:
- Describing a sophisticated incident response process (that's Risk Management Process even though it's "strategic" in the colloquial sense)
- Describing a materiality **determination process** (the process for deciding if something is material is Risk Management Process; the actual materiality **conclusion** is Strategy Integration)

---

## Specificity Scale: Design Rationale

### The four levels measure disclosure quality progression

| Level | What it tells you |
|---|---|
| 1 — Generic Boilerplate | Company said nothing substantive. Could paste into any filing unchanged. |
| 2 — Sector-Adapted | Company name-dropped a recognized standard (NIST, ISO 27001, SOC 2, etc.) but nothing unique to their organization. |
| 3 — Firm-Specific | Company disclosed at least one fact unique to their organization. |
| 4 — Quantified-Verifiable | Company disclosed two or more independently verifiable hard facts. |

### "Sector-Adapted" refers to the cybersecurity sector, not the company's industry

The name is misleading. "Sector-Adapted" does not mean "the company adapted its disclosure to its industry" (e.g., a bank discussing financial-sector cyber risks). It means the company referenced a recognized **cybersecurity** standard or framework — NIST CSF, ISO 27001, SOC 2, PCI DSS, HIPAA, etc. The "sector" is cybersecurity itself. A utility company mentioning NERC CIP and a retailer mentioning PCI DSS both qualify for Level 2 the same way — they named a standard. The company's own industry is irrelevant to the specificity score.

### Level 2 is intentionally narrow

Level 2 requires naming a recognized standard but having zero firm-specific facts. In practice this is uncommon — most filings either say nothing specific (Level 1) or name a framework alongside a CISO or named committee in the same paragraph (Level 3).

This is a feature, not a bug. The analytically interesting distinction is between Level 1 (boilerplate box-checking) and Level 3/4 (substantive disclosure). Level 2 is a real but thin middle ground. A mushier middle would make the classifier's job harder without adding research value.

### The research contribution is the specificity dimension itself

The SEC requires cybersecurity disclosure but does not grade its quality. The 1-4 specificity scale measures something the SEC doesn't: how much substance is actually in the disclosure versus boilerplate. The core research question is whether companies are genuinely disclosing or just filling the regulatory box.

### Common specificity pitfalls

**Generic practices are not specific.** Penetration testing, vulnerability scanning, tabletop exercises, phishing simulations, security awareness training, encryption, logging and monitoring — all Level 1. These are standard activities that appear in nearly every filing.

**Long paragraphs can still be Level 1.** A paragraph can list ten generic security practices and still be boilerplate. Length and detail are not the same as specificity.

**Cross-references and section titles don't add specificity.** Quoting a long Risk Factors section title with specific-sounding language ("collaborators, contract research organizations, third-party logistics providers") is just metadata, not disclosure substance.

**The materiality boilerplate is Level 1.** The phrase "have not materially affected, and are not reasonably likely to materially affect, our business strategy, results of operations, or financial condition" appears nearly verbatim in thousands of filings. It is Strategy Integration (it makes a materiality assessment) but Specificity 1 (the assessment is template language).