SEC-cyBERT/docs/data-pipeline/EDGAR-FILING-GENERATORS.md
2026-04-05 21:00:40 -04:00

20 KiB

SEC EDGAR Filing Generator Reference

Reference for identifying which software generated a given SEC 10-K HTML filing. Built from direct inspection of EDGAR filings and market research (March 2026).


1. Major Vendors and HTML Signatures

Workiva (Wdesk) -- Market Leader for 10-K/10-Q

Filing agent CIK: 0001628280

HTML comment signature (lines 1-3):

<?xml version='1.0' encoding='ASCII'?>
<!--XBRL Document Created with the Workiva Platform-->
<!--Copyright 2025 Workiva-->
<!--r:{uuid},g:{uuid},d:{hex-id}-->

Detection heuristics:

  • HTML comment: XBRL Document Created with the Workiva Platform
  • HTML comment: Copyright \d{4} Workiva
  • Third comment line contains r:, g:, d: UUIDs (document/generation tracking)
  • xml:lang="en-US" attribute on <html> tag
  • Body uses inline styles exclusively (no CSS classes on content elements)
  • Heavy use of <span> with inline styles containing background-color, font-family, font-size, font-weight, line-height in every span
  • Div IDs follow pattern: i{hex32}_{number} (e.g., id="i56b78781f7c84a038f6ae0f6244f7dd8_1")
  • Tables use display:inline-table and vertical-align:text-bottom
  • iXBRL fact IDs follow pattern: F_{uuid} (e.g., id="F_d8dc1eb1-109d-445d-a55a-3dde1a81ca63")
  • No <meta name="generator"> tag
  • No CSS classes on body content (purely inline styles)

Structural patterns:

  • Span-heavy: nearly every text fragment wrapped in <span style="...">
  • Font specified as font-family:'Times New Roman',sans-serif (note: sans-serif fallback, unusual)
  • Line-height specified on every span (e.g., line-height:120%)
  • Background color explicitly set: background-color:#ffffff

Known quality issues:

  • Extremely verbose HTML; simple paragraphs become deeply nested span trees
  • Text extraction is clean because span boundaries align with word boundaries
  • Large file sizes due to inline style repetition

DFIN / Donnelley Financial Solutions (ActiveDisclosure)

DFIN operates under two distinct CIKs with two different HTML output formats.

DFIN "New" ActiveDisclosure (primary)

Filing agent CIK: 0000950170 (also 0000950130)

HTML comment signature:

<?xml version='1.0' encoding='ASCII'?>
<!-- DFIN New ActiveDisclosure (SM) Inline XBRL Document - http://www.dfinsolutions.com/ -->
<!-- Creation Date :2025-02-18T12:36:24.4008+00:00 -->
<!-- Copyright (c) 2025 Donnelley Financial Solutions, Inc. All Rights Reserved. -->

Detection heuristics:

  • HTML comment: DFIN New ActiveDisclosure
  • HTML comment: http://www.dfinsolutions.com/
  • HTML comment: Copyright (c) \d{4} Donnelley Financial Solutions
  • HTML comment: Creation Date : with ISO timestamp
  • Body style: padding:8px;margin:auto!important;
  • Inline styles use font-kerning:none;min-width:fit-content; on most spans
  • Extensive use of white-space:pre-wrap on spans
  • CSS class item-list-element-wrapper and page-border-spacing present
  • iXBRL fact IDs follow pattern: F_{uuid}

Structural patterns:

  • Every text span carries min-width:fit-content (distinctive)
  • Uses &#160; for spacing extensively
  • Uses <p> tags with inline margins for all paragraphs
  • Tables use explicit padding-top:0in;vertical-align:top;padding-bottom:0in cell styles

DFIN Legacy (RR Donnelley heritage)

Filing agent CIK: 0001193125

HTML signature:

<?xml version='1.0' encoding='ASCII'?>
<html xmlns:link="..." xmlns:xbrldi="..." ...>
<head>
<title>10-K</title>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/>
</head>
<body style="line-height:normal;background-color:white;">
<h5 style="font-size:10pt;font-weight:bold"><a href="#toc">Table of Contents</a></h5>

Detection heuristics:

  • No identifying HTML comments (no generator/copyright comment)
  • Accession number prefix 0001193125 is definitive
  • <body style="line-height:normal;background-color:white;">
  • Immediately starts with <h5> Table of Contents link
  • Uses deprecated namespace aliases: xmlns:xl, xmlns:xbrll, xmlns:deprecated
  • iXBRL fact IDs follow pattern: Fact_{large_number} (e.g., id="Fact_129727210")
  • Uses <FONT> tags (HTML 3.2 style) in some documents
  • Uppercase HTML tags in older filings (<P>, <B>, <DIV>)

Structural patterns:

  • Cleaner HTML than ActiveDisclosure New
  • Uses semantic <h5> for table of contents
  • Inline styles are simpler and more standard
  • File description filenames follow pattern: d{number}d10k.htm

Toppan Merrill (Bridge)

Filing agent CIKs: 0001104659 (primary), 0001558370 (secondary)

HTML comment signature:

<?xml version='1.0' encoding='ASCII'?>
<!-- iXBRL document created with: Toppan Merrill Bridge iXBRL 10.9.0.3 -->
<!-- Based on: iXBRL 1.1 -->
<!-- Created on: 2/21/2025 8:11:11 PM -->
<!-- iXBRL Library version: 1.0.9062.16423 -->
<!-- iXBRL Service Job ID: {uuid} -->

Detection heuristics:

  • HTML comment: iXBRL document created with: Toppan Merrill Bridge iXBRL
  • HTML comment: iXBRL Library version:
  • HTML comment: iXBRL Service Job ID:
  • Includes version number in comment (e.g., 10.9.0.3)
  • <title> tag contains company name + period end date (e.g., Sunstone Hotel Investors,&#160;Inc._December 31, 2024)
  • Uses xmlns:xs alongside xmlns:xsi (both XML Schema namespaces)
  • Body starts with <div style="margin-top:30pt;"></div> (distinctive)
  • iXBRL hidden div uses display:none; (no additional styles on the div)

Structural patterns:

  • Context IDs use descriptive names with GUIDs: As_Of_12_31_2024_{base64-like}, From_01_01_2024_to_12_31_2024_{guid}
  • Hidden fact IDs follow pattern: Hidden_{base64-like}
  • Unit ref IDs follow pattern: Unit_Standard_USD_{base64-like}
  • No CSS classes used on content elements
  • Relatively clean HTML structure

RDG Filings (ThunderDome Portal)

Filing agent CIK: 0001437749

HTML signature:

<?xml version='1.0' encoding='ASCII'?>
<html xmlns:thunderdome="http://www.RDGFilings.com" ...>
 <head>
  <title>avpt20241231_10k.htm</title>
  <!-- Generated by ThunderDome Portal - 2/27/2025 6:06:48 PM -->
  <meta http-equiv="Content-Type" content="text/html"/>
 </head>
 <body style="cursor: auto; padding: 0in 0.1in; font-family: &quot;Times New Roman&quot;, Times, serif; font-size: 10pt;">

Detection heuristics:

  • XML namespace: xmlns:thunderdome="http://www.RDGFilings.com"
  • HTML comment: Generated by ThunderDome Portal
  • <title> contains the filing filename
  • Body style includes cursor: auto; padding: 0in 0.1in
  • iXBRL fact IDs prefixed with thunderdome- (e.g., id="thunderdome-EntityCentralIndexKey")
  • Context ref IDs use simple date ranges: d_2024-01-01_2024-12-31
  • Other fact IDs follow ixv-{number} or c{number} pattern

Market presence: ~14,000 filings/year, rank #9 among filing agents. About 5% of annual filings.


Broadridge Financial Solutions (PROfile)

Filing agent CIKs: 0001140361 (primary), 0001133228 (secondary)

HTML comment signature:

<!-- Licensed to: Broadridge
     Document created using Broadridge PROfile 25.1.1.5279
     Copyright 1995 - 2025 Broadridge -->

Detection heuristics:

  • HTML comment: Licensed to: Broadridge
  • HTML comment: Document created using Broadridge PROfile with version number
  • HTML comment: Copyright 1995 - \d{4} Broadridge
  • CSS classes with BRPF prefix: BRPFPageBreak, BRPFPageBreakArea, BRPFPageFooter, BRPFPageHeader, BRPFPageNumberArea
  • CSS class: DSPFListTable
  • CSS class: cfttable
  • CSS class: Apple-interchange-newline (suggests Mac/WebKit origin)
  • Context ref IDs use XBRL-standard descriptive format: c20240101to20241231_AxisName_MemberName

Note: Broadridge acquired CompSci Resources LLC in July 2024 and is integrating CompSci's Transform platform. Filings may transition to Broadridge branding over time.


CompSci / Novaworks (Transform and GoFiler)

CompSci Resources produces two tools that leave distinct signatures.

CompSci Transform (now Broadridge)

Filed via: EdgarAgents LLC (0001213900) or other agents

HTML comment signature:

<?xml version='1.0' encoding='ASCII'?>
<!-- Generated by CompSci Transform (tm) - http://www.compsciresources.com -->
<!-- Created: Mon Mar 17 19:46:10 UTC 2025 -->

Detection heuristics:

  • HTML comment: Generated by CompSci Transform
  • HTML comment: http://www.compsciresources.com
  • XML namespace: xmlns:compsci="http://compsciresources.com"
  • Body wrapped in: <div style="font: 10pt Times New Roman, Times, Serif">
  • Uses <!-- Field: Rule-Page --> and <!-- Field: /Rule-Page --> HTML comments as structural markers
  • Empty <div> tags used as spacers between paragraphs
  • iXBRL context refs use simple sequential IDs: c0, c1, c2, ...
  • iXBRL fact IDs follow ixv-{number} pattern
  • Uses shorthand CSS: font: 10pt Times New Roman, Times, Serif (combined property)
  • Margin shorthand: margin: 0pt 0

Known quality issues:

  • Words can be broken across <span> tags mid-word
  • Heavy use of &#160; for spacing
  • Empty divs between every paragraph create parsing noise
  • <!-- Field: ... --> comments interspersed throughout document body

Novaworks GoFiler (XDX format)

Filed via: SECUREX Filings (0001214659) or self-filed

HTML signature:

<head>
     <title></title>
<meta http-equiv="Content-Type" content="text/html"/>
</head>
<!-- Field: Set; Name: xdx; ID: xdx_021_US%2DGAAP%2D2024%2D... -->
<!-- Field: Set; Name: xdx; ID: xdx_03B_... -->

Detection heuristics:

  • HTML comments with pattern: <!-- Field: Set; Name: xdx; ID: xdx_{code}_{data} -->
  • XDX comments appear between </head> and <body> (unusual placement)
  • Body style: font: 10pt Times New Roman, Times, Serif (same shorthand as CompSci)
  • Empty <title></title> tag
  • iXBRL fact IDs use xdx2ixbrl{number} pattern (e.g., id="xdx2ixbrl0102")
  • Standard fact IDs use Fact{number:06d} pattern (e.g., id="Fact000003")
  • Context refs use From{date}to{date} or AsOf{date} format (no separators within date)

XDX explained: XDX (XBRL Data Exchange) is GoFiler's proprietary format that uses HTML tag ID attributes ("engrams") to embed XBRL metadata. The xdx_ comments carry taxonomy, entity, period, and unit definitions that GoFiler uses to generate the final iXBRL.


Discount EDGAR / NTDAS (XBRLMaster / EDGARMaster)

Filing agent CIK: 0001477932

HTML signature:

<head>
  <title>crona_10k.htm</title>
  <!--Document Created by XBRLMaster-->
  <meta http-equiv="Content-Type" content="text/html"/>
</head>
<body style="text-align:justify;font:10pt times new roman">

Detection heuristics:

  • HTML comment: Document Created by XBRLMaster
  • Body style: text-align:justify;font:10pt times new roman
  • Hidden iXBRL div has id="XBRLDIV"
  • Additional body styles include margin-left:7%;margin-right:7%
  • Uses lowercase times new roman (no capitalization)
  • iXBRL fact IDs use ixv-{number} pattern

EdgarAgents LLC

Filing agent CIK: 0001213900

EdgarAgents is a filing agent service, not a document creation tool. The HTML they submit is typically generated by CompSci Transform, GoFiler, or other tools. Check the HTML comments to identify the actual generator.


DFIN Legacy (pre-iXBRL / SGML-era)

Filing agent CIK: 0001193125

Older filings (pre-2019) from this CIK may appear in <DOCUMENT> SGML wrapper format:

<DOCUMENT>
<TYPE>10-K
<SEQUENCE>1
<FILENAME>d913213d10k.htm
<DESCRIPTION>10-K
<TEXT>
<HTML><HEAD>
<TITLE>10-K</TITLE>
</HEAD>
<BODY BGCOLOR="WHITE" STYLE="line-height:Normal">
<Center><DIV STYLE="width:8.5in" align="left">

Detection heuristics:

  • Uppercase HTML tags: <HTML>, <HEAD>, <BODY>, <P>, <B>
  • BGCOLOR="WHITE" attribute (deprecated HTML)
  • <Center> tag with capital C
  • <DIV STYLE="width:8.5in" (page-width container)
  • <FONT> tags for styling
  • Filename pattern: d{number}d10k.htm

2. Filing Agent Market Share

Based on secfilingdata.com total filings across all form types:

Rank Filing Agent CIK 2025 Filings Total (All Time)
1 Donnelley Financial (DFIN) 0001193125 65,180 1,872,890
2 EdgarAgents LLC 0001213900 48,021 367,211
3 Quality Edgar (QES) 0001839882 38,017 151,031
4 Toppan Merrill 0001104659 48,260 988,715
5 WallStreetDocs Ltd 0001918704 22,387 56,431
6 Workiva (Wdesk) 0001628280 21,606 141,795
7 M2 Compliance LLC 0001493152 13,810 164,603
8 Davis Polk & Wardwell LLP 0000950103 16,231 326,359
9 RDG Filings (ThunderDome) 0001437749 14,209 187,270
10 Morgan Stanley 0001950047 12,822 56,468
11 Broadridge 0001140361 -- 597,664
14 SECUREX Filings 0001214659 -- 115,218
19 Blueprint 0001654954 -- 62,250
20 FilePoint 0001398344 -- 76,218
38 Discount EDGAR 0001477932 -- 37,422

For 10-K/10-Q specifically (estimated from biotech IPO data and market research):

  • DFIN: ~40-50% of annual/quarterly filings
  • Workiva: ~25-35% (has been gaining share from DFIN since ~2010)
  • Toppan Merrill: ~10-15%
  • RDG Filings: ~5%
  • Broadridge/CompSci: ~5%
  • Others (law firms, self-filed, smaller agents): ~5-10%

3. XBRL/iXBRL Tool Signatures

The iXBRL tagging tool is often the same as the filing generator, but not always. Key distinguishing patterns in the iXBRL layer:

Tool Context Ref Pattern Fact ID Pattern Unit Ref Pattern
Workiva C_{uuid} F_{uuid} U_{uuid}
DFIN New C_{uuid} F_{uuid} Standard names
DFIN Legacy Fact_{large_int} Fact_{large_int} Standard names
Toppan Merrill As_Of_{date}_{guid} / From_{date}_to_{date}_{guid} Hidden_{guid} Unit_Standard_USD_{guid}
ThunderDome d_{date_range} / i_{date} thunderdome-{name} or ixv-{n} or c{n} Standard names
CompSci Transform c0, c1, c2 ... ixv-{number} Standard names
GoFiler (XDX) From{date}to{date} / AsOf{date} xdx2ixbrl{number} Standard names
XBRLMaster From{date}to{date} ixv-{number} Standard names
Broadridge PROfile c{date}to{date}_{axis}_{member} Descriptive Standard names

For maximum reliability, check signatures in this order:

  1. HTML comments (first 10 lines) -- most generators embed identifying comments
    • Workiva Platform --> Workiva
    • DFIN New ActiveDisclosure --> DFIN New
    • Toppan Merrill Bridge --> Toppan Merrill
    • ThunderDome Portal --> RDG Filings
    • CompSci Transform --> CompSci/Broadridge
    • Broadridge PROfile --> Broadridge
    • XBRLMaster --> Discount EDGAR / NTDAS
  2. XML namespaces on <html> tag
    • xmlns:thunderdome="http://www.RDGFilings.com" --> RDG
    • xmlns:compsci="http://compsciresources.com" --> CompSci
  3. XDX comments between head and body --> GoFiler/Novaworks
  4. Accession number prefix (first 10 digits) --> identifies filing agent CIK
  5. Body style patterns as fallback
  6. iXBRL fact ID patterns as secondary confirmation

5. Known Quality Issues by Generator

CompSci Transform

  • Words broken across spans: Text is split at arbitrary character boundaries, not word boundaries. A single word like "cybersecurity" may be split across 2-3 <span> tags. This breaks naive text extraction that operates per-element.
  • Empty div spacers: <div>\n\n</div> between every paragraph adds noise.
  • Field comments in body: <!-- Field: Rule-Page --> markers interspersed with content.

Workiva

  • Extreme span nesting: Every text run gets its own <span> with full inline style. A simple bold sentence may have 5+ spans.
  • Large file sizes: Inline style repetition causes 10-K files to be 2-5x larger than equivalent DFIN filings.
  • Clean word boundaries: Despite heavy span usage, spans align with word/phrase boundaries, making text extraction reliable.

DFIN New ActiveDisclosure

  • min-width:fit-content everywhere: Unusual CSS property on every span; may cause rendering inconsistencies in older browsers.
  • font-kerning:none: Explicit kerning disable on all text spans.
  • Generally clean: Text extraction works well; word boundaries respected.

DFIN Legacy

  • Uppercase HTML tags: Older filings use <P>, <B>, <FONT> -- need case-insensitive parsing.
  • Mixed HTML versions: Some documents mix HTML 3.2 and 4.0 constructs.
  • SGML wrappers: Some filings wrapped in <DOCUMENT> SGML envelope.

GoFiler / Novaworks

  • XDX comment noise: Multiple <!-- Field: Set; ... --> comments that must be stripped.
  • Generally clean HTML: Body content is straightforward.

Toppan Merrill Bridge

  • Clean output: Among the cleanest generators. Minimal inline style bloat.
  • GUID-heavy IDs: Context and unit refs use base64-like GUIDs that are less human-readable.

6. Self-Filed / In-House Filings

Some large filers submit directly using their own CIK as the accession number prefix. These filings have no generator comment and variable HTML quality.

Detection: Accession number prefix matches the filer's own CIK (e.g., Halliburton CIK 0000045012 files with accession 0000045012-25-000010).

However: Even self-filed companies typically use a commercial tool. Halliburton's self-filed 10-K contains the Workiva comment signature, indicating they use Workiva but submit directly rather than through a filing agent.

Truly in-house HTML (no commercial tool) is rare among 10-K filers. When it occurs:

  • No identifying comments
  • No consistent structural patterns
  • May use Word-to-HTML conversion (look for mso- CSS prefixes from Microsoft Office)
  • May have minimal or no iXBRL tagging

7. Law Firm Filings

Several large law firms act as filing agents:

  • Davis Polk & Wardwell (0000950103) -- 326K total filings
  • Paul Weiss (0000950142) -- 56K total filings
  • Foley & Lardner (0000897069) -- 30K total filings
  • Sidley Austin (0000905148) -- 39K total filings
  • Seward & Kissel (0000919574) -- 107K total filings

Law firms typically file transactional documents (S-1, proxy, 8-K) rather than periodic 10-K filings. The HTML in law-firm-filed documents often comes from Word conversion and lacks commercial generator signatures.


8. Summary: Quick Detection Regex Table

Pattern                                              | Generator
-----------------------------------------------------|------------------
/Workiva Platform/                                   | Workiva
/DFIN New ActiveDisclosure/                          | DFIN (New)
/Donnelley Financial Solutions/                      | DFIN (New)
/Toppan Merrill Bridge/                              | Toppan Merrill
/ThunderDome Portal/                                 | RDG Filings
/CompSci Transform/                                  | CompSci/Broadridge
/Broadridge PROfile/                                 | Broadridge
/XBRLMaster/                                         | Discount EDGAR
/xmlns:thunderdome="http:\/\/www\.RDGFilings\.com"/  | RDG Filings
/xmlns:compsci="http:\/\/compsciresources\.com"/     | CompSci
/Field: Set; Name: xdx/                             | GoFiler/Novaworks
/dfinsolutions\.com/                                 | DFIN
/min-width:fit-content/                              | DFIN (New)
/BRPFPage/                                           | Broadridge PROfile
/id="XBRLDIV"/                                       | XBRLMaster

Sources