SEC-cyBERT/docs/reference/Signoff_assn_instructions.md

# CAPSTONE: Construct of Interest and Data Sign-off by Dr. D.

**Due:** April 7 by 11:59pm | **Points:** 0 | **Submission:** File upload

---

## Overview

Your capstone project requires you to classify a construct of interest in data at scale.

You must get approval for your construct of interest and data from your instructor.

This is an ungraded assignment. However, it is the prerequisite to your capstone project.

## Goal

Pick a well-documented, theoretically founded construct of interest. Explain why a firm would want to classify this construct at scale.

### Your Construct Must Be:

- **Business-relevant** — addresses a real business decision
- **Theoretically grounded** — anchored in established literature
- **Well documented** — clearly defined in academic or industry sources
- **Observable in text** — detectable in your data source
- **Definable with clear rules** — specific enough for reliable labeling
- **Complex & nuanced** — more than just sentiment (not positive/negative)

You must pick one of the seven provided constructs of interest from here: https://www.ringel.ai/UNC/2026/BUSI488/Class23/Ringel_488-2026_Capstone_Constructs.pdf

## Define Your Construct Precisely

Turn the concept into labels humans can apply consistently.

### You Must Create:

- **Label set** (classes/categories)
- **Clear definitions and decision rules** for each label
- **Borderline cases** — guidance for unclear examples
- **None/Other policy** — if applicable (multi-class yes, multi-label no)
- **2-3 example texts** per label (your own examples)
- **Decision:** multi-class (one label per item) vs multi-label (multiple labels can apply)

### Consider Your Data Source

Before finalizing, ask yourself:

- Do these data, when classified, inform and improve a business decision?
- Can the construct of interest (all its labels/classes) be sufficiently found in these data?
- Are these data abundantly available and do they need to be analyzed frequently and/or at scale to justify building a vertical AI?

## Important Guardrails

- **Choose a data source** that fits your construct and is realistically useful to a firm
- **Use public data or properly de-identified data only** — no sensitive internal company data
- **Pilot test first:** Before committing, do a quick manual pilot on 100–200 texts in the developer platform playground or ChatGPT to confirm your construct appears in the source and that your labels are workable

## Deliverable for This Assignment

**Two-page maximum, double-spaced documentation** containing:

1. **Definition** of your construct of interest and its labels/classes

2. **Sources & Citations** that support your construct of interest (and its classes/labels), demonstrating that it is:
   - Theoretically founded
   - Well-established in literature
   - Meaningful to decision makers

3. **Data Description** explaining:
   - What data you will identify it in
   - How you will acquire these data
   - Why identifying your construct at scale/frequently in these data is valuable (justifies the need for a vertical AI)