72 lines
3.0 KiB
Markdown
72 lines
3.0 KiB
Markdown
# CAPSTONE: Construct of Interest and Data Sign-off by Dr. D.
|
||
|
||
**Due:** April 7 by 11:59pm | **Points:** 0 | **Submission:** File upload
|
||
|
||
---
|
||
|
||
## Overview
|
||
|
||
Your capstone project requires you to classify a construct of interest in data at scale.
|
||
|
||
You must get approval for your construct of interest and data from your instructor.
|
||
|
||
This is an ungraded assignment. However, it is the prerequisite to your capstone project.
|
||
|
||
## Goal
|
||
|
||
Pick a well-documented, theoretically founded construct of interest. Explain why a firm would want to classify this construct at scale.
|
||
|
||
### Your Construct Must Be:
|
||
|
||
- **Business-relevant** — addresses a real business decision
|
||
- **Theoretically grounded** — anchored in established literature
|
||
- **Well documented** — clearly defined in academic or industry sources
|
||
- **Observable in text** — detectable in your data source
|
||
- **Definable with clear rules** — specific enough for reliable labeling
|
||
- **Complex & nuanced** — more than just sentiment (not positive/negative)
|
||
|
||
You must pick one of the seven provided constructs of interest from here: https://www.ringel.ai/UNC/2026/BUSI488/Class23/Ringel_488-2026_Capstone_Constructs.pdf
|
||
|
||
## Define Your Construct Precisely
|
||
|
||
Turn the concept into labels humans can apply consistently.
|
||
|
||
### You Must Create:
|
||
|
||
- **Label set** (classes/categories)
|
||
- **Clear definitions and decision rules** for each label
|
||
- **Borderline cases** — guidance for unclear examples
|
||
- **None/Other policy** — if applicable (multi-class yes, multi-label no)
|
||
- **2-3 example texts** per label (your own examples)
|
||
- **Decision:** multi-class (one label per item) vs multi-label (multiple labels can apply)
|
||
|
||
### Consider Your Data Source
|
||
|
||
Before finalizing, ask yourself:
|
||
|
||
- Do these data, when classified, inform and improve a business decision?
|
||
- Can the construct of interest (all its labels/classes) be sufficiently found in these data?
|
||
- Are these data abundantly available and do they need to be analyzed frequently and/or at scale to justify building a vertical AI?
|
||
|
||
## Important Guardrails
|
||
|
||
- **Choose a data source** that fits your construct and is realistically useful to a firm
|
||
- **Use public data or properly de-identified data only** — no sensitive internal company data
|
||
- **Pilot test first:** Before committing, do a quick manual pilot on 100–200 texts in the developer platform playground or ChatGPT to confirm your construct appears in the source and that your labels are workable
|
||
|
||
## Deliverable for This Assignment
|
||
|
||
**Two-page maximum, double-spaced documentation** containing:
|
||
|
||
1. **Definition** of your construct of interest and its labels/classes
|
||
|
||
2. **Sources & Citations** that support your construct of interest (and its classes/labels), demonstrating that it is:
|
||
- Theoretically founded
|
||
- Well-established in literature
|
||
- Meaningful to decision makers
|
||
|
||
3. **Data Description** explaining:
|
||
- What data you will identify it in
|
||
- How you will acquire these data
|
||
- Why identifying your construct at scale/frequently in these data is valuable (justifies the need for a vertical AI)
|