Each layer successively raises the coherence of incoming data.
Brief
| Pipeline Phase | Layer | Primary Operation | Example Outputs |
|---|---|---|---|
| 0. Ingestion | Input Normalization | Ingest content from search, feeds, and platforms. | Unified content schema. |
| 1. Meta-Data | Normalization (canonicalization) | Field mapping, date unification, URL canonicalization. | Canonical_URL, Article_Author, ISO-8601 date. |
| 2. Storyline | Deduplication & Clustering | Merge duplicates, build story clusters. | Story_ID, cluster metadata. |
| 3. Semantic | Facet Navigation | Assign Parts, Steps, and Types (Polynyms). | { "facet": "Mail-In Voting", "mode": "Type" } |
Three Standardization Layers (examples)
1. Meta-Data Standardization (Normalization Layer)
Purpose: Establish a uniform data substrate.
Goal: Convert heterogeneous feeds into a canonical schema.
| Focus | Problem Addressed | Example Task |
|---|---|---|
| Field Mapping | Inconsistent field names across feeds. | Map all variants of author fields (e.g., <dc:creator>, <atom:author>) → Article_Author. |
| Date Unification | Multiple date/time formats and zones. | Convert all to ISO 8601 UTC. |
| URL Canonicalization | Duplicates caused by tracking URLs. | Strip query parameters to yield a unique Canonical_URL. |
→ Output: A clean, normalized metadata layer ready for story analysis.
2. Storyline Standardization (Deduplication & Clustering Layer)
Purpose: Establish structural coherence across feeds.
Goal: Identify unique stories and merge redundant variants.
| Focus | Problem Addressed | Example Task |
|---|---|---|
| Deduplication | Same article appears in multiple feeds. | Detect via Canonical_URL + content hash. |
| Clustering | Different sources report the same event. | Group articles with high Title/Body similarity into a single Story Cluster. |
| Entity Tagging | Inconsistent entity mentions. | Normalize entities (e.g., “Microsoft Corp.” instead of “MSFT”). |
→ Output: Distinct, enriched story clusters that serve as narrative units for semantic analysis.
3. Semantic Standardization (Facet Navigation Layer)
Purpose: Establish interpretive coherence.
Goal: Map clustered stories into structured, analyzable meaning spaces.
Facet Modes
| Mode | Definition | Function | Example Facets (Democracy Context) |
|---|---|---|---|
| Part | A component or side of a whole. | Expresses opposition, duality, or balance. | Fair / Unfair · Security / Access |
| Step | A stage in a sequence or process. | Expresses order, evolution, or progression. | Registration → Voting → Counting → Certification |
| Type | A kind or classification. | Groups by nature, category, or identity. | Mail-in / In-person Voting · NGO / Gov / Media |
Facet Modes Across Polynym Sizes
| Nym Size | Example | Mode Illustrated |
|---|---|---|
| Mononym (1) | Trust | Part / Step / Type depending on context |
| Bionym (2) | Fair / Unfair → Part · Campaign / Election → Step · Liberal / Conservative → Type | |
| Trionym (3) | Access / Security / Trust → Part · Registration / Voting / Certification → Step · Social Media / News Media / Government → Type | |
| Tetranym (4) | Audit / Cybersecurity / Chain of Custody / Paper Ballots → Part · Civic Education / Registration / Voting / Post-Election Challenges → Step · Voters / Officials / Judges / Observers → Type | |
| Pentanym (5–9) | Mixed-mode hybrids combining multiple Parts, Steps, and Types for full coverage. | — |
Operational Integration
- Filtering: “Show all Steps in the process.”
- Comparison: “Compare sentiment across all Parts of the voting system.”
- Visualization: “Plot Step-mode facets on a timeline.”
→ Output: A semantically standardized layer enabling structured analysis, bias detection, and orientation visualization.
