NewsVoy’s Three Standardization Layers

Each layer successively raises the coherence of incoming data.

Brief

Pipeline Phase Layer Primary Operation Example Outputs
0. Ingestion Input Normalization Ingest content from search, feeds, and platforms. Unified content schema.
1. Meta-Data Normalization (canonicalization) Field mapping, date unification, URL canonicalization. Canonical_URL, Article_Author, ISO-8601 date.
2. Storyline Deduplication & Clustering Merge duplicates, build story clusters. Story_ID, cluster metadata.
3. Semantic Facet Navigation Assign Parts, Steps, and Types (Polynyms). { "facet": "Mail-In Voting", "mode": "Type" }

Three Standardization Layers (examples)
1. Meta-Data Standardization (Normalization Layer)

Purpose: Establish a uniform data substrate.
Goal: Convert heterogeneous feeds into a canonical schema.

Focus Problem Addressed Example Task
Field Mapping Inconsistent field names across feeds. Map all variants of author fields (e.g., <dc:creator>, <atom:author>) → Article_Author.
Date Unification Multiple date/time formats and zones. Convert all to ISO 8601 UTC.
URL Canonicalization Duplicates caused by tracking URLs. Strip query parameters to yield a unique Canonical_URL.

Output: A clean, normalized metadata layer ready for story analysis.


2. Storyline Standardization (Deduplication & Clustering Layer)

Purpose: Establish structural coherence across feeds.
Goal: Identify unique stories and merge redundant variants.

Focus Problem Addressed Example Task
Deduplication Same article appears in multiple feeds. Detect via Canonical_URL + content hash.
Clustering Different sources report the same event. Group articles with high Title/Body similarity into a single Story Cluster.
Entity Tagging Inconsistent entity mentions. Normalize entities (e.g., “Microsoft Corp.” instead of “MSFT”).

Output: Distinct, enriched story clusters that serve as narrative units for semantic analysis.


3. Semantic Standardization (Facet Navigation Layer)

Purpose: Establish interpretive coherence.
Goal: Map clustered stories into structured, analyzable meaning spaces.

Facet Modes

Mode Definition Function Example Facets (Democracy Context)
Part A component or side of a whole. Expresses opposition, duality, or balance. Fair / Unfair · Security / Access
Step A stage in a sequence or process. Expresses order, evolution, or progression. Registration → Voting → Counting → Certification
Type A kind or classification. Groups by nature, category, or identity. Mail-in / In-person Voting · NGO / Gov / Media
Facet Modes Across Polynym Sizes
Nym Size Example Mode Illustrated
Mononym (1) Trust Part / Step / Type depending on context
Bionym (2) Fair / Unfair → Part · Campaign / Election → Step · Liberal / Conservative → Type
Trionym (3) Access / Security / Trust → Part · Registration / Voting / Certification → Step · Social Media / News Media / Government → Type
Tetranym (4) Audit / Cybersecurity / Chain of Custody / Paper Ballots → Part · Civic Education / Registration / Voting / Post-Election Challenges → Step · Voters / Officials / Judges / Observers → Type
Pentanym (5–9) Mixed-mode hybrids combining multiple Parts, Steps, and Types for full coverage.
Operational Integration
{"facet": "Mail-In Voting", "mode": "Type"}
  • Filtering: “Show all Steps in the process.”
  • Comparison: “Compare sentiment across all Parts of the voting system.”
  • Visualization: “Plot Step-mode facets on a timeline.”

Output: A semantically standardized layer enabling structured analysis, bias detection, and orientation visualization.