NewsVoy’s Three Standardization Layers

Each layer successively raises the coherence of incoming data.

Pipeline Phase	Layer	Primary Operation	Example Outputs
0. Ingestion	Input Normalization	Ingest content from search, feeds, and platforms.	Unified content schema.
1. Meta-Data	Normalization (canonicalization)	Field mapping, date unification, URL canonicalization.	`Canonical_URL`, `Article_Author`, ISO-8601 date.
2. Storyline	Deduplication & Clustering	Merge duplicates, build story clusters.	`Story_ID`, cluster metadata.
3. Semantic	Facet Navigation	Assign Parts, Steps, and Types (Polynyms).	`{ "facet": "Mail-In Voting", "mode": "Type" }`

Purpose: Establish a uniform data substrate.
Goal: Convert heterogeneous feeds into a canonical schema.

Focus	Problem Addressed	Example Task
Field Mapping	Inconsistent field names across feeds.	Map all variants of author fields (e.g., `<dc:creator>`, `<atom:author>`) → `Article_Author`.
Date Unification	Multiple date/time formats and zones.	Convert all to ISO 8601 UTC.
URL Canonicalization	Duplicates caused by tracking URLs.	Strip query parameters to yield a unique `Canonical_URL`.

→ Output: A clean, normalized metadata layer ready for story analysis.

Purpose: Establish structural coherence across feeds.
Goal: Identify unique stories and merge redundant variants.

Focus	Problem Addressed	Example Task
Deduplication	Same article appears in multiple feeds.	Detect via `Canonical_URL` + content hash.
Clustering	Different sources report the same event.	Group articles with high Title/Body similarity into a single Story Cluster.
Entity Tagging	Inconsistent entity mentions.	Normalize entities (e.g., “Microsoft Corp.” instead of “MSFT”).

→ Output: Distinct, enriched story clusters that serve as narrative units for semantic analysis.

Purpose: Establish interpretive coherence.
Goal: Map clustered stories into structured, analyzable meaning spaces.

Mode	Definition	Function	Example Facets (Democracy Context)
Part	A component or side of a whole.	Expresses opposition, duality, or balance.	Fair / Unfair · Security / Access
Step	A stage in a sequence or process.	Expresses order, evolution, or progression.	Registration → Voting → Counting → Certification
Type	A kind or classification.	Groups by nature, category, or identity.	Mail-in / In-person Voting · NGO / Gov / Media

Nym Size	Example	Mode Illustrated
Mononym (1)	Trust	Part / Step / Type depending on context
Bionym (2)	Fair / Unfair → Part · Campaign / Election → Step · Liberal / Conservative → Type
Trionym (3)	Access / Security / Trust → Part · Registration / Voting / Certification → Step · Social Media / News Media / Government → Type
Tetranym (4)	Audit / Cybersecurity / Chain of Custody / Paper Ballots → Part · Civic Education / Registration / Voting / Post-Election Challenges → Step · Voters / Officials / Judges / Observers → Type
Pentanym (5–9)	Mixed-mode hybrids combining multiple Parts, Steps, and Types for full coverage.	—

→ Output: A semantically standardized layer enabling structured analysis, bias detection, and orientation visualization.

Build Intuit