Eurostat Pipeline

Eurostat's Urban Audit publishes 220+ comparable indicators across 1,000+ European cities as flat TSV files. Cittopia's pipeline turns these into the per-city JSON datasets that power the atlas.

Version 0.9 · Public draft Updated 2026-04-30 Maintainer Tunç Meriç

Raw input #

Eurostat ships datasets as gzipped TSV files such as estat_urb_cpop1.tsv.gz (population by year, by city, by indicator). Each row is one observation across many time columns; each cell may be a number, a flag, or a colon (missing).

Pipeline stages #

  1. Decompress & parse. Stream the .gz file, parse TSV, identify the indicator code and city code columns.
  2. City code mapping. Match Eurostat city codes (e.g. BG003C for Varna) against Cittopia's canonical city slugs.
  3. Time-series construction. For each city + indicator, build a {year: value} dict. Drop colons (missing) and flagged estimates.
  4. Normalisation. Apply unit conversion and z-scoring against the corpus.
  5. Confidence stamp. Compute a freshness score per indicator (full credit if < 2 years old, decaying after).
  6. Write JSON. Emit per-city objects into assets/data/global-cities.js.

Example: Varna (BG003C) #

varna: {
    eurostat: 'BG003C',
    indicators: {
        population:      { 2013: 334_679, ..., 2022: 332_686 },
        employment_rate: { 2013: 53.8,    ..., 2022: 56.4 },
        ...
    },
    confidence: 0.86,
}
CoverageApril 2026: 12 EU cities are fully ingested. Full Urban Audit coverage (1,000+ cities) is planned for Phase 5 of the roadmap.

Last updated 30 April 2026 by Tunç Meriç Suggest an edit