# DESTATIS fetcher
Federal Statistical Office Germany
## Source data
DESTATIS provides SDMX data via an [API](https://www-genesis.destatis.de/genesisWS/web) with login/password dedicated to DBnomics.
Data is grouped by hierarchical themes (see `Themes` link on [home page](https://www-genesis.destatis.de/genesis/online)). Only data belonging to some chosen themes (can be found in
destatis_util.py) are downloaded.
- Source-data repo contains theme code subdirs (`42`, `45`, ... `81`)
- Each subdir contains:
- one datacubes file (named `{themeid}_datacubes.xml`): list datacube (code, name) information
- pairs of
- datacube file (named `{dataset_id}.xml`) containing time series data
- datacube structure file (named `{dataset_id}.structure.xml`) containing dimension information
### Source format oddities
- datacube SDMX files embed CSV content (';' delimiter) in `<quaderDaten>` tags. Column number is variable because several CSV content co-exist with their own header lines.
- time series data itself are preceded by metadata (also encoded as CSV)
- each row of time series data contains several observation values. Each observation value is relative to an indicator (stored in datacube structure file).
- each observation value consists in 4 columns: the 1st one contains observation value unless the 2nd one is a dash ('-'), in this case the observation value is considered as N/A.
## Download
Download script checks [new data RSS feed](https://www-genesis.destatis.de/genesis/online/news?language=en) to know which datacubes have been updated.
- If they're some, download script fetches not only the modified datacubes but all the datacubes belonging to the theme.
- download script can ignore RSS feed and download all datacubes using `--all-datasets` parameter
- datacube files are post-processed after download: download timestamp in content is erased (to avoid git false commit)
- TODO: datacubes files contain download urls (including login and password), juste replace password value by "XXXX"
## Convert
Convert process scans source-data repo to generates json-data (1 datacube == 1 dataset). Category tree is based on theme hierarchy (see skeleton in destatis_util.py) and filled with generated datasets.
