EUROSTAT

Provider

SDMX: ML 2.1
REST API: http://ec.europa.eu/eurostat/data/web-services SDMX + JSON. See whether is is useful to get updated data only.
Bulk Download: SDMX XML or TSV http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&dir=data
Account: No

Existence of a hierachy of datasets on web site: Yes
How to recover the information: http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=table_of_contents.xml with update information in the lastUpdate date.
- The lastModified date corresponds to the last table structure
- change (probably adding a new observation ?)

parsing of table_of_contents.xml is done in function build_data_tree()
it uses lxml iterative parser in order not to build the entire xml tree in memory
categories are created by sub-function create_category. The dictionary _category needs to be modified to fit the new schema
the iterative xml parser looks for tag leaf that identifies datasets. Then it uses xpath to get the branches parents and the components of a dataset. This could be improved and accelerated by defining branch and children as events. This does not need to be done in the first iteration for this provider. If not, open an issue to do it in the future.

datasetCode: provided
how to get release date: in DSD
dataset docHref: yes
dataset notes: yes
dimension_list: provided in DSD
use of attributes: yes
attribute_list: provided in DSD
available frequencies:
- A: Annual
- S: Half-yearly, semester
- Q: Quarterly
- M: Monthly
- W: Weekly
- B:Business week
- D: Daily
- H: Hourly (DBnomics doesn't support this frequency. It may not be used by Eurostat. If it appear, log an error so that we can see the dataset)
- N: Minutely (DBnomics doesn't support this frequency. It may not be used by Eurostat. If it appear, log an error so that we can see the dataset)
availability of previous updates: no
existence of real time datasets: no
warning: dimension and attribute code lists are more complete than the value effectively used in a dataset. It is necessary to rebuild the code list while reading the series.

the datasets are provided in a *.zip file at http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data, see function make_url()
the *.zip file contains a dsd.xml file and a sdmx.xml file
the dimensions and other characteristics of a dataset are provided in the *.dsd.xml file and the data in the sdmx.xml file
the code for processing these files is provided in dlstats\dlstats\xml_utils.py
the function to process the dsd.xml file is provided by class XMLStructure()

calendar of future updates: no
summary of previous updates: only latest one in http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=table_of_contents.xml
regular updates: the site is updated every day 15 11:00 and 23:00
RSS flow: http://ec.europa.eu/eurostat/cache/RSS/rss_estat_news.xml
best way to monitor updates: read http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=table_of_contents.xml at 11:05 and 23:05, every day

dimensions names may be in uppercase in DSD but in lowercase in SDMX series: we force to lower case
this may not be the case for attributes names but we force also to lower case by symmetry