EUROSTAT
Provider
Data accessibility
Desired datasets
Data tree and categories
Existing code in fetcher eurostat.py
- parsing of
table_of_contents.xml
is done in function
build_data_tree()
- it uses
lxml
iterative parser in order not to build the entire xml
tree in memory
-
categories
are created by sub-function create_category. The
dictionary _category
needs to be modified to fit the new schema
- the iterative
xml
parser looks for tag leaf
that identifies
datasets. Then it uses xpath
to get the branches parents and the
components of a dataset. This could be improved and accelerated by
defining branch
and children
as events. This does not need to be
done in the first iteration for this provider. If not, open an issue
to do it in the future.
Datasets
-
datasetCode: provided
-
how to get release date: in DSD
-
dataset docHref: yes
-
dataset notes: yes
-
dimension_list: provided in DSD
-
use of attributes: yes
-
attribute_list: provided in DSD
-
available frequencies:
- A: Annual
- S: Half-yearly, semester
- Q: Quarterly
- M: Monthly
- W: Weekly
- B:Business week
- D: Daily
- H: Hourly (DBnomics doesn't support this frequency. It may not be
used by Eurostat. If it appear, log an error so that we can see
the dataset)
- N: Minutely (DBnomics doesn't support this frequency. It may not be
used by Eurostat. If it appear, log an error so that we can see
the dataset)
-
availability of previous updates: no
-
existence of real time datasets: no
-
warning: dimension and attribute code lists are more complete than
the value effectively used in a dataset. It is necessary to
rebuild the code list while reading the series.
Existing code in fetcher eurostat.py
- the datasets are provided in a
*.zip
file at
http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data,
see function make_url()
- the
*.zip
file contains a dsd.xml
file and a sdmx.xml
file
- the dimensions and other characteristics of a dataset are provided
in the
*.dsd.xml
file and the data in the sdmx.xml
file
- the code for processing these files is provided in
dlstats\dlstats\xml_utils.py
- the function to process the
dsd.xml
file is provided by class XMLStructure()
Series
-
Series key: provided
-
Series name: (provided or to be made up from dimensions)
-
Series docHref: yes/no
-
Series notes: yes/no
-
missing values: code for missing values or way to detect them
-
date format:
-
mixed frequencies in the same dataset: some dataset have mixed frequencies
Existing code in fetcher eurostat.py
- see above
- the function to process the
sdmx.xml
is provided by class XMLData()
Updates
Special problems
- dimensions names may be in uppercase in DSD but in lowercase in SDMX series: we force to lower case
- this may not be the case for attributes names but we force also to lower case by symmetry