|
|
EUROSTAT
|
|
|
========
|
|
|
# EUROSTAT
|
|
|
|
|
|
## Provider
|
|
|
|
|
|
* **provider_name**: Eurostat
|
|
|
* **provider_longname**: Eurostat
|
|
|
* **provider URL**: http://ec.europa.eu/eurostat/home
|
|
|
* **region**: European Union
|
|
|
* **terms of use**: http://ec.europa.eu/eurostat/about/policies/copyright
|
|
|
* **approximate number of datasets**: > 6000
|
|
|
|
|
|
## Data accessibility
|
|
|
|
|
|
* **SDMX**: ML 2.1
|
|
|
* **REST API**: http://ec.europa.eu/eurostat/data/web-services SDMX + JSON. See whether is is useful to get updated data only.
|
|
|
* **Bulk Download**: SDMX XML or TSV http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&dir=data
|
|
|
* **Account**: No
|
|
|
|
|
|
## Desired datasets
|
|
|
|
|
|
* **All datasets**
|
|
|
|
|
|
|
|
|
## Data tree and categories
|
|
|
|
|
|
* **Existence of a hierachy of datasets on web site**: Yes
|
|
|
* **How to recover the information**:
|
|
|
http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=table_of_contents.xml
|
|
|
with update information in the lastUpdate date.
|
|
|
* The lastModified date corresponds to the last table structure
|
|
|
* change (probably adding a new observation ?)
|
|
|
|
|
|
### Existing code in fetcher eurostat.py
|
|
|
|
|
|
* parsing of `table_of_contents.xml` is done in function
|
|
|
`build_data_tree()`
|
|
|
* it uses `lxml` iterative parser in order not to build the entire xml
|
|
|
tree in memory
|
|
|
* `categories` are created by sub-function create_category. The
|
|
|
dictionary `_category` needs to be modified to fit the new schema
|
|
|
* the iterative `xml` parser looks for tag `leaf` that identifies
|
|
|
datasets. Then it uses `xpath` to get the branches parents and the
|
|
|
components of a dataset. This could be improved and accelerated by
|
|
|
defining `branch` and `children` as events. This does not need to be
|
|
|
done in the first iteration for this provider. If not, open an issue
|
|
|
to do it in the future.
|
|
|
|
|
|
|
|
|
## Datasets
|
|
|
|
|
|
* **datasetCode**: provided
|
|
|
* **how to get release date**: in DSD
|
|
|
* **dataset docHref**: yes
|
|
|
* **dataset notes**: yes
|
|
|
* **dimension_list**: provided in DSD
|
|
|
* **use of attributes**: yes
|
|
|
* **attribute_list**: provided in DSD
|
|
|
* **available frequencies**:
|
|
|
* A: Annual
|
|
|
* S: Half-yearly, semester
|
|
|
* Q: Quarterly
|
|
|
* M: Monthly
|
|
|
* W: Weekly
|
|
|
* B:Business week
|
|
|
* D: Daily
|
|
|
* H: Hourly (DBnomics doesn't support this frequency. It may not be
|
|
|
used by Eurostat. If it appear, log an error so that we can see
|
|
|
the dataset)
|
|
|
* N: Minutely (DBnomics doesn't support this frequency. It may not be
|
|
|
used by Eurostat. If it appear, log an error so that we can see
|
|
|
the dataset)
|
|
|
* **availability of previous updates**: no
|
|
|
* **existence of real time datasets**: no
|
|
|
* **warning**: dimension and attribute code lists are more complete than
|
|
|
the value effectively used in a dataset. It is necessary to
|
|
|
rebuild the code list while reading the series.
|
|
|
|
|
|
### Existing code in fetcher eurostat.py
|
|
|
|
|
|
* the datasets are provided in a `*.zip` file at
|
|
|
http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data,
|
|
|
see function `make_url()`
|
|
|
* the `*.zip` file contains a `dsd.xml` file and a `sdmx.xml` file
|
|
|
* the dimensions and other characteristics of a dataset are provided
|
|
|
in the `*.dsd.xml` file and the data in the `sdmx.xml` file
|
|
|
* the code for processing these files is provided in
|
|
|
`dlstats\dlstats\xml_utils.py`
|
|
|
* the function to process the `dsd.xml` file is provided by class `XMLStructure()`
|
|
|
|
|
|
## Series
|
|
|
|
|
|
* **Series key**: provided
|
|
|
* **Series name**: (provided or to be made up from dimensions)
|
|
|
* **Series docHref**: yes/no
|
|
|
* **Series notes**: yes/no
|
|
|
* **missing values**: code for missing values or way to detect them
|
|
|
* **date format**:
|
|
|
* **mixed frequencies in the same dataset**: some dataset have mixed frequencies
|
|
|
|
|
|
### Existing code in fetcher eurostat.py
|
|
|
* see above
|
|
|
* the function to process the `sdmx.xml` is provided by class `XMLData()`
|
|
|
|
|
|
|
|
|
## Updates
|
|
|
|
|
|
* **calendar of future updates**: no
|
|
|
* **summary of previous updates: only latest one in http**://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=table_of_contents.xml
|
|
|
* **regular updates: the site is updated every day 15 11:00 and 23**:00
|
|
|
* **RSS flow: http**://ec.europa.eu/eurostat/cache/RSS/rss_estat_news.xml
|
|
|
* **best way to monitor updates**: read
|
|
|
http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=table_of_contents.xml
|
|
|
at 11:05 and 23:05, every day
|
|
|
|
|
|
## Special problems
|
|
|
|
|
|
* dimensions names may be in uppercase in DSD but in lowercase in SDMX series: we force to lower case
|
|
|
* this may not be the case for attributes names but we force also to lower case by symmetry
|
|
|
|
|
|
Provider
|
|
|
--------
|
|
|
|
|
|
provider\_name
|
|
|
|
|
|
: Eurostat
|
|
|
|
|
|
provider\_longname
|
|
|
|
|
|
: Eurostat
|
|
|
|
|
|
provider URL
|
|
|
|
|
|
: <http://ec.europa.eu/eurostat/home>
|
|
|
|
|
|
region
|
|
|
|
|
|
: European Union
|
|
|
|
|
|
terms of use
|
|
|
|
|
|
: <http://ec.europa.eu/eurostat/about/policies/copyright>
|
|
|
|
|
|
approximate number of datasets
|
|
|
|
|
|
:
|
|
|
|
|
|
Data accessibility
|
|
|
------------------
|
|
|
|
|
|
SDMX
|
|
|
|
|
|
: ML 2.1
|
|
|
|
|
|
REST API
|
|
|
|
|
|
: <http://ec.europa.eu/eurostat/data/web-services> SDMX + JSON. See
|
|
|
whether is is useful to get updated data only.
|
|
|
|
|
|
Bulk Download
|
|
|
|
|
|
: SDMX XML or TSV
|
|
|
<http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data/>
|
|
|
|
|
|
Account
|
|
|
|
|
|
: No
|
|
|
|
|
|
Desired datasets
|
|
|
----------------
|
|
|
|
|
|
Description
|
|
|
|
|
|
: All datasets
|
|
|
|
|
|
Data tree
|
|
|
---------
|
|
|
|
|
|
Existence of a hierachy of datasets on web site
|
|
|
|
|
|
: Yes
|
|
|
|
|
|
How to recover the information
|
|
|
|
|
|
: <http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=table_of_contents.xml>
|
|
|
with update information in the lastUpdate date. The lastModified
|
|
|
date corresponds to the last table structure change (probably adding
|
|
|
a new observation ?)
|
|
|
|
|
|
Datasets
|
|
|
--------
|
|
|
|
|
|
datasetCode
|
|
|
|
|
|
: provided
|
|
|
|
|
|
how to get release date
|
|
|
|
|
|
: in DSD
|
|
|
|
|
|
dataset docHref
|
|
|
|
|
|
: yes
|
|
|
|
|
|
dataset notes
|
|
|
|
|
|
: yes
|
|
|
|
|
|
dimension\_list
|
|
|
|
|
|
: provided in DSD
|
|
|
|
|
|
use of attributes
|
|
|
|
|
|
: yes
|
|
|
|
|
|
attribute\_list
|
|
|
|
|
|
: provided in DSD
|
|
|
|
|
|
available frequencies
|
|
|
|
|
|
: A,Q,M,W,D, ...
|
|
|
|
|
|
availability of previous updates
|
|
|
|
|
|
: no
|
|
|
|
|
|
existence of real time datasets
|
|
|
|
|
|
: no
|
|
|
|
|
|
warning
|
|
|
|
|
|
: dimension and attribute code lists are more complete than the value
|
|
|
effectively used in a dataset. It is necessary to rebuild the code
|
|
|
list while reading the series.
|
|
|
|
|
|
Series
|
|
|
------
|
|
|
|
|
|
Series key
|
|
|
|
|
|
: provided
|
|
|
|
|
|
Series name
|
|
|
|
|
|
: (provided or to be made up from dimensions)
|
|
|
|
|
|
Series docHref
|
|
|
|
|
|
: yes/no
|
|
|
|
|
|
Series notes
|
|
|
|
|
|
: yes/no
|
|
|
|
|
|
missing values
|
|
|
|
|
|
: code for missing values or way to detect them
|
|
|
|
|
|
date format
|
|
|
|
|
|
:
|
|
|
|
|
|
mixed frequencies in the same dataset
|
|
|
|
|
|
: some dataset have mixed frequencies
|
|
|
|
|
|
Updates
|
|
|
-------
|
|
|
|
|
|
calendar of future updates
|
|
|
|
|
|
: no
|
|
|
|
|
|
summary of previous updates
|
|
|
|
|
|
: only latest one in
|
|
|
<http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=table_of_contents.xml>
|
|
|
|
|
|
regular updates
|
|
|
|
|
|
: the site is updated every day 15 11:00 and 23:00
|
|
|
|
|
|
RSS flow
|
|
|
|
|
|
: <http://ec.europa.eu/eurostat/cache/RSS/rss_estat_news.xml>
|
|
|
|
|
|
best way to monitor updates
|
|
|
|
|
|
: read
|
|
|
<http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=table_of_contents.xml>
|
|
|
at 11:05 and 23:05, every day
|
|
|
|
|
|
Special problems
|
|
|
----------------
|
|
|
|
|
|
> - dimensions names may be in uppercase in DSD but in lowercase in
|
|
|
> SDMX series: we force to lower case
|
|
|
> - this may not be the case for attributes names but we force also to
|
|
|
> lower case by symmetry
|
|
|
|
|
|
Other remarks
|
|
|
------------- |