World_bank
Provider
- provider_name: World Bank
- provider_longname: World Bank
- provider URL: http://www.worldbank.org/
- number of datasets: 237 (https://data.worldbank.org/data-catalog, not all relevant for DBnomics)
Data accessibility
- SDMX: For some databases
- REST API: The JSON Data Service and XML, http://data.worldbank.org/developers/api-overview
- Bulk Download: yes, Zipped file of excels
- Account required: No
Remarks by @cbenz:
The API delivers more structured data than bulk download files. It makes me think that the API is the source of truth. Also the API delivers JSON which is easier than CSV. So I'd prefer the API by default. Before starting to download a dataset, the developer should check that the data are the same between Bulk and API. If Bulk data quality is better, we'll discuss this choice again.
If JSON files are too big to fit in RAM, consider downloading XML and using lxml iterparse
method.
Desired datasets
in existing code
- DB: API + Bulk download (http://databank.worldbank.org/data/download/DB_csv.zip)
- GEM: API + Bulk download Excel files (http://databank.worldbank.org/data/download/GemDataEXTR.zip)
- GMC: API + contained in GemDataEXTR.zip
- WDI: API + Bulk download (http://databank.worldbank.org/data/download/WDI_csv.zip)
- WGI: API + Bulk download (http://databank.worldbank.org/data/download/WGI_csv.zip)
Desired additional datasets
- JEDH: API (+ query tool) (see below)
- QPSD: API (+ query tool)
- (QEDS)SDDS: API + Bulk download (http://databank.worldbank.org/data/download/SDDS_csv.zip)
- (QEDS)GDDS: API + Bulk download (http://databank.worldbank.org/data/download/GDDS_csv.zip)
Data tree
- Existence of a hierachy of datasets on web site: Yes, only one level
- How to recover the information: https://data.worldbank.org/. Look only at our selected datasets
Datasets
- datasetCode: see above
- how to get release date: from dataset HTML page
- dataset docHref: from dataset HTML page
- dataset notes: No
- dimension_list: to be made from dimensions of each series in CSV file
- use of attributes: No
- attribute_list: No
- available frequencies: Annual, Quarterly, Monthly, Daily
- availability of previous updates: No
- existence of real time datasets: No
Series
- Series key: to be made up, using dimension codes
- Series name: to be made up using dimension labels (called "name" in the CSV files)
- Series docHref: No
- Series notes: No
- missing values: empty cell in CSV file
- date format: "%a, %d %b %Y %H:%M:%S GMT"
- mixed frequencies in the same dataset: yes
Updates
- calendar of future updates: No
- summary of previous upudates: No
- regular updates: ?
- RSS flow: No
- best way to monitor updates: read HTML page of each dataset once a day
Special problems
Other remarks
Data samples
- location:
- description:
Web API URLs
- all datasets: http://api.worldbank.org/v2/datacatalog?format=json&per_page=999
- JEDH dataset: http://api.worldbank.org/v2/datacatalog/15?format=json&per_page=32700
- indicators of JEDH dataset: http://api.worldbank.org/v2/sources/54/indicators?format=json&per_page=32700 (54 is obtained via previous URL in
apisourceid
property) - values of one indicator for all countries: http://api.worldbank.org/v2/countries/all/indicators/Q.8A0.5B0.C.5A.ALL.IECE.1.ALL.MX.TO1.ALL?format=json&per_page=32700