Handle dataset releases
Related to #718 (closed)
- As an economist
- I want to access each release (past and latest) of a particular dataset, when it is published with releases
- in order to do reproducible data processing.
Acceptance criteria
-
the fetcher authors MUST be able to declare the releases of a dataset in a meta-data file ( releases.json) -
the web API MUST accept :latestsuffix for the "dataset" endpoint, and redirect to the latest release
Description
This issue describes an evolution of DBnomics conceptual data model, introducing a new concept of dataset release.
When the provider distributes datasets with releases, allowing the user to download many previous releases, each having its own name, DBnomics can integrate and propose them to its users.
Each dataset can have many releases. Each release is just a normal dataset named after the pattern {dataset_code_prefix}:{release_name} (e.g. WEO:2020-04). As a consequence, all the existing components of DBnomics continue to work without needing evolutions.
There are actual release names that can be any string (2020-01, 1.1, before_trump), and a special release name latest referencing the latest known release.
To encode the relationship between a dataset and its releases, a new releases.json file is introduced:
// releases.json
{
"dataset_releases": [
{
"dataset_code_prefix": "WEO",
"name": "WEO by countries", // optional
"releases": [
{"code": "2020-04"},
{"code": "2020-10"} // latest release at this time
]
}
]
}
The latest release corresponds to the last item of the release array. If that array evolves, the latest release will always correspond to the last item.
When the user asks for {dataset_code_prefix}:latest:
- in the API: the HTTP request is redirected (HTTP 302) to the actual latest release name
- in the UI: the HTTP request is redirected (HTTP 302) to the actual latest release name
- from language modules (Python, R, etc.): because the
fetchfunction calls the API, it just has to follow the redirection.
An HTTP redirection is a good way to let the user understand that the latest release depends on the current time, and encourage him to use an actual release name. However, in language modules, the user will not see the redirection, and he will have to assume the risk to use the latest release. He will have to choose between having the latest data, and potentially breaking the source code, in particular if it is executed automatically every day for example.
To simplify fetcher development, when the category tree is just a flat list of datasets, it's conceptually possible to generate it, taking that releases meta-data into account. The dataset_code_prefix would be a category having one node per release. This will be possible to do with dbnomics-fetcher-toolbox (cf #622). Meanwhile, the fetcher authors can write the category_tree.json manually.
Details
- We preferred introducing a new file named
releases.json, instead of adding a new property toprovider.json, to avoid data model changes. - Data validation: each release declared in
releases.jsonMUST correpond to a dataset code such as{dataset_code_prefix}:{release_name}
Tasks
-
dbnomics-data-model: add a schemamodel forreleases.json(cf dbnomics/dbnomics-data-model!44 (merged)) -
dbnomics-data-model: validate releases.jsonif it exists in json-data directory -
dbnomics-api: detect when asking {dataset_code_prefix}:latestin each endpoint accepting a dataset code, and redirect to the latest release (HTTP 302 temporary, very important, do not use a permanent redirect) (cf dbnomics/dbnomics-api!9 (merged)) -
dbnomics-website: detect when asking {dataset_code_prefix}:latestin each route accepting a dataset code, and redirect to the latest release (HTTP 302 temporary, very important, do not use a permanent redirect) (cf dbnomics/dbnomics-website!7 (merged)) -
dbnomics-docs: document feature (cf dbnomics/dbnomics-docs!1 (merged)) -
declare WEOandWEOAGGreleases in IMF fetcher (cf #718 (closed) and imf-fetcher!1 (merged))
Questions
- What happens if a dataset code has a
:as published by the provider? - Is Solr impacted by this issue? I don't think so for now...