Introduce discontinued datasets, check category tree
A recent update (cf. dbnomics/dbnomics-data-model@e4777951) has been made in dbnomics-data-model validation_storage.py script to check the consistency between the datasets referenced in category_tree.json and the datasets published by the provider.
This check has been added after a DBnomics user reports a 404 not found on a dataset link in a BLS category tree (#420 (closed)).
Now, the validation script checks that:
- every dataset found in the category tree has to exist in json-data
- every dataset published by the provider has to be found in category_tree
1st part of the check is really helpful and can reveal convert process problem
2nd part isn't so straight-forward. It can reveal problems during convert phase but it will fail with discontinued datasets
Discontinued datasets are datasets which have been published by the provider but are no more available on provider website. They are kept in json-data repository but are no more generated from source-data. This type of datasets are no more referenced in category tree but are available directly by their page on https://db.nomics.world/{provider}/{dataset}.
How to spotlight discontinued datasets and avoid validation errors on category tree?
Proposal
- flag discontinued datasets
- => declare new
discontinued
attribute in data-model dataset.json schema - => manually update dataset.json of known discontinued datasets
- => declare new
- in validation_storage.py, don't fail if a discontinued dataset doesn't appear in the provider category tree
- bonus:
- provider page: automatically add links to discontinued datasets under category tree
- dataset page: display "Discontinued since YY-mm-dd" for discontinued dataset