Document and validate JSON data model
Related to #104 (closed)
- As a fetcher developer
- I want to validate the JSON files I produce and read an up-to-date documentation of the JSON data model
- in order to be confident about the correctness of the files my fetcher produces.
Acceptance criteria
-
provider.json
MUST be defined as a JSON schema file, based on the current implementation ofvalidate_provider
-
category.json
MUST be defined as a JSON schema file, based on the current implementation ofvalidate_category
-
dataset.json
MUST be defined as a JSON schema file, based on the current implementation ofvalidate_dataset
-
series.json
MUST be defined as a JSON schema file, based on the current implementation ofvalidate_series
-
JSON schemas SHOULD be committed in the dbnomics-converters repository, under a schemas
sub-directory -
MongoDB compatibility DOES NOT HAVE to be kept -
validate_FOO
function MUST be changed to call a Python implementation of JSON schema againstFOO.json
-
For now, contextual constraints DO NOT HAVE to be implemented
Resources
Contextual constraints
JSON schema allows defining context-free constraints: each property is independent from the others as well as each file. For example, we need to validate the unicity of dataset["code"]
among all the datasets of the provider, not only those of the parent category of the dataset. This can be implemented by each fetcher in their own code, for example by maintaining a set of already-used dataset codes.
MongoDB BSON support
validate_*
functions have a format
parameter which is either json
or bson
(MongoDB binary JSON format).
Here we drop the bson
support, so drop also the parameter.
Documentation generation
As a benefit, Markdown documentation generation will be possible, with projects like https://github.com/AnalyticalGraphicsInc/wetzel (and several others)
This is not included in the scope of this story.
Why switch to JSON schema
- standard
- descriptive, like Swagger for APIs
- allows documentation generation
- easy to understand (newcomers have difficulties)
- validate functions should only validate, not convert
Technical steps
-
Adapt validate_FOO
functions to beget_FOO_validation_errors
-
Drop support for MongoDB in dbnomics-importer -
Update fetchers to use the new validation functions, to validate the concept