Document and validate JSON data model

Related to #104 (closed)

  • As a fetcher developer
  • I want to validate the JSON files I produce and read an up-to-date documentation of the JSON data model
  • in order to be confident about the correctness of the files my fetcher produces.

Acceptance criteria

  • provider.json MUST be defined as a JSON schema file, based on the current implementation of validate_provider
  • category.json MUST be defined as a JSON schema file, based on the current implementation of validate_category
  • dataset.json MUST be defined as a JSON schema file, based on the current implementation of validate_dataset
  • series.json MUST be defined as a JSON schema file, based on the current implementation of validate_series
  • JSON schemas SHOULD be committed in the dbnomics-converters repository, under a schemas sub-directory
  • MongoDB compatibility DOES NOT HAVE to be kept
  • validate_FOO function MUST be changed to call a Python implementation of JSON schema against FOO.json
  • For now, contextual constraints DO NOT HAVE to be implemented

Resources

Contextual constraints

JSON schema allows defining context-free constraints: each property is independent from the others as well as each file. For example, we need to validate the unicity of dataset["code"] among all the datasets of the provider, not only those of the parent category of the dataset. This can be implemented by each fetcher in their own code, for example by maintaining a set of already-used dataset codes.

MongoDB BSON support

validate_* functions have a format parameter which is either json or bson (MongoDB binary JSON format).

Here we drop the bson support, so drop also the parameter.

Documentation generation

As a benefit, Markdown documentation generation will be possible, with projects like https://github.com/AnalyticalGraphicsInc/wetzel (and several others)

This is not included in the scope of this story.

Why switch to JSON schema

  • standard
  • descriptive, like Swagger for APIs
    • allows documentation generation
  • easy to understand (newcomers have difficulties)
  • validate functions should only validate, not convert

Technical steps

  • Adapt validate_FOO functions to be get_FOO_validation_errors
  • Drop support for MongoDB in dbnomics-importer
  • Update fetchers to use the new validation functions, to validate the concept
Edited Nov 07, 2017 by Christophe Benz
Assignee Loading
Time tracking Loading