... | ... | @@ -75,10 +75,10 @@ All folders inside source-data and json-data **MUST** follow the naming conventi |
|
|
- `<provider_slug>-json-data`
|
|
|
- `<provider_slug>-fetcher`
|
|
|
|
|
|
#### data-model
|
|
|
#### data-model
|
|
|
|
|
|
Data-model defines the json data model of DB.nomics. Each new json-data produced by a fetcher must be compliant with this json-data model.
|
|
|
The expected format of the data produced by your fetcher is represented in the
|
|
|
Data-model defines the json data model of DB.nomics. Each new json-data produced by a fetcher must be compliant with this json-data model.
|
|
|
The expected format of the data produced by your fetcher is represented in the
|
|
|
* tree-sample-json-data
|
|
|
with a set of requirements and constraints that you have to validate using data-model script
|
|
|
|
... | ... | @@ -109,7 +109,7 @@ First, let's install it inside the dbnomics virtual env |
|
|
|
|
|
(nomics_env) me@mylaptop:~$ ls -dbnomics-data_model/
|
|
|
(nomics_env) me@mylaptop:~$ build dbnomics_data_model.egg-info README.md setup.py
|
|
|
dbnomics_data_model dist scripts
|
|
|
dbnomics_data_model dist scripts
|
|
|
```
|
|
|
|
|
|
* Install the package
|
... | ... | @@ -125,7 +125,7 @@ with : |
|
|
|
|
|
Now it is installed, let's use it!
|
|
|
|
|
|
* First check the informations about the package
|
|
|
* First check the informations about the package
|
|
|
```sh
|
|
|
(nomics_env) me@mylaptop:~$ pip show dbnomics-data-model
|
|
|
Name: dbnomics-data-model
|
... | ... | @@ -137,7 +137,6 @@ Author-email: christophe.benz@cepremap.org |
|
|
License: https://www.gnu.org/licenses/agpl-3.0.en.html
|
|
|
Location: /home/me/dbnomics/dbnomics-data-model
|
|
|
Requires: dulwich, jsonschema, ujson
|
|
|
|
|
|
```
|
|
|
|
|
|
* Inside your fetcher you *MUST* declare the version of data-model you are using
|
... | ... | @@ -148,9 +147,9 @@ Example: |
|
|
DATA_MODEL_VERSION = "0.7.1"
|
|
|
```
|
|
|
|
|
|
The json-data produced by your fetcher must be similar to the `tree-sample-json-data`
|
|
|
The json-data produced by your fetcher must be similar to the `tree-sample-json-data` stored in dbnomics-data-model folder
|
|
|
|
|
|
Here is the expected files and directories tree:
|
|
|
Here are the expected files and directories tree :
|
|
|
|
|
|
```
|
|
|
/
|
... | ... | @@ -172,13 +171,15 @@ Here is the expected files and directories tree: |
|
|
| |- etc.
|
|
|
```
|
|
|
|
|
|
Constraints:
|
|
|
Json-data shcema expresses a set of constraints:
|
|
|
|
|
|
- The repository directory name MUST be equal to the provider code + "-json-data".
|
|
|
- Each dataset directory name MUST be equal to the dataset code.
|
|
|
- Conversions MUST be stable: 2 executions of the conversion script MUST be equivalent to one.
|
|
|
|
|
|
That you will have to validate throught script
|
|
|
|
|
|
##### Validate your data
|
|
|
* Validate the hierarchy of the data produced using the script provided in data-model folder
|
|
|
|
|
|
```sh
|
... | ... | @@ -195,8 +196,8 @@ Constraints: |
|
|
./scripts/validate_json_data_git_repository.py wto-json-data
|
|
|
```
|
|
|
|
|
|
:bulb: If some of your data doesn't fit the model and the model need some additionanl constaint you can add into the tree example and make a PR on
|
|
|
|
|
|
:bulb: If some of your data doesn't fit the model and the model need some additionanl constaint you can add into the tree example and make a PR on
|
|
|
the data-model repo upgrading the version
|
|
|
#### Source Data
|
|
|
|
|
|
Source data is a folder and a git repository where the raw datasets will be put i.e a raw deposit of provider sources files.
|
... | ... | |