... | ... | @@ -38,6 +38,7 @@ For pedagogical purpose we will create the tree for dbnomics project by creating |
|
|
- dbnomics-source-data
|
|
|
- dbnomics-json-data
|
|
|
- dbnomics-fetcher
|
|
|
|
|
|
and by **cloning** from source:
|
|
|
- dbnomics-data-model
|
|
|
|
... | ... | @@ -70,11 +71,15 @@ All folders inside source-data and json-data **MUST** follow the naming conventi |
|
|
- `<provider_slug>-json-data`
|
|
|
- `<provider_slug>-fetcher`
|
|
|
|
|
|
#### Data_model :tools: FIXME
|
|
|
#### data-model
|
|
|
|
|
|
Data-model defines the json data model of DB.nomics. Each new json-data produced by a fetcher must be compliant with this json-data model.
|
|
|
The expected format of the data produced by your fetcher is represented in the
|
|
|
* tree-sample-json-data
|
|
|
with a set of requirements and constraints that you have to validate using data-model script
|
|
|
|
|
|
data_model defines the correct json data model of DB.nomics for one provider produced by a fetcher and exposed in the corresponding dbnomics-json-data.
|
|
|
|
|
|
Inside the virtual env
|
|
|
First, let's install it inside the dbnomics virtual env
|
|
|
|
|
|
* `clone` the `repository` dbnomics/data_model
|
|
|
|
... | ... | @@ -98,39 +103,94 @@ Inside the virtual env |
|
|
|
|
|
```bash
|
|
|
|
|
|
(nomics_env) me@mylaptop:~$ ls dbnomics-data_model/
|
|
|
(nomics_env) me@mylaptop:~$ build dbnomics_data_model.egg-info README.md setup.py
|
|
|
dbnomics_data_model dist scripts
|
|
|
(nomics_env) me@mylaptop:~$ ls -dbnomics-data_model/
|
|
|
(nomics_env) me@mylaptop:~$ build dbnomics_data_model.egg-info README.md setup.py
|
|
|
dbnomics_data_model dist scripts
|
|
|
```
|
|
|
Validate a JSON data Git repository:
|
|
|
|
|
|
```sh
|
|
|
./scripts/validate_json_data_git_repository.py <git_repo_dir>
|
|
|
* Install the package
|
|
|
|
|
|
#### for example:
|
|
|
./scripts/validate_json_data_git_repository.py wto-json-data
|
|
|
```bash
|
|
|
(nomics_env) me@mylaptop:~$ pip -e dbnomics-data_model/
|
|
|
```
|
|
|
Add it the packags requirements with the current version
|
|
|
|
|
|
> When pulling dbnomics-converters think about reinstalling the current version
|
|
|
with :
|
|
|
```pip -e dbnomics-data_model/```
|
|
|
|
|
|
Now it is installed, let's use it!
|
|
|
|
|
|
* First check the informations about the package
|
|
|
```bash
|
|
|
(nomics_env) me@mylaptop:~$ pip show dbnomics-data-model
|
|
|
Name: dbnomics-data-model
|
|
|
Version: 0.7.1
|
|
|
Summary: Define and validate DB.nomics data model.
|
|
|
Home-page: https://git.nomics.world/dbnomics/dbnomics-data-model
|
|
|
Author: Christophe Benz
|
|
|
Author-email: christophe.benz@cepremap.org
|
|
|
License: https://www.gnu.org/licenses/agpl-3.0.en.html
|
|
|
Location: /home/me/dbnomics/dbnomics-data-model
|
|
|
Requires: dulwich, jsonschema, ujson
|
|
|
|
|
|
```
|
|
|
* Inside your fetcher you *MUST* declare the version of data-model you are using
|
|
|
|
|
|
Example:
|
|
|
``` python
|
|
|
|
|
|
DATA_MODEL_VERSION = "0.7.1"
|
|
|
```
|
|
|
### Test
|
|
|
|
|
|
From a virtualenv in which this package is installed:
|
|
|
The json-data produced by your fetcher must be similar to the `tree-sample-json-data`
|
|
|
|
|
|
Here is the expected files and directories tree:
|
|
|
|
|
|
```
|
|
|
/
|
|
|
|- categories_tree.json
|
|
|
|- datapackage.json
|
|
|
|- provider.json
|
|
|
|- dataset1
|
|
|
| |- dataset.json
|
|
|
| |- A1.B1.C1.tsv
|
|
|
| |- A1.B1.C2.tsv
|
|
|
| |- A1.B2.C1.tsv
|
|
|
| |- A1.B2.C2.tsv
|
|
|
| |- etc.
|
|
|
|
|
|
|
|- dataset2
|
|
|
| |- dataset.json
|
|
|
| |- I1.J1.tsv
|
|
|
| |- I1.J2.tsv
|
|
|
| |- etc.
|
|
|
```
|
|
|
|
|
|
Constraints:
|
|
|
|
|
|
- The repository directory name MUST be equal to the provider code + "-json-data".
|
|
|
- Each dataset directory name MUST be equal to the dataset code.
|
|
|
- Conversions MUST be stable: 2 executions of the conversion script MUST be equivalent to one.
|
|
|
|
|
|
|
|
|
* Validate the hierarchy of the data produced using the script provided in data-model folder
|
|
|
|
|
|
```sh
|
|
|
./scripts/test_tree_sample.sh
|
|
|
```
|
|
|
|
|
|
* Validate a JSON data Git repository using the script provided in data-model:
|
|
|
|
|
|
* Install the library :tools: FIXME
|
|
|
|
|
|
```sh
|
|
|
./scripts/validate_json_data_git_repository.py <git_repo_dir>
|
|
|
|
|
|
```bash
|
|
|
(nomics_env) me@mylaptop:~$ pip -e dbnomics-data_model/
|
|
|
# for example:
|
|
|
./scripts/validate_json_data_git_repository.py wto-json-data
|
|
|
```
|
|
|
Add it the packags requirements with the current version
|
|
|
|
|
|
> When pulling dbnomics-converters think about reinstalling the current version
|
|
|
with :
|
|
|
```pip -e dbnomics-data_model/```
|
|
|
|
|
|
:bulb: If some of your data doesn't fit the model and the model need some additionanl constaint you can add into the tree example and make a PR on
|
|
|
|
|
|
#### Source Data
|
|
|
|
... | ... | @@ -238,7 +298,7 @@ Correspond to script to_source_data.py in your fetcher that populate the source |
|
|
* by using the most appropriate method
|
|
|
|
|
|
|
|
|
Datasets tha havec to be stored are listed in the corresponding ***Analysis*** you will find in the gitlab project dbnomics-fetcher/management along with the corresponding issue
|
|
|
Datasets that have to be stored are listed in the corresponding ***Analysis*** document you will find in the gitlab project dbnomics-fetcher/management along with the corresponding issue
|
|
|
|
|
|
|
|
|
|
... | ... | @@ -267,7 +327,7 @@ Correspond to your script to_dbnomics.py in your fetcher that will populate JSON |
|
|
Useful tips:
|
|
|
* Open the corresponding Analysis that define the structure and the targeted time series
|
|
|
you will need to extract from raw data stored in source_data
|
|
|
|
|
|
* Don't forget to validate the data produced using dbnomics-data-model
|
|
|
|
|
|
### Requirements
|
|
|
|
... | ... | |