... | ... | @@ -29,11 +29,12 @@ A Fetcher is composed of modules or bricks: |
|
|
* Fetcher: dbnomics-fetcher
|
|
|
|
|
|
|
|
|
For pedagogical purpose we will create the tree for dbnomics project by creating
|
|
|
dbnomics-source-data
|
|
|
dbnomics-json-data
|
|
|
dbnomics-fetcher
|
|
|
and downloading dbnomics-converter
|
|
|
For pedagogical purpose we will create the tree for dbnomics project by creating:
|
|
|
- dbnomics-source-data
|
|
|
- dbnomics-json-data
|
|
|
- dbnomics-fetcher
|
|
|
and by downloading:
|
|
|
- dbnomics-converter
|
|
|
|
|
|
```bash
|
|
|
(nomics_env) me@mylaptop:~$ mkdir dbnomics-source-data
|
... | ... | @@ -58,6 +59,7 @@ At the end of the procedure a DBNOMICS fetcher should be ordered in your compute |
|
|
└── <provider_slug>-source-data
|
|
|
|
|
|
```
|
|
|
|
|
|
All folders inside source-data and json-data MUST follow the naming conventions:
|
|
|
- `<provider_slug>-source-data`
|
|
|
- `<provider_slug>-json-data`
|
... | ... | @@ -65,35 +67,42 @@ All folders inside source-data and json-data MUST follow the naming conventions: |
|
|
|
|
|
#### Converter
|
|
|
|
|
|
Converters are a set of utils to convert raw data to dbnomics formats
|
|
|
Converters are a set of utils to convert raw data (source-data) to dbnomics formatted data (json-data).
|
|
|
|
|
|
* `clone` the `repository` dbnomics/converter
|
|
|
|
|
|
* In ssh mode: you have to previously add your ssh_key to your profile on git.nomics.world.
|
|
|
|
|
|
```bash
|
|
|
|
|
|
(nomics_env) me@mylaptop:~$ git clone git@git.nomics.world:dbnomics/dbnomics-converters.git
|
|
|
|
|
|
```
|
|
|
You will have to add your fingerprint to the server
|
|
|
|
|
|
* Using https:
|
|
|
|
|
|
```bash
|
|
|
|
|
|
(nomics_env) me@mylaptop:~$ git clone https://git.nomics.world/dbnomics/dbnomics-converters
|
|
|
|
|
|
```
|
|
|
* Check if clone is ok
|
|
|
|
|
|
```bash
|
|
|
|
|
|
(nomics_env) me@mylaptop:~$ ls dbnomics-converters/
|
|
|
(nomics_env) me@mylaptop:~$ dbnomics_converters setup.cfg setup.py
|
|
|
|
|
|
```
|
|
|
|
|
|
#### Source Data
|
|
|
|
|
|
Source data is a folder and a git repository where the raw datasets will be put i.e a raw deposit for downloading provider sources.
|
|
|
Source data is a folder and a git repository where the raw datasets will be put i.e a raw deposit of provider sources files.
|
|
|
|
|
|
|
|
|
|
|
|
* Create an empty repository for source data:
|
|
|
* Create an empty repository for **source-data**:
|
|
|
|
|
|
Inside https://git.nomics.world/dbnomics-source-data click on `New Project`
|
|
|
|
... | ... | @@ -103,9 +112,10 @@ Name your project following the naming convention: |
|
|
|
|
|
Add a description for the project following this pattern:
|
|
|
|
|
|
`Series from <providername> (acronym explaination) macro economic database in source format`
|
|
|
`Series from <provider_name> (acronym explaination) macro economic database in source format`
|
|
|
|
|
|
> Let the visibility to public
|
|
|
|
|
|
Let the visibility to public
|
|
|
|
|
|
* Clone it inside your dbnomics-source-data
|
|
|
|
... | ... | @@ -116,20 +126,14 @@ Let the visibility to public |
|
|
|
|
|
```
|
|
|
|
|
|
Whenever your fetcher will be operational: it will download the raw provider file inside this repo you will have then to add it commit and push
|
|
|
|
|
|
```bash
|
|
|
(nomics_env) me@mylaptop:~/dbnomics-source-data/<provider_slug>-source-data$ git add . (nomics_env) me@mylaptop:~/dbnomics-source-data/<provider_slug>-source-data$ git commit -m "Source-data: <provider_slug>"
|
|
|
(nomics_env) me@mylaptop:~/dbnomics-source-data/<provider_slug>-source-data$ git push
|
|
|
```
|
|
|
|
|
|
|
|
|
> Your fetcher's script <provider_slug>_to_data_source.py will populate this dedicated repository: <provider_slug>-data-source
|
|
|
with the targeted datasets.
|
|
|
|
|
|
#### JSON Data
|
|
|
|
|
|
JSON data is a folder and a git repository where results of the fetching and conversion process are stored i.e all the datasets in dbnomics json format
|
|
|
JSON data is a git repository where results of the conversion process from datasets (source-data repository) to **db-nomics datasets** ( json-data) are stored
|
|
|
|
|
|
* Create an empty repository for source data:
|
|
|
* Create an empty repository for **json-data**:
|
|
|
|
|
|
Inside https://git.nomics.world/dbnomics-json-data click on `New Project`
|
|
|
|
... | ... | @@ -141,7 +145,7 @@ Add a description for the project following this pattern: |
|
|
|
|
|
`Series from <providername> (acronym explaination) macro-economic data converted to DB.nomics JSON format`
|
|
|
|
|
|
Let the visibility to public
|
|
|
> Let the visibility to public
|
|
|
|
|
|
|
|
|
* clone it inside the dbnomics-json-data folder
|
... | ... | @@ -154,33 +158,26 @@ Let the visibility to public |
|
|
|
|
|
```
|
|
|
|
|
|
Whenever your fetcher and converter scripts will be operational: it will download the raw provider file inside this repo you will have then to add it commit and push
|
|
|
|
|
|
```bash
|
|
|
(nomics_env) me@mylaptop:~/dbnomics-source-data/<provider_slug>-source-data$ git add . (nomics_env) me@mylaptop:~/dbnomics-source-data/<provider_slug>-source-data$ git commit -m "JSON-data: <provider_slug>"
|
|
|
(nomics_env) me@mylaptop:~/dbnomics-source-data/<provider_slug>-source-data$ git push
|
|
|
```
|
|
|
|
|
|
#### Fetcher
|
|
|
|
|
|
The Fetcher is a set of two scripts to acquire and transform raw data from source to dbnomics datasets.
|
|
|
|
|
|
Inside https://git.nomics.world/dbnomics-fetchers create a new project
|
|
|
|
|
|
* Create an empty repository for your fetcher:
|
|
|
|
|
|
Inside https://git.nomics.world/dbnomics-fetchers click on `New Project`
|
|
|
|
|
|
Name your project following the naming convention:
|
|
|
|
|
|
`<provider_slug>-fetcher`
|
|
|
|
|
|
Add a description for the project following this pattern:
|
|
|
> Name your project following the naming convention:
|
|
|
`<provider_slug>-fetcher`
|
|
|
|
|
|
> Add a description for the project following this pattern:
|
|
|
`DB.nomics fetcher for series from <provider_name> (accronym explaination if needed) macro economic database`
|
|
|
|
|
|
Let the visibility to public
|
|
|
> Let the visibility to public
|
|
|
|
|
|
|
|
|
* clone it inside the dbnomics-fetchers folder
|
|
|
* clone it inside your dbnomics-fetchers folder:
|
|
|
|
|
|
|
|
|
```bash
|
... | ... | @@ -189,29 +186,37 @@ Let the visibility to public |
|
|
(nomics_env) me@mylaptop:~/dbnomics-fetchers/$ cd <provider_slug>-fetcher
|
|
|
```
|
|
|
|
|
|
A fetcher is composed of:
|
|
|
The fetcher will be composed of two mandatory files:
|
|
|
|
|
|
* <provider_slug>_to_source_data.py
|
|
|
* <provider_slug>_source_data_to_dbnomics.py
|
|
|
|
|
|
##### to_source_data
|
|
|
|
|
|
`<provider_slug>_to_source_data.py` is a script that:
|
|
|
|
|
|
##### Writing the Fetcher
|
|
|
* given a provider
|
|
|
* populate the **source-data** repository
|
|
|
* with the raw data of the provider
|
|
|
* by using the most appropriate method
|
|
|
|
|
|
* Create a file '<provider_slug>_to_source_data.py' inside dbnomics-fetchers/<provider_slug>-fetcher/
|
|
|
* Read the analysis that specify which dataset we want to store and how to access it
|
|
|
* Define the targeted datasets
|
|
|
* Specify the data-source repo for your provider into your '<provider_slug>_to_source_data.py'
|
|
|
|
|
|
* Start from the beginning by coding the routine to download one dataset in a flat mode inside the dedicated repository <provider_slug>_data-source directory
|
|
|
Datasets tha havec to be stored are listed in the corresponding ***Analysis*** you will find in the gitlab project dbnomics-fetcher/management along with the corresponding issue
|
|
|
|
|
|
|
|
|
* '<provider_slug>_to_source_data.py' will be executed from CLI and should take at least one argument : the specific path of source-data directory for this folder
|
|
|
|
|
|
* Create the file '<provider_slug>_to_source_data.py' inside dbnomics-fetchers/<provider_slug>-fetcher/
|
|
|
|
|
|
* Read the analysis that specify which dataset we want to store and how to access it
|
|
|
|
|
|
* Define the targeted datasets
|
|
|
|
|
|
* Specify the **data-source repository** for your provider into your `<provider_slug>_to_source_data.py`, this script will be executed from CLI by gitlab-CI so it should take at least one argument : the destination for the datasets i.e the specific path of source-data repository corresponding to your provider
|
|
|
|
|
|
* Add you file commit and push:
|
|
|
```bash
|
|
|
(nomics_env) me@mylaptop:~/dbnomics-fetchers/<provider_slug>-fetcher$ git add .
|
|
|
(nomics_env) me@mylaptop:~/dbnomics-fetchers/<provider_slug>-fetcher$ git commit -m "ADD Fetcher: <provider_slug>"
|
|
|
(nomics_env) me@mylaptop:~/dbnomics-fetchers/<provider_slug>-fetcher$ git commit -m "ADD Fetcher: <provider_slug>_to_source_data.py "
|
|
|
(nomics_env) me@mylaptop:~/dbnomics-fetchers/<provider_slug>-fetcher$ git push
|
|
|
```
|
|
|
|
... | ... | |