|
|
|
|
|
|
|
### Requirements
|
|
|
|
|
|
|
|
* Git
|
|
|
|
* Python3
|
|
|
|
* virtualenv
|
|
|
|
|
|
|
|
### Create your environnement
|
|
|
|
|
|
|
|
Inside your working directory
|
|
|
|
* Create a `virtualenv` for DBNOMICS with python3:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
me@mylaptop:~$ virtualenv -p /usr/bin/python3 nomics_env
|
|
|
|
```
|
|
|
|
|
|
|
|
* activate the virtualenv
|
|
|
|
```bash
|
|
|
|
me@mylaptop:~$ source nomics_env/bin/activate
|
|
|
|
(nomics_env) me@mylaptop:~$
|
|
|
|
```
|
|
|
|
|
|
|
|
### Architecture of a DBNOMICS Fetcher
|
|
|
|
|
|
|
|
A Fetcher is composed of modules or bricks:
|
|
|
|
* Converter: dbnomics-converter
|
|
|
|
* Source-data: dbnomics-source-data
|
|
|
|
* JSON-data: dbnomics-json-data
|
|
|
|
* Fetcher: dbnomics-fetcher
|
|
|
|
|
|
|
|
|
|
|
|
For pedagogical purpose we will create the tree for dbnomics project by creating
|
|
|
|
dbnomics-source-data
|
|
|
|
dbnomics-json-data
|
|
|
|
dbnomics-fetcher
|
|
|
|
and downloading dbnomics-converter
|
|
|
|
|
|
|
|
```bash
|
|
|
|
(nomics_env) me@mylaptop:~$ mkdir dbnomics-source-data
|
|
|
|
(nomics_env) me@mylaptop:~$ mkdir dbnomics-json-data
|
|
|
|
(nomics_env) me@mylaptop:~$ mkdir dbnomics-fetchers
|
|
|
|
```
|
|
|
|
|
|
|
|
At the end of the procedure a DBNOMICS fetcher should be ordered in your computer like this:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
(nomics_env) me@mylaptop:~$ tree . -L 2
|
|
|
|
.
|
|
|
|
├── dbnomics-converters
|
|
|
|
│ ├── dbnomics_converters
|
|
|
|
│ ├── setup.cfg
|
|
|
|
│ └── setup.py
|
|
|
|
├── dbnomics-fetchers
|
|
|
|
│ └── <provider_slug>-fetcher
|
|
|
|
├── dbnomics-json-data
|
|
|
|
│ └── <provider_slug>-json-data
|
|
|
|
└── dbnomics-source-data
|
|
|
|
└── <provider_slug>-source-data
|
|
|
|
|
|
|
|
```
|
|
|
|
All folders inside source-data and json-data MUST follow the naming conventions:
|
|
|
|
- `<provider_slug>-source-data`
|
|
|
|
- `<provider_slug>-json-data`
|
|
|
|
- `<provider_slug>-source-data`
|
|
|
|
|
|
|
|
#### Converter
|
|
|
|
|
|
|
|
Converters are a set of utils to convert raw data to dbnomics formats
|
|
|
|
|
|
|
|
* `clone` the `repository` dbnomics/converter
|
|
|
|
|
|
|
|
* In ssh mode: you have to previously add your ssh_key to your profile on git.nomics.world.
|
|
|
|
```bash
|
|
|
|
(nomics_env) me@mylaptop:~$ git clone git@git.nomics.world:dbnomics/dbnomics-converters.git
|
|
|
|
```
|
|
|
|
You will have to add your fingerprint to the server
|
|
|
|
|
|
|
|
* Using https:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
(nomics_env) me@mylaptop:~$ git clone https://git.nomics.world/dbnomics/dbnomics-converters
|
|
|
|
```
|
|
|
|
* Check if clone is ok
|
|
|
|
|
|
|
|
```bash
|
|
|
|
(nomics_env) me@mylaptop:~$ ls dbnomics-converters/
|
|
|
|
(nomics_env) me@mylaptop:~$ dbnomics_converters setup.cfg setup.py
|
|
|
|
```
|
|
|
|
|
|
|
|
#### Source Data
|
|
|
|
|
|
|
|
Source data is a folder and a git repository where the raw datasets will be put i.e a raw deposit for downloading provider sources.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
* Create an empty repository for source data:
|
|
|
|
|
|
|
|
Inside https://git.nomics.world/dbnomics-source-data click on `New Project`
|
|
|
|
|
|
|
|
Name your project following the naming convention:
|
|
|
|
|
|
|
|
`<provider_slug>-source-data`
|
|
|
|
|
|
|
|
Add a description for the project following this pattern:
|
|
|
|
|
|
|
|
`Series from <providername> (acronym explaination) macro economic database in source format`
|
|
|
|
|
|
|
|
Let the visibility to public
|
|
|
|
|
|
|
|
* Clone it inside your dbnomics-source-data
|
|
|
|
|
|
|
|
```bash
|
|
|
|
(nomics_env) me@mylaptop:~$ cd dbnomics-source-data/
|
|
|
|
(nomics_env) me@mylaptop:~/dbnomics-source-data/$ git clone git@git.nomics.world:dbnomics-source-data/<provider_slug>-source-data.git
|
|
|
|
(nomics_env) me@mylaptop:~/dbnomics-source-data/$ cd <provider_slug>-data
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
Whenever your fetcher will be operational: it will download the raw provider file inside this repo you will have then to add it commit and push
|
|
|
|
|
|
|
|
```bash
|
|
|
|
(nomics_env) me@mylaptop:~/dbnomics-source-data/<provider_slug>-source-data$ git add . (nomics_env) me@mylaptop:~/dbnomics-source-data/<provider_slug>-source-data$ git commit -m "Source-data: <provider_slug>"
|
|
|
|
(nomics_env) me@mylaptop:~/dbnomics-source-data/<provider_slug>-source-data$ git push
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
#### JSON Data
|
|
|
|
|
|
|
|
JSON data is a folder and a git repository where results of the fetching and conversion process are stored i.e all the datasets in dbnomics json format
|
|
|
|
|
|
|
|
* Create an empty repository for source data:
|
|
|
|
|
|
|
|
Inside https://git.nomics.world/dbnomics-json-data click on `New Project`
|
|
|
|
|
|
|
|
Name your project following the naming convention:
|
|
|
|
|
|
|
|
`<provider_slug>-json-data`
|
|
|
|
|
|
|
|
Add a description for the project following this pattern:
|
|
|
|
|
|
|
|
`Series from <providername> (acronym explaination) macro-economic data converted to DB.nomics JSON format`
|
|
|
|
|
|
|
|
Let the visibility to public
|
|
|
|
|
|
|
|
|
|
|
|
* clone it inside the dbnomics-json-data folder
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
(nomics_env) me@mylaptop:~$ cd dbnomics-json-data/
|
|
|
|
(nomics_env) me@mylaptop:~/dbnomics-json-data/$ git clone git@git.nomics.world:dbnomics-source-data/<provider_slug>-source-data.git
|
|
|
|
(nomics_env) me@mylaptop:~/dbnomics-json-data/$ cd <provider_slug>-data
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
Whenever your fetcher and converter scripts will be operational: it will download the raw provider file inside this repo you will have then to add it commit and push
|
|
|
|
|
|
|
|
```bash
|
|
|
|
(nomics_env) me@mylaptop:~/dbnomics-source-data/<provider_slug>-source-data$ git add . (nomics_env) me@mylaptop:~/dbnomics-source-data/<provider_slug>-source-data$ git commit -m "JSON-data: <provider_slug>"
|
|
|
|
(nomics_env) me@mylaptop:~/dbnomics-source-data/<provider_slug>-source-data$ git push
|
|
|
|
```
|
|
|
|
|
|
|
|
#### Fetcher
|
|
|
|
|
|
|
|
Inside https://git.nomics.world/dbnomics-fetchers create a new project
|
|
|
|
|
|
|
|
* Create an empty repository for your fetcher:
|
|
|
|
|
|
|
|
Inside https://git.nomics.world/dbnomics-fetchers click on `New Project`
|
|
|
|
|
|
|
|
Name your project following the naming convention:
|
|
|
|
|
|
|
|
`<provider_slug>-fetcher`
|
|
|
|
|
|
|
|
Add a description for the project following this pattern:
|
|
|
|
|
|
|
|
`DB.nomics fetcher for series from <provider_name> (accronym explaination if needed) macro economic database`
|
|
|
|
|
|
|
|
Let the visibility to public
|
|
|
|
|
|
|
|
|
|
|
|
* clone it inside the dbnomics-fetchers folder
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
(nomics_env) me@mylaptop:~$ cd dbnomics-fetchers/
|
|
|
|
(nomics_env) me@mylaptop:~/dbnomics-fetchers/$ git clone git@git.nomics.world:dbnomics-source-data/<provider_slug>-fetcher.git
|
|
|
|
(nomics_env) me@mylaptop:~/dbnomics-fetchers/$ cd <provider_slug>-fetcher
|
|
|
|
```
|
|
|
|
|
|
|
|
A fetcher is composed of:
|
|
|
|
* <provider_slug>_to_source_data.py
|
|
|
|
* <provider_slug>_source_data_to_dbnomics.py
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
##### Writing the Fetcher
|
|
|
|
|
|
|
|
* Create a file '<provider_slug>_to_source_data.py' inside dbnomics-fetchers/<provider_slug>-fetcher/
|
|
|
|
* Read the analysis that specify which dataset we want to store and how to access it
|
|
|
|
* Define the targeted datasets
|
|
|
|
* Specify the data-source repo for your provider into your '<provider_slug>_to_source_data.py'
|
|
|
|
|
|
|
|
* Start from the beginning by coding the routine to download one dataset in a flat mode inside the dedicated repository <provider_slug>_data-source directory
|
|
|
|
|
|
|
|
* '<provider_slug>_to_source_data.py' will be executed from CLI and should take at least one argument : the specific path of source-data directory for this folder
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
(nomics_env) me@mylaptop:~/dbnomics-fetchers/<provider_slug>-fetcher$ git add .
|
|
|
|
(nomics_env) me@mylaptop:~/dbnomics-fetchers/<provider_slug>-fetcher$ git commit -m "ADD Fetcher: <provider_slug>"
|
|
|
|
(nomics_env) me@mylaptop:~/dbnomics-fetchers/<provider_slug>-fetcher$ git push
|
|
|
|
```
|
|
|
|
|
|
|
|
### Data-source
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### |