Test a fetcher on the pre-production server
This documentation applies to pre-production server. More info on servers page.
See also: https://git.nomics.world/dbnomics/fetchers-envs
Pyenv is used to have access to the latest stable Python version when the OS does not.
Super quick guide
In a nutshell:
- clone, download and convert
export PROVIDER_SLUG=xxxx
cd ~cepremap/fetchers-envs
-
./create-fetcher-env.sh $PROVIDER_SLUG
- or for specific Python version:
./create-fetcher-env.sh --pyenv x.y.z $PROVIDER_SLUG
- or for specific Python version:
cd $PROVIDER_SLUG/$PROVIDER_SLUG-fetcher
pyenv activate ${PROVIDER_SLUG}-fetcher
../../download.py
../../convert.py
- Validation & Solr import
pyenv activate dbnomics
dbnomics-validate ~/json-data/${PROVIDER_SLUG}-json-data/
dbnomics-solr index-provider ~/json-data/${PROVIDER_SLUG}-json-data
Detailed guide
Create a new fetcher environment
As a requirement, repositories for the fetcher source code, source-data and json-data must have been created before. If necessary, use the script create-repositories-for-provider.py
.
create-fetcher-env.sh
script can be found in this repo
We take ecb
as an example.
ssh cepremap@eros.nomics.world
cd ~/fetchers-envs
./create-fetcher-env.sh ecb # Replace ecb by the real provider slug.
Mount remote data via sshfs
./mount-fetchers-envs-eros.sh
# unmount
./umount-fetchers-envs-eros.sh
Running download or convert
We take ecb
as an example.
ssh cepremap@eros.nomics.world
PROVIDER_SLUG=ecb
cd ~/fetchers-envs/${PROVIDER_SLUG}
source ${PROVIDER_SLUG}-venv/bin/activate
# or using pyenv
pyenv activate ${PROVIDER_SLUG}-fetcher
cd ${PROVIDER_SLUG}-fetcher
"Manual" method
rm -rf ../${PROVIDER_SLUG}-source-data/*; python download.py ../${PROVIDER_SLUG}-source-data/
# or
rm -rf ../${PROVIDER_SLUG}-json-data/*; python convert.py ../${PROVIDER_SLUG}-source-data/ ../${PROVIDER_SLUG}-json-data/
"Assisted" method - using bduye scripts
You can use also use bduye scripts that automatize:
- cleaning of source/json dir (depending if the script is download/convert)
- call script with correct directories, and custom arguments you passed to it
Example:
~/fetchers-envs/eurostat$ ../../convert.py --datasets teicp290 --full
is equivalent to:
~/fetchers-envs/eurostat$ ./convert.py ../eurostat-source-data ../eurostat-json-data --datasets teicp290 --full
Using pyenv to choose specific Python version
pyenv allows to choose a specific Python version.
Create a virtualenv using pyenv:
pyenv versions # list installed versions
pyenv virtualenv ${PYTHON_VERSION} ${PROVIDER_SLUG}-fetcher
pyenv activate ${PROVIDER_SLUG}-fetcher
Delete a virtualenv using pyenv:
pyenv virtualenv-delete ${PROVIDER-SLUG}-fetcher
Install a specific Python version:
pyenv install --list # list available versions to install
CONFIGURE_OPTS=--enable-shared pyenv install $PYTHON_VERSION
Validating converted data
We take ecb
provider as an example.
ssh cepremap@eros.nomics.world
PROVIDER_SLUG=ecb
pyenv activate dbnomics
# optional: update packages
pip install -U dbnomics-data-model dbnomics-solr
dbnomics-validate ~/json-data/${PROVIDER_SLUG}-json-data/
# No error should be displayed. Use --log=debug to see more details.
Importing converted data into Apache Solr
We take ecb
provider as an example.
ssh cepremap@eros.nomics.world
PROVIDER_SLUG=ecb
pyenv activate dbnomics
# optional: update packages
pip install -U dbnomics-data-model dbnomics-solr
dbnomics-solr index-provider ~/json-data/${PROVIDER_SLUG}-json-data
# If the indexation raises an error like:
# "msg":"ERROR: [doc=ELSTAT/DKT15/DKT15-2-1_0020_F_A] unknown field 'dimensions_values_labels'",
# then you must reset the Solr core (see section below).
Now you may verify that ecb
is visible in the API and the UI:
- http://pre.db.nomics.world/providers
- http://api.pre.db.nomics.world/v22/providers
- http://api.pre.db.nomics.world/v22/providers/ECB
- http://pre.db.nomics.world/ECB/MIR (check that dimensions search and charts are OK)
If an internal error is returned by the Web API, follow the "Errors handling" section below.
Update the Web API and UI
You may first want to have an up-to-date Web API and UI. Follow the documentations of the respective projects.
Errors handling
If the Web API returns an internal error, you can check the server logs.
As root:
tail -f /var/log/uwsgi/app/dbnomics-api-uwsgi-v21.log
See also: troubleshooting
Test Solr queries locally
You can forward the port used by Solr to run queries with HTTP requests.
From your machine:
ssh -N -L 8983:localhost:8983 cepremap@eros.nomics.world
Then open URLs like:
- http://localhost:8983/solr/dbnomics/query?q=*
- http://localhost:8983/solr/dbnomics/query?q=provider_code:ECB
Reset the Solr core
Only to be done if the converted data follow a data model too recent compared to the Solr core schema. For example when a new field is added to the Solr schema by the import script.
Warning: this deletes absolutely everything in Solr about DBnomics! Be sure to check that you run this on the pre-production server (eros).
As solr user:
./bin/solr delete -c dbnomics
./bin/solr create -c dbnomics
./bin/solr config -c dbnomics -p 8983 -property update.autoCreateFields -value false
rm /var/solr/data/dbnomics/conf/managed-schema
# Get files from https://git.nomics.world/dbnomics/dbnomics-solr/-/tree/master/solr_core_config
cp /path/to/solr_core_config/* /var/solr/data/dbnomics/conf/
As root:
systemctl restart solr.service