Test a fetcher on the pre-production server
This documentation applies to pre-production server. More info on servers page.
See also: https://git.nomics.world/dbnomics/fetchers-envs
Super quick guide
For quick'n clean guide, follow /home/cepremap/fetchers-envs/README.md
In a nutshell:
- clone, download and convert
- export PROVIDER_SLUG=xxxx
- cd ~cepremap/fetchers-envs
- ./create-fetcher-env.sh $PROVIDER_SLUG
- cd $PROVIDER_SLUG/$PROVIDER_SLUG-fetcher
- ../../download.py
- ../../convert.py
- Validation & Solr import
- source ~/virtualenvs/dbnomics/bin/activate
- dbnomics-validate ~/json-data/${PROVIDER_SLUG}-json-data/
- ~/dbnomics-importer/import_storage_dir.py ~/json-data/${PROVIDER_SLUG}-json-data
Create a new fetcher environment
As a requirement, repositories for the fetcher source code, source-data and json-data must have been created before. If necessary, use the script create-repositories-for-provider.py
.
create-fetcher-env.sh
script can be found in this repo
We take ecb
as an example.
ssh cepremap@eros.nomics.world
cd ~/fetchers-envs
./create-fetcher-env.sh ecb # Replace ecb by the real provider slug.
Running download or convert
We take ecb
as an example.
ssh cepremap@eros.nomics.world
PROVIDER_SLUG=ecb
cd ~/fetchers-envs/${PROVIDER_SLUG}
source ${PROVIDER_SLUG}-venv/bin/activate
cd ${PROVIDER_SLUG}-fetcher
"Manual" method
rm -rf ../${PROVIDER_SLUG}-source-data/*; python download.py ../${PROVIDER_SLUG}-source-data/
# or
rm -rf ../${PROVIDER_SLUG}-json-data/*; python convert.py ../${PROVIDER_SLUG}-source-data/ ../${PROVIDER_SLUG}-json-data/
"Assisted" method - using bduye scripts
You can use also use bduye scripts that automatize:
- cleaning of source/json dir (depending if the script is download/convert)
- call script with correct directories, and custom arguments you passed to it
Example:
~/fetchers-envs/eurostat$ ../../convert.py --datasets teicp290 --full
is equivalent to:
~/fetchers-envs/eurostat$ ./convert.py ../eurostat-source-data ../eurostat-json-data --datasets teicp290 --full
"docker" method
Python3 version on eros
is 5.5.3. To take advantage of Python3.7:
- create a file named
Dockerfile
in the ${PROVIDER_SLUG}-fetcher directory with this content:
# From https://pythonspeed.com/articles/activate-virtualenv-dockerfile/
FROM python:3.7
ENV VIRTUAL_ENV=/venv
RUN pip install virtualenv && virtualenv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
# Install dependencies:
COPY requirements.txt .
RUN pip install -r requirements.txt
# Run the application:
COPY *.py ./
CMD ["python"]
- build docker image (do it each time your fetcher code change)
docker build --tag ${PROVIDER_SLUG}-fetcher:latest .
- Launch your download/convert job:
docker run -v $PWD/../${PROVIDER_SLUG}-source-data:/source-data:rw ${PROVIDER_SLUG}-fetcher:latest python download.py /source-data --log info
docker run -v $PWD/../${PROVIDER_SLUG}-source-data:/source-data:ro -v $PWD/../${PROVIDER_SLUG}-json-data:/json-data:rw ${PROVIDER_SLUG}-fetcher:latest python convert.py /source-data /json-data --log info
Validating converted data
We take ecb
as an example.
ssh cepremap@eros.nomics.world
PROVIDER_SLUG=ecb
source ~/virtualenvs/dbnomics/bin/activate
# optional: update dbnomics-data-model
pip install -U dbnomics-data-model
dbnomics-validate ~/json-data/${PROVIDER_SLUG}-json-data/
# No error should be displayed. Use --log=debug to see more details.
Importing converted data into Apache Solr
We take ecb
as an example.
ssh cepremap@eros.nomics.world
PROVIDER_SLUG=ecb
source ~/virtualenvs/dbnomics/bin/activate
# optional: update dbnomics-data-model
pip install -U dbnomics-data-model
# optional: dbnomics-importer
cd ~/dbnomics-importer
git pull
pip install -r requirements.txt
cd ..
~/dbnomics-importer/import_storage_dir.py ~/json-data/${PROVIDER_SLUG}-json-data
# (API will search for data in ~/json-data, a symlink should exists)
# If the indexation raises an error like:
# "msg":"ERROR: [doc=ELSTAT/DKT15/DKT15-2-1_0020_F_A] unknown field 'dimensions_values_labels'",
# then you must reset the Solr core (see section below).
Now you may verify that ecb
is visible in the API and the UI:
- http://pre.db.nomics.world/providers
- http://api.pre.db.nomics.world/v22/providers
- http://api.pre.db.nomics.world/v22/providers/ECB
- http://pre.db.nomics.world/ECB/MIR (check that dimensions search and charts are OK)
If an internal error is returned by the Web API, follow the "Errors handling" section below.
Update the Web API and UI
You may first want to have an up-to-date Web API and UI. Follow the documentations of the respective projects.
Errors handling
If the Web API returns an internal error, you can check the server logs.
As root:
tail -f /var/log/uwsgi/app/dbnomics-api-uwsgi-v21.log
See also: troubleshooting
Test Solr queries locally
You can forward the port used by Solr to run queries with HTTP requests.
From your machine:
ssh -N -L 8983:localhost:8983 cepremap@eros.nomics.world
Then open URLs like:
- http://localhost:8983/solr/dbnomics/query?q=*
- http://localhost:8983/solr/dbnomics/query?q=provider_code:ECB
Reset the Solr core
Only to be done if the converted data follow a data model too recent compared to the Solr core schema. For example when a new field is added to the Solr schema by the import script.
Warning: this deletes absolutely everything in Solr about DBnomics! Be sure to check that you run this on the pre-production server (eros).
As solr user:
./bin/solr delete -c dbnomics
./bin/solr create -c dbnomics
./bin/solr config -c dbnomics -p 8983 -property update.autoCreateFields -value false
rm /var/solr/data/dbnomics/conf/managed-schema
cp ~cepremap/dbnomics-importer/solr_core_config/* /var/solr/data/dbnomics/conf/
As root:
systemctl restart solr.service