Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Support
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
D
documentation
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
    • Cycle Analytics
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Charts
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Labels
    • Milestones
  • Merge Requests 0
    • Merge Requests 0
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Charts
  • Packages
    • Packages
    • Container Registry
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Charts
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • dbnomics-fetchers
  • documentation
  • Wiki
  • test fetcher on pre prod

test fetcher on pre prod

Last edited by Pierre Dittgen Dec 05, 2019
Page history

Test a fetcher on the pre-production server

This documentation applies to pre-production server. More info on servers page.

See also: https://git.nomics.world/dbnomics/fetchers-envs

Super quick guide

For quick'n clean guide, follow /home/cepremap/fetchers-envs/README.md

In a nutshell:

  • clone, download and convert
    • export PROVIDER_SLUG=xxxx
    • cd ~cepremap/fetchers-envs
    • ./create-fetcher-env.sh $PROVIDER_SLUG
    • cd $PROVIDER_SLUG/$PROVIDER_SLUG-fetcher
    • ../../download.py
    • ../../convert.py
  • Validation & Solr import
    • source ~/virtualenvs/dbnomics/bin/activate
    • dbnomics-validate ~/json-data/${PROVIDER_SLUG}-json-data/
    • ~/dbnomics-importer/import_storage_dir.py ~/json-data/${PROVIDER_SLUG}-json-data

Create a new fetcher environment

As a requirement, repositories for the fetcher source code, source-data and json-data must have been created before. If necessary, use the script create-repositories-for-provider.py.

create-fetcher-env.sh script can be found in this repo

We take ecb as an example.

ssh cepremap@eros.nomics.world

cd ~/fetchers-envs
./create-fetcher-env.sh ecb  # Replace ecb by the real provider slug.

Running download or convert

We take ecb as an example.

ssh cepremap@eros.nomics.world

PROVIDER_SLUG=ecb

cd ~/fetchers-envs/${PROVIDER_SLUG}
source ${PROVIDER_SLUG}-venv/bin/activate

cd ${PROVIDER_SLUG}-fetcher

"Manual" method

rm -rf ../${PROVIDER_SLUG}-source-data/*; python download.py ../${PROVIDER_SLUG}-source-data/
# or
rm -rf ../${PROVIDER_SLUG}-json-data/*; python convert.py ../${PROVIDER_SLUG}-source-data/ ../${PROVIDER_SLUG}-json-data/

"Assisted" method - using bduye scripts

You can use also use bduye scripts that automatize:

  • cleaning of source/json dir (depending if the script is download/convert)
  • call script with correct directories, and custom arguments you passed to it

Example:

~/fetchers-envs/eurostat$ ../../convert.py --datasets teicp290 --full

is equivalent to:

~/fetchers-envs/eurostat$ ./convert.py ../eurostat-source-data ../eurostat-json-data --datasets teicp290 --full

"docker" method

Python3 version on eros is 5.5.3. To take advantage of Python3.7:

  • create a file named Dockerfile in the ${PROVIDER_SLUG}-fetcher directory with this content:
# From https://pythonspeed.com/articles/activate-virtualenv-dockerfile/

FROM python:3.7

ENV VIRTUAL_ENV=/venv
RUN pip install virtualenv && virtualenv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

# Install dependencies:
COPY requirements.txt .
RUN pip install -r requirements.txt

# Run the application:
COPY *.py ./
CMD ["python"]
  • build docker image (do it each time your fetcher code change)
docker build --tag ${PROVIDER_SLUG}-fetcher:latest .
  • Launch your download/convert job:
docker run  -v $PWD/../${PROVIDER_SLUG}-source-data:/source-data:rw ${PROVIDER_SLUG}-fetcher:latest python download.py  /source-data --log info

docker run  -v $PWD/../${PROVIDER_SLUG}-source-data:/source-data:ro -v $PWD/../${PROVIDER_SLUG}-json-data:/json-data:rw ${PROVIDER_SLUG}-fetcher:latest python convert.py  /source-data /json-data --log info

Validating converted data

We take ecb as an example.

ssh cepremap@eros.nomics.world

PROVIDER_SLUG=ecb

source ~/virtualenvs/dbnomics/bin/activate

# optional: update dbnomics-data-model
pip install -U dbnomics-data-model

dbnomics-validate ~/json-data/${PROVIDER_SLUG}-json-data/
# No error should be displayed. Use --log=debug to see more details.

Importing converted data into Apache Solr

We take ecb as an example.

ssh cepremap@eros.nomics.world

PROVIDER_SLUG=ecb

source ~/virtualenvs/dbnomics/bin/activate

# optional: update dbnomics-data-model
pip install -U dbnomics-data-model

# optional: dbnomics-importer
cd ~/dbnomics-importer
git pull
pip install -r requirements.txt
cd ..

~/dbnomics-importer/import_storage_dir.py ~/json-data/${PROVIDER_SLUG}-json-data
# (API will search for data in ~/json-data, a symlink should exists)
# If the indexation raises an error like:
#    "msg":"ERROR: [doc=ELSTAT/DKT15/DKT15-2-1_0020_F_A] unknown field 'dimensions_values_labels'",
# then you must reset the Solr core (see section below).

Now you may verify that ecb is visible in the API and the UI:

  • http://pre.db.nomics.world/providers
  • http://api.pre.db.nomics.world/v22/providers
  • http://api.pre.db.nomics.world/v22/providers/ECB
  • http://pre.db.nomics.world/ECB/MIR (check that dimensions search and charts are OK)

If an internal error is returned by the Web API, follow the "Errors handling" section below.

Update the Web API and UI

You may first want to have an up-to-date Web API and UI. Follow the documentations of the respective projects.

Errors handling

If the Web API returns an internal error, you can check the server logs.

As root:

tail -f /var/log/uwsgi/app/dbnomics-api-uwsgi-v21.log

See also: troubleshooting

Test Solr queries locally

You can forward the port used by Solr to run queries with HTTP requests.

From your machine:

ssh -N -L 8983:localhost:8983 cepremap@eros.nomics.world

Then open URLs like:

  • http://localhost:8983/solr/dbnomics/query?q=*
  • http://localhost:8983/solr/dbnomics/query?q=provider_code:ECB

Reset the Solr core

Only to be done if the converted data follow a data model too recent compared to the Solr core schema. For example when a new field is added to the Solr schema by the import script.

Warning: this deletes absolutely everything in Solr about DBnomics! Be sure to check that you run this on the pre-production server (eros).

As solr user:

./bin/solr delete -c dbnomics
./bin/solr create -c dbnomics
./bin/solr config -c dbnomics -p 8983 -property update.autoCreateFields -value false
rm /var/solr/data/dbnomics/conf/managed-schema
cp ~cepremap/dbnomics-importer/solr_core_config/* /var/solr/data/dbnomics/conf/

As root:

systemctl restart solr.service
Clone repository
  • Code style
  • Download big Git repositories statically
  • Git and Gitlab workflow
  • Runners migration
  • acceptance-criteria
    • fetchers
  • ci jobs and runners
  • code optimization
  • converter_checklist
  • dbnomics pearls
  • dev tools
  • e mails
  • failure handling procedures
  • fetcher design rules
  • general acceptance criteria
  • glossary
More Pages