Try runners autoscale with Kubernetes

This issue is part of #666

Goal

General goal: reduce data availability delay.
Specific goal of this issue: reduce the waiting time of a job in the waiting queue by using GitLab runners autoscale feature via Kubernetes executor (cf docs)

Context

previously we tried autoscale with docker-machine executor with Scaleway driver
- mostly because Docker driver had issues, and being not officially supported by Docker does not help
my intuition tells me to try Kubernetes first because it is more widely adopted than docker-machine
- Scaleway has a Kubernetes commercial offer named Kapsule and we're going to start with it

Tasks

create a Kubernetes cluster on Scaleway
follow docs GitLab runner on Kubernetes
setup cluster scale up
setup cluster scale down
setup CI pipelines cache via S3 distributed cache
refactor GitLab CI pipeline to be a unique pipeline with download, convert, index jobs (related to #523 (closed) and #557)
consider adapting or removing the current dashboard
add Prometheus and collect GitLab runner metrics (enabled by default with GitLab Runner chart in values.yaml)
setup a dashboard presenting data like a timeline (Grafana?)

New CI pipeline

cf https://git.nomics.world/dbnomics/dbnomics-fetcher-pipeline
write pipeline with 3 stages: download, convert, index
add validate stage between convert and index, but keep in mind that warnings will need to be introduced
test with one big fetcher
test with many fetchers at the same time
remove test triggers (afdb, ecb)
do not use dev branch in wget ... git-pull-or-clone.py
remove new-pipeline branch name in Index job
add "git push" info to the dashboard (cf this JSON)
add support for errors.json artifact

Migrating fetchers to k8s

In management project:

update fetchers.yml to remove legacy_pipeline flag

Then use switch-provider-to-k8s-pipeline.py:

python switch-provider-to-k8s-pipeline.py -v --dry-run PROVIDER_SLUG
# if everything seems OK
python switch-provider-to-k8s-pipeline.py -v PROVIDER_SLUG

About k8s resources requests and limits:

start with default settings provided by fetcher-gitlab-ci.yml
look at the metrics in Grafana
adjust memory and CPU requests and limits based on what's really used

Rollback?

If needed, it's possible to come back to the old pipeline by reverting the following steps, and launching the script configure-ci-for-provider.py:

revert commit in fetcher source code about .gitlab-ci.yml
update fetchers.yml to add legacy_pipeline: true flag

Edited Nov 17, 2020 by Christophe Benz