Here are listed some useful tools to know for development.
Since Git 2.13, it is possible to define config files for directories.
Example: the file ~/dev/dbnomics define user email for all DBnomics repositories:
path = ~/dev/dbnomics/gitconfig
email = firstname.lastname@example.org
path = ~/dev/gitconfig
path = ~/dev/dbnomics/gitconfig
useConfigOnly = true
After a merge request is approved and the branch merged, the branch is deleted on the server. On the machine of the developer, it is irrelevant to keep those polluting branches.
This setting makes git delete local branches when their remote branches were deleted on the server.
prune = true
To use virtualenvs with Visual Studio Code editor:
from your terminal, while the virtualenv is activated, identify the path of the python executable:
# Should answer something like /home/username/.local/share/virtualenvs/virtualenvname/bin/python
in Visual Studio Code, edit your workspace settings, and paste this block, adapting of course the path with your actual value:
We use autopep8 to reformat Python code automatically, and avoid bikeshedding.
It works very well with the Visual Studio Code editor, and is enabled by default, as soon as you pip install autopep8 in your venv.
pip install autopep8
First you must configure Visual Studio Code to work with virtualenvs (see above).
To define a limit of 120 characters per line, edit your user settings, and paste this block:
We use pylint to check code quality directly from our source code editor.
It works very well with the Visual Studio Code editor.
From your virtualenv in the shell, type pip install pylint.
pip install pylint
By default you'll be annoyed by the huge amount of warnings, so follow the instructions on the Code Style page.
To sort imports automatically and separate them in 3 sections (Python, third-party, local), use isort.
Install in your virtualenv:
pip install isort
To sort imports in Visual Studio Code, press F1 and execute "Sort imports" action.
View big CSV files in terminal.
Powerfull operations in CSV files made quickly.
Usefull function: sample => to test a script parsing a HUGE CSV source file, instead of testing it on the n firsts lines of file, we can generate a sample of the file, representing a "rich" mix of the values found in the file:
xsv sample 10000 services_annual_dataset.csv > services_annual_dataset-sample.csv
generate a 10000 lines sample of services_annual_dataset.csv file
Ex: collect (and count) all the flags values of a file:
> xsv select Flag merchandise_indices_annual_dataset.csv | xsv frequency
When git repositories are big, you may want to avoid cloning them on your development machine. Sometimes it is more convenient to access them remotely from the server.
Use sshfs as described here.
When a directory grows in size of number of files, the disk I/O becomes slower, even with SSD disks.
A quick and dirty solution during development is to use tmpfs like so (paths are simplified):
sudo mount -t tmpfs -o size=8g tmpfs wto-json-data-tmpfs
./wto_to_dbnomics.py wto-source-data wto-json-data-tmpfs
sudo umount wto-json-data-tmpfs
Warning: after un-mounting, data is lost! Once your script is OK, execute it against the real wto-json-data directory.