Here are listed some useful tools to know for development.
Git
Relative config files
Since Git 2.13, it is possible to define config files for directories.
Example: the file ~/dev/dbnomics
define user email for all DBnomics repositories:
-
in
~/.gitconfig
:[includeIf "gitdir:~/dev/dbnomics/"] path = ~/dev/dbnomics/gitconfig
-
in
~/dev/dbnomics/gitconfig
:[user] email = jc.dus@cepremap.org
Multiple config files (one in the sub-tree of the other)
Example:
In ~/.gitconfig
:
[includeIf "gitdir:~/dev/"]
path = ~/dev/gitconfig
[includeIf "gitdir:~/dev/dbnomics/"]
path = ~/dev/dbnomics/gitconfig
Forcing git to use a manually configured email address
In ~/.gitconfig
:
[user]
useConfigOnly = true
Delete local image of remote branches when fetching
After a merge request is approved and the branch merged, the branch is deleted on the server. On the machine of the developer, it is irrelevant to keep those polluting branches.
This setting makes git delete local branches when their remote branches were deleted on the server.
In ~/.gitconfig
:
[fetch]
prune = true
Good commit messages
https://chris.beams.io/posts/git-commit/
Python
Useful dev tools
Virtualenvs
- We recommend using virtualenv wrapper to ease shell integration
- To automatically activate venvs: https://github.com/kennethreitz/autoenv
To use virtualenvs with Visual Studio Code editor:
-
from your terminal, while the virtualenv is activated, identify the path of the
python
executable:which python # Should answer something like /home/username/.local/share/virtualenvs/virtualenvname/bin/python
-
in Visual Studio Code, edit your workspace settings, and paste this block, adapting of course the path with your actual value:
{
"python.pythonPath": "/home/username/.local/share/virtualenvs/virtualenvname/bin/python"
}
autopep8
We use autopep8 to reformat Python code automatically, and avoid bikeshedding.
It works very well with the Visual Studio Code editor, and is enabled by default, as soon as you pip install autopep8
in your venv.
First you must configure Visual Studio Code to work with virtualenvs (see above).
To define a limit of 120 characters per line, edit your user settings, and paste this block:
"python.formatting.formatOnSave": true,
"python.formatting.autopep8Args": [
"--max-line-length=120"
],
pylint
We use pylint to check code quality directly from our source code editor.
It works very well with the Visual Studio Code editor.
First you must configure Visual Studio Code to work with virtualenvs (see above).
From your virtualenv in the shell, type pip install pylint
.
By default you'll be annoyed by the huge amount of warnings, so follow the instructions on the Code Style page.
isort
To sort imports automatically and separate them in 3 sections (Python, third-party, local), use isort.
Install in your virtualenv:
pip install isort
To sort imports in Visual Studio Code, press F1
and execute "Sort imports" action.
CSV tools
Tabview
View big CSV files in terminal.
https://github.com/TabViewer/tabview
Xsv
Powerfull operations in CSV files made quickly.
https://github.com/BurntSushi/xsv
Sample
Usefull function: sample => to test a script parsing a HUGE CSV source file, instead of testing it on the n firsts lines of file, we can generate a sample of the file, representing a "rich" mix of the values found in the file:
xsv sample 10000 services_annual_dataset.csv > services_annual_dataset-sample.csv
generate a 10000 lines sample of services_annual_dataset.csv
file
Select & frequency
Ex: collect (and count) all the flags values of a file:
> xsv select Flag merchandise_indices_annual_dataset.csv | xsv frequency
field,value,count
Flag,(NULL),26860
Flag,E,9924
Flag,B,8
sshfs - mount remote directory from server
When git repositories are big, you may want to avoid cloning them on your development machine. Sometimes it is more convenient to access them remotely from the server.
Use sshfs as described here.
tmpfs - mount directory in RAM
When a directory grows in size of number of files, the disk I/O becomes slower, even with SSD disks.
A quick and dirty solution during development is to use tmpfs like so (paths are simplified):
cd json-data
sudo mount -t tmpfs -o size=8g tmpfs wto-json-data-tmpfs
cd wto-fetcher
./wto_to_dbnomics.py wto-source-data wto-json-data-tmpfs
cd json-data
sudo umount wto-json-data-tmpfs
Warning: after un-mounting, data is lost! Once your script is OK, execute it against the real wto-json-data
directory.