Validation script fails when series has no observations (or other cases)
Errors encountered during the development of a fetcher converting https://www.ipp.eu/baremes-ipp/ to json-data, starting by producing sample files manually.
These may be solved very easily but let's take advantage of this to add non-regression tests.
When TSV file is empty
Traceback (most recent call last):
File "/home/cbenz/.local/share/virtualenvs/dbnomics-data-model/bin/dbnomics-validate", line 11, in <module>
load_entry_point('dbnomics-data-model', 'console_scripts', 'dbnomics-validate')()
File "/home/cbenz/Dev/dbnomics/dbnomics-data-model/dbnomics_data_model/validate_storage.py", line 135, in main
max_series=args.max_series, max_observations=args.max_observations)
File "/home/cbenz/Dev/dbnomics/dbnomics-data-model/dbnomics_data_model/validate_storage.py", line 711, in validate_series
ignore_errors=ignore_errors)
File "/home/cbenz/Dev/dbnomics/dbnomics-data-model/dbnomics_data_model/validate_storage.py", line 418, in validate_observations
header = observations[0]
IndexError: list index out of range
When TSV file does not exist.
Traceback (most recent call last):
File "/home/cbenz/.local/share/virtualenvs/dbnomics-data-model/bin/dbnomics-validate", line 11, in <module>
load_entry_point('dbnomics-data-model', 'console_scripts', 'dbnomics-validate')()
File "/home/cbenz/Dev/dbnomics/dbnomics-data-model/dbnomics_data_model/validate_storage.py", line 135, in main
max_series=args.max_series, max_observations=args.max_observations)
File "/home/cbenz/Dev/dbnomics/dbnomics-data-model/dbnomics_data_model/validate_storage.py", line 704, in validate_series
for _, series_code, observations in observations_iterator:
File "/home/cbenz/Dev/dbnomics/dbnomics-data-model/dbnomics_data_model/storages/abstract.py", line 150, in iter_observations
yield from self.iter_observations_from_jsonl(series_codes, offset_by_series_code)
File "/home/cbenz/Dev/dbnomics/dbnomics-data-model/dbnomics_data_model/storages/filesystem.py", line 130, in iter_observations_from_jsonl
series_jsonl_file_path = self.path / self.get_series_jsonl_file_name()
File "/usr/lib64/python3.7/pathlib.py", line 908, in __truediv__
return self._make_child((key,))
File "/usr/lib64/python3.7/pathlib.py", line 695, in _make_child
drv, root, parts = self._parse_args(args)
File "/usr/lib64/python3.7/pathlib.py", line 649, in _parse_args
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
When category tree JSON top-level element is an Object
instead of an Array
:
Traceback (most recent call last):
File "/home/cbenz/.local/share/virtualenvs/dbnomics-data-model/bin/dbnomics-validate", line 11, in <module>
load_entry_point('dbnomics-data-model', 'console_scripts', 'dbnomics-validate')()
File "/home/cbenz/Dev/dbnomics/dbnomics-data-model/dbnomics_data_model/validate_storage.py", line 115, in main
category_tree_errors = validate_category_tree(storage, ignore_errors=args.ignore_errors)
File "/home/cbenz/Dev/dbnomics/dbnomics-data-model/dbnomics_data_model/validate_storage.py", line 257, in validate_category_tree
datasets_codes_in_category_tree = set(list(category_tree_dataset_code_iter(category_tree_json)))
File "/home/cbenz/Dev/dbnomics/dbnomics-data-model/dbnomics_data_model/validate_storage.py", line 231, in category_tree_dataset_code_iter
elif elt.get('code'):
AttributeError: 'str' object has no attribute 'get'
Edited by Christophe Benz