... | ... | @@ -170,4 +170,40 @@ Work Experience of the Population (Annual) |
|
|
|
|
|
## Data samples
|
|
|
|
|
|
|
|
|
## Useful parts in existing code
|
|
|
|
|
|
* **location**: https://git.nomics.world/dbnomics/dlstats/blob/master/dlstats/fetchers/bls.py
|
|
|
|
|
|
### Datasets download
|
|
|
|
|
|
1. parsing of HTML page
|
|
|
https://git.nomics.world/dbnomics/dlstats/blob/master/dlstats/fetchers/bls.py
|
|
|
- function parse_bls_site(), line 115
|
|
|
|
|
|
2. building category tree
|
|
|
- function build_data_tree(), line 200
|
|
|
- function make_node(), line 205 must be rewritten for new
|
|
|
infrastructure. Add logic for excluded datasets
|
|
|
|
|
|
3. add logic for downloading all datasets in category trees
|
|
|
|
|
|
4. Download all files for a desired dataset
|
|
|
- get directory content with function get_data_directory(), line 545
|
|
|
- get \*.series file with the list of series and dimensions with
|
|
|
functions get_series_filepath(), line 694, et
|
|
|
get_series_fields(), line 705
|
|
|
- using header of \*series file, get dimension names with
|
|
|
get_dimensions_keys(), line 576
|
|
|
- download dimension files with get_code_list(), line 582, that
|
|
|
calls get_dimension_data(), line 652. This
|
|
|
function handles all peculiarities in the naming of the
|
|
|
dimensions and the format of the dictionary files:
|
|
|
- fmt = 1, two colonms, code - label
|
|
|
- fmt = 2, three columns, ignore first column - code - label
|
|
|
- fmt = 3, double classification, code - secondary code - label
|
|
|
- fmt = 4, double classification, code1 - code2 - ignore - label
|
|
|
- data file names are obtained with get_data_filenames(), line 683.
|
|
|
- data files are handled in iterator SeriesIterator, line 283.
|
|
|
- data files are downloaded line 293
|
|
|
|
|
|
TO BE CONTINUED.... |