Skip to content
Snippets Groups Projects

WIP: #501: add series names

Closed Bruno Duyé requested to merge ticket-501-series_names_are_not_right into master
1 unresolved thread

Merge request reports

Approval is optional

Closed by Bruno DuyéBruno Duyé 5 years ago (Oct 1, 2019 5:28pm UTC)

Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
    • Author Contributor

      @cbenz I wanted to test this on preprod to close #501, but when indexing I have this error:

      (dbnomics) cepremap@eros:~/fetchers-envs$ PROVIDER_SLUG='buba'; ~/dbnomics-importer/import_storage_dir.py ~/fetchers-envs/${PROVIDER_SLUG}/${PROVIDER_SLUG}-json-data                                                                        
      INFO:__main__:2019-09-25 16:56:36,804:Received args: Namespace(bare_repo_fallback=False, datasets=None, exclude_datasets=None, full=False, log='INFO', print_json_lines=False, solr_core='dbnomics', solr_hostname='localhost', solr_port=898
      3, solr_post=PosixPath('/opt/solr/bin/post'), start_from=None, storage_dir=PosixPath('/home/cepremap/fetchers-envs/buba/buba-json-data'))                                                                                                    
      INFO:__main__:2019-09-25 16:56:36,804:Using indexed_at '2019-09-25T14:56:36.804848Z' for all documents                                                                                                                                       
      INFO:__main__:2019-09-25 16:56:36,805:Provider code: 'BUBA'                                                                                                                                                                                  
      INFO:__main__:2019-09-25 16:56:36,811:provider.created_at is unknown. Running command 'git log --reverse --format="format:%at" | head -n1'                                                                                                   
      ERROR:__main__:2019-09-25 16:56:36,814:Could not find provider document in Solr. Indexing all datasets.                                                                                                                                      
      INFO:__main__:2019-09-25 16:56:36,815:Mode: full                                                                                                                                                                                             
      INFO:__main__:2019-09-25 16:56:36,815:Running command '/opt/solr/bin/post -c dbnomics -type application/json -url http://localhost:8983/solr/dbnomics/update/json/docs -'                                                                    
      INFO:__main__:2019-09-25 16:56:36,817:Processing 45 datasets...                                                                                                                                                                              
      INFO:__main__:2019-09-25 16:56:36,817:Indexing dataset 'BBAI3' (1/45)                                                                                                                                                                        
      INFO:__main__:2019-09-25 16:56:36,829:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBAI3 | head -n1'                                                                                           
      java -classpath /opt/solr/dist/solr-core-7.5.0.jar -Dauto=yes -Dtype=application/json -Durl=http://localhost:8983/solr/dbnomics/update/json/docs -Dc=dbnomics -Ddata=stdin org.apache.solr.util.SimplePostTool                               
      SimplePostTool version 5.0.0                                                                                                                                                                                                                 
      POSTing stdin to http://localhost:8983/solr/dbnomics/update/json/docs...                                                                                                                                                                     
      INFO:__main__:2019-09-25 16:56:37,055:Indexing dataset 'BBAPV' (2/45)                                                                                                                                                                        
      INFO:__main__:2019-09-25 16:56:37,089:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBAPV | head -n1'                                                                                           
      INFO:__main__:2019-09-25 16:56:37,095:Indexing dataset 'BBASV' (3/45)                                                                                                                                                                        
      INFO:__main__:2019-09-25 16:56:37,163:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBASV | head -n1'                                                                                           
      INFO:__main__:2019-09-25 16:56:37,720:Indexing dataset 'BBBP1' (4/45)                                                                                                                                                                        
      INFO:__main__:2019-09-25 16:56:37,729:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBBP1 | head -n1'                                                                                           
      INFO:__main__:2019-09-25 16:56:37,750:Indexing dataset 'BBBP2' (5/45)                                                                                                                                                                        
      INFO:__main__:2019-09-25 16:56:37,761:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBBP2 | head -n1'                                                                                           
      INFO:__main__:2019-09-25 16:56:37,766:Indexing dataset 'BBBPS' (6/45)                                                                                                                                                                        
      INFO:__main__:2019-09-25 16:56:37,776:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBBPS | head -n1'                                                                                           
      INFO:__main__:2019-09-25 16:56:37,781:Indexing dataset 'BBBU2' (7/45)                                                                                                                                                                        
      INFO:__main__:2019-09-25 16:56:37,793:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBBU2 | head -n1'                                                                                           
      INFO:__main__:2019-09-25 16:56:37,827:Indexing dataset 'BBBZ1' (8/45)                                                                                                                                                                        
      INFO:__main__:2019-09-25 16:56:37,836:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBBZ1 | head -n1'                                                                                           
      INFO:__main__:2019-09-25 16:56:37,844:Indexing dataset 'BBDA1' (9/45)                                                                                                                                                                        
      INFO:__main__:2019-09-25 16:56:37,853:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBDA1 | head -n1'                                                                                           
      INFO:__main__:2019-09-25 16:56:37,866:Indexing dataset 'BBDB2' (10/45)                                                                                                                                                                       
      INFO:__main__:2019-09-25 16:56:37,878:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBDB2 | head -n1'                                                                                           
      INFO:__main__:2019-09-25 16:56:37,905:Indexing dataset 'BBDE1' (11/45)                                                                                                                                                                       
      INFO:__main__:2019-09-25 16:56:37,928:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBDE1 | head -n1'                                                                                           
      INFO:__main__:2019-09-25 16:56:37,961:Indexing dataset 'BBDG1' (12/45)                                                                                                                                                                       
      INFO:__main__:2019-09-25 16:56:37,970:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBDG1 | head -n1'                                                                                           
      INFO:__main__:2019-09-25 16:56:37,975:Indexing dataset 'BBDL1' (13/45)                                                                                                                                                                       
      INFO:__main__:2019-09-25 16:56:37,983:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBDL1 | head -n1'                                                                                           
      INFO:__main__:2019-09-25 16:56:37,991:Indexing dataset 'BBDP1' (14/45)                                                                                                                                                                       
      INFO:__main__:2019-09-25 16:56:38,001:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBDP1 | head -n1'                                                                                           
      INFO:__main__:2019-09-25 16:56:38,009:Indexing dataset 'BBDR1' (15/45)                                                                                                                                                                       
      INFO:__main__:2019-09-25 16:56:38,018:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBDR1 | head -n1'                                                                                           
      INFO:__main__:2019-09-25 16:56:38,028:Indexing dataset 'BBDY1' (16/45)                                                                                                                                                                       
      INFO:__main__:2019-09-25 16:56:38,038:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBDY1 | head -n1'                                                                                           
      INFO:__main__:2019-09-25 16:56:38,044:Indexing dataset 'BBDZ1' (17/45)                                                                                                                                                                       
      INFO:__main__:2019-09-25 16:56:38,054:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBDZ1 | head -n1'                                                                                           
      INFO:__main__:2019-09-25 16:56:38,061:Indexing dataset 'BBEE1' (18/45)                                                                                                                                                                       
      INFO:__main__:2019-09-25 16:56:38,070:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBEE1 | head -n1'                                                                                           
      INFO:__main__:2019-09-25 16:56:38,075:Indexing dataset 'BBEX3' (19/45)
      INFO:__main__:2019-09-25 16:56:38,109:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBEX3 | head -n1'
      INFO:__main__:2019-09-25 16:56:38,237:Indexing dataset 'BBFB1' (20/45)
      INFO:__main__:2019-09-25 16:56:38,278:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBFB1 | head -n1'
      INFO:__main__:2019-09-25 16:56:38,545:Indexing dataset 'BBFDV' (21/45)
      INFO:__main__:2019-09-25 16:56:38,616:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBFDV | head -n1'
      INFO:__main__:2019-09-25 16:56:39,645:Indexing dataset 'BBFI1' (22/45)
      INFO:__main__:2019-09-25 16:56:39,673:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBFI1 | head -n1'
      INFO:__main__:2019-09-25 16:56:39,892:Indexing dataset 'BBFI3' (23/45)
      INFO:__main__:2019-09-25 16:56:39,901:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBFI3 | head -n1'
      INFO:__main__:2019-09-25 16:56:39,911:Indexing dataset 'BBFN1' (24/45)
      INFO:__main__:2019-09-25 16:56:39,934:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBFN1 | head -n1'
      INFO:__main__:2019-09-25 16:56:40,203:Indexing dataset 'BBK01' (25/45)
      INFO:__main__:2019-09-25 16:56:40,375:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBK01 | head -n1'
      INFO:__main__:2019-09-25 16:56:42,497:Indexing dataset 'BBMF1' (26/45)
      INFO:__main__:2019-09-25 16:56:42,506:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBMF1 | head -n1'
      INFO:__main__:2019-09-25 16:56:42,539:Indexing dataset 'BBMME' (27/45)
      INFO:__main__:2019-09-25 16:56:42,549:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBMME | head -n1'
      INFO:__main__:2019-09-25 16:56:42,561:Indexing dataset 'BBMMS' (28/45)
      INFO:__main__:2019-09-25 16:56:42,569:dataset.created_at is unknown. Running command 'git log --reverse --format="format:%at" -- BBMMS | head -n1'
      Traceback (most recent call last):
        File "/home/cepremap/dbnomics-importer/import_storage_dir.py", line 525, in <module>
          sys.exit(main())
        File "/home/cepremap/dbnomics-importer/import_storage_dir.py", line 483, in main
          indexed_at, desired_datasets_codes_actions)
        File "/home/cepremap/dbnomics-importer/import_storage_dir.py", line 253, in process_datasets
          dataset_solr = build_dataset_solr(solr, provider_json, dataset_json, indexed_at, repo)
        File "/home/cepremap/dbnomics-importer/import_storage_dir.py", line 135, in build_dataset_solr
          commit_datetime = datetime.utcfromtimestamp(int(commit_timestamp_str))
      ValueError: invalid literal for int() with base 10: ''
      (dbnomics) cepremap@eros:~/fetchers-envs$ COMMITting Solr index changes to http://localhost:8983/solr/dbnomics/update/json/docs...
      Time spent: 0:00:08.430

      I suppose that ontly 28/45 datasets have been imported. Do you have an idea on how to solve this ?

    • Author Contributor

      In fact this question can wait, I just realized that the generated series doesn't pass validation script; so I've to have a look again

    • Please register or sign in to reply
  • Bruno Duyé added 3 commits

    added 3 commits

    Compare with previous version

  • closed

  • Author Contributor

    Closing this MR as issue management#501 (closed) have been fixed by !2 (merged)

Please register or sign in to reply