Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
D
dares-fetcher
Manage
Activity
Members
Labels
Plan
Issues
0
Issue boards
Milestones
Wiki
Code
Merge requests
0
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
dbnomics-fetchers
dares-fetcher
Commits
68dcde78
Commit
68dcde78
authored
7 years ago
by
Constance de Quatrebarbes
Browse files
Options
Downloads
Patches
Plain Diff
insert headers to_source_data
parent
91e324f2
No related branches found
Branches containing commit
No related tags found
1 merge request
!1
Implement download script and more
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
dares_to_source_data.py
+24
-4
24 additions, 4 deletions
dares_to_source_data.py
with
24 additions
and
4 deletions
dares_to_source_data.py
+
24
−
4
View file @
68dcde78
#!/usr/bin/env python3
# dares-fetcher -- Fetch series from DRESS
#By Constance de Quatrebarbes <constance.24barbes@jailbreak.paris>
# dares-fetcher -- Fetch series from DARES
# By Constance de Quatrebarbes <constance.de.quatrebarbes@cepremap.org>
# Copyright (C) 2017 Cepremap
# https://git.nomics.world/dbnomics-fetchers/daess-fetcher
#
# This is free software; you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
# published by the Free Software Foundation, either version 3 of the
# License, or (at your option) any later version.
#
# This software is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see <http:>www.gnu.org/licenses/>.
"""
DARES Fetcher.
Usage:
...
...
@@ -79,13 +96,16 @@ def fetch(dataset):
doc_list
=
sidebar
.
find
(
"
ul
"
,
{
"
class
"
:
"
docs-joints__liste
"
})
target_files
=
[]
for
doc
in
doc_list
.
findAll
(
"
li
"
):
ext
,
title
=
[
n
.
text
for
n
in
doc
.
findAll
(
"
span
"
)[
0
:
2
]]
if
ext
in
[
"
xls
"
,
"
xlsx
"
]:
raw_url
=
doc
.
find
(
'
a
'
,{
"
class
"
:
ext
}).
get
(
"
href
"
)
f_url
=
os
.
path
.
join
(
ROOT_PROVIDER_URL
,
raw_url
)
f_name
=
f_url
.
split
(
"
/
"
)[
-
1
]
target_files
.
append
({
"
f_name
"
:
f_name
,
"
f_url
"
:
f_url
,
"
f_title
"
:
title
,
"
f_ext
"
:
ext
})
target_files
.
append
({
"
f_name
"
:
f_name
,
"
f_url
"
:
f_url
,
"
f_title
"
:
title
,
"
f_ext
"
:
ext
})
assert
len
(
target_files
)
==
dataset
[
"
file_nb
"
],
\
"
Fetcher Error: url %s should retrieve %i xls docs instead of %i
"
%
(
url
,
dataset
[
"
file_nb
"
],
len
(
target_files
))
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment