Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
B
bceao-fetcher
Manage
Activity
Members
Labels
Plan
Issues
0
Issue boards
Milestones
Wiki
Code
Merge requests
0
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Michel Juillard
bceao-fetcher
Commits
16d48830
Commit
16d48830
authored
5 years ago
by
Michel Juillard
Browse files
Options
Downloads
Patches
Plain Diff
refactoring category_tree
parent
3935d068
No related branches found
Branches containing commit
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
download.py
+14
-11
14 additions, 11 deletions
download.py
with
14 additions
and
11 deletions
download.py
+
14
−
11
View file @
16d48830
...
...
@@ -76,14 +76,20 @@ def download_datasets(target_dir):
_file
.
write
(
resp
.
text
)
headers
=
resp
.
headers
soup
=
bs
(
resp
.
text
,
"
lxml
"
)
categories
=
{}
categories
=
[]
for
ul
in
soup
.
find_all
(
"
ul
"
,
class_
=
"
extend
"
):
cat
=
ul
.
parent
.
a
.
text
.
encode
(
"
utf-8
"
).
decode
(
"
utf-8
"
,
errors
=
"
ignore
"
)
categories
[
cat
]
=
[]
# ugly fix of broken unicode char
cat_name
=
clean_special_char
(
ul
.
parent
.
a
.
text
)
category
=
{
"
name
"
:
cat_name
,
"
code
"
:
slugify
.
slugify
(
clean_special_char
(
ul
.
parent
.
a
.
text
))
}
category
[
"
datasets
"
]
=
[]
for
li
in
ul
.
find_all
(
"
li
"
):
series
=
list
(
ast
.
literal_eval
(
li
.
a
.
get
(
"
onclick
"
).
replace
(
"
soumettreTab
"
,
""
)))
categor
ies
[
cat
].
append
(
series
)
categor
y
[
"
datasets
"
].
append
(
series
)
categories
.
append
(
category
)
# no need to have country names
# encoding problems: broken utf-8 from website app: countries and categories é and ô same
countries
=
{
n
.
text
.
strip
().
encode
(
'
utf-8
'
).
decode
(
"
utf-8
"
,
errors
=
"
ignore
"
):
n
.
find
(
"
input
"
)
...
...
@@ -122,16 +128,13 @@ def download_datasets(target_dir):
post_data
.
update
(
country
)
categories_tree
=
[]
for
cat
,
series_l
in
categories
.
items
():
cat_name
=
clean_special_char
(
ul
.
parent
.
a
.
text
)
for
cat
in
categories
:
category
=
{
# ugly fix of broken unicode char
"
name
"
:
clean_special_char
(
ul
.
parent
.
a
.
text
),
"
code
"
:
slugify
.
slugify
(
clean_special_char
(
ul
.
parent
.
a
.
text
))
"
code
"
:
cat
[
"
code
"
],
"
name
"
:
cat
[
"
name
"
]
}
category
[
"
children
"
]
=
[]
for
series
in
series_l
:
freq
,
id_tab
,
s_name_id
=
series
for
freq
,
id_tab
,
s_name_id
in
cat
[
"
datasets
"
]:
post_data
[
"
idTab
"
]
=
id_tab
data
=
[(
k
,
v
)
for
k
,
v
in
post_data
.
items
()]
# html_download
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment