Do not split pagenames with slashes in Main:#361
Conversation
|
Why don't I check make test... |
|
Apparently Main: pages can have subpages, like head/translations... So this is not correct. OR possibly that the tests are wrong and the output shouldn't striped |
Wiktextract issue tatuylonen/wiktextract#1009 Pagename data passed into templates or modules was split on slashes, so you'd get output like `Bing` and `Bed` instead of `A/Bing` and `A/Bed`. We had some copied code from Scribunto handling slashes in titles, but it didn't do a check for whether slashes should be considered 'subpages', like Module:mainmodule/submodule. `Main:` which is our main concern, doesn't have subpages and slashes should be considered part of the title. I've only done a quick and dirty fix here, because we only concern ourselves with Main:, Template: and Module:; All other should hopefully have subpages (default assumption except for Main:) or not have slashes in their titles.
a0a34cb to
29d27b7
Compare
|
I only updated two tests, I also test the "A/B" page and the extracted forms are correct. |
29d27b7 to
e2d6a1f
Compare
| "Main": { | ||
| "id": 0, | ||
| "name": "Main", | ||
| "subpages": false, |
There was a problem hiding this comment.
Might be little confusing, I guess they consider "Template:a/doc" a subpage but not "eye/translations".
|
I think we don't access translations pages by partitioning "xxx/translations", instead of do "xxx" + "/translations", so this shouldn't, hopefully break anything. Thank you for checking this! |
|
I've tested the "eye" page in en edition and the translation data in "eye/translations" page are extracted correctly. Our Python code shouldn't be affected, just hope all that Lua code also works. |
Wiktextract issue tatuylonen/wiktextract#1009
Pagename data passed into templates or modules was split on slashes, so you'd get output like
Bingand
Bedinstead ofA/BingandA/Bed.We had some copied code from Scribunto handling
slashes in titles, but it didn't do a check for
whether slashes should be considered 'subpages',
like Module:mainmodule/submodule.
Main:whichis our main concern, doesn't have subpages and
slashes should be considered part of the title.
I've only done a quick and dirty fix here, because we only concern ourselves with Main:, Template:
and Module:; All other should hopefully have
subpages (default assumption except for Main:)
or not have slashes in their titles.