Skip to content

Do not split pagenames with slashes in Main:#361

Merged
xxyzz merged 2 commits intomainfrom
slashes-in-titles
Jan 29, 2025
Merged

Do not split pagenames with slashes in Main:#361
xxyzz merged 2 commits intomainfrom
slashes-in-titles

Conversation

@kristian-clausal
Copy link
Collaborator

@kristian-clausal kristian-clausal commented Jan 28, 2025

Wiktextract issue tatuylonen/wiktextract#1009

Pagename data passed into templates or modules was split on slashes, so you'd get output like Bing
and Bed instead of A/Bing and A/Bed.

We had some copied code from Scribunto handling
slashes in titles, but it didn't do a check for
whether slashes should be considered 'subpages',
like Module:mainmodule/submodule. Main: which
is our main concern, doesn't have subpages and
slashes should be considered part of the title.

I've only done a quick and dirty fix here, because we only concern ourselves with Main:, Template:
and Module:; All other should hopefully have
subpages (default assumption except for Main:)
or not have slashes in their titles.

@kristian-clausal
Copy link
Collaborator Author

Why don't I check make test...

@kristian-clausal
Copy link
Collaborator Author

Apparently Main: pages can have subpages, like head/translations... So this is not correct. OR possibly that the tests are wrong and the output shouldn't striped /translations from the 'title'.

Wiktextract issue tatuylonen/wiktextract#1009

Pagename data passed into templates or modules was
split on slashes, so you'd get output like `Bing`
and `Bed` instead of `A/Bing` and `A/Bed`.

We had some copied code from Scribunto handling
slashes in titles, but it didn't do a check for
whether slashes should be considered 'subpages',
like Module:mainmodule/submodule. `Main:` which
is our main concern, doesn't have subpages and
slashes should be considered part of the title.

I've only done a quick and dirty fix here, because
we only concern ourselves with Main:, Template:
and Module:; All other should hopefully have
subpages (default assumption except for Main:)
or not have slashes in their titles.
@xxyzz xxyzz force-pushed the slashes-in-titles branch from a0a34cb to 29d27b7 Compare January 29, 2025 03:15
@xxyzz
Copy link
Collaborator

xxyzz commented Jan 29, 2025

I only updated two tests, I also test the "A/B" page and the extracted forms are correct.

@xxyzz xxyzz force-pushed the slashes-in-titles branch from 29d27b7 to e2d6a1f Compare January 29, 2025 03:23
@xxyzz xxyzz merged commit 9dbd323 into main Jan 29, 2025
10 checks passed
@xxyzz xxyzz deleted the slashes-in-titles branch January 29, 2025 03:26
"Main": {
"id": 0,
"name": "Main",
"subpages": false,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be little confusing, I guess they consider "Template:a/doc" a subpage but not "eye/translations".

@kristian-clausal
Copy link
Collaborator Author

I think we don't access translations pages by partitioning "xxx/translations", instead of do "xxx" + "/translations", so this shouldn't, hopefully break anything. Thank you for checking this!

@xxyzz
Copy link
Collaborator

xxyzz commented Jan 29, 2025

I've tested the "eye" page in en edition and the translation data in "eye/translations" page are extracted correctly. Our Python code shouldn't be affected, just hope all that Lua code also works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants