Skip to content

Conversation

@xxyzz
Copy link
Collaborator

@xxyzz xxyzz commented Oct 21, 2025

On Wiktionary, || at the line beginning inside a table is expanded to a single <td> HTML element. Previous code produces an extra empty table cell node and makes it more difficult for extractor code to decide whether to discard the cell.

test_table_hdr4 was added in commit 1f48e90 for en edition page "山 歩き" and the extracted forms list is the same after this change.

@kristian-clausal
Copy link
Collaborator

You might need to upgrade ruff, I had the same problem with the linter being mean to me. The format specs have been changed.

On Wiktionary, `||` at the line beginning inside a table is expanded
to a single `<td>` HTML element. Previous code produces an extra
empty table cell node and makes it more difficult for extractor code
to decide whether to discard the cell.

`test_table_hdr4` was added in commit 1f48e90 for en edition page "山
歩き" and the extracted forms list is the same after this change.
@xxyzz
Copy link
Collaborator Author

xxyzz commented Oct 21, 2025

I forget to run ruff format...

@xxyzz xxyzz merged commit f26afeb into tatuylonen:main Oct 21, 2025
6 checks passed
@xxyzz xxyzz deleted the table_cell branch October 21, 2025 04:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants