Post Translation: Add translation of posts based on GlotPress#90
Post Translation: Add translation of posts based on GlotPress#90dd32 wants to merge 18 commits intoWordPress:trunkfrom
Conversation
…pecify the string was actually translated (even if it remained the same).
…the work for /gutenberg/, /patterns/ pattern translation, etc.
|
Really nice, this is quite something to pull off. I've gone through the code and noted how it works along the way (maybe useful for others) and added some remarks:
Further RemarksThe display locale must be set through code right now, potentially with a Out of ignorance how block themes work exactly but will Using projects for each page in GlotPress has pros and cons:
As already mentioned in WordPress/wporg-main-2022#15 (comment), if (RTL) languages need changes to the HTML, it might be necessary to do those transforms on the PHP level individually. I see invalidation of translated pages upon translation save as quite necessary to validate good translations. It can be really hard to be sure that a new translation works if you cannot see it in context. It'd be unfeasible having to wait for the cache to expire. Overall, the approach seems technically feasible and already exposes some of the difficulties we'll face in Gutenberg Phase 4 but also shows potential solutions for those. It will be exciting to see how this holds up in production, it looks quite ready for first experiments. |
|
Thanks @akirk! The code that throws things in a single-project is a hold-over from early prototypes (This has actually been built 3-4 times now), I would anticipate that it would be a project-per-site at a minimum.
This is actually handled by the MakePot import class, the same code is in use for translations on I would love to see this scenario handled natively by GlotPress rather than code like this, I'm not sure how though.. I suspect it would end up very similar to this. It might be that storing it as project-per-page and a translation UI that just pulled strings from "current project, and all sub projects" would work here though.
Handling context of strings that need to be translated differently on different pages (or different blocks on the same page!) is a shortcoming of this, and I suspect the best way to handle this will be to add a "context flag" to the Block in the editor to then make the strings within that Block (or Block Group) have that as the originals context. I haven't fleshed that idea out yet. Eg, in psuedo-block, these would be two strings with one having The URL reference here is mostly to provide translators the ability to click through to the pages where it's used, not to provide string context (at least not in gettext context terms).
This is an overly cautious approach, I admit. I chose to clear all translations cache upon any page alteration as the cache here is only being used for speed - to avoid the block parsing needing to be run on every pageload. Clearing the cache everytime a page edit is made is not too bad. The 6hr cache timeframe could also be lowered to 1hr for the same reason - it's only to avoid it running every pageload.
That would be beneficial for the
That was an option I considered, and if it was the source-of-truth for the translations, it would make perfect sense. In this case however, we're pulling from the source-of-truth which is GlotPress and on-the-fly converting the page, just with caching. On the Pattern Directory we do use similar code, but instead of on-the-fly-and-cache we created DB posts, as that was the quickest way to implement Searching of the content. That may be needed for Translated Pages in the future for sure, but for now, we can skip the DB step IMHO.
Correct, currently we run a single theme On other sites, such as https://learn.wordpress.org/ we have a Language switcher, which also filters
Good question! I'm not actually sure of the answer.. In practice, I don't understand how Block themes work under the hood either.
At this point, that's an issue that already exists AFAIK, this isn't going to make that any better or worse.
This feels like two different issues.
I would love to improve the loop-time for translations to clear the cache for sure! |
|
In a recent Polyglots meeting, @naokomc talked about how the Japanese team is translating the handbooks and how they have them updated:
I think it could be interesting to use the translation process presented in this PR into the handbooks as beta testers, because translators have this need. What do you think? |
|
Is there an angle to explore around the way Gutenberg saves block markup? Generate special classes or IDs on translateable elements, for example. Wrap translateable text in a special tag. Store data in json blobs. Or even use formats as a mechanism - require content authors to explicitly identify text to be translated. |
100% there is - however this is something that's probably going to be best considered as part of Gutenberg Phase 4, purely because those who have their minds deep within the Gutenberg internals and have been thinking about it in the back of their minds forever, are sure to have a better idea of how that could be done.
The main problem I can see there, is that translations can be within HTML tags, or within attributes, or possibly even within the block meta-data in a custom field that's later inserted into / used by a dynamic block. I could see using an ASCII control character (there's a literal Start/End of text pair - STX/ETX) as a viable flag to use to wrap textual pieces though.
Handbooks were the original target for the code I started working on for this, and then Developer Documentation (ie. the Code Reference), then w.org/gutenberg/ and now this.. I've started and stopped writing this code so many times it's not funny :D Realistically though, Handbooks probably require a little bit more effort than what I'm currently targeting (Homepage/Downloads) as a Document needs more context into the surrounding text for good translations, where as with this it's mostly short strings that are separate from one another. |
This! I could see us ending up with some sort of XPath definition file per Gutenberg post that acts like an array of C pointers to the translatable strings, although the text inside meta-data (very valid and real point, @dd32) might need some extra work. The tricky part is not only that the text can be inside HTML tags but that potentially the translatable strings then could also contain HTML, so it's not like you could say, let's just translate all DOM textnodes. |
…rtcode as a string.
…her than running it within a cron task that will never fire on a sandbox
…block templates).
…innerblocks with the strings in a bit
This includes the updates from WordPress/wordpress.org#90, a new AttributeParser for attribute-only blocks, and a fix for the ListItem block to allow child lists. Fixes #211
This includes the updates from WordPress/wordpress.org#90, a new AttributeParser for attribute-only blocks, and a fix for the ListItem block to allow child lists. Fixes #211
…nifest (#247) * Update parsers This includes the updates from WordPress/wordpress.org#90, a new AttributeParser for attribute-only blocks, and a fix for the ListItem block to allow child lists. Fixes #211 * Add phpunit test infrastructure and tests Pulled from #181 * Remove swag page from manifest * Update content with new parser * Add a phpunit workflow
…nifest (#247) * Update parsers This includes the updates from WordPress/wordpress.org#90, a new AttributeParser for attribute-only blocks, and a fix for the ListItem block to allow child lists. Fixes #211 * Add phpunit test infrastructure and tests Pulled from #181 * Remove swag page from manifest * Update content with new parser * Add a phpunit workflow
|
Closing in favour of #602 |
This PR is a combination of the work done to translate Patterns, WordPress.org/gutenberg/, and now https://github.com/WordPress/wporg-main-2022
The block parsing classes in this PR differ from that which is used for Patterns (See https://github.com/WordPress/pattern-directory/tree/trunk/public_html/wp-content/plugins/pattern-translations/includes) as in testing I found issue with it and started writing my own parsers that seemed to work more reliably for the purpose of post translation.
The strings for the wporg-main-2022 theme are currently in this project:
https://translate.wordpress.org/projects/disabled/posttranslation/wordpress-org-main-test
(The cron job is not run upon post update, it must be run manually through CLI)
See WordPress/wporg-main-2022#15
A list of TODOs from code and review:
the_contentfilters. and<br>tags, it might be better to standardise some of these prior to inserting into GlotPress, for example, replacing<br>with a literal\n, although that will make retrieving them harder.<a href="https://...../">and<span class="has-many-attributes">with<a>and<span>respectfully in translations, filling them back in on retrieval. This is likely a v2/v3 feature though.. but ties in to the above reference to<br>in the content.