Skip to content

[el] add form_of parsing for inflections (nouns, adjectives)#1125

Merged
kristian-clausal merged 4 commits intotatuylonen:masterfrom
daxida:master
Apr 16, 2025
Merged

[el] add form_of parsing for inflections (nouns, adjectives)#1125
kristian-clausal merged 4 commits intotatuylonen:masterfrom
daxida:master

Conversation

@daxida
Copy link
Contributor

@daxida daxida commented Apr 15, 2025

Closes #1122

  • Adds form_of and raw_tags for [gender of] + πτώσ(η|εις) templates.
    This should cover most nouns and adjectives.
  • Test it.
  • Support verbs.
  • Localize to English and populate tags instead of raw_tags

Notes:

  • Glosses are unchanged based on this [structure]
    (Greek edition redirects #1122 (comment)).
  • Glosses are not tested in order to avoid expanding templates. I did verify locally that indeed they were unaffected by the changes.
  • Some raw_tags (cf. θηλυκό) are parsed from raw_tags into tags at some other points of the code. I didn't add any extra logic myself.
  • I didn't see how to apply @xxyzz comment. Everything is done at parse_gloss_inflections.

Let me know what you think.

@kristian-clausal
Copy link
Collaborator

I will be on vacation for two weeks starting next week, so I will not be looking at these then, and we've got some other things going on during this shorter week so things will have to wait until then. Please don't do too much work during that time, I suspect much of it is not going to be appropriate / idiomatic and would need to be changed a lot. Checking and working these PR requests would be easier to do it in small portions where I can massage it together (if possible) with what I'm currently working at, which overlaps with this.

The most useful bits from this (at a quick glance) is the translation of how template names are structured. If you can give us data on that and other things, like tag translations, that would be useful. I'll comment on this, but I'm not promising to integrate it.

raw = """* {{θηλ_του-πτώσειςΟΑΚεν|μικρός}}"""
expected = [
{
"tags": ["θηλυκό"], # This is parsed somewhere else I guess
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not be in tags, tags should only be valid, English tag-data. Might be my fault.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recon this is weird but it's not something I implemented. I just add the θηλυκό (feminine) tag to raw_tags and I imagine that somewhere else it is promoted to tag.

I had to do this to pass the test, but for instance the ουδέτερο (neuter) tag that appears in another test does not receive the same treatment.

@kristian-clausal
Copy link
Collaborator

Ok, this actually looked good. The return value stuff isn't vital here, looking at other sense entries we still have to parse for glosses so just inserting the FormOf data straight into the sense should be fine.

@daxida
Copy link
Contributor Author

daxida commented Apr 15, 2025

The problem with tags instead of raw_tags is that they are deleted with the current code. One would need to change this

def convert_tags_in_sense(sense: Sense) -> None:
    tags, raw_tags, poses = convert_tags(sense.raw_tags)
    # sense.tags = tags
    sense.tags.extend(tags) # <-
    sense.raw_tags = raw_tags
    sense.tags.extend(poses)

and I am not entirely sure of the side effects.


Ended up having to modify that line as described.

@daxida
Copy link
Contributor Author

daxida commented Apr 15, 2025

Addressed most of your suggestions. Let me know if you still want me to change the names of the functions/tests.

I will put this as ready because I won't be implementing more logic on my side other than the suggestions that you may still have.

@daxida daxida marked this pull request as ready for review April 15, 2025 09:32
@daxida
Copy link
Contributor Author

daxida commented Apr 15, 2025

Note that I'm not entirely convinced about having to translate tags to English since this is supposed to be a monolingual el-el dictionary but oh well.

@kristian-clausal
Copy link
Collaborator

Note that I'm not entirely convinced about having to translate tags to English since this is supposed to be a monolingual el-el dictionary but oh well.

The Wiktextract project is for processing different Wiktionary editions and producing output with equivalent tags. This means translating tags into English.

@kristian-clausal
Copy link
Collaborator

I'll take a look at tomorrow. 👍

@kristian-clausal kristian-clausal merged commit 8090f34 into tatuylonen:master Apr 16, 2025
5 checks passed
@kristian-clausal
Copy link
Collaborator

This works well enough for now, and I think we can merge it and handle any issue that come up later.

Something more useful than raw .largs should be implemented, for example.

@kristian-clausal
Copy link
Collaborator

Wait, what the hell, there was already template_parameters()... Why haven't I noticed this before??? Why have I been messing around with .largs all this time? I must have forgotten this exists, when xxyzz implemented it!

@daxida
Copy link
Contributor Author

daxida commented Apr 16, 2025

Ah, I didn't know about that. I saw your PR and I can understand the problem it solves. I will try to remember it for the future.

Thank you for the reviews and feedback, and have a nice vacation!

@xxyzz
Copy link
Collaborator

xxyzz commented Apr 17, 2025

there was already template_parameters()... Why haven't I noticed this before???

Err, you made some changes to that class method two months ago... I guess you still influenced by en edition code?

@kristian-clausal
Copy link
Collaborator

there was already template_parameters()... Why haven't I noticed this before???

Err, you made some changes to that class method two months ago... I guess you still influenced by en edition code?

I already said I had forgotten about it, you don't have to mention it again.

@xxyzz
Copy link
Collaborator

xxyzz commented Apr 17, 2025

Sorry, just feel a bit surprising. I think I also forget many things...

@kristian-clausal
Copy link
Collaborator

Sorry about being snippy, I got up on the wrong side of the bed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Greek edition redirects

3 participants