[el] add form_of parsing for inflections (nouns, adjectives)#1125
[el] add form_of parsing for inflections (nouns, adjectives)#1125kristian-clausal merged 4 commits intotatuylonen:masterfrom
Conversation
|
I will be on vacation for two weeks starting next week, so I will not be looking at these then, and we've got some other things going on during this shorter week so things will have to wait until then. Please don't do too much work during that time, I suspect much of it is not going to be appropriate / idiomatic and would need to be changed a lot. Checking and working these PR requests would be easier to do it in small portions where I can massage it together (if possible) with what I'm currently working at, which overlaps with this. The most useful bits from this (at a quick glance) is the translation of how template names are structured. If you can give us data on that and other things, like tag translations, that would be useful. I'll comment on this, but I'm not promising to integrate it. |
tests/test_el_inflection.py
Outdated
| raw = """* {{θηλ_του-πτώσειςΟΑΚεν|μικρός}}""" | ||
| expected = [ | ||
| { | ||
| "tags": ["θηλυκό"], # This is parsed somewhere else I guess |
There was a problem hiding this comment.
Should not be in tags, tags should only be valid, English tag-data. Might be my fault.
There was a problem hiding this comment.
I recon this is weird but it's not something I implemented. I just add the θηλυκό (feminine) tag to raw_tags and I imagine that somewhere else it is promoted to tag.
I had to do this to pass the test, but for instance the ουδέτερο (neuter) tag that appears in another test does not receive the same treatment.
|
Ok, this actually looked good. The return value stuff isn't vital here, looking at other sense entries we still have to parse for glosses so just inserting the |
|
The problem with tags instead of raw_tags is that they are deleted with the current code. One would need to change this def convert_tags_in_sense(sense: Sense) -> None:
tags, raw_tags, poses = convert_tags(sense.raw_tags)
# sense.tags = tags
sense.tags.extend(tags) # <-
sense.raw_tags = raw_tags
sense.tags.extend(poses)and I am not entirely sure of the side effects. Ended up having to modify that line as described. |
|
Addressed most of your suggestions. Let me know if you still want me to change the names of the functions/tests. I will put this as ready because I won't be implementing more logic on my side other than the suggestions that you may still have. |
|
Note that I'm not entirely convinced about having to translate tags to English since this is supposed to be a monolingual el-el dictionary but oh well. |
The Wiktextract project is for processing different Wiktionary editions and producing output with equivalent tags. This means translating tags into English. |
|
I'll take a look at tomorrow. 👍 |
|
This works well enough for now, and I think we can merge it and handle any issue that come up later. Something more useful than raw .largs should be implemented, for example. |
|
Wait, what the hell, there was already |
|
Ah, I didn't know about that. I saw your PR and I can understand the problem it solves. I will try to remember it for the future. Thank you for the reviews and feedback, and have a nice vacation! |
Err, you made some changes to that class method two months ago... I guess you still influenced by en edition code? |
I already said I had forgotten about it, you don't have to mention it again. |
|
Sorry, just feel a bit surprising. I think I also forget many things... |
|
Sorry about being snippy, I got up on the wrong side of the bed. |
Closes #1122
form_ofandraw_tagsfor[gender of] + πτώσ(η|εις)templates.This should cover most nouns and adjectives.
tagsinstead ofraw_tagsNotes:
(Greek edition redirects #1122 (comment)).
raw_tags(cf. θηλυκό) are parsed fromraw_tagsintotagsat some other points of the code. I didn't add any extra logic myself.parse_gloss_inflections.Let me know what you think.