-
Notifications
You must be signed in to change notification settings - Fork 66
Description
While DKPro has UIMA types for Cardinal and Ordinal, it seems there are no annotators that can produce them.
So I implemented my own CardOrdAnnotator for English based on the Stanford NLP QuantifiableEntityNormalizer class.
If you are interested, I could roll that into dkpro-core-api-ner-asl, or whatever module you think is appropriate.
I attach the classes and tests that I wrote for that. Note that you won't be able to run them as they use some utilities that I wrote for myself, but it should give you an idea of how they work.
Basically, the annotator uses a class CardOrdParser, which I wrote based on QuantifiableEntityNormalizer. This means that the annotator would have to be GPLed.
Note that at the moment, the parser is only available for English, but it would be probably be relatively easy to implement it for other languages. To do that however, we would have to re-write (or extend) QuantifiableEntityNormalizer because in its current implementation, it uses static variables to store words for cardinals and ordinals (ex: "first", "one", etc...). As a result, you cannot have different instances of QuantifiableEntityNormalizer for different languages. I guess we could rewrite QuantifiableEntityNormalizer altogether (using its code as "inspiration"). Not sure if that would be sufficient to remove the GPL constraint on CardOrdParser.
Let me know if you are interested.