-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
I built a test that looks like this:
def test_umlauts(self):
dictionary_path = os.path.join(self.fortests_path, "umlaut_dict.txt")
edit_distance_max = 1
prefix_length = 5
sym_spell = SymSpell(edit_distance_max, prefix_length)
sym_spell.load_dictionary(dictionary_path, 0, 1)
result = sym_spell.lookup("dämen", Verbosity.TOP, 2)
self.assertEqual(1, len(result))
self.assertEqual("damen", result[0].term)
With a dictionary that contains only this line: damen 1
However this test fails with edit_distance_max = 1 and passes with edit_distance_max = 2 even though there is only 1 character changed from dämen to damen
It seems like there is a bug so that umlauts like 'ä' are being interpreted as 'ae' or something like that?
If anyone has an idea where to look I'd gladly try to fix it but I haven't found anything yet.
Metadata
Metadata
Assignees
Labels
No labels