-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Description
Hi Allen,
https://de.dariah.eu/tatom/preprocessing.html#every-1-000-words
def split_text(filename, n_words):
....: """Split a text into chunks approximately n_words words in length."""
....: input = open(filename, 'r')
....: words = input.read().split(' ')
....: input.close()
At the place of "input = open(filname, 'r')".
I don't konw if we use "input = open(filname, 'r', encoding = 'UTF-8')" would be better.
Otherwise you may get error message: "UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 10: character maps to ".
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels