UnicodeDecodeError

Hi Allen,

https://de.dariah.eu/tatom/preprocessing.html#every-1-000-words

def split_text(filename, n_words):
   ....:      """Split a text into chunks approximately `n_words` words in length."""
   ....:      input = open(filename, 'r')
   ....:      words = input.read().split(' ')
   ....:      input.close()

At the place of "input = open(filname, 'r')".

I don't konw if we use "input = open(filname, 'r', encoding = 'UTF-8')" would be better. 

Otherwise you may get error message: "UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 10: character maps to <undefined>".


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecodeError #11

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

UnicodeDecodeError #11

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions