Skip to content

Conversation

@JeffCarpenter
Copy link

@JeffCarpenter JeffCarpenter commented Nov 24, 2021

Unlike pathlib, BeautifulSoup can guess and handle several text codecs so we let it work its magic

Addresses issue #5

@fernandomora
Copy link

fernandomora commented Apr 21, 2023

Any change to get this merged?
This PR solved my problem reading a non utf-8 input

Traceback (most recent call last):
  File "/opt/homebrew/bin/html2csv", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/html2csv/__main__.py", line 41, in main
    html_doc = path.read_text()
               ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.3/Frameworks/Python.framework/Versions/3.11/lib/python3.11/pathlib.py", line 1059, in read_text
    return f.read()
           ^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 532: invalid continuation byte

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants