`require-ascii` doesn’t do what it says on the tin

[According to the README](https://github.com/jumanjihouse/pre-commit-hooks#require-ascii):

> ### `require-ascii`
>
> **What it does**
>
> Requires that text files have ascii-encoding, including the
> [extended ascii set](https://theasciicode.com.ar/).
> This is useful to detect files that have unicode characters.

`require-ascii` will fail on files that are encoded in extended ASCII if:

1. the file uses characters in the 128–255 range, and
2. those characters aren’t followed by other characters that coincidentally make the sequence valid UTF-8 (see [this table](https://en.wikipedia.org/wiki/UTF-8#Encoding)).

This script will generate a bunch of files that contain valid extended ASCII but fail when tested by `require-ascii`:

```python
# The README links to <https://theasciicode.com.ar/>. There's many different
# ways you could extend ASCII, but that site in particular says "In 1981,
# IBM developed an extension of 8-bit ASCII code, called 'code page 437'..."
extended_ascii = "cp437"

for code_point in range(128, 256):
	# Create a file that should pass require-ascii, but won't.
	with open(f"{code_point}.cp437.txt", mode='wb') as file:
		file.write(code_point.to_bytes(1, 'little'))
	# Make sure that that file really does contain valid extended ASCII.
	with open(f"{code_point}.cp437.txt", mode='rt', encoding=extended_ascii) as file:
		# This should cause a UnicodeDecodeError if file contains
		# invalid extended ASCII.
		file.read()
```

A more accurate description of `require-ascii` would be:

> ### `require-ascii`
>
> **What it does**
>
> Requires that text files use UTF-8 and only use code points ≤ 255.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`require-ascii` doesn’t do what it says on the tin #104

`require-ascii`

`require-ascii`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

require-ascii doesn’t do what it says on the tin #104

Description

require-ascii

require-ascii

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`require-ascii` doesn’t do what it says on the tin #104

`require-ascii`

`require-ascii`