This repository provides a Python script for proofreading .docx documents using OpenAI models.
The script checks for spelling, grammar, and agreement errors in any language, splits large texts into manageable chunks, and saves a detailed error report in plain text format.
- Proofreads
.docxfiles for spelling, typo, grammar, and agreement errors using advanced OpenAI models. - Handles large documents by automatically splitting them into model-sized chunks.
- Error findings are clearly listed and explained in the output.
- Model selection is easy: choose your preferred model with a command-line flag.
- Each run produces a report with the model name, date, and source file included in the header and filename.
- Robust API error handling: Retries API calls on rate limits and network issues.
- Easy to use: Just specify your
.docxinput file, and receive a full report as a.txtfile.
-
Python 3.7 or higher
-
Required libraries:
openaipython-docxtiktoken
-
A valid OpenAI API key.
To install the requirements, run:
pip install openai python-docx tiktokenCreate a config.json file in the same directory as the script and add your OpenAI API key:
{
"OPENAI_API_KEY": "your-api-key-here"
}-
Place your input
.docxfile (e.g.,input.docx) in the script directory. -
Run the script from the command line. Basic usage:
python proofreader.py
Specify a different model or input/output file if you want:
python proofreader.py --model o3 --input myfile.docx
-
The script will create an output file with a name like:
proofreading_report_gpt-4o_20250702_1334.txtcontaining a detailed error report, the model name, the date, and the name of the original file.
- The report is saved as a plain text
.txtfile. - The file header shows the date, model name, and source file.
- Each section corresponds to a chunk of your document, with errors listed and explained.
- If no errors are found, the report will state so.
- To change which model is used by default, edit the
DEFAULT_MODELvariable at the top of the script. - To adjust chunk size or API retry behavior, modify the
CHUNK_TOKENSorRETRIESvariables. - You can further refine the proofreading prompt for other languages or criteria in the
SYSTEM_PROMPTvariable in the script.
python proofreader.py --model o3 --input document_to_check.docx --output my_report.txt