Comparison of the Czech Off-the-Shelf GEC Tools

This repository contains a rigorous evaluation of the available Czech off-the-shelf grammar error correction (GEC) tools on a part of the test data of the GECCC corpus.

The quantitative findings will be published at TSD 2025 as Refining Czech GEC: Insights from a Multi-Experiment Approach (Pechman et al., 2025).

System	NF	NWI	R	SL	All Domains
Opravidlo	32.95	45.97	31.51	22.13	32.76
Korektor	36.90	24.66	48.86	54.66	44.71
GoogleDocs	39.56	29.03	52.23	47.13	45.45
MSWord	52.25	46.20	51.63	55.22	51.54
DeepSeek R1 70B zero-shot (see disclaimer)	36.06	52.34	58.46	58.11	53.58
GPT4o zero-shot (see disclaimer)	59.06	78.88	77.16	75.64	74.60
Naplava2022_synthetic	45.92	38.14	51.14	61.79	51.81
Naplava2022_ag_finetuned	66.45	55.02	74.39	71.81	69.82
Naplava2022_geccc_finetuned	73.15	70.95	77.17	74.64	74.68
Ours	70.82	82.15	77.45	75.38	77.34

How to Reproduce the Results

Clone this repository:

git clone https://github.com/strakova/gec_tool_comparison

Download the GECCC corpus into the GECCC directory, and unzip it:

mkdir GECCC
cd GECCC
curl --remote-name-all https://lindat.mff.cuni.cz/repository/xmlui/bitstream/handle/11234/1-4861{/geccc.zip}
unzip geccc.zip

Install dependencies:

python3 -m venv venv
venv/bin/pip install -r requirements.txt

Select the test sentences from GECCC for evaluation. Current default values of the script will select 10.36% of the test sentences, and you should get exactly the same stats as in the file stats.txt:

venv/bin/python ./select_sentences_for_evaluation.py

Upload/open the documents in the GEC tools of your choice, accept all the suggested GEC corrections, and save the results into GECCC_corrections. We used the following:
- Opravidlo Betaverze, accessed 2024-11-14, postprocessed with postprocess_googledocs_and_opravidlo.sh,
- Korektor, accessed 2024-11-19, you can reproduce the results by running korektor.sh,
- Google Docs, accessed 2024-11-20, postprocessed with postprocess_googledocs_and_opravidlo.sh,
- MSWord, accessed 2025-01-31, using the final_vba.txt macro to go through data,
- open-source large language model (LLM) Deep Seek R1 70B, prompted in zero-shot setting (see disclaimer below), see deepseek.py,
- large language model (LLM) GPT4o, prompted in zero-shot setting (see disclaimer below), accessed 2025-05-02,
- to get predictions by Náplava et al. (2022), run script select_predictions.py --predictions=Naplava2022. The script will copy the predictions corresponding to the selected test sentences from Naplava2022 to GECCC_corrections/Naplava2022_*,
- to get predictions by our system, run script select_predictions.py --predictions=Ours. The script will copy the predictions corresponding to the selected test sentences from Ours to GECCC_corrections/Ours.

Data exposure disclaimer: Since the Czech GEC training, development, and even test data have been freely available online since 2019 (AKCES-GEC) and 2022 (GECCC), and the training corpora of large language models (LLMs) are typically undisclosed, it is impossible to determine whether the evaluation setting is genuinely zero-shot, that is, to what extent the GECCC data may have been seen during pretraining. More concerningly, the test data itself may have been included in the LLMs’ training sets.

Evaluate the system corrections with the m2scorer. The evaluations will be printed to *.eval files in the directory GECCC_evals:

./evaluate_corrections.sh

Generate LaTeX table rows from the evaluations in GECCC_evals:

./make_table.py

Contact

Jana Straková [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparison of the Czech Off-the-Shelf GEC Tools

How to Reproduce the Results

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
GECCC_corrections		GECCC_corrections
GECCC_corrections_merged_docs_obsolete		GECCC_corrections_merged_docs_obsolete
Naplava2022		Naplava2022
udpipe_tokenizer		udpipe_tokenizer
LICENSE		LICENSE
README.md		README.md
deepseek.py		deepseek.py
evaluate_corrections.sh		evaluate_corrections.sh
evaluation_sanity_check.sh		evaluation_sanity_check.sh
final_vba.txt		final_vba.txt
korektor.sh		korektor.sh
make_table.py		make_table.py
postprocess_googledocs_and_opravidlo.sh		postprocess_googledocs_and_opravidlo.sh
requirements.txt		requirements.txt
select_predictions.py		select_predictions.py
select_sentences_for_evaluation.py		select_sentences_for_evaluation.py
sentences_test_indices.txt		sentences_test_indices.txt
stats.txt		stats.txt
udpipe_tokenizer.py		udpipe_tokenizer.py

License

strakova/gec_tool_comparison

Folders and files

Latest commit

History

Repository files navigation

Comparison of the Czech Off-the-Shelf GEC Tools

How to Reproduce the Results

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages