Skip to content

strakova/gec_tool_comparison

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Comparison of the Czech Off-the-Shelf GEC Tools

This repository contains a rigorous evaluation of the available Czech off-the-shelf grammar error correction (GEC) tools on a part of the test data of the GECCC corpus.

The quantitative findings will be published at TSD 2025 as Refining Czech GEC: Insights from a Multi-Experiment Approach (Pechman et al., 2025).

System NF NWI R SL All Domains
Opravidlo 32.95 45.97 31.51 22.13 32.76
Korektor 36.90 24.66 48.86 54.66 44.71
GoogleDocs 39.56 29.03 52.23 47.13 45.45
MSWord 52.25 46.20 51.63 55.22 51.54
DeepSeek R1 70B zero-shot (see disclaimer) 36.06 52.34 58.46 58.11 53.58
GPT4o zero-shot (see disclaimer) 59.06 78.88 77.16 75.64 74.60
Naplava2022_synthetic 45.92 38.14 51.14 61.79 51.81
Naplava2022_ag_finetuned 66.45 55.02 74.39 71.81 69.82
Naplava2022_geccc_finetuned 73.15 70.95 77.17 74.64 74.68
Ours 70.82 82.15 77.45 75.38 77.34

How to Reproduce the Results

  1. Clone this repository:
git clone https://github.com/strakova/gec_tool_comparison
  1. Download the GECCC corpus into the GECCC directory, and unzip it:
mkdir GECCC
cd GECCC
curl --remote-name-all https://lindat.mff.cuni.cz/repository/xmlui/bitstream/handle/11234/1-4861{/geccc.zip}
unzip geccc.zip
  1. Install dependencies:
python3 -m venv venv
venv/bin/pip install -r requirements.txt
  1. Select the test sentences from GECCC for evaluation. Current default values of the script will select 10.36% of the test sentences, and you should get exactly the same stats as in the file stats.txt:
venv/bin/python ./select_sentences_for_evaluation.py
  1. Upload/open the documents in the GEC tools of your choice, accept all the suggested GEC corrections, and save the results into GECCC_corrections. We used the following:

    • Opravidlo Betaverze, accessed 2024-11-14, postprocessed with postprocess_googledocs_and_opravidlo.sh,
    • Korektor, accessed 2024-11-19, you can reproduce the results by running korektor.sh,
    • Google Docs, accessed 2024-11-20, postprocessed with postprocess_googledocs_and_opravidlo.sh,
    • MSWord, accessed 2025-01-31, using the final_vba.txt macro to go through data,
    • open-source large language model (LLM) Deep Seek R1 70B, prompted in zero-shot setting (see disclaimer below), see deepseek.py,
    • large language model (LLM) GPT4o, prompted in zero-shot setting (see disclaimer below), accessed 2025-05-02,
    • to get predictions by Náplava et al. (2022), run script select_predictions.py --predictions=Naplava2022. The script will copy the predictions corresponding to the selected test sentences from Naplava2022 to GECCC_corrections/Naplava2022_*,
    • to get predictions by our system, run script select_predictions.py --predictions=Ours. The script will copy the predictions corresponding to the selected test sentences from Ours to GECCC_corrections/Ours.

Data exposure disclaimer: Since the Czech GEC training, development, and even test data have been freely available online since 2019 (AKCES-GEC) and 2022 (GECCC), and the training corpora of large language models (LLMs) are typically undisclosed, it is impossible to determine whether the evaluation setting is genuinely zero-shot, that is, to what extent the GECCC data may have been seen during pretraining. More concerningly, the test data itself may have been included in the LLMs’ training sets.

  1. Evaluate the system corrections with the m2scorer. The evaluations will be printed to *.eval files in the directory GECCC_evals:
./evaluate_corrections.sh
  1. Generate LaTeX table rows from the evaluations in GECCC_evals:
./make_table.py

Contact

Jana Straková [email protected]

About

Evaluation of the Czech off-the-shelf GEC tools.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published