CS433_Project_2_ChessReader

This repository contains the work we did for the CS-433 Project on handwritten chess-notation recognition, under the supervision of the NLP research group at Zurich University of Applied Sciences (ZHAW).

Since little annotated data was available, we first rebuilt and extended a synthetic data generation pipeline based on earlier student work. The idea was to produce a large and diverse dataset of handwritten chess moves by combining thousands of real moves with hundreds of handwriting-style fonts and a range of image transformations.

On top of this dataset, we trained and compared two models: a CNN-BiLSTM architecture adapted for handwriting recognition, and a ViT-based approach adjusted for variable-length chess notation.

More information in the report: CS433_Project2_Report.pdf

The repo includes the full data-generation code, the training and inference scripts for both models, and the configuration files needed to reproduce everything.

Data Generation Pipeline

This guide provides an overview of the data generation pipeline and instructions for using it effectively. All relevant code is located in the src/data_generation_pipeline folder.

Overview

The data generation pipeline is designed to generate data efficiently and save it to a specified folder. The main components of the pipeline include:

Main Script: data_generation.py (entry point for the pipeline)
Utility Scripts: image_setup.py, image_generation.py, image_processing.py (helper scripts for generating data)
Configuration Files: config_setup.json, config_process.json, config_characters.json (modular configurations for customization)
Jupyter Notebook: Data_Generation.ipynb (alternative data generation method for interactive use)

Getting Started

Prerequisites:
- Ensure you have Python installed (version >= 3.x).
- Install the required dependencies using:
```
pip install -r requirements.txt
```
- For the CNN-BiLSTM model, you will need to install the following dependencies:
```
pip install -r requirements_cnn-bilstm.txt
```
- For the ViT model, you will need to install the following dependencies:
Setup:
- Navigate to the src/data_generation_pipeline folder:
```
cd src/data_generation_pipeline
```
- Review and modify the configuration files as needed (see Configuration Files).

Running the Pipeline

To generate data using the main script:

Run the data_generation.py script:
```
python data_generation.py
```
The data will be generated and saved to the output folder specified in the configuration files.

Configuration Files

The pipeline uses the following configuration files for customization:

config_setup.json: Defines initial setup parameters for the data generation process.
config_process.json: Specifies processing settings, such as transformations or filters.
config_characters.json: Contains character-specific configurations for data generation.

To modify the configurations:

Open the respective JSON file in a text editor or IDE.
Update the values according to your requirements.
Save the changes before running the pipeline.

Alternative Method

If you prefer an interactive approach, use the provided Jupyter notebook:

Open Data_Generation.ipynb in Jupyter Notebook or Jupyter Lab.
```
jupyter notebook Data_Generation.ipynb
```
Follow the step-by-step instructions in the notebook to generate data.
The notebook provides a quick and flexible way to experiment with the pipeline.

CNN-BiLSTM Training and Inference

Training the model

You can configure some of the model parameters in the src/models/cnn-bilstm/configs.py file, including the dataset_path which should point to the location of the generated dataset.

To run the training script, make sure you are in the project root directory and run the following command:

python src/models/cnn-bilstm/train_torch.py

The model will be saved in the models/cnn-bilstm/ directory, in a folder named with the current timestamp. To use the model for inference you should put that model folder name as the MODEL_NAME variable in the src/models/cnn-bilstm/inferenceModel.py script.

Inference with the model

To run the inference script, make sure you are in the project root directory and run the following command:

python src/models/cnn-bilstm/inferenceModel.py

The training and inference scripts store results using wandb, if you do not have an account set up with wandb, you can comment out the wandb related code in the scripts.

ViT Training and Inference

Training/Test/Validation the model

Install environment_vit.yml using conda
Download pretrained .pth model and put into src/models/VIT/saved: https://drive.google.com/file/d/1Wp2YWZF2wsnmVqITLQI8F9tlhly8jEvn/view?usp=sharing
Use jupyter notebook: src/models/VIT/main_pretrained.ipynb

Note: run the notebook from the src/models/VIT directory (or add it to your PYTHONPATH) so the internal utils imports resolve correctly.

├── environment_vit.yml
├── models
│   └── cnn-bilstm
│       └── best
│           ├── best_model.onnx
│           ├── best_model.pt
│           ├── configs.yaml
│           ├── error_analysis
│           │   ├── error_analysis.csv
│           │   └── preds_and_labels_padded.csv
│           ├── logs
│           │   ├── test
│           │   │   └── events.out.tfevents.1733556571.i01.796664.1
│           │   └── train
│           │       └── events.out.tfevents.1733556571.i01.796664.0
│           ├── model.pt
│           ├── train.csv
│           └── val.csv
├── README.md
├── requirements_cnn-bilstm.txt
├── requirements.txt
├── src
│   ├── data_generation_pipeline
│   │   ├── all_moves_proba.txt
│   │   ├── config_characters.json
│   │   ├── config_process.json
│   │   ├── config_setup.json
│   │   ├── Data_Generation.ipynb
│   │   ├── data_generation.py
│   │   ├── fonts
│   │   │   ├── aAccountantSignature.ttf
│   │   │   ├── Yellowtail-Regular.ttf
│   │   │   └── ylee Mortal Heart, Immortal Memory v.1.11 (TTF).ttf
│   │   ├── image_generation.py
│   │   ├── image_processing.py
│   │   ├── image_setup.py
│   │   ├── Pipeline_Test.ipynb
│   ├── models
│   │   ├── cnn-bilstm
│   │   │   ├── configs.py
│   │   │   ├── ctc_beam_search_decoder.py
│   │   │   ├── error_analysis.py
│   │   │   ├── inferenceModel.py
│   │   │   ├── model_architecture.txt
│   │   │   ├── model.py
│   │   │   ├── train_job.run
│   │   │   └── train_torch.py
│   │   └── VIT
│   │       ├── architecture
│   │       │   ├── main_scratch.ipynb
│   │       │   ├── __pycache__
│   │       │   │   └── transformer.cpython-310.pyc
│   │       │   └── transformer.py
│   │       ├── logs
│   │       │   └── output_2024-12-19_0804.csv
│   │       ├── main_pretrained.ipynb
│   │       ├── saved
│   │       ├── utils
│   │       │   ├── config.py
│   │       │   ├── datasets.py
│   │       │   ├── model.py
│   │       │   ├── plotting.py
│   │       │   ├── __pycache__
│   │       │   │   ├── config.cpython-310.pyc
│   │       │   │   ├── datasets.cpython-310.pyc
│   │       │   │   ├── model.cpython-310.pyc
│   │       │   │   ├── plotting.cpython-310.pyc
│   │       │   │   └── tokeniser.cpython-310.pyc
│   │       │   └── tokeniser.py
│   │       └── visualisation.ipynb
│   └── utils
│       └── tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CS433_Project_2_ChessReader

Data Generation Pipeline

Overview

Getting Started

Running the Pipeline

Configuration Files

Alternative Method

CNN-BiLSTM Training and Inference

Training the model

Inference with the model

ViT Training and Inference

Training/Test/Validation the model

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
models/cnn-bilstm/best		models/cnn-bilstm/best
src		src
CS433_Project2_Report.pdf		CS433_Project2_Report.pdf
README.md		README.md
environment_vit.yml		environment_vit.yml
requirements.txt		requirements.txt
requirements_cnn-bilstm.txt		requirements_cnn-bilstm.txt

vfiszbin/ChessReader

Folders and files

Latest commit

History

Repository files navigation

CS433_Project_2_ChessReader

Data Generation Pipeline

Overview

Getting Started

Running the Pipeline

Configuration Files

Alternative Method

CNN-BiLSTM Training and Inference

Training the model

Inference with the model

ViT Training and Inference

Training/Test/Validation the model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages