This repository contains materials from a 5-hour WFST tutorial I delivered to LU Lab interns on 13 July 2025. The focus is on leveraging the power of Weighted Finite State Transducers (WFSTs) for core NLP tasks including word segmentation, POS tagging, and machine translation. Whether you’re new to WFSTs or looking for practical implementations, these tutorials provide a solid foundation for understanding their applications in NLP.
- Finite State Machines for NLP, UTYCC NLP Class, 2019, Ye Kyaw Thu, https://github.com/ye-kyaw-thu/wfst_nlp_tutorials/blob/main/slide/11-fsm4nlp.pdf
- Invited Lecture@University of Taxila (30 July 2025), https://github.com/ye-kyaw-thu/wfst_nlp_tutorials/blob/main/slide/WFST_Talk_at_Taxila_Univ_Myanmar.pdf
1.WFST_Word_Segmentation_Small_Corpus.ipynb
2.WFST_POS_Tagging_Small_Corpus.ipynb
3.WFST_MT_Small_Corpus.ipynb
4.WFST_Word_Segmentation.ipynb
5.WFST_POS_Tagging.ipynb
6.WFST_MT.ipynb
The Bash shell scripts, Python code, and Jupyter notebooks in this WFST-NLP tutorial repository are licensed under the MIT License.
However, the datasets used in these tutorials follow their original licenses:
- Word Segmentation & POS Tagging Tutorial: Uses myPOS (Version 3.0), licensed as per its original source.
- Machine Translation Tutorial: Uses the Myanmar-Rakhine Parallel Corpus (full version not yet publicly released).
- For research purposes, I’ve included aligned phrase pairs (Myanmar-Rakhine) generated using the
anymalignalignment toolkit.
- For research purposes, I’ve included aligned phrase pairs (Myanmar-Rakhine) generated using the
If you have used Jupyter Notebooks for teaching or R&D, please cite them as follows:
(wfst_nlp_tutorials ကို သုံးဖြစ်ကြရင် အောက်ပါ citation လုပ်ပေးပါ။ ကျေးဇူးပါ။)
@misc{wfst_nlp_tutorials_2025,
author = {Ye Kyaw Thu},
title = {wfst_nlp_tutorials},
month = {7},
year = {2025},
url = {https://github.com/ye-kyaw-thu/wfst_nlp_tutorials},
note = {Accessed Date: yyyy-mm-dd},
institution = {LU Lab., Myanmar}
}