Skip to content

comanchegenerate/ComancheNLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ComancheNLP

Advancing Uto-Aztecan Language Technologies: A Case Study on the Endangered Comanche Language

Authors: Jesus Alvarez C, Daua Karajeanes, Ashley Prado, John Ruttan, Ivory Yang, Sean O’Brien, Vasu Sharma, Kevin Zhu

Explore how we accelerate Comanche NLP by combining synthetic text pipelines and language ID to overcome data scarcity in endangered languages.

🔗 Read the full paper (AmericasNLP 2025)


🚀 Clone the Repo

git clone https://github.com/comanchegenerate/ComancheSynthetic.git
cd ComancheSynthetic

📂 What’s Inside

  • Datasets/: 412 phrase Comanche-English corpus, the first for this language.
  • comanche_synthetic_generation.py: Generate validated synthetic Comanche text via GPT-4 few-shot prompting.
  • language_identification.ipynb: Language identification experimentation showing effectiveness of few-shot examples on increasing accuracy.

🤝 Contributing

Feedback and pull requests welcome!

About

The first computational modelling for the critically endangered language Comanche

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published