turkish-syllable is a library for syllabification of Turkish text, written in C and accessible using Python connectors. It works quickly and efficiently, produces results that follow Turkish spelling rules, and offers optional inclusion of punctuation.
Important Note: This library is able to separate the syllables of words of Turkish origin according to the rules of the Turkish Language Association (TDK), but it does not provide a definitive solution for words of foreign origin. Although these words are often spelled correctly, incorrect spelling can be encountered due to language structure.
- Turkish Spelling: Works according to the spelling rules specific to the Turkish language (for example, “merhaba” →
['mer', 'ha', 'ba']). - Punctuation Support: Optionally adds punctuation marks and spaces to the syllable list (
with_punctuationparameter). - Fast Performance: C-based algorithm provides fast results even for large texts.
- Platform Compatibility: The library is platform independent as of version 0.2.0.
You can install it via PyPI:
pip install turkish-syllablefrom turkish_syllable import syllabify
# with punctuation
result = syllabify("Merhaba, dünya!") # default value of with_punctuation is True
print(result)
# output: ['Mer', 'ha', 'ba', ',', ' ', 'dün', 'ya', '!']
# without punctuation
result = syllabify("Merhaba, dünya!", with_punctuation=False)
print(result)
# output: ['Mer', 'ha', 'ba', 'dün', 'ya']or directly on the file:
from turkish_syllable.csyllable_tr import process_input_output
input_file = "input.txt"
output_file = "output.txt"
"""
function:
- process_input_output: function that does the spelling on files
parameters:
- input_file: file with the text to be spelled
- output_file: the name of the file where the spelled text will be written
- with_punctuation: indicates whether punctuation and space characters should be included in the spelling process (default=True)
"""
process_input_output(input_file=input_file, output_file=output_file, with_punctuation=True)
with open(output_file, "r", encoding="utf-8") as f:
print("With punctuation:")
print(f.read())
process_input_output(input_file=input_file, output_file=output_file, with_punctuation=False)
with open(output_file, "r", encoding="utf-8") as f:
print("\nWithout punctuation:")
print(f.read())# with punctuation (default)
python3 -m turkish_syllable -i input.txt -o output.txt -p
# or enter the text directly:
python3 -m turkish_syllable -p
# sample input: "Merhaba, dünya!"
# output: Mer ha ba , dün ya !
# without punctuation
python3 -m turkish_syllable -i input.txt -o output.txt --no-punctuation
# or:
python3 -m turkish_syllable --no-punctuation
# sample input: "Merhaba, dünya!"
# output: Mer ha ba dün ya- Language: The algorithm is written in C and linked to Python with ctypes.
- Spelling Algorithm: It follows the natural distinctions between vowels and consonants according to Turkish spelling rules. It is optimized for special cases (for example, words with 3 or 4 letters).
- Dependencies: No extra Python dependencies are required, only standard libraries are used.
- File Structure:
- syllable.c: C source code containing the spelling logic.
- libsyllable.so: Compiled shared library (Linux-many).
- libsyllable.dll: Compiled shared library (Windows).
- libsyllable.dylib: Compiled shared library (MacOS).
- csyllable_en.py: Python linker.
- Python 3.6 or higher
- It can run on all operating systems.
Distributed under this project (MIT).
If you want to contribute:
- Fork the repository: github
- Make your changes and send pull request.
For questions or suggestions: [email protected]
- 0.2.1: Platform independency, README improved
- 0.1.4: README improved
- 0.1.3: README improved and fixing some bugs
- 0.1.2: Fixing some bugs.
- 0.1.1: Added
with_punctuationparameter, shortened function name tosyllabify. - 0.1.0: Initial release.