cattogether a corpus into a singular file (ex.,copcorp.txt).- Edit
freq.sh's first line to contain the unicode characters that you need (unicode-table.com is good for this). - Run
freq.sh, this will take a while and use a lot of cpu. - Make sure there aren't any errors, since it takes a while
freq.shgenerates a file at each step for error checking. - Run
parser.plon the last output offreq.sh, this will normalize the frequency numbers to get them ready for ASK.
parser.pl taken from this repository licensed with Apache 2.0!