-
Notifications
You must be signed in to change notification settings - Fork 33
String compression
This page is not up to date.
Let's say I want to store the strings ["Some", "Random", "Strings"]. This allows for 3 methods of compression:
The strings are concatenated in strings without a separator character, then the lengths of each string are stored in a separate array.
lengths = [4, 6, 7]
strings = ["SomeRandomStrings"]
The decompressor then iterates on the lengths array, does some string slices on strings[0], and moves to strings[1] if the sum of all decompressed strings so far exceeds 128 characters.
This will most of the time be the least effective method, and method 3 should be considered.
The strings are concatenated in strings, this time using a separator character. Here it is ";", but you can use any unicode character, even ones that are not assigned (as long as they are valid).
strings = ["Some;Random;Strings"]
The decompressor then iterates through strings[0], adds it to the decompressed string, then moves to the next one when it encounters the separator character.
This method is the most efficient space-wise, however it takes a lot of time compared to method 3, as it must iterate on every character.
Here, the length of the longest string is 7. We can add zero-width spaces to the strings so that they all are 7 characters long, then decode like in method 1 but with a constant length.
length = 7
strings = ["Some\z\z\zRandom\z;Strings"]
There may be however very few strings that are very long (for example, one string of 100 characters, while all the others are under 10 characters). A way to treat this would be to split the long string into several 10 character long strings, then concatenate it.
There may also be problems with the string byte limit, which is 511 characters. You can use \u0001 as a zero-width space that takes two less bytes if that is a problem.
This method will likely be the best of all 3 for compressing plaintext strings.
(Old wiki. Not up to date. See Readme instead)