On Fri, Oct 03, 2014 at 04:54:55PM -0500, Ben Myers wrote:
> From: Olaf Weber <olaf@xxxxxxx>
>
> Remove the Hangul decompositions from the utf8data trie, and do
> algorithmic decomposition to calculate them on the fly. To store
> the decomposition the caller of utf8lookup()/utf8nlookup() must
> provide a 12-byte buffer, which is used to synthesize a leaf with
> the decomposition. Trie size is reduced from 245kB to 90kB.
>
> This change also contains a number of robustness fixes to the
> trie generator mkutf8data.c.
Please separate out the robustness fixes or merge them back into the
original patch. e.g. Bulk renaming of code like this:
> static int
> -utf8key(unsigned int key, char keyval[])
> -{
> - int keylen;
> -
> - if (key < 0x80) {
> - keyval[0] = key;
> - keylen = 1;
> - } else if (key < 0x800) {
> - keyval[1] = key & UTF8_V_MASK;
> - keyval[1] |= UTF8_N_BITS;
> - key >>= UTF8_V_SHIFT;
....
> +utf8encode(char *str, unsigned int val)
> +{
> + int len;
> +
> + if (val < 0x80) {
> + str[0] = val;
> + len = 1;
> + } else if (val < 0x800) {
> + str[1] = val & UTF8_V_MASK;
> + str[1] |= UTF8_N_BITS;
> + val >>= UTF8_V_SHIFT;
Doesn't belong in a patch that introduces special hangul character
handling....
Cheers,
Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
|