[PATCH 04/16] lib/utf8norm.c: reduce the size of utf8data[]

Dave Chinner david at fromorbit.com
Sun Oct 5 16:52:44 CDT 2014


On Fri, Oct 03, 2014 at 04:54:55PM -0500, Ben Myers wrote:
> From: Olaf Weber <olaf at sgi.com>
> 
> Remove the Hangul decompositions from the utf8data trie, and do
> algorithmic decomposition to calculate them on the fly. To store
> the decomposition the caller of utf8lookup()/utf8nlookup() must
> provide a 12-byte buffer, which is used to synthesize a leaf with
> the decomposition. Trie size is reduced from 245kB to 90kB.
> 
> This change also contains a number of robustness fixes to the
> trie generator mkutf8data.c.

Please separate out the robustness fixes or merge them back into the
original patch. e.g. Bulk renaming of code like this:


>  static int
> -utf8key(unsigned int key, char keyval[])
> -{
> -	int keylen;
> -
> -	if (key < 0x80) {
> -		keyval[0] = key;
> -		keylen = 1;
> -	} else if (key < 0x800) {
> -		keyval[1] = key & UTF8_V_MASK;
> -		keyval[1] |= UTF8_N_BITS;
> -		key >>= UTF8_V_SHIFT;
....
> +utf8encode(char *str, unsigned int val)
> +{
> +	int len;
> +
> +	if (val < 0x80) {
> +		str[0] = val;
> +		len = 1;
> +	} else if (val < 0x800) {
> +		str[1] = val & UTF8_V_MASK;
> +		str[1] |= UTF8_N_BITS;
> +		val >>= UTF8_V_SHIFT;

Doesn't belong in a patch that introduces special hangul character
handling....

Cheers,

Dave.
-- 
Dave Chinner
david at fromorbit.com



More information about the xfs mailing list