[Top] [All Lists]

Re: [RFC v2] Unicode/UTF-8 support for XFS

To: Andi Kleen <andi@xxxxxxxxxxxxxx>
Subject: Re: [RFC v2] Unicode/UTF-8 support for XFS
From: Olaf Weber <olaf@xxxxxxx>
Date: Fri, 26 Sep 2014 16:06:22 +0200
Cc: Ben Myers <bpm@xxxxxxx>, <linux-fsdevel@xxxxxxxxxxxxxxx>, <tinguely@xxxxxxx>, <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <5422A5F8.5040703@xxxxxxx>
Organization: SGI
References: <20140918195650.GI19952@xxxxxxx> <87lhpbhfgg.fsf@xxxxxxxxxxxxxxxxxxxx> <20140922184145.GH4482@xxxxxxx> <20140922192958.GJ4120@xxxxxxxxxxxxxxxxxx> <54219C17.3090104@xxxxxxx> <20140923201540.GB15923@xxxxxxxxxxxxxxxxxx> <5422A5F8.5040703@xxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.2
On 24-09-14 13:07, Olaf Weber wrote:
On 23-09-14 22:15, Andi Kleen wrote:

A big part of the table does decompositions for Korean: eliminating
the Hangul decompositions removes 156320 bytes, leaving 89936 bytes.

Are there regular ranges or other redundancies in the Korean encoding
that could be used to compress paths?

Yes, though at the expense of more complicated code and interfaces. in
particular, lookups that want a normalized string would need to provide a
10-byte buffer to store it in.

I spent some time working on this, and the effect on the lookup code isn't as bad as I'd thought. The updated code should be posted early next week.

With this change, the table size for the full trie becomes 89952 bytes. Of this, 66400 bytes are spent on the NFKD + Ignorables, an additional 20992 bytes on NFDK + Ignorables + Case Fold. The remainder, 2560 bytes, are additional info for older unicode versions.

Note that the NFDK + Ignorables + Case Fold trie forwards to the NFKD + Ignorables where they overlap. A stand-alone version would be 71750 bytes.

As noted before these tables also contain the Canonical Combining Class and unicode version information for the code points. The latter allows for supporting multiple unicode versions using a single combined table.


Olaf Weber                 SGI               Phone:  +31(0)30-6696796
                           Veldzigt 2b       Fax:    +31(0)30-6696799
Technical Lead             3454 PW de Meern  Vnet:   955-6796
Storage Software           The Netherlands   Email:  olaf@xxxxxxx

<Prev in Thread] Current Thread [Next in Thread>