xfs
[Top] [All Lists]

Re: [PATCH 07/10] xfs: add trie generator and supporting code for UTF-8.

To: Ben Myers <bpm@xxxxxxx>
Subject: Re: [PATCH 07/10] xfs: add trie generator and supporting code for UTF-8.
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Tue, 23 Sep 2014 06:57:14 +1000
Cc: linux-fsdevel@xxxxxxxxxxxxxxx, tinguely@xxxxxxx, olaf@xxxxxxx, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20140918201518.GJ4482@xxxxxxx>
References: <20140918195650.GI19952@xxxxxxx> <20140918201518.GJ4482@xxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Thu, Sep 18, 2014 at 03:15:19PM -0500, Ben Myers wrote:
> From: Olaf Weber <olaf@xxxxxxx>
> 
> mkutf8data.c is the source for a program that generates utf8data.h, which
> contains the trie that utf8norm.c uses. The trie is generated from the
> Unicode 7.0.0 data files. The format of the utf8data[] table is described
> in utf8norm.c.
> 
> Supporting functions for UTF-8 normalization are in utf8norm.c with the
> header utf8norm.h. Two normalization forms are supported: nfkdi and nfkdicf.
> 
>   nfkdi:
>    - Apply unicode normalization form NFKD.
>    - Remove any Default_Ignorable_Code_Point.
> 
>   nfkdicf:
>    - Apply unicode normalization form NFKD.
>    - Remove any Default_Ignorable_Code_Point.
>    - Apply a full casefold (C + F).
> 
> For the purposes of the code, a string is valid UTF-8 if:
> 
>  - The values encoded are 0x1..0x10FFFF.
>  - The surrogate codepoints 0xD800..0xDFFFF are not encoded.
>  - The shortest possible encoding is used for all values.
> 
> The supporting functions work on null-terminated strings (utf8 prefix) and
> on length-limited strings (utf8n prefix).
> 
> Signed-off-by: Olaf Weber <olaf@xxxxxxx>
> 
> ---
> [v2: the trie is now separated into utf8norm.ko;
>      utf8version is now a function and exported;
>      introduced CONFIG_XFS_UTF8. -bpm]
> ---
>  fs/xfs/Kconfig               |    8 +
>  fs/xfs/Makefile              |    2 +-
>  fs/xfs/utf8norm/Makefile     |   37 +
>  fs/xfs/utf8norm/mkutf8data.c | 3239 
> ++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/utf8norm/utf8norm.c   |  649 +++++++++
>  fs/xfs/utf8norm/utf8norm.h   |  116 ++

Again, nothing XFS specific here. It's being built as a separate
module and the only thing that XFS uses are exported functions, so
it really should be generic library code....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>