xfs
[Top] [All Lists]

Re: Tomorrow

To: Andi Kleen <ak@xxxxxxx>
Subject: Re: Tomorrow
From: Michael Sinz <msinz@xxxxxxxxx>
Date: Tue, 27 May 2003 06:58:08 -0400
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <20030524093103.GA12181@wotan.suse.de>
References: <1053694002.2887.1.camel@localhost.localdomain> <1053697162.21472.51.camel@jen.americas.sgi.com> <20030523134438.GC30288@wotan.suse.de> <20030523150530.A31022@infradead.org> <20030524071709.GK27626@plato.local.lan> <20030524095245.A24074@infradead.org> <20030524091516.GM27626@plato.local.lan> <20030524093103.GA12181@wotan.suse.de>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030507
Andi Kleen wrote:
i wouldn't call them v3 dirs either, that implies its an `upgrade' to
v2, when in fact its a downgrade (non-broken -> broken).  maybe call
them v0 (afaik xfs only has two dir formats v1 and v2). or call it
something entirely different, like broken_dirs ;-)


I would not call them broken, but what is a bit worrying is that it can
be quite complicated to lower case letters. In the American ASCII subset it's easy, but for other languages it usually needs huge lookup tables and worse
there are different character set.

When we did this for the Amiga (oh so many years ago) it was a royal PITA. We ended up punting for the most part on anything that was outside of the ISO-Latin-1 code page and even there we had a problem due to some "differences" of opinion by certain language groups what was supposed to happen.

This gets worse when you look at behavior patterns due to the fact that
a file, especially one accessed over the network, may be accessed by
a machine with different locale settings and thus have slightly different
rules as to what is the lowercase form of an uppercase letter or wordform.

While I can fully understand the need to do this somewhere closer to the
filesystem (as the performance impact can be massive otherwise) there
is no really good solution to this in the international space when you
start to network machines accross locale settings.

(A pair of files that are correctly unique names in one locale may not
be unique in another locale!)

You either only support UTF-8 Unicode (shifting the burden of conversion to user space) or you need to store a "codepage" per filesystem. Linux seems
to go towards the UTF-8 route. The kernel already has some code for this (JFS does it), but it will be not pretty.

I have not looked at the JFS code at all but this can not be very pretty if they supported the locale preferences. (Unless, in the last 10 years there was some new agreement such that case conversion for all locales are consistant with eachother)

--
Michael Sinz -- Director, Systems Engineering -- Worldgate Communications
A master's secrets are only as good as
        the master's ability to explain them to others.


<Prev in Thread] Current Thread [Next in Thread>