Received: with ECARTIS (v1.0.0; list linux-xfs); Tue, 27 May 2003 06:03:24 -0700 (PDT) Received: from mail.tvol.net (pr-66-150-46-254.wgate.com [66.150.46.254]) by oss.sgi.com (8.12.9/8.12.9) with SMTP id h4RD312x029290 for ; Tue, 27 May 2003 06:03:02 -0700 Received: from sinz.eng.tvol.net ([10.32.2.99]) by mail.tvol.net with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id GZVMLX1C; Tue, 27 May 2003 09:03:26 -0400 Received: from wgate.com (localhost.localdomain [127.0.0.1]) by sinz.eng.tvol.net (8.12.8/8.12.5) with ESMTP id h4RD1xfj016683; Tue, 27 May 2003 09:01:59 -0400 Message-ID: <3ED361C7.5080601@wgate.com> Date: Tue, 27 May 2003 09:01:59 -0400 From: Michael Sinz User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030507 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Andi Kleen CC: linux-xfs@oss.sgi.com Subject: Re: Tomorrow References: <1053694002.2887.1.camel@localhost.localdomain> <1053697162.21472.51.camel@jen.americas.sgi.com> <20030523134438.GC30288@wotan.suse.de> <20030523150530.A31022@infradead.org> <20030524071709.GK27626@plato.local.lan> <20030524095245.A24074@infradead.org> <20030524091516.GM27626@plato.local.lan> <20030524093103.GA12181@wotan.suse.de> <3ED344C0.1010700@wgate.com> <20030527120650.GA22306@wotan.suse.de> In-Reply-To: <20030527120650.GA22306@wotan.suse.de> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 4158 X-ecartis-version: Ecartis v1.0.0 Sender: linux-xfs-bounce@oss.sgi.com Errors-to: linux-xfs-bounce@oss.sgi.com X-original-sender: msinz@wgate.com Precedence: bulk X-list: linux-xfs Content-Length: 2202 Lines: 49 Andi Kleen wrote: > On Tue, May 27, 2003 at 06:58:08AM -0400, Michael Sinz wrote: > >>When we did this for the Amiga (oh so many years ago) it was a royal >>PITA. We ended up punting for the most part on anything that was >>outside of the ISO-Latin-1 code page and even there we had a problem >>due to some "differences" of opinion by certain language groups what >>was supposed to happen. > > I wrote a C Library for the Amiga a long time ago and in the end I > left it all for locale.library because it was too nasty to do by itself. That is why we wrote the locale.library - it was nasty. Even worse when you add in the sorting issues. >>This gets worse when you look at behavior patterns due to the fact that >>a file, especially one accessed over the network, may be accessed by >>a machine with different locale settings and thus have slightly different >>rules as to what is the lowercase form of an uppercase letter or wordform. > > AFAIk the SMB protocol handles this. I would have to look at how it deals with uniqueness vs non-uniqueness between different clients. That was the really hard problem for us. >>there was some new agreement such that case conversion for all locales >>are consistant with eachother) > > Yes there is: Unicode/UTF-8. That is where all the Linux distributions are > going too. For legacy SMB support you will still need to support codepages, > but that could be done by samba. For XFS I guess it would be enough to just > support UTF-8. Supporting different code pages is probably not too useful > anymore. Does UNICODE actually define the case-ness of characters now? I have been out of UNICODE stuff for some time (working at different levels of system design - not the OS guru I used to be :-() It used to just define the glyphs and give not symantic meaning to them. In fact, the 16-bit UNICODE had the problem of not even keeping all of the glyphs for a locale together. It was just a way of enumerating glyphs and some "compatibility" stuff for ASCII and ECMA/ISO Latin-1 -- Michael Sinz -- Director, Systems Engineering -- Worldgate Communications A master's secrets are only as good as the master's ability to explain them to others.