xfs
[Top] [All Lists]

Re: Linux 2.4.17-xfs vs previous XFS versions and certain non-us chara

Subject: Re: Linux 2.4.17-xfs vs previous XFS versions and certain non-us characters in filenames
From: "D. Stimits" <stimits@xxxxxxxxxx>
Date: Sun, 27 Jan 2002 11:26:06 -0700
Cc: Linux XFS Mailing List <linux-xfs@xxxxxxxxxxx>
References: <1012101803.1045.28.camel@steelnest> <1012102374.1045.35.camel@steelnest> <3C536F44.1020301@xxxxxxx> <20020127152120.A1490@xxxxxxxxxx> <20020127154745.A20990@xxxxxxxxxxxxx> <1012143898.923.1.camel@steelnest> <1012146858.923.6.camel@steelnest> <20020127172958.A8796@xxxxxxxxxxxxx> <3C5437F5.3553AAC5@xxxxxxxxxx> <3C543B45.3000308@xxxxxxx>
Reply-to: stimits@xxxxxxxxxx
Sender: owner-linux-xfs@xxxxxxxxxxx
Stephen Lord wrote:
> 
> D. Stimits wrote:
> 
> >Andi Kleen wrote:
> >
> >>On Sun, Jan 27, 2002 at 04:54:18PM +0100, H?kan Lindqvist wrote:
> >>
> >>>There is, however, another (perhaps also related) issue!
> >>>If I create a file named "?" I can't remove it afterwards.
> >>>
> >>Is this a real ? or a ascii unprintable character >127 ?
> >>
> >>-Andi
> >>
> >
> >I'm curious if the extended characters over 127 from the Latin-1
> >character set require support at the filesystem level? I imagine that
> >some of the wide character requirements of Japanese kanji would make it
> >rather "interesting" if the filesystem has to actually deal with it. I'm
> >beginning to discover the pain of trying to internationalize software on
> >X11, and curious about how far character sets actually "invade" the
> >system.
> >
> >D. Stimits, stimits@xxxxxxxxxx
> >
> 
> This is something of an xfs special in this case, the hash algorithm
> used in xfs
> directories does math on the names, and in removing the -funsigned-char from
> the Makefiles, I forgot about this.

To some degree it reminds me of PostegreSQL. Somewhere in the docs I
recall seeing it mention that it works with non-C locales (non-english
basically), but that it would then run slower due to no hashing. I guess
instead of trying to support other locales at full performance,
PostgreSQL (at least the version I read docs on a year or so ago)
completely eliminated hashing if different character sets were used.

With all the internationalizing going on, and the "world economy" being
so much more important in the tech industry, I have to wonder how long
it will be before hash routines for all these different character sets
becomes common. I tend to do more C++ these days than C, I love the STL
containers; hashed versions are common, even if not officially part of
any standard, but they lack built-in hash functions for anything but
fundamental types and char*. It's hard to realize how important hash
functions are until you use a non-ASCII character set. Some of the wide
character sets that use 16 bits, also use forms of shifting on top of
that (check out kanji). Then there are mixed uses, where more than one
character set must be displayed at the same time in the same
application, so one has to mark attributes on all the characters for
which character set, and switch on the fly; worse, the mix often
requires a special scheme not only for mixing character sets, but also
to require mixing 7 bit, 8 bit, and 16 bit sets without losing the
boundary between characters. I am very curious if anyone running a
Japanese locale machine has reported using XFS? I shudder to think about
the difficulty.

D. Stimits, stimits@xxxxxxxxxx

D. Stimits, stimits@xxxxxxxxxx

> 
> Steve


<Prev in Thread] Current Thread [Next in Thread>