xfs
[Top] [All Lists]

utf-8' chars from Winxp machine may be problem related (was Re: xfs file

To: Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>, xfs-oss <xfs@xxxxxxxxxxx>
Subject: utf-8' chars from Winxp machine may be problem related (was Re: xfs file system in process of becoming corrupt; though xfs_repair...)
From: "Linda A. Walsh" <xfs@xxxxxxxxx>
Date: Wed, 07 Jul 2010 14:40:52 -0700
In-reply-to: <4C2BB0C4.4060800@xxxxxxxxxxxxxxxxx>
References: <4C26A51F.8020909@xxxxxxxxx> <20100628022744.GX6590@dastard> <4C2A749E.4060006@xxxxxxxxx> <20100629232532.GA24712@dastard> <4C2A8948.3030008@xxxxxxxxx> <20100630010622.GC24712@dastard> <4C2AA36F.2070905@xxxxxxxxx> <4C2BB0C4.4060800@xxxxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.8.1.24) Gecko/20100228 Lightning/0.9 Thunderbird/2.0.0.24 Mnenhy/0.7.6.666


Stan Hoeppner wrote:

It is interesting that all of these "corrupt" files originate from Japan.  The
filenames have likely passed through many different character encodings on
their journey from their source to your XFS filesystems.  And they probably
originated on a MS Windows machine.
----
        Actually, while the Japanese comment is true for most, its not true
for all -- some have the 'copyright' or 'trademark' symbol in the.

How can you be so certain that there isn't a wonky bit somewhere that's
wreaking havoc with BabelMap?
----
        Sorry, I didn't mean to come across with certainty that everything was
'unwonky'. Just that normally these files work correctly -- I have good compatibility across all my tools:

1) logged into linux with a tty (secureCRT that supports unicode), and use
  "ls" to view them in bash (or use "echo *" in bash)
2) use the tty version of Vim in that tty window
3) the 'X' version of Vim (displayed through cygwin's X server, which also handles unicode), 4) over the net using Samba on the linux server, in windows7
5) editing the files on Win7 using 'Gvim'

The "broken" files don't work anywhere.  And it is not name or character
specific.  I had 3-4 occurances of 2-3 names broken in 4 copies of 1 directory,
but I also had 2 other copies of that directory that were 'fine' -- same
names, same characters -- some corrupt, some not.


Y access cycles flips a bit, changes a character, or something along these
lines?  Did you update this program recently, or any other programs that might
affect character encoding/displaying, or anything remotely related to such?
Have you done any software updates recently, period?
----
        Have been forced to do file system copies (which I did with
an "xfsdump|mbuffer|xfsrestore" pipe running in background. It was there I really began to notice a pattern of problems, though some nightly backups were giving errors as far as a few weeks ago -- with my
first NOTICING it (I'm often not attentive to automatic processes that
have been working fine for months or years), a few weeks ago, or shortly
after upgrading to 2.6.34. Due to an upgrade to SuSE 11.2 about ... 6-8 months back, my normal logs were lost as it changed, **AGAIN**, the system
logger (first from syslog->ng-syslog, which was a good thing, but now
from ng-syslog to rsyslog -- a step backwards in flexibility), with the
result that all my logfile patterns were no longer used and much logging
was simply thrown away.  AFter I caught it -- switched back to ng-syslog
and that's when I began noticing multiple oddities in my log files.



Given the entirety of what we're looking at, and that you're apparently not
seeing this with files created in a native English language encoding, I'd say
Dave is probably on the right track here.
---
        Yeah...something to do with character encoding...I'd agree there.
But not just foreign -- just "utf-8"  some english names but with special
symbols:
Favorites/Cannabis, EO's & Plant info sources/Plant, Tree sources/The Online Nursery 
» buckeye tree.URL
Favorites/Cannabis, EO's & Plant info sources/Plant, Tree sources/The Online Nursery 
» Black Walnut.URL
Favorites/Hw/Intel® Xeon® Processor Numbers.URL
Favorites/Hw/Intel® 5000X Chipset Overview.URL
Favorites/Hw/Computer(s), peripherals, parts/Intel CPU and chips.../Intel® 
Xeon® Processor 5000 Sequence - Technical Documents.URL
Favorites/Hw/Computer(s), peripherals, parts/Intel CPU and chips.../Intel® 
Xeon® Processor Numbers.URL
Favorites/Hw/Computer(s), peripherals, parts/Intel CPU and chips.../Intel® 
Core™ Microarchitecture.URL
Favorites/Hw/Computer(s), peripherals, parts/Intel CPU and chips.../How to Choose between 
Hardware and Software Prefetch on 32-Bit Intel® Architecture - Intel® Software 
Network.URL
Favorites/Hw/Computer(s), peripherals, parts/Intel CPU and chips.../Preparing Applications 
for Intel® Core™ Microarchitecture.URL
Favorites/Microsoft/JSI, INC. - Your Windows Server 2003 - Windows NT - Windows 2000 
- Windows XP ® Resource.URL
Favorites/Web Technologies/Ajaxian » Behold the, um, Beholder!.URL
Favorites/Web Technologies/mezzoblue § css Zen Garden Resources.URL
hw/misc+interest/Freedom to Tinker » Blog Archive » Making and Breaking HDCP 
Handshakes_files
hw/misc+interest/Freedom to Tinker » Blog Archive » Making and Breaking HDCP 
Handshakes.htm
Receipts_n_inf_etc/WinZip® Order Confirmation-v14-2009.pdf

-----
So (R) and the angular right quote ">>", the paragraph mark...

All of the above are filenames that can't be accessed, among several french, 
spanish, greek
and japanese filenames.
The french/spanish are from Adobe documentation.

Even the proper Knuth spelling of "Latex" with the lowered 't' (theta I 
believe)...etc.

So yup...foreign char delight.

I can easily image most or all of these having been imported from my winXP 
machine at
one point -- as I only recently started using Win7.  -- and many of the 
troublsome japanese
filenames were 'downloaded japanese anime-related stuff' that I did on my old
XP machine -- which I used as a download client while I did work on my Win7 
machine...
That gave a huge influx of foreign names from a WinXP machine.  That could be 
what
made the problem jump out so noticeable -- before it was only maybe 20-30 files out of about a million or more. But in the new batch it was hundreds out of several thousand, so
they stand out alot more.



--------------------

That said -- and I note

<Prev in Thread] Current Thread [Next in Thread>