Steve +,
No problems, if only I could find the time to look at all the problems
that come my way the same day or week (or often month!).
I'll compile up the latest CVS XFS kernel later with CONFIG_DEBUG_SLAB set
and hopefully setup a test load over the weekend.
I had another crash on my production server this morning (~6.5 days up). This
seemed to produce a similar Oops as the last one, but I failed to capture the
full details. What was interesting (?) was the following messages in the
system log:
Feb 8 09:32:56 blue00 kernel: 08:01: rw=1, want=1791743140, limit=560282908
Feb 8 09:32:56 blue00 kernel: attempt to access beyond end of device
Feb 8 09:32:56 blue00 kernel: 08:01: rw=1, want=1791743160, limit=560282908
>From just before it crashed (~1minute). I can't remember seeing any similar
messages before (checked back at some logs and can't find anything). The
configuration of the filesystem (which is on a GFORCE RI hardware FC/IDE
RAID-5 unit) is:
# xfs_info /scratch
meta-data=/scratch isize=256 agcount=535, agsize=261815 blks
data = bsize=4096 blocks=140070727, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=0
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=17098
realtime =none extsz=65536 blocks=0, rtextents=0
Which I guess agrees with the limit referred to in the error message
(140070727 * 4K blocks = 560282908K), though why would the kernel try
to reference something at 1791743140K (I guess I need some downtime to
run a offline xfs_check) . May give you some pointers?
Your observation of a Oops in a memory allocation is interesting, like
in your case my system has lots of memory 1G, though the majority (~910Mb)
is used as filesystem cache (cached):
# cat /proc/meminfo
total: used: free: shared: buffers: cached:
Mem: 1054031872 1044246528 9785344 0 6295552 936423424
Swap: 1052794880 1261568 1051533312
MemTotal: 1029328 kB
MemFree: 9556 kB
MemShared: 0 kB
Buffers: 6148 kB
Cached: 914356 kB
SwapCached: 120 kB
Active: 54540 kB
Inactive: 872776 kB
HighTotal: 131008 kB
HighFree: 1028 kB
LowTotal: 898320 kB
LowFree: 8528 kB
SwapTotal: 1028120 kB
SwapFree: 1026888 kB
Though I have noticed that at times the amount of memory in use for
'Cached' (and Buffers) decreases (down to ~400Mbytes), with no increase in
free memory or observable increase in memory used by processes, it does
recover (but normally back to 600-800Mbytes against cached rather than
the 900Mbytes+ that we can see on a recently booted system. Could this be
connected with the memory allocation problems?
--
Thanks
Ian
-- Steve,
Steve Lord wrote:
>
>
> You have not been forgotten, just trying to do too many things at once
> around here right now. But both of you ended up with an oops in kfree,
> would it be possible to turn on CONFIG_DEBUG_SLAB.
> This will turn on a number of memory checking features and might make
> things fall over at a different - and more inciteful point.
>
> In Chip's case I suspect the config flag does not exist, so hand edit
> mm/slab.c and turn on the DEBUG options in there.
>
> On a side note, today I experienced an oops due to what appeared to be
> a failure to allocate a buffer - we had been assuming these were caused
> by being out of memory, but in my case I had plenty of available memory,
> it turns out to be a bug in the pagebuf code when we reallocate metadata
> space. I am thrashing the fix on some test boxes now, but it is possible
> that those really were not out of memory cases people were seeing, but
> due to this bug.
>
> Steve
>
> --
>
> Steve Lord voice: +1-651-683-3511
> Principal Engineer, Filesystem Software email: lord@xxxxxxx
--
/////////////Technical Coordination, Research Services////////////////////
Ian Hardy
Computing Services
Southampton University email: idh@xxxxxxxxxxx
Southampton S017 1BJ, UK. i.d.hardy@xxxxxxxxxxx
\\'BUGS: The notion of errors is ill-defined' (IRIX man page for netstat)\
|