[Top] [All Lists]

Re: vmap allocation for size 1048576 failed

To: Michael Weissenbacher <mw@xxxxxxxxxxxx>
Subject: Re: vmap allocation for size 1048576 failed
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 10 Nov 2010 23:30:52 +1100
Cc: xfs@xxxxxxxxxxx
In-reply-to: <4CDA54E3.308@xxxxxxxxxxxx>
References: <4CD924A6.6040200@xxxxxxxxxxxx> <20101109120701.GN2715@dastard> <4CDA54E3.308@xxxxxxxxxxxx>
User-agent: Mutt/1.5.20 (2009-06-14)
On Wed, Nov 10, 2010 at 09:16:35AM +0100, Michael Weissenbacher wrote:
> Hi Dave!
> > I didn't think anything other than log recovery tries to vmap
> > buffers. This is clearly not in log recovery. Can you post an
> > unedited error log, how much data you are rsyncing, the
> > configuration of your filesystem (xfs_info, mount options, loop dev
> > config, etc) to give us an idea of what you are doing to trigger
> > this?
>  * I'm attaching the latest kern.log, unedited
>  * I am syncing the gentoo portage tree, not much data but many small
> files (currently 228MiB in 117880 files). This sync is done twice per hour.
>  * I already posted my xfs_info and mount options in another post. Maybe
> i should note that the loop file system was deliberately created with
> blocksize=512 to accommodate the fs to the nature of the portage tree
> (many small files...)
>  * TBH i don't know about any special configuration of the loop device.
> I just created an empty file with "dd if=/dev/zero ..." and then did
> mkfs.xfs on it.

Ok, thanks for the info. I'm struggling to work out what is actually
consuming vmap space given what you are doing and the storage
config. None of the operations you are doing should be triggering
mapping of multipage metadata buffers given the filesystem

> > Can't you run on a 64-bit machine?
> 80% of my machines are 64-bit and i never saw anything like that on
> them. But otoh i dont' use loop devices very much. Unfortunately this is
> machine is old hardware (P4 class) which can't run a 64-bit kernel.
> > Can you downgrade your kernel and run the loop device there to tell
> > us whether this is actually a regression or not? If it is a
> > regression, then if you could run a bisect to find the exact patch
> > that causes it woul dbe very helpful....
> Already did that yesterday and i can confirm it has the same problem -
> so no regression. The kern.log i attached to this mail is from Kernel

Ok, I havent looked at this yet, but having looked at the code I
don't think it will tell me anything useful. However, if you can
reproduce the problem easily, the tracing output (google for
trace-cmd and/or kernelshark) of the xfs trace events for the short
period across the error condition (just a couple of seconds before
and after the error occurs) would tell me a lot more about what is
going wrong...


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>