[Top] [All Lists]

Re: [xfs_check Out of memory: ]

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [xfs_check Out of memory: ]
From: Arkadiusz MiÅkiewicz <arekm@xxxxxxxx>
Date: Sat, 28 Dec 2013 00:20:39 +0100
Cc: xfs@xxxxxxxxxxx, "Stor??" <289471341@xxxxxx>, Jeff Liu <jeff.liu@xxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=maven.pl; s=maven; h=from:to:subject:date:user-agent:cc:references:in-reply-to :mime-version:content-type:content-transfer-encoding:message-id; bh=zA8yvnTnxNRfg4ouIBHnQciH1x1wP3JdQubeuMoi2xU=; b=u2Ng1anU7xUwTiO4ji77VAtqGWA1ZvssGRSrmXfnHoNR17JWobcy2MGemDR6tUkNg8 E28nQarVAymbDRfwXrG+hNw5uPKkRqA86lQQAmnD0pwVDP4Fa3nlnmumQwCA+EbW4off 4iboiSOBncoEC/DTGp1qjNy7sDgxcsUEsQQlY=
In-reply-to: <20131227224212.GK20579@dastard>
References: <tencent_3F12563342ED1D4E049D1123@xxxxxx> <201312270907.22638.arekm@xxxxxxxx> <20131227224212.GK20579@dastard>
User-agent: KMail/1.13.7 (Linux/3.12.6-dirty; KDE/4.12.0; x86_64; ; )
On Friday 27 of December 2013, Dave Chinner wrote:
> On Fri, Dec 27, 2013 at 09:07:22AM +0100, Arkadiusz MiÅkiewicz wrote:
> > On Friday 27 of December 2013, Jeff Liu wrote:
> > > On 12/27 2013 14:48 PM, Stor?? wrote:
> > > > Hey:
> > > > 
> > > > 20T xfs file system
> > > > 
> > > > 
> > > > 
> > > > /usr/sbin/xfs_check: line 28: 14447 Killed
> > > > xfs_db$DBOPTS -i -p xfs_check -c "check$OPTS" $1
> > > 
> > > xfs_check is deprecated and please use xfs_repair -n instead.
> > > 
> > > The following back traces show us that it seems your system is run out
> > > memory when executing xfs_check, thus, snmp daemon/xfs_db were killed.
> > 
> > This reminds me a question...
> > 
> > Could xfs_repair store its temporary data (some of that data, the biggest
> > parte) on disk instead of in memory?
> Where on disk? 

In directory/file that I'll tell it to use (since I usualy have few xfs 
filesystems on single server and so far only one at a time breaks).

> We can't write to the disk until we've verified all
> the free space is really free space, and guess what uses all the
> memory? Besides, if the information is not being referenced
> regularly (and it usually isn't), then swap space is about as
> efficient as any database we might come up with...

It's not about efficiency. It's about not killing the system (by not eating 
all memory, OOM). If I can (optionally) trade repair speed for not eating ram 
then it's desired sometimes. Better to have slow repair than no repair 8)

Could xfs_repair tell kernel that this data should always end up on swap first 
(allowing other programs/daemons to use regular memory) prehaps? (Don't know 
interface that would allow to do that in kernel though). That would be some 
half baked solution.

> > I don't know it that would make sense, so asking. Not sure if xfs_repair
> > needs to access that data frequently (so on disk makes no sense) or
> > maybe it needs only for iteration purposes in some later phase (so on
> > disk should work).
> > 
> > Anyway memory usage of xfs_repair was always a problem for me (like 16GB
> > not enough for 7TB fs due to huge amount of fies being stored). With
> > parallel scan it's even worse obviously.
> Yes, your problem is that the filesystem you are checking contains
> 40+GB of metadata and a large amount of that needs to be kept in
> memory from phase 3 through to phase 6.

Is that data (or most of that data) frequenly accessed? Or something that's 
iterated over let say once in each phase? 

Anyway current "fun" with repair and huge filesystems looks like this:
- 16GB of memory, run xfs_repair, system goes into unusable state because 
whole ram is eaten (ends up with OOM); wait several hours
- reboot, add 20GB of swap, run xfs_repair, the same happens again; wait half 
a day
- reboot, add another 20GB of swap space, run xfs repair - success!; wait 
another day
- in all steps system is simply unusable for other services. Nothing else will 
work since entire ram gets eaten by repair. So doesn't help me to have 4 xfs 
filesystems and only one broken - have to shut down all services only for that 
repair to work
- with parallel git repair it is even worse obviously (OOM happens sooner than 
- can't add more RAM easily, machine is at remote location, uses obsolete 
DDR2, have no more ram slots and so on
- total repair time for all that steps is few times longer than neccessary 
(successful repair took 7.5h while all these steps took 2 days)
- what's worse tools give no estimations of ram needed etc but that's afaik 
unfixable. This means that it is not known how much memory will be needed. You 
need to run repair and see. Also if more files gets stored then next repair in 
few monts could require twice more ram. You never know what to expect.

Now how to prevent these problems? Currently I see only one "solution" - add 
more RAM.

Unfortunately that's not a sloution - won't work in many cases described 

So looks like my future backup servers will need to have 64GB, 128GB or maybe 
even more ram that will be there only for xfs_repair usage. That's gigantic 
waste of resources. And there are modern processors that don't work with more 
than 32GB of ram - like "Intel Xeon E3-1220v2" ( http://tnij.org/tkqas9e ). So 
adding ram means replacing CPU, likely replacing mainboard. Fun :)

> If you really want to add
> some kind of database interface to store this information somewhere
> else, then I'll review the patches. ;)

Right. So only "easy" task finding the one who understands the code and can 
write such interface left. Anyone?

IMO ram usage is a real problem for xfs_repair and there has to be some 
upstream solution other than "buy more" (and waste more) approach.

> Cheers,
> Dave.

Arkadiusz MiÅkiewicz, arekm / maven.pl

<Prev in Thread] Current Thread [Next in Thread>