<div dir="ltr"><div>Dear Dave:<br><br></div>Thanks very much for your explanations<br><div><div><div class="gmail_extra"><br><div class="gmail_quote">2016-05-30 1:20 GMT+02:00 Dave Chinner - <a href="mailto:david@fromorbit.com">david@fromorbit.com</a> <span dir="ltr"><<a href="mailto:xfs.pkoch.2540fe3cfd.david#fromorbit.com@ob.0sg.net" target="_blank">xfs.pkoch.2540fe3cfd.david#fromorbit.com@ob.0sg.net</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">.... <br>
Oh, dear. There's a massive red flag. I'll come back to it...<br></blockquote><div><br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
> 5: xfsdump the temporary xfs fs to /dev/null. took 20 hours<br>
<br>
Nothing to slow down xfsdump reading from disk. Benchmarks lie.<br></blockquote><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
dump is fast - restore is the slow point because it has to recreate<br>
everything. That's what limits the speed of dump - the pipe has a<br>
bound limit on data in flight, so dump is throttled to restore<br>
speed when you run this.<br>
<br>
And, as I said I'll come back to, restore is slow because:<br></blockquote><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
The filesystem is not exactly as you described. Did you notice that<br>
xfs_restore realises that it has to restore 20 million directories<br>
and *274 million* directory entries? i.e. for those 7 million inodes<br>
containing data, there is roughly 40 hard links pointing to each<br>
inode. There are also 3 directory inodes for every regular file.<br>
This is not a "data mostly" filesystem - it has vastly more metadata<br>
than it has data, even though the data takes up more space.<br></blockquote><div><br></div><div>Our backup-server has 46 versions of our home-directories and 158<br>versions of our mailserver, so if a file has not been changed for more<br>than a year it will exist once on the backup server together with<br>45 / 157 hard links.<br><br></div><div>I'm astonished myself. Firstly about the numbers and also about<br>the fact that our backup-strategy does work quite well.<br><br></div><div>Also rsync does a very good job. It was able to copy all these hard links<br></div><div>in 6 days from a 16TB ext3 filesystem on a RAID10-volume to a<br></div><div>15TB xfs filesystem on a RAID5-volume.<br><br></div><div>And right now 4 rsync processes are copying the 15TB xfs filesystem<br>back to a 20TB xfs-filesystem. And it seems as if this will finish<br>today (after 3 days only). Very nice.<br><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Keep in mind that it took dump the best part of 7 hours just to read<br>
all the inodes and the directory structure to build the dump<br>
inventory. This matches with the final ext3 rsync pass of 10 hours<br>
which should have copied very little data. Creating 270 million<br>
hard links in 20 million directories from scratch takes a long time,<br>
and xfs_restore will be no faster at that than rsync....<br></blockquote><div><br></div><div>That was my misunderstanding. I was believing/hoping that a tool<br></div><div>that was built for a specific filesystem would outperform a generic<br></div><div>tool like rsync. I thought xfsdump would write all used filesystem<br>blocks into a data stream and xfsrestore would just read the<br>blocks from stdin and write them back to the destination filesystem.<br>Much like a dd-process that knows about the device-content and<br>can skip unused blocks.<br> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
> Seems like 2 days was a little optimistic<br>
<br>
Just a little. :/<br></blockquote><div><br></div><div>It would have taken approx 1000 hours<br> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Personally, I would have copied the data using rsync to the<br>
temporary XFS filesystem of the same size and shape of the final<br>
destination (via mkfs parameters to ensure stripe unit/width match<br>
final destination) and then used xfs_copy to do a block level copy<br>
of the temporary filesystem back to the final destination. xfs_copy<br>
will run *much* faster than xfsdump/restore....<br></blockquote><div><br></div><div>Next time I will do it like you suggest with one minor change. Instead<br></div><div>of xfs_copy I would use dd, which makes sense if the filesystem is<br></div><div>almost filled. Or do you believe that xfs_copy is faster then dd?<br></div><div>Or will the xfs_growfs create any problems?<br></div><div><br></div><div>I used dd on saturday to copy the 15TB xfs filesystem back<br></div><div>into the 20TB raid10 volume and enlarged the filesystem with<br></div><div>xfs_growfs. The result was a xfs-filesystem with layout-parameters<br></div><div>matching the temporary raid5 volume built from 16 1TB disks<br></div><div>with a 256K chunksize. But the new raid10-volume consists of<br></div><div>20 2TB disks using a chunksize of 512K. And growing the filesystem<br></div><div>raised the allocation group count from 32 to 45.<br><br></div><div>I reformatted the 20TB volume with a fresh xfs-filesystem and I<br></div><div>let mkfs.xfs decide about the layout.<br><br></div><div>Does that give me an optimal layout? I will enlarge the filesystem<br>in the future. This will increase my allocation group count. Is that<br>a problem that I should better have avoided in advance by reducing<br>the agcount?<br><br>Kind regards and thanks very much for the useful infose<br><br>Peter Koch<br clear="all"></div></div><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature">Peter Koch<br>
Passauer Strasse 32, 47249 Duisburg<br>
Tel.: 0172 2470263</div>
</div></div></div></div>