<div dir="ltr">Hi Eric,<br><div class="gmail_extra"><br><div class="gmail_quote">On 28 October 2014 18:42, Eric Sandeen <span dir="ltr"><<a href="mailto:sandeen@sandeen.net" target="_blank">sandeen@sandeen.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span>On 10/28/14 10:38 AM, Jan Kokoska wrote:<br>
> Hi,<br>
><br>
> I'm running OpenVZ (OS container) kernel variant of RHEL6 kernel on<br>
<br>
</span>... for which we have no source code? ;)<br></blockquote><div><br></div><div>Right, I'm sorry, the source code patch on vanilla kernel is linked from</div><div><a href="http://openvz.org/Download/kernel/rhel6/042stab084.20" target="_blank">http://openvz.org/Download/kernel/rhel6/042stab084.20</a><br></div><div>and</div><div><a href="http://openvz.org/Download/kernel/rhel6/042stab092.3" target="_blank">http://openvz.org/Download/kernel/rhel6/042stab092.3</a><br></div><div>for the two kernel versions.</div><div><br></div><div>xfs.aops.c differs a bit between the older and the newer version (released 6 months apart), but both kernels crash. </div><div><br></div><div><div><div>1031,1032c1031,1034</div><div>< <span class="" style="white-space:pre"> </span> * Just skip the page if it is fully outside i_size, e.g. due</div><div>< <span class="" style="white-space:pre"> </span> * to a truncate operation that is in progress.</div><div>---</div><div>> <span class="" style="white-space:pre"> </span> * Skip the page if it is fully outside i_size, e.g. due to a</div><div>> <span class="" style="white-space:pre"> </span> * truncate operation that is in progress. We must redirty the</div><div>> <span class="" style="white-space:pre"> </span> * page so that reclaim stops reclaiming it. Otherwise</div><div>> <span class="" style="white-space:pre"> </span> * xfs_vm_releasepage() is called on it and gets confused.</div><div>1034,1037c1036,1037</div><div>< <span class="" style="white-space:pre"> </span>if (page->index >= end_index + 1 || offset_into_page == 0) {</div><div>< <span class="" style="white-space:pre"> </span>unlock_page(page);</div><div>< <span class="" style="white-space:pre"> </span>return 0;</div><div>< <span class="" style="white-space:pre"> </span>}</div><div>---</div><div>> <span class="" style="white-space:pre"> </span>if (page->index >= end_index + 1 || offset_into_page == 0)</div><div>> <span class="" style="white-space:pre"> </span>goto redirty;</div></div></div><div><br></div><div>OpenVZ devs unfortunately don't publish their git tree anymore.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
I don't know what's in "2.6.32-openvz-amd64" so can't help much.<br>
<br>
What is at line 86 of xfs_aops.c in that kernel?<br></blockquote><div><br></div><div>Stefan is right in that it's the line</div><div>bh = head = page_buffers(page);<br></div><div>from </div><div>xfs_count_page_state()<br></div><div><br></div><div>Eric, thanks for the pointer to ef5d437f71afdf4afdbab99213add99f4b1318fd, I'll raise it with OpenVZ devs or with RHEL so the bug trickles downstream to OpenVZ. I simply didn't know how much difference there may be between XFS parts of the kernel trees that you maintain and that are e.g. in RHEL and thought it could be a generally occurring bug. Also wanted to get in touch with the mailing list as I've been using XFS mostly happily for a decade.</div><div><br></div><div>Jan</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
-Eric<br>
<div><div><br>
> several amd64 machines by different manufacturers (HP and Supermicro)<br>
> and different RAID cards (HP and Areca).<br>
><br>
> I've started seeing kernel crashes in October, as per the netconsole<br>
> logs attached, on two of the machines (one HP, one Supermicro). The<br>
> traces look quite similar, the machine in question cannot write<br>
> anything to its own filesystem when this happens so the logs are made<br>
> over the network. The XFS filesystem is not root (that's ext4), but<br>
> one for data (OS containers), on both machines. When I run xfs_check<br>
> and xfs_repair on the filesystem after the kernel crash & reboot, no<br>
> issue is ever found.<br>
><br>
> This may very well have nothing to do with XFS kernel code you wrote<br>
> and maintain, but in that case, could you, from looking at the traces,<br>
> tell me whether it maybe looks like something issue related to<br>
> vm/paging just ending up in XFS related code path?<br>
><br>
> I'm happy to test any suggestions/fixes for this if it is XFS related.<br>
><br>
> Thank you,<br>
> --<br>
> Jan Kokoska<br>
> Glow Internet s.r.o.<br>
><br>
><br>
</div></div>> _______________________________________________<br>
> xfs mailing list<br>
> <a href="mailto:xfs@oss.sgi.com" target="_blank">xfs@oss.sgi.com</a><br>
> <a href="http://oss.sgi.com/mailman/listinfo/xfs" target="_blank">http://oss.sgi.com/mailman/listinfo/xfs</a><br>
><br>
<br>
_______________________________________________<br>
xfs mailing list<br>
<a href="mailto:xfs@oss.sgi.com" target="_blank">xfs@oss.sgi.com</a><br>
<a href="http://oss.sgi.com/mailman/listinfo/xfs" target="_blank">http://oss.sgi.com/mailman/listinfo/xfs</a><br>
</blockquote></div><br><br clear="all"><div><br></div>-- <br>S pozdravem<br><br>Jan Kokoska<br>Glow Internet s.r.o.<br>
</div></div>