[Top] [All Lists]

Re: [PATCH] xfs: v2 fix node forward in xfs_node_toosmall

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH] xfs: v2 fix node forward in xfs_node_toosmall
From: Mark Tinguely <tinguely@xxxxxxx>
Date: Tue, 24 Sep 2013 16:34:53 -0500
Cc: stable@xxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20130924210653.GB26872@dastard>
References: <20130920220519.585903357@xxxxxxx> <20130923171911.273669684@xxxxxxx> <20130923234819.GW9901@dastard> <5241CD70.7050800@xxxxxxx> <20130924210653.GB26872@dastard>
User-agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20120122 Thunderbird/9.0
On 09/24/13 16:06, Dave Chinner wrote:
On Tue, Sep 24, 2013 at 12:35:44PM -0500, Mark Tinguely wrote:
On 09/23/13 18:48, Dave Chinner wrote:
On Mon, Sep 23, 2013 at 12:18:58PM -0500, Mark Tinguely wrote:
Commit f5ea1100 cleans up the disk to host conversions for
node directory entries, but because a variable is reused in
xfs_node_toosmall() the next node is not correctly found.
If the original node is small enough (<= 3/8 of the node size),
this change may incorrectly cause a node collapse when it should

The comment about the size of the node triggering a collapse is
irrelevant - nodes always collapse at that given size. What this
doesn't tell us is why the crash occurs - "the next node is not
correctly found" is not particularly obvious, and would require
quite a bit of code reading to work out from first principles a
couple of years down the track.

The commit message should be more precise and describe what the
underlying cause of the failure was. i.e. that the node is finding itself as 
the merge
candidate because we go forward, overwrite the pointers and the new
block's backward sibling is the original block which is where we end
up on teh second loop.  And vice versa if we go backwards first...

Also, the "next node" is correctly termed a "sibling", and it's
either the forwards or backwards sibling, not the "next" sibling as
the direction of movement is important. So perhaps this
is better written as:

"When a node is considered for a merge with a sibling, it overwrites
the sibling pointers of the original node with the sibling's
pointers. This leads to loop considering the original node as a
merge candidate with itself in the second pass, and so it
incorrectly determines a merge should occur."

Are you done ranting? Get the @#$% bug patched.

I'm deeply sorry you feel that way about the review process - it's
not just code that matters. Experience has shown us time and time
again that accurate and complete commit messages are extremely
valuable as they document the symptoms of a problem being fixed and
why the fix was needed.

If someone needs to look at this commit in a couple of years to
determine if it matches a problem that a customer reported, they
shouldn't have to work out what the problem was and guess at it's
symptoms and impact from code analysis. The commit message should
tell them all the information they need(*).

It took you quite some time and effort to find the problem, so it's
worthy of spending a few minutes to document that effort for
posterity. That way when someone asks you a question about the
problem, all you need to do is point them at the commit and all the
information is right at their fingertips.

I'm not asking you to do anything I don't do already.  Have you ever
wondered why I write long, verbose commit messages and changes with
verbose comments? They aren't for the reviewer - I can answer
questions in real-time about the change. The message is for someone
looking at the change in 2-3 years time when when nobody remembers
the exact circumstances of the fix anymore.

IOWs, by clearly documenting the problem being fixed, the root cause
analysis and verification that was performed *using standard
terminology*, we make it far easier for someone to come along in 2-3
years time and understand the fix without needing any further
information about it.

Software engineering best practices have come a long way since the
early 90's - writing meaningful commit messages to go along with
your code changes has been considered a best practise for at least
the last 10 years....



(*) If you've ever spent any time looking at the old XFS archives,
then you'll understand exactly why what I've asked for is important.
Trying to reverse-engineer why a change was made in the old XFS code
is just about impossible because all they generally have is single
line commit messages and nothing else to describe the change.
Sometimes they just point at a bug number, without any other
information at all.....

You do not always practice what you preach - here is the commit message from the patch that caused this whole problem:

commit f5ea110044fa858925a880b4fa9f551bfa2dfc38
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Wed Apr 24 18:58:02 2013 +1000

    xfs: add CRCs to dir2/da node blocks

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Ben Myers <bpm@xxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

Did this patch just add CRCs to node blocks? no.

And how about your patches that do not need a commit message at all?!
Are these exceptions? No, I have done enough reviews of your patches, and some of your commit message are less than stunning.

when your preaching, and not so subtle digs at me or SGI gets to the point of rewriting commit messages even I can't hold my tongue any more.

PS. I told Ben that I do not appreciate commit messages being changed.


<Prev in Thread] Current Thread [Next in Thread>