xfs
[Top] [All Lists]

Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_

To: Brian Foster <bfoster@xxxxxxxxxx>
Subject: Re: Data can't be wrote to XFS RIP [<ffffffffa041a99a>] xfs_dir2_sf_get_parent_ino+0xa/0x20
From: Kuo Hugo <tonytkdk@xxxxxxxxx>
Date: Mon, 20 Jul 2015 22:30:31 +0800
Cc: Hugo Kuo <hugo@xxxxxxxxxxxxxx>, Eric Sandeen <sandeen@xxxxxxxxxxx>, Darrell Bishop <darrell@xxxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=cFZDZSL8sllVdLzxfTOV4DyHPlvGTysNFpjb2qKlhSg=; b=gXYLFeOb4fKdTAzphjhLnQbnzklU+xnR0XtyEAHJRcTt5DkjE+xGHH7rCiudZS6zY/ SvMOr7RsVB3YFvqXeemk6s1wwUbu1LcPIUjyDsVbOHKoV+p4K7sdd34OcoEH9TfodviB wG1lTgVkLlXGgntKVpzyIac9cdj835LdL5SX4Q1DOSItHo1zKCygId3q97JDlsGSxopc s3y6DA+W+jr/kSg7PKT1MtyWO4N3suwOyDdtPXl6nYqX723Aakv1Cv5CcNEqeCJXBP51 PyzmP5C2Q5DMRsW8cUVRxweFzy8ahO/F4cNY4qQ+frYFt0ea29yhD1vm3YRRPIByYi1d pGiw==
In-reply-to: <20150720114648.GB53450@xxxxxxxxxxxxxxx>
References: <CA++_uhuoQ76uOUikbetw5ocfHDGAFMERcbgX9R4Shha2GUxWmQ@xxxxxxxxxxxxxx> <20150709151811.GE63282@xxxxxxxxxxxxxxx> <CA++_uhu=VNKtjax_JjsCZwDFT0Vk-CAjS5j=ba5+A5HL4nxpmA@xxxxxxxxxxxxxx> <20150709183255.GG63282@xxxxxxxxxxxxxxx> <CA++_uht5N6MtqUQfbB9A3R__UvR4aLN2q5-mFKiO__vU-Cxwpw@xxxxxxxxxxxxxx> <20150713125214.GA50787@xxxxxxxxxxxxxxx> <CA++_uhvrDBuP9nANTc0ZxZudDriYKrrtnaQUZzXPRLs0otD22w@xxxxxxxxxxxxxx> <20150713170158.GB50787@xxxxxxxxxxxxxxx> <CA++_uhvDrO2BmQ+q0bN=M_L-vUUaLZO9bHoKh0ntFveM5t-DNQ@xxxxxxxxxxxxxx> <CA++_uhuJNkO4MDyS_+veFpysGyqzhqLspB3g73DtUCQqK1F80Q@xxxxxxxxxxxxxx> <20150720114648.GB53450@xxxxxxxxxxxxxxx>

Hi Brain,

>I donât know much about the Swift bug. A BUG() or crash in the kernel is generally always a kernel bug, regardless of what userspace is doing. It >certainly could be that whatever userspace is doing to trigger the kernel bug is a bug in the userspace application, but either way it shouldnât cause the >kernel to crash. By the same token, if Swift is updated to fix the aforementioned bug and the kernel crash no longer reproduces, that doesnât >necessarily mean the kernel bug is fixed (just potentially hidden).

Understand.Â

[Previous Message]

The valid inode has an inode number of 13668207561.
- The fsname for this inode is "sdb."
- The inode does appear to have a non-NULL if_data:

    if_u1 = {
      if_extents = 0xffff88084feaf5c0,
      if_ext_irec = 0xffff88084feaf5c0,
      if_data = 0xffff88084feaf5c0 "\004"
    },

        find <mntpath> -inum 13668207561

Q1: Were you able to track down the directory inode mentioned in the previous message?

Ans: Yes, itâs the directory/file as below. /srv/node/d224 is the mount point of /dev/sdb . This is the original location of the path. This folder includes the file 1436266052.71893.ts now. The .ts file is 0 size


[root@r2obj01 ~]# find /srv/node/d224 -inum 13668207561
/srv/node/d224/objects/45382/b32/b146865bf8034bfc42570b747c341b32

[root@r2obj01 ~]# ls -lrt /srv/node/d224/objects/45382/b32/b146865bf8034bfc42570b747c341b32
-rw------- 1 swift swift 0 Jul 7 22:37 1436266052.71893.ts

Q2: Is it some kind of internal directory used by the application (e.g., perhaps related to the quarantine mechanism mentioned in the bug)?

Ans: Yes, itâs a directory which accessing by application.


 37 ffff8810718343c0 ffff88105b9d32c0 ffff8808745aa5e8 REG  [eventpoll]
 38 ffff8808713da780 ffff880010c9a900 ffff88096368a188 REG /srv/node/d224/quarantined/objects/b146865bf8034bfc42570b747c341b32/1436266042.57775.ts
 39 ffff880871cb03c0 ffff880495a8b380 ffff8808a5e6c988 REG  /srv/node/d224/tmp/tmpSpnrHg

 40 ffff8808715b4540 ffff8804819c58c0 ffff8802381f8d88 DIR  /srv/node/d224/quarantined/objects/b146865bf8034bfc42570b747c341b32

The above operation in the swift-object-server was doing python function call to rename the fileÂ/srv/node/d224/objects/45382/b32/b146865bf8034bfc42570b747c341b32/1436266042.57775.tsÂas /srv/node/d224/quarantined/objects/b146865bf8034bfc42570b747c341b32/1436266042.57775.ts

os.rename(old, new)

And it crashed at this point. In the Q1, we found the inum is pointing to the directoryÂ/srv/node/d224/objects/45382/b32/b146865bf8034bfc42570b747c341b32Â.Â

We found that multiple(over 10) DELETE from application against the target file at almost same moment. The DELETE is removing the original file in the directory and create new empty .ts file in this directory. I suspect that multiple os.rename on the same file in that directory will cause the kernel panic.Â

And the file /srv/node/d224/quarantined/objects/b146865bf8034bfc42570b747c341b32/1436266042.57775.ts was not created.

Regards // HugoÂ

â
<Prev in Thread] Current Thread [Next in Thread>