On 4/18/13 8:23 AM, 符永涛 wrote:
> Hi Brian and Eric,
> The shutdown is not easy to produce but finally right now 2 of our servers in
> our test cluster xfs was shutdown.
>
> the trace output as following
> https://docs.google.com/file/d/0B7n2C4T5tfNCLXRYUWJ0b19JcWc/edit?usp=sharing
>
here's something interesting, for 2 inodes we have double/racing calls to
xfs_iunlink:
=== 0x5cc0b ===
<...>-8336 [004] 6931.372924: xfs_iunlink: dev 8:16 ino 0x5cc0b
<...>-8336 [004] 6931.372965: xfs_iunlink_remove: dev 8:16 ino
0x5cc0b
<...>-27541 [001] 35061.349747: xfs_iunlink: dev 8:16 ino 0x5cc0b
<...>-3356 [001] 36449.762504: xfs_iunlink_remove: dev 8:16 ino
0x5cc0b
<...>-3300 [003] 41013.398566: xfs_iunlink: dev 8:16 ino 0x5cc0b
<...>-26115 [012] 41013.399884: xfs_iunlink: dev 8:16 ino 0x5cc0b
<...>-26115 [012] 41013.399935: xfs_iunlink_remove: dev 8:16 ino
0x5cc0b
<...>-28961 [000] 68977.951208: xfs_iunlink: dev 8:16 ino 0x5cc0b
<...>-3364 [021] 81616.210533: xfs_iunlink_remove: dev 8:16 ino
0x5cc0b
=== 0x7ef8c ===
<...>-13169 [001] 118751.536025: xfs_iunlink: dev 8:16 ino 0x7ef8c
<...>-13169 [001] 118751.536049: xfs_iunlink_remove: dev 8:16 ino
0x7ef8c
<...>-3594 [015] 119027.006161: xfs_iunlink: dev 8:16 ino 0x7ef8c
<...>-3594 [015] 119027.006186: xfs_iunlink_remove: dev 8:16 ino
0x7ef8c
<...>-3591 [001] 121423.286004: xfs_iunlink: dev 8:16 ino 0x7ef8c
<...>-4141 [019] 121423.288518: xfs_iunlink: dev 8:16 ino 0x7ef8c
<...>-4141 [019] 121423.288541: xfs_iunlink_remove: dev 8:16 ino
0x7ef8c
2 threads on 2 different CPUs adding the same inode to the unlinked list in a
race;
this will corrupt the list and lead to the failure to find the other inode we're
looking for. So, progress! We'll take a look at the iunlink paths.
-Eric
|