| To: | "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>, lists@xxxxxxxxxxxxxxxxx |
|---|---|
| Subject: | Re: xfs Digest, Vol 79, Issue 19 |
| From: | David Raffelt <david.raffelt@xxxxxxxxxxxxx> |
| Date: | Wed, 7 Jan 2015 18:05:14 +1100 |
| Delivered-to: | xfs@xxxxxxxxxxx |
| In-reply-to: | <9cb00ded133a452e9fed635bd0094885@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> |
| References: | <9cb00ded133a452e9fed635bd0094885@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> |
|
Hi Chris, Thanks for your time. I have responded to your suggestions below.  Date: Tue, 6 Jan 2015 19:35:34 -0700 Yes, after the 2 disks were dropped I definitely had a working degraded drive with 5/7 . I only see XFS errors in the kernel log soon AFTER the hot spare finished syncing. Here are the errors. Jan 06 00:00:27 server kernel: XFS (md0): Corruption detected. Unmount and run xfs_repair Jan 06 00:00:27 server kernel: XFS (md0): metadata I/O error: block 0x36b106c00 ("xfs_trans_read_buf_map") error 117 numblks 16 Jan 06 00:00:27 server kernel: XFS (md0): xfs_imap_to_bp: xfs_trans_read_buf() returned error 117. If it helps, about 1 min before the drives were dropped from the array I got many of the below errors in the log. Here is a link to the complete log if needed.Âhttps://dl.dropboxusercontent.com/u/1156508/journalctl_dump.txt Jan 05 11:40:45 server kernel: ata11.00: status: { DRDY } Jan 05 11:40:45 server kernel: ata11.00: cmd 60/10:00:70:60:0c/00:00:71:00:00/40 tag 0 ncq 8192 in res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout) Jan 05 11:40:45 server kernel: ata11.00: failed command: READ FPDMA QUEUED Jan 05 11:40:45 server kernel: ata11.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x6 frozen ... Jan 05 11:41:15 server kernel: ata12: limiting SATA link speed to 3.0 Gbps Jan 05 11:41:15 server kernel: ata12: hard resetting link Jan 05 11:41:16 server kernel: ata12: SATA link up 6.0 Gbps (SStatus 133 SControl 320) Jan 05 11:41:38 server kernel: ata11.00: qc timeout (cmd 0xec) Jan 05 11:41:38 server kernel: ata11.00: failed to IDENTIFY (I/O error, err_mask=0x4) Jan 05 11:41:38 server kernel: ata11.00: revalidation failed (errno=-5) Jan 05 11:41:38 server kernel: ata11.00: disabled Jan 05 11:41:38 server kernel: ata11.00: device reported invalid CHS sector 0 .... Jan 05 11:41:40 server kernel: sd 11:0:0:0: [sdf] Unhandled error code Jan 05 11:41:40 server kernel: sd 11:0:0:0: [sdf]  Jan 05 11:41:40 server kernel: Result: hostbyte=0x04 driverbyte=0x00 Jan 05 11:41:40 server kernel: sd 11:0:0:0: [sdf] CDB: Jan 05 11:41:40 server kernel: cdb[0]=0x88: 88 00 00 00 00 00 71 0c 60 40 00 00 00 10 00 00 Jan 05 11:41:40 server kernel: end_request: I/O error, dev sdf, sector 1896636480 ... Jan 05 11:41:40 server kernel: sd 11:0:0:0: [sdf] Unhandled error code Jan 05 11:41:40 server kernel: sd 11:0:0:0: [sdf]  Jan 05 11:41:40 server kernel: Result: hostbyte=0x04 driverbyte=0x00 Jan 05 11:41:40 server kernel: sd 11:0:0:0: [sdf] CDB: Jan 05 11:41:40 server kernel: cdb[0]=0x88: 88 00 00 00 00 00 7f 00 0b 00 00 00 00 08 00 00 Jan 05 11:41:40 server kernel: md: super_written gets error=-5, uptodate=0 Jan 05 11:41:40 server kernel: md/raid:md0: Disk failure on sdf1, disabling device.                          md/raid:md0: Operation continuing on 6 devices.  It's not rare for smart to not test for certain failure vectors so it ÂI get all the information from all 8 drives (7 + the hot spare). The 6 drives currently in the array all have the same event number (80664). However the 2 drives that were dropped have the event numberÂ80327. Â
Not sure if I'm missing something, but the array is already assembled with 6/7 drives all having the same event number. In any case I have stopped the array, and assembled again with no luck. Â
I'm ashamed to say that this command only works on 1 of the 8 drives since this is the only enterprise class drive (we are funded by small science grants). We have been gradually replacing the desktop class drives as they fail. SCT Error Recovery Control:      ÂRead:   70 (7.0 seconds)      Write:   70 (7.0 seconds) Â
All devices are set to 30 seconds. Â
Just to confirm, we have 3x15TB bricks in a 45TB volume. Don't we need complete duplication in a distributed-replicated Gluster volume, or can we get away with only 1 more brick? Those HGST 8GB drives do look very tempting! Thanks again, Dave
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | Re: What is a recommended XFS sector size for hybrid (512e) advanced format hard drives?, Chris Murphy |
|---|---|
| Next by Date: | 1:1 high quality laptop adapter from factory with good good price!!, kaleitu |
| Previous by Thread: | Wish you a happy new year!, bymuu |
| Next by Thread: | 1:1 high quality laptop adapter from factory with good good price!!, kaleitu |
| Indexes: | [Date] [Thread] [Top] [All Lists] |