Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Your math is completely off. a 100x10TB RAID6, with a failed disk need to read 990TB of data to rebuild in the case of a failed disk. With an URE of 1 in 10^14 you will see 79.2 URE events on average if the URE rate is correct (again, I don't believe it is) during single a rebuild - this is the reason no serious engineer recommends a RAID6 for large arrays.

In the case of a RAID1, noone uses 100 mirrored drives. You use RAID10, and in the case of a failed disk, you must read 10TB to recover. With the same URE, we'd see on average 8 URE for every 10 rebuilds, or around 2 orders of magnitude less failure rate compared to the RAID6 example.



Your logic is sadly incorrect.

During a RAID6 rebuild, a URE is non-critical as the Array can recover the data with one lost disk an a URE on any other disk during the stripe rebuild.

The only critical error would be a URE on two disk on the same stripe, 80 URE's during a 990TB rebuild have an amazingly low chance of having two UREs on the same stripe on two seperate disks.

In case of the RAID10, you get 8 URE's over 10 rebuilds, which aren't recoverable unless you have 3 disks. So you'll corrupt data.

edit: URE of 10^14 is what most vendors specify for consumer harddrives, 10^16 is closer to what people encounter in the real world but 10^14 is considered the worst case URE rate.


Good point with the URE on a RAID6, but that still doesn't make it superiour. The strain of rebuild have been known to kill many arrays, both RAID5 and 6.

URE does not have to corrupt data, if you use a proper filesystem with checksumming such as the ZFS.

When a disk fails, a RAID10 is simply in a far better position as it only have to read a single disk, and it doesn't have any complicated striping to worry about. Just clone a disk.


>URE does not have to corrupt data, if you use a proper filesystem with checksumming such as the ZFS.

No but afaik there is no way to recover data once ZFS has declared it corrupted. (ie, no parity)

>The strain of rebuild have been known to kill many arrays, both RAID5 and 6.

I haven't actually encountered that yet. Despite that, a RAID 6 can loose a disk, so as long as you don't encounter further URE's after loosing another disk, it's fine.

If you're worried about that, go for RAIDZ3 or equivalent. With something like SnapRAID you can even have a RAIDZ6, loosing 6 disks without loosing data. The chances of that happening are relatively low.

>When a disk fails, a RAID10 is simply in a far better position as it only have to read a single disk

A RAID 10 is in no position to recover from URE's once a disk has failed unless you reduce your space efficiency to 33%.

I personally favor not corrupting data over rebuild speeds.

Striping might be complicated but that doesn't make it worse.

It might be acceptable too loose a music file, but once the family image collection gets corrupted or even lost on ZFS because a disk in a RAID 1 encountered a URE, it's personal.

I'd rather life with the thought that even if a disk has a URE, the others can cover for it. Even during a rebuild.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: