[Tfug] ECC (was Re: Using a Laptop as a server)

Zack Williams zdwzdw at gmail.com
Thu Mar 14 17:12:02 MST 2013


On Thu, Mar 14, 2013 at 4:55 PM, Bexley Hall <bexley401 at yahoo.com> wrote:
> How would you "set policy" so a "flunky" would know when the error
> rate is "too much" and could take corrective action?  (i.e., how
> would you have the *machine* decide when it's integrity is
> compromised -- or, soon to be?)

That's a value judgement so everyone's would be different - something
like any repeated, identical correctable error that happens more than
once every 3 months would be my criteria for replacement.   Multiple
non-correctable errors in the same unit, even if they aren't identical
would probably prompt for replacement as well.  This could be written
up as policy, assuming that said flunky could interpret the log
information correctly.

It's pointless to fix something that might not be at fault.  Given the
Sun example, all the swapping of parts never fixed the root problem
until they isolated it.  In the same way, swapping out perfectly fine
memory/CPU's that happened to hit the cosmic rays/radon bit flip
jackpot once in a while isn't going to solve a problem.

- Zack




More information about the tfug mailing list