[Tfug] ECC (was Re: Using a Laptop as a server)
Zack Williams
zdwzdw at gmail.com
Thu Mar 14 14:39:07 MST 2013
On Thu, Mar 14, 2013 at 1:35 PM, Bexley Hall <bexley401 at yahoo.com> wrote:
>
> So, what does this tell you in terms of the quality/reliability of your
> system? When do you start getting nervous? Statistically, a device
> that throws an error is more likely to throw *more* errors in the
> future. [Unless the source of the errors is the memory infrastructure
> and not the memory (device) itself.]
I write them up as an environmental hazard, caused by cosmic rays
(btw, I've always wanted to build a cloud chamber after seeing one at
a science museum), radon, etc. Unless there's some systematic,
repetitive error that I see 2 or more times, I don't view it as a
hardware flaw. Those are the kind of errors I'm seeing.
I've also had cases where I did need to replace memory that was
throwing ECC errors on a daily basis - that's where it's doing it's
job: functioning properly until scheduled replacement can happen (see
also: RAID).
One interesting story that is tangentially related - in the early
2000's Sun released a bunch of processors that had radioactive casings
on the cache chips, which caused these sorts of errors.
http://www.sparcproductdirectory.com/artic-2001-dec-1.html
http://nighthacks.com/roller/jag/entry/at_the_mercy_of_suppliers
- Zack
More information about the tfug
mailing list