[Tfug] ECC (was Re: Using a Laptop as a server)
Bexley Hall
bexley401 at yahoo.com
Thu Mar 14 13:35:37 MST 2013
Hi Zack,
On 3/14/2013 9:01 AM, Zack Williams wrote:
> On Thu, Mar 14, 2013 at 7:44 AM, Louis Taber<ltaber at gmail.com> wrote:
>> Does Linux, by default, log ECC errors? If so where? If not, how logging
>> be turned on?
>
> Shows up in the system log - the Linux kernel driver that reads error
> codes is named "edac". I've logged a fair number of main DRAM and L2
> and L3 cache ECC errors on my system, probably once ever 2-3 months
> per 64GB of active memory across all systems.
So, what does this tell you in terms of the quality/reliability of your
system? When do you start getting nervous? Statistically, a device
that throws an error is more likely to throw *more* errors in the
future. [Unless the source of the errors is the memory infrastructure
and not the memory (device) itself.]
How do you develop/implement *policy* for dealing with these numbers?
Or, do they just serve a "blinkenlichten" role?
> I view ECC as a safety net. Not needed if you're walking the high
> wire, but invaluable if you do need it.
More information about the tfug
mailing list