http://www.cs.toronto.edu/~bianca/paper ... rics09.pdf
it claims to be "the first large-scale study of DRAM memory errors in the field," and it "covers the majority of machines in Google's fleet and spans nearly 2.5 yrs," which adds up to "many millions of DIMM days."
here are some excerpts from their conclusions:
- - the incidence of memory errors was much higher than was found by previous (smaller scale) studies.
- - about a third of machines have ECC-correctable memory errors in a given year.
- - a DIMM that has one correctable error is much more likely to have additional errors.
- - error rates increase as components age.