Article 5JQRK Silent Data Corruption in Multicore CPUs?

Silent Data Corruption in Multicore CPUs?

by
business_kid
from LinuxQuestions.org on (#5JQRK)
Here's a link with 2 references and a summary but google scholar has plenty more
https://hardware.slashdot.org/story/...re-modern-cpus

Basically, it seems that modern high-density fab is causing sporadic errors in CPUs and these are being noticed in big data centres.

As a hardware guy, I know it's a testing nightmare. Testing at max temperature and minimum voltage may help, but may not. These would typically heavily cooled 250W or 280W packages, and temperature uniformity throughout can only be modeled, not measured. Also, the uniformity of doping could be an issue. "Doping" mixes pure silicon with a tiny percentage of atoms with 1 electron more(negative doping), or 1 less (positive doping). Lastly, any manufacturing imperfection would do it. CPUs have an extremely low manufacturing pass rate anyhow.

What I can also imagine is the staggering amount of time required to decide core 59 is dodgy, but not 58 or 60. I'm interested in proposed solutions, because nobody seems to have any. I thought about options to disable cores, but once you find the suspect box, the cheapest practical thing is to replace the CPU or indeed the box.latest?d=yIl2AUoC8zA latest?i=_Hxxfc82rrY:EMzkGSrNxUs:F7zBnMy latest?i=_Hxxfc82rrY:EMzkGSrNxUs:V_sGLiP latest?d=qj6IDK7rITs latest?i=_Hxxfc82rrY:EMzkGSrNxUs:gIN9vFw_Hxxfc82rrY
External Content
Source RSS or Atom Feed
Feed Location https://feeds.feedburner.com/linuxquestions/latest
Feed Title LinuxQuestions.org
Feed Link https://www.linuxquestions.org/questions/
Reply 0 comments