Article 5FBBB Finding DIMM with ECC Error

Finding DIMM with ECC Error

by
America's Sweetheart
from LinuxQuestions.org on (#5FBBB)
Hi,

One of the DIMMs in my system had an ECC error:

Code:[ 5015.808246] mce: [Hardware Error]: Machine check events logged
[ 5015.808250] [Hardware Error]: Corrected error, no action required.
[ 5015.808254] [Hardware Error]: CPU:2 (17:31:0) MC18_STATUS[-|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|Scrub]: 0x9c2041000000011b
[ 5015.808260] [Hardware Error]: Error Addr: 0x000000074f879740
[ 5015.808261] [Hardware Error]: IPID: 0x0000009600550f00, Syndrome: 0xe4da80000a800603
[ 5015.808263] [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.
[ 5015.808279] EDAC MC0: 1 CE on mc#0csrow#3channel#5 (csrow:3 channel:5 page:0x1d7e1e5 offset:0xd40 grain:64 syndrome:0x8000)
[ 5015.808280] [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RDSo, it corrected the error. However, out of curiosity, I tried to find the DIMM this is referencing, in case I have to replace a bad DIMM in the future, but was unable to do so.

My motherboard is an ASUS ROG Zenith II Extreme Alpha. It has eight DIMM slots (A1, A2, B1, B2, C1, C2, D1, and D2). The manual says the channels are A, B, C, and D. I have DIMMs installed in slots A1, B1, C1, and D1. Here is a link to the manual. Page 1-5 goes over memory.

I looked through /sys/ and found two csrow directories. They're csrow2 and csrow3. Here is a full listing of ls /sys/devices/system/edac/mc/mc0/:

Code:# ls /sys/devices/system/edac/mc/mc0/
total 0
58586 drwxr-xr-x 13 root root 0 Mar 14 23:07 .
18965 drwxr-xr-x 4 root root 0 Mar 14 23:07 ..
58595 -r--r--r-- 1 root root 4.0K Mar 15 00:40 ce_count
58593 -r--r--r-- 1 root root 4.0K Mar 15 00:40 ce_noinfo_count
58773 drwxr-xr-x 3 root root 0 Mar 15 00:40 csrow2
58799 drwxr-xr-x 3 root root 0 Mar 15 00:34 csrow3
58600 -rw-r--r-- 1 root root 4.0K Mar 15 00:40 inject_ecc_vector
58602 --w------- 1 root root 4.0K Mar 15 00:40 inject_read
58598 -rw-r--r-- 1 root root 4.0K Mar 15 00:40 inject_section
58599 -rw-r--r-- 1 root root 4.0K Mar 15 00:40 inject_word
58601 --w------- 1 root root 4.0K Mar 15 00:40 inject_write
58596 -r--r--r-- 1 root root 4.0K Mar 15 00:40 max_location
58589 -r--r--r-- 1 root root 4.0K Mar 15 00:40 mc_name
58603 drwxr-xr-x 2 root root 0 Mar 15 00:40 power
58613 drwxr-xr-x 3 root root 0 Mar 15 00:40 rank18
58633 drwxr-xr-x 3 root root 0 Mar 15 00:40 rank19
58653 drwxr-xr-x 3 root root 0 Mar 15 00:40 rank20
58673 drwxr-xr-x 3 root root 0 Mar 15 00:40 rank21
58693 drwxr-xr-x 3 root root 0 Mar 15 00:40 rank26
58713 drwxr-xr-x 3 root root 0 Mar 15 00:40 rank27
58733 drwxr-xr-x 3 root root 0 Mar 15 00:40 rank28
58753 drwxr-xr-x 3 root root 0 Mar 15 00:40 rank29
58588 --w------- 1 root root 4.0K Mar 15 00:40 reset_counters
58597 -rw-r--r-- 1 root root 4.0K Mar 15 00:40 sdram_scrub_rate
58591 -r--r--r-- 1 root root 4.0K Mar 15 00:40 seconds_since_reset
58590 -r--r--r-- 1 root root 4.0K Mar 15 00:40 size_mb
58594 -r--r--r-- 1 root root 4.0K Mar 15 00:40 ue_count
58592 -r--r--r-- 1 root root 4.0K Mar 15 00:40 ue_noinfo_count
58587 -rw-r--r-- 1 root root 4.0K Mar 15 00:40 uevent

# ls /sys/devices/system/edac/mc/mc0/csrow2
total 0
58773 drwxr-xr-x 3 root root 0 Mar 15 00:46 .
58586 drwxr-xr-x 13 root root 0 Mar 14 23:07 ..
58780 -r--r--r-- 1 root root 4.0K Mar 15 00:46 ce_count
58785 -r--r--r-- 1 root root 4.0K Mar 15 00:46 ch2_ce_count
58781 -rw-r--r-- 1 root root 4.0K Mar 15 00:46 ch2_dimm_label
58786 -r--r--r-- 1 root root 4.0K Mar 15 00:46 ch3_ce_count
58782 -rw-r--r-- 1 root root 4.0K Mar 15 00:46 ch3_dimm_label
58787 -r--r--r-- 1 root root 4.0K Mar 15 00:46 ch4_ce_count
58783 -rw-r--r-- 1 root root 4.0K Mar 15 00:46 ch4_dimm_label
58788 -r--r--r-- 1 root root 4.0K Mar 15 00:46 ch5_ce_count
58784 -rw-r--r-- 1 root root 4.0K Mar 15 00:46 ch5_dimm_label
58775 -r--r--r-- 1 root root 4.0K Mar 15 00:46 dev_type
58777 -r--r--r-- 1 root root 4.0K Mar 15 00:46 edac_mode
58776 -r--r--r-- 1 root root 4.0K Mar 15 00:46 mem_type
58789 drwxr-xr-x 2 root root 0 Mar 15 00:46 power
58778 -r--r--r-- 1 root root 4.0K Mar 15 00:46 size_mb
58779 -r--r--r-- 1 root root 4.0K Mar 15 00:46 ue_count
58774 -rw-r--r-- 1 root root 4.0K Mar 15 00:46 uevent
# ls /sys/devices/system/edac/mc/mc0/csrow3
total 0
58799 drwxr-xr-x 3 root root 0 Mar 15 00:46 .
58586 drwxr-xr-x 13 root root 0 Mar 14 23:07 ..
58806 -r--r--r-- 1 root root 4.0K Mar 15 00:46 ce_count
58811 -r--r--r-- 1 root root 4.0K Mar 15 00:46 ch2_ce_count
58807 -rw-r--r-- 1 root root 4.0K Mar 15 00:46 ch2_dimm_label
58812 -r--r--r-- 1 root root 4.0K Mar 15 00:46 ch3_ce_count
58808 -rw-r--r-- 1 root root 4.0K Mar 15 00:46 ch3_dimm_label
58813 -r--r--r-- 1 root root 4.0K Mar 15 00:46 ch4_ce_count
58809 -rw-r--r-- 1 root root 4.0K Mar 15 00:46 ch4_dimm_label
58814 -r--r--r-- 1 root root 4.0K Mar 15 00:46 ch5_ce_count
58810 -rw-r--r-- 1 root root 4.0K Mar 15 00:46 ch5_dimm_label
58801 -r--r--r-- 1 root root 4.0K Mar 15 00:46 dev_type
58803 -r--r--r-- 1 root root 4.0K Mar 15 00:46 edac_mode
58802 -r--r--r-- 1 root root 4.0K Mar 15 00:46 mem_type
58815 drwxr-xr-x 2 root root 0 Mar 15 00:46 power
58804 -r--r--r-- 1 root root 4.0K Mar 15 00:46 size_mb
58805 -r--r--r-- 1 root root 4.0K Mar 15 00:46 ue_count
58800 -rw-r--r-- 1 root root 4.0K Mar 15 00:46 ueventIs "ch2" is really Channel A? I would assume csrow2 corresponds to the first DIMM in each slot, but the second slots are empty in my motherboard. So, I'm not sure what to make of the csrow number. I tried to install mcelog because I hear that makes locating things easier, but it said this "mcelog: ERROR: AMD Processor family 23: mcelog does not support this processor. Please use the edac_mce_amd module instead. CPU is unsupported."

Thanks,

Richlatest?d=yIl2AUoC8zA latest?i=bQxE6j7XcVY:XGljKRa-5ZU:F7zBnMy latest?i=bQxE6j7XcVY:XGljKRa-5ZU:V_sGLiP latest?d=qj6IDK7rITs latest?i=bQxE6j7XcVY:XGljKRa-5ZU:gIN9vFwbQxE6j7XcVY
External Content
Source RSS or Atom Feed
Feed Location https://feeds.feedburner.com/linuxquestions/latest
Feed Title LinuxQuestions.org
Feed Link https://www.linuxquestions.org/questions/
Reply 0 comments