Article 6J08J Does this mean the boot SSD is about to fail?

Does this mean the boot SSD is about to fail?

by
road hazard
from LinuxQuestions.org on (#6J08J)
Was helping a friend with a weird Linux problem, he's using Debian 12.

Firefox would load and immediately crash if you tried to open a web page. I installed Chrome and Brave and they refuse to load. Both give a generic 'bus error'.

I checked in dmesg and saw TONS of entries like this:

Code:[ 21.460501] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 21.460512] ata1.00: irq_stat 0x40000001
[ 21.460515] ata1.00: failed command: READ DMA
[ 21.460516] ata1.00: cmd c8/00:20:90:18:c4/00:00:00:00:00/e7 tag 1 dma 16384 in
res 51/40:20:90:18:c4/00:00:00:00:00/e7 Emask 0x9 (media error)
[ 21.460522] ata1.00: status: { DRDY ERR }
[ 21.460523] ata1.00: error: { UNC }
[ 21.497642] ata1.00: configured for UDMA/133
[ 21.497661] sd 0:0:0:0: [sda] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[ 21.497665] sd 0:0:0:0: [sda] tag#1 Sense Key : Medium Error [current]
[ 21.497666] sd 0:0:0:0: [sda] tag#1 Add. Sense: Unrecovered read error - auto reallocate failed
[ 21.497669] sd 0:0:0:0: [sda] tag#1 CDB: Read(10) 28 00 07 c4 18 90 00 00 20 00
[ 21.497674] I/O error, dev sda, sector 130291856 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
[ 21.497693] ata1: EH complete
[ 21.564458] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 21.564464] ata1.00: irq_stat 0x40000001
[ 21.564466] ata1.00: failed command: READ DMA
[ 21.564466] ata1.00: cmd c8/00:20:90:18:c4/00:00:00:00:00/e7 tag 17 dma 16384 in
res 51/40:20:90:18:c4/00:00:00:00:00/e7 Emask 0x9 (media error)
[ 21.564471] ata1.00: status: { DRDY ERR }
[ 21.564472] ata1.00: error: { UNC }
[ 21.591996] ata1.00: configured for UDMA/133
[ 21.592006] sd 0:0:0:0: [sda] tag#17 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[ 21.592009] sd 0:0:0:0: [sda] tag#17 Sense Key : Medium Error [current]
[ 21.592010] sd 0:0:0:0: [sda] tag#17 Add. Sense: Unrecovered read error - auto reallocate failed
[ 21.592011] sd 0:0:0:0: [sda] tag#17 CDB: Read(10) 28 00 07 c4 18 90 00 00 20 00
[ 21.592012] I/O error, dev sda, sector 130291856 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ 21.592021] ata1: EH complete
[ 21.676370] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 21.676375] ata1.00: irq_stat 0x40000001
[ 21.676377] ata1.00: failed command: READ DMA
[ 21.676377] ata1.00: cmd c8/00:e0:b0:0d:c4/00:00:00:00:00/e7 tag 4 dma 114688 in
res 51/40:e0:b0:0d:c4/00:00:00:00:00/e7 Emask 0x9 (media error)
[ 21.676381] ata1.00: status: { DRDY ERR }
[ 21.676382] ata1.00: error: { UNC }
[ 21.703646] ata1.00: configured for UDMA/133
[ 21.703661] sd 0:0:0:0: [sda] tag#4 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[ 21.703665] sd 0:0:0:0: [sda] tag#4 Sense Key : Medium Error [current]
[ 21.703666] sd 0:0:0:0: [sda] tag#4 Add. Sense: Unrecovered read error - auto reallocate failed
[ 21.703668] sd 0:0:0:0: [sda] tag#4 CDB: Read(10) 28 00 07 c4 0d b0 00 00 e0 00
[ 21.703669] I/O error, dev sda, sector 130289072 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 2
[ 21.703680] ata1: EH complete
[ 21.788436] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 21.788446] ata1.00: irq_stat 0x40000001
[ 21.788450] ata1.00: failed command: READ DMA
[ 21.788453] ata1.00: cmd c8/00:08:08:0e:c4/00:00:00:00:00/e7 tag 13 dma 4096 in
res 51/40:08:08:0e:c4/00:00:00:00:00/e7 Emask 0x9 (media error)
[ 21.788463] ata1.00: status: { DRDY ERR }
[ 21.788465] ata1.00: error: { UNC }
[ 21.815857] ata1.00: configured for UDMA/133
[ 21.815877] sd 0:0:0:0: [sda] tag#13 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[ 21.815883] sd 0:0:0:0: [sda] tag#13 Sense Key : Medium Error [current]
[ 21.815886] sd 0:0:0:0: [sda] tag#13 Add. Sense: Unrecovered read error - auto reallocate failed
[ 21.815889] sd 0:0:0:0: [sda] tag#13 CDB: Read(10) 28 00 07 c4 0e 08 00 00 08 00
[ 21.815891] I/O error, dev sda, sector 130289160 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ 21.815911] ata1: EH complete
[ 21.876511] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 21.876520] ata1.00: irq_stat 0x40000001
[ 21.876526] ata1.00: failed command: READ DMA
[ 21.876528] ata1.00: cmd c8/00:08:08:0e:c4/00:00:00:00:00/e7 tag 27 dma 4096 in
res 51/40:08:08:0e:c4/00:00:00:00:00/e7 Emask 0x9 (media error)I then installed smartctl:

Code:smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-17-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: WD Blue SA510 2.5 250GB
Serial Number: 22464U800081
LU WWN Device Id: 5 001b44 8bd7281be
Firmware Version: 52020100
User Capacity: 250,059,350,016 bytes [250 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic
Device is: Not in smartctl database 7.3/5319
ATA Version is: ACS-4, ACS-2 T13/2015-D revision 3
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jan 19 19:52:12 2024 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 48) A fatal error or unknown test error
occurred while the device was executing
its self-test routine and the device
was unable to complete the self-test
routine.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x71) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 10) minutes.
Conveyance self-test routine
recommended polling time: ( 1) minutes.

SMART Attributes Data Structure revision number: 0
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 010 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 316
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 14
165 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 9994578627391
166 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 107
167 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 15
168 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 153
169 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 50
170 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
171 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
173 Unknown_Attribute 0x0032 100 100 005 Old_age Always - 130
174 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 097 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 418
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 100 100 014 Old_age Always - 29 (Min/Max 18/45)
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
230 Unknown_SSD_Attribute 0x0032 100 100 000 Old_age Always - 3655155712851
232 Available_Reservd_Space 0x0033 100 100 004 Pre-fail Always - 95
233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 31008
234 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 17794
241 Total_LBAs_Written 0x0030 253 253 000 Old_age Offline - 16729
242 Total_LBAs_Read 0x0030 253 253 000 Old_age Offline - 8823
244 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0

SMART Error Log Version: 0
Warning: ATA error count 418 inconsistent with error log pointer 5

ATA Error Count: 418 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 418 occurred at disk power-on lifetime: 316 hours (13 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 e8 26 db ee Error: UNC at LBA = 0x0edb26e8 = 249243368

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 e8 26 db ee 08 23:00:54.750 READ DMA
47 00 01 00 00 00 a0 08 23:00:54.750 READ LOG DMA EXT
47 00 01 30 08 00 a0 08 23:00:54.750 READ LOG DMA EXT
47 00 01 30 00 00 a0 08 23:00:54.750 READ LOG DMA EXT
47 00 01 00 00 00 a0 08 23:00:54.750 READ LOG DMA EXT

Error 417 occurred at disk power-on lifetime: 316 hours (13 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 e8 26 db ee Error: UNC at LBA = 0x0edb26e8 = 249243368

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 e8 26 db ee 08 23:00:54.630 READ DMA
c8 00 08 e0 26 db ee 08 23:00:54.630 READ DMA
c8 00 08 d8 26 db ee 08 23:00:54.630 READ DMA
c8 00 08 d0 26 db ee 08 23:00:54.630 READ DMA
c8 00 08 c8 26 db ee 08 23:00:54.630 READ DMA

Error 416 occurred at disk power-on lifetime: 316 hours (13 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 f8 26 db ee Error: UNC at LBA = 0x0edb26f8 = 249243384

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 18 f8 26 db ee 08 23:00:54.525 READ DMA
47 00 01 00 00 00 a0 08 23:00:54.525 READ LOG DMA EXT
47 00 01 30 08 00 a0 08 23:00:54.525 READ LOG DMA EXT
47 00 01 30 00 00 a0 08 23:00:54.525 READ LOG DMA EXT
47 00 01 00 00 00 a0 08 23:00:54.525 READ LOG DMA EXT

Error 415 occurred at disk power-on lifetime: 316 hours (13 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 d8 e8 26 db ee Error: UNC 216 sectors at LBA = 0x0edb26e8 = 249243368

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 e8 10 26 db ee 08 23:00:54.415 READ DMA
c8 00 00 78 1b db ee 08 23:00:54.415 READ DMA
c8 00 00 90 19 db ee 08 23:00:54.415 READ DMA
c8 00 00 20 14 db ee 08 23:00:54.415 READ DMA
c8 00 a8 78 13 db ee 08 23:00:54.415 READ DMA

Error 414 occurred at disk power-on lifetime: 316 hours (13 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 a8 2a db ee Error: UNC at LBA = 0x0edb2aa8 = 249244328

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 a8 2a db ee 08 23:00:42.675 READ DMA
ca 00 08 d8 0a e3 e0 08 23:00:42.675 WRITE DMA
47 00 01 00 00 00 a0 08 23:00:42.675 READ LOG DMA EXT
47 00 01 30 08 00 a0 08 23:00:42.675 READ LOG DMA EXT
47 00 01 30 00 00 a0 08 23:00:42.675 READ LOG DMA EXT

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 30% 316 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.He's only had this WD SSD for a couple of months. Do all these signs point to a drive that's about to die?
External Content
Source RSS or Atom Feed
Feed Location https://feeds.feedburner.com/linuxquestions/latest
Feed Title LinuxQuestions.org
Feed Link https://www.linuxquestions.org/questions/
Reply 0 comments