Article 51R6A help determining RAID fail event

help determining RAID fail event

by
dimm0k
from LinuxQuestions.org on (#51R6A)
I recently received an email stating a "FAIL event has been detected on md device /dev/md0" whereby it looks like it went back to resyncing everything from the good drive, but upon checking smartctl there does not seem to indicate anything issues with the bad drive so I'm wondering if this needs to be investigated further or if it was a false alarm. if it is, how can I get rid of the flag in mdadm that states the drive is faulty?

here's some relevant information when this event was triggered
Code:A Fail event has been detected on md device /dev/md0.

The device /dev/sdc1 may be involved.

Contents of /proc/mdstat:
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md0 : active raid1 sdb1[0] sdc1[1](F)
2930162552 blocks super 1.2 [2/1] [U_]
[=====>...............] resync = 28.3% (830933760/2930162552) finish=7399.8min speed=4728K/sec

unused devices: <none>

Contents of mdadm --detail
/dev/md0:
Version : 1.2
Creation Time : Tue Aug 2 10:36:53 2011
Raid Level : raid1
Array Size : 2930162552 (2794.42 GiB 3000.49 GB)
Used Dev Size : 2930162552 (2794.42 GiB 3000.49 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Update Time : Fri Apr 3 00:17:37 2020
State : active, degraded, resyncing
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0

Resync Status : 99% complete

Name : defiant:0 (local to host defiant)
UUID : a043a371:530d4c99:daed879a:904c0e11
Events : 1382

Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 33 1 faulty /dev/sdc1

Contents of dmesg:
[ 6034.544287] ata2.01: configured for UDMA/133
[ 6036.687034] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 6038.086733] ata1.00: configured for UDMA/133
[ 6046.837423] PM: resume of devices complete after 13097.208 msecs
[ 6046.861968] Restarting tasks ... done.
[ 6046.905404] md: checkpointing resync of md0.
[ 6046.970037] RAID1 conf printout:
[ 6046.970045] --- wd:1 rd:2
[ 6046.970053] disk 0, wo:0, o:1, dev:sdb1
[ 6046.970062] disk 1, wo:1, o:0, dev:sdc1also, here's what smartctl has to say for the drive in question
Code:smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.4.208] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Hitachi Deskstar 5K3000
Device Model: Hitachi HDS5C3030ALA630
Serial Number: MJ1321YNG17PEA
LU WWN Device Id: 5 000cca 228c0913e
Firmware Version: MEAOA580
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 5700 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Sun Apr 5 12:30:53 2020 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (38166) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 636) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 134 134 054 Pre-fail Offline - 109
3 Spin_Up_Time 0x0007 220 220 024 Pre-fail Always - 273 (Average 362)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 86
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 132 132 020 Pre-fail Offline - 32
9 Power_On_Hours 0x0012 092 092 000 Old_age Always - 58611
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 48
192 Power-Off_Retract_Count 0x0032 099 099 000 Old_age Always - 1817
193 Load_Cycle_Count 0x0012 099 099 000 Old_age Always - 1817
194 Temperature_Celsius 0x0002 193 193 000 Old_age Always - 31 (Min/Max 19/48)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.latest?d=yIl2AUoC8zA latest?i=sAltdUqdgnI:h2-a-H4VhaU:F7zBnMy latest?i=sAltdUqdgnI:h2-a-H4VhaU:V_sGLiP latest?d=qj6IDK7rITs latest?i=sAltdUqdgnI:h2-a-H4VhaU:gIN9vFwsAltdUqdgnI
External Content
Source RSS or Atom Feed
Feed Location https://feeds.feedburner.com/linuxquestions/latest
Feed Title LinuxQuestions.org
Feed Link https://www.linuxquestions.org/questions/
Reply 0 comments