Unable to import pool due to mdadm #11349
-
Linux system was working and VM was running on the zpool before reboot. After reboot zpool did not import. state: UNAVAIL It's not the end of the world to lose this data (and I even have some older backups) but its a "pain" nonetheless. If the zfs zpool was running fine before reboot, would it have been impossible for zfs to cache/copy any memory stored data that enabled it to keep the pool running, to copy it to some non-zfs partition and re-use that info to re-import the pool on reboot? Even a warning that zpool is messed up and might die on reboot would have been helpful. I (wrongly?) assumed that zfs would "actively warn us" of any huge "problems" etc. |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments
-
To me it looks like sda3 and sdb3 are not available. Can you check if sda and sdb are present and if there is a third partition? Also, it looks like you are using a striped for your pool. While this is doable, i would recommend frequent backups as losing one disk means losing all the data. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the reply, actually I thought Unavail was because of corrupted data and unable to put halves together. Also smartctl on sda was saying FAILED ie replace immediately going to die in 24hours. But guess what? Smartctl still says sda is dying but I was able to import the pool no problem after all. So what happened? Well apparently after the reboot, mdadm snapped up sda3 and sdb3 and assembled a striped array! Luckily, after simply stopping that unused mdadm stripe array that I hadn't noticed, zpool import worked. |
Beta Was this translation helpful? Give feedback.
-
Presumably sda3 and sdb3 must have previously been part of a md striped array which is why it claimed them. Both md and zfs want exclusive access to the block devices which is why they ended up being reported as |
Beta Was this translation helpful? Give feedback.
-
I don't believe they were previously part of an mdadm array (but I can't 100% rule out), but when the partitions were made, perhaps the "type" of partition label had something to do with it, what partition type label should be used for zfs? (or numerical value id as per fdisk etc?)(update: seems partitions were set to "fd" not 83) Also, strangely could ZFS somehow be impacting "network speed"? After reboot, before bringing back the zpool with zpool import, bandwidth went back to usual over 88MB/s and now after zpool import its back to some strange 1.1MB/second limit. Not sure if related to the zpool having a failing disk as part of the array. But could zfs somehow affect the network bandwidth some strange way or is it related to smartctl saying sda failing? Fyi for testing I was using wget -O /dev/null http... seemed strange that before zpool import bandwidth was good. But perhaps coincidence and some other issue... |
Beta Was this translation helpful? Give feedback.
-
It is most likely the disk. Remember you have a STRIPED pool. So every read touches both disks. And a failing disk might need to read defective sectors over and over to be able to return data. My recommendation is: Backup all the important data NOW. Do not waste time. And then replace the disk. Chances are high that you will encounter unreadable sectors which will then corrupt some data. |
Beta Was this translation helpful? Give feedback.
-
Was busy so forgot to update, yea you're right that perhaps write or even read operations on failing disk could of caused some issue that slowed down networking (via general slowdown), I did confirm zpool import itself does not kill the bandwidth so must of been co-incidental and secondary to load or other issues related to dying disk. I'm finishing up backups etc before replacing drive (its in datacenter so can't just add spare quick). |
Beta Was this translation helpful? Give feedback.
Thanks for the reply, actually I thought Unavail was because of corrupted data and unable to put halves together. Also smartctl on sda was saying FAILED ie replace immediately going to die in 24hours. But guess what? Smartctl still says sda is dying but I was able to import the pool no problem after all. So what happened?
Well apparently after the reboot, mdadm snapped up sda3 and sdb3 and assembled a striped array!
Luckily, after simply stopping that unused mdadm stripe array that I hadn't noticed, zpool import worked.
I'm trying to copy over the files now (getting some in/out errors) before swapping out the failing sda drive.