📜 ⬆️ ⬇️

The problem of simultaneous use of kernel raid autodetect for / on / dev / md0 and superblock v1.2 for other / dev / md, or how can I drop (and raise) the server after updating it


Thank you for reading the title. It was a test.


Today, after the next update to your favorite Gentoo server and preventive reboot, suddenly / dev / md1 fell off with the words of a wise kernel: sdc1 doesn’t have a valid v0.90 superblock, not importing!

Shock! Panic! Well, that is not in the core ...

And what's the matter?


To begin with, I’ll talk about the server configuration to make it easier to understand the essence of the problem and how to solve it. So, the kernel is 3.10.7 with RAID autodetect enabled and two RAID1 (mirror) disks.
')
On / dev / md0 is mounted root, on / dev / md1 database (Percona):
db13 ~ # cat /etc/fstab | grep md /dev/md0 / ext3 noatime 0 1 /dev/md1 /mnt/db reiser4 noatime 0 0 

And a piece of /boot/grub/grub.conf:
 title Gentoo Linux 3.10.7 md0 root (hd0,0) kernel /boot/kernel-3.10.7 root=/dev/md0 

So, for successful assembly of md devices by the kernel at boot time, two conditions must be met:
  1. Type 0xFD at partitions on which RAID is built
  2. The version of the superblock format 0.90 on the device / dev / md, which is created using mdadm

If with item 1. everything was fine in my configuration, as it turned out, the superblock format was 1.2 I suspect that I created / dev / md1 after a new version of mdadm arrived, which by default uses this format. As a result, the kernel swears in terrible words:
dmesg | grep md
  [0.000000] Command line: root = / dev / md0 raid = / dev / md0
 [0.000000] Kernel command line: root = / dev / md0 raid = / dev / md0
 [1.063603] md: raid1 personality registered for level 1
 [1.266420] md: Waiting for all devices available before autodetect
 [1.266494] md: If you don't use raid, use raid = noautodetect
 [1.266781] md: Autodetecting RAID arrays.
 [1.293670] md: invalid raid superblock magic on sdc1
  [1.294210] md: sdc1 doesn’t have a valid v0.90 superblock, not importing!
 [1.312482] md: invalid raid superblock magic on sdd1
 [1.312556] md: sdd1 doesn’t have a valid v0.90 superblock, not importing! 
 [1.312579] md: Scanned 4 and added 2 devices.
 [1.312595] md: autorun ...
 [1.312610] md: considering sdb3 ...
 [1.312626] md: adding sdb3 ...
 [1.312641] md: adding sda3 ...
 [1.312657] md: created md0
 [1.312665] md: bind <sda3>
 [1.312754] md: bind <sdb3>
 [1.312770] md: running: <sdb3> <sda3>
 [1.313064] md / raid1: md0: active with 2 out of 2 mirrors
 [1.313166] md0: detected capacity change from 0 to 7984840704
 [1.313262] md: ... autorun DONE.
 [1.320413] md0: unknown partition table
 [1.338528] EXT3-fs (md0): mounted filesystem with ordered data mode
 [2.581420] systemd-udevd [861]: starting version 208
 [3.122748] md: bind <sdc1>
 [4.896331] EXT3-fs (md0): using internal journal


When Google doesn't help


The choice is quite small - either to disable autodetection of arrays in the kernel (recompiling and editing in grub.conf), or changing the format of the superblock (full data backup and killing the mirror and then restoring it). Both options are not an option, because they are inherently destructive and can lead to data loss, and they can take a lot of time (as it turned out during the search for the kernel autodetect is depricated feature )

By the way, after the start of sevrer / dev / md1 is perfectly started using the command
  mdadm --manage / dev / md1 --run 
. Of course, one could write this line somewhere in rc-scripts, but, you see, this is somehow not sporty.

Eureka!


The solution did not come immediately, although it was on the surface - all that needs to be done is to remove the 0xFD type (replace with 0x83) from the disks in / dev / md1 and then the kernel will no longer try to build this array without success, making it difficult for udevd to do its work. Indeed, after using fdisk to change the type of partitions on both mirrors and reboot the server, everything miraculously got started:
dmesg | grep md
 [0.000000] Command line: root = / dev / md0 raid = / dev / md0
 [0.000000] Kernel command line: root = / dev / md0 raid = / dev / md0
 [1.063924] md: raid1 personality registered for level 1
 [1.248078] md: waiting for all devices available before autodetect
 [1.248201] md: If you don't use raid, use raid = noautodetect
 [1.248504] md: Autodetecting RAID arrays.
 [1.265058] md: Scanned 2 and added 2 devices.
 [1.265243] md: autorun ...
 [1.265258] md: considering sda3 ...
 [1.265274] md: adding sda3 ...
 [1.265290] md: adding sdb3 ...
 [1.265305] md: created md0
 [1.265321] md: bind <sdb3>
 [1.265331] md: bind <sda3>
 [1.265428] md: running: <sda3> <sdb3>
 [1.265865] md / raid1: md0: active with 2 out of 2 mirrors
 [1.265891] md0: detected capacity change from 0 to 7984840704
 [1.266068] md: ... autorun DONE.
 [1.276627] md0: unknown partition table
 [1.294892] EXT3-fs (md0): mounted filesystem with ordered data mode
  [2.713383] systemd-udevd [860]: starting version 208
 [3.128295] md: bind <sdc1>
 [3.159107] md: bind <sdd1>
 [3.170320] md / raid1: md1: active with 2 out of 2 mirrors
 [3.170333] md1: detected capacity change from 0 to 17170300928
 [3.178113] md1: unknown partition table
 [4.911712] EXT3-fs (md0): using internal journal
 [5.027077] reiser4: md1: found disk format 4.0.0.


I would be glad if Google, having found this text, will show it to my colleagues in the workshop who are in a similar situation.

Source: https://habr.com/ru/post/203652/


All Articles