[GTALUG] war story: power failure makes CentOS system unbootable
William Park
opengeometry at yahoo.ca
Sat Jul 8 02:20:19 EDT 2017
Another reason why I don't use UEFI is that the boot manager is written
to motherboard's non-volatile memory (what used to be "BIOS"). So, you
can't simply take the harddisk and put into another machine. Found out
the hard way...
--
William
On Thu, Jul 06, 2017 at 01:12:11PM -0400, D. Hugh Redelmeier via talk wrote:
> tl;dr / spoiler: change fstab so that /boot/efi is not automatically
> mounted. See the recommendation at the end of this message.
>
> My gateway computer is a little PC running CentOS.
> It does not come back after a power failure.
> The reason (as best I can tell) is interesting and I think that I have a
> fix.
>
> My system is UEFI. It boots from a UEFI system partition, known as
> /boot/efi to Linux. If this gets corrupted, it won't boot. It is a
> VFAT partition.
>
> On my gateway (a Zotac Zbox RI321, a cute little box with two ethernet
> ports), the UEFI firmware apparently won't boot if the dirty bit is
> set in the system partition.
>
> CentOS normally runs with /boot/efi mounted. So when the system isn't
> shut down cleanly, the dirty bit will be on.
>
> Consequence: the system is not going to boot after a power failure.
>
> Odd observation 1: if I put a live Fedora 25 USB stick in the system
> and try to boot in this otherwise-unbootable state, CentOS boots from
> the main disk. So this really looks like a firmware bug/quirk.
>
> Odd observation 2: fsck doesn't seem to automatically fix the system
> partition. Once CentOS 7 booted, I dismounted /boot/efi and
> did an fsck on it.
>
> $ sudo fsck /dev/sda1
> fsck from util-linux 2.23.2
> fsck.fat 3.0.20 (12 Jun 2013)
> 0x25: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.
> 1) Remove dirty bit
> 2) No action
> ? 1
> Leaving filesystem unchanged.
> /dev/sda1: 16 files, 2420/51145 clusters
>
> Googling got me to <https://www.centos.org/forums/viewtopic.php?t=50917>
> In particular, this advice seemed quite good:
>
> ==== recommendation ====
>
> Two other changes I recommend for UEFI systems, to each OS's /etc/fstab.
>
> - For the /boot/efi mountpoint, add the mount options
> x-systemd.automount,noauto
>
> - Change fs_passno (last column) to a 1 or 2; the canonical fstab
> instructions suggest 2, but systemd treats 1 and 2 as the same.
>
> The first change means the EFI System partition will not be automatically
> read-write mounted at boot time; it's a bad idea this is the default
> because it puts the ESP at risk especially if ther are crashes, FAT has no
> journal and will therefore always be marked dirty in such a case; no other
> UEFI OS mounts the ESP by default. Second, if anything tries to access
> /boot/efi (read or write), systemd will automatically mount it, and
> because of the fs_passno 1 or 2, it will fsck it first and that fixes and
> clears the dirty bit in case it's set.
>
> Right now without these changes, it's just a matter of having the right
> number and bad timing of crashes to render the EFI System partition
> corrupt.
> ---
> Talk Mailing List
> talk at gtalug.org
> https://gtalug.org/mailman/listinfo/talk
More information about the talk
mailing list