[GTALUG] war story: power failure makes CentOS system unbootable

William Park opengeometry at yahoo.ca
Sat Jul 8 02:20:19 EDT 2017


Another reason why I don't use UEFI is that the boot manager is written
to motherboard's non-volatile memory (what used to be "BIOS").  So, you
can't simply take the harddisk and put into another machine.  Found out
the hard way...
-- 
William

On Thu, Jul 06, 2017 at 01:12:11PM -0400, D. Hugh Redelmeier via talk wrote:
> tl;dr / spoiler: change fstab so that /boot/efi is not automatically
> mounted.  See the recommendation at the end of this message.
> 
> My gateway computer is a little PC running CentOS.
> It does not come back after a power failure.
> The reason (as best I can tell) is interesting and I think that I have a 
> fix.
> 
> My system is UEFI.  It boots from a UEFI system partition, known as
> /boot/efi to Linux.  If this gets corrupted, it won't boot.  It is a
> VFAT partition.
> 
> On my gateway (a Zotac Zbox RI321, a cute little box with two ethernet
> ports), the UEFI firmware apparently won't boot if the dirty bit is
> set in the system partition.
> 
> CentOS normally runs with /boot/efi mounted.  So when the system isn't 
> shut down cleanly, the dirty bit will be on.
> 
> Consequence: the system is not going to boot after a power failure.
> 
> Odd observation 1: if I put a live Fedora 25 USB stick in the system
> and try to boot in this otherwise-unbootable state, CentOS boots from
> the main disk.  So this really looks like a firmware bug/quirk.
> 
> Odd observation 2: fsck doesn't seem to automatically fix the system 
> partition.  Once CentOS 7 booted, I dismounted /boot/efi and
> did an fsck on it.
> 
>     $ sudo fsck /dev/sda1
>     fsck from util-linux 2.23.2
>     fsck.fat 3.0.20 (12 Jun 2013)
>     0x25: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.
>     1) Remove dirty bit
>     2) No action
>     ? 1
>     Leaving filesystem unchanged.
>     /dev/sda1: 16 files, 2420/51145 clusters
> 
> Googling got me to <https://www.centos.org/forums/viewtopic.php?t=50917>
> In particular, this advice seemed quite good:
> 
> ==== recommendation ====
> 
> Two other changes I recommend for UEFI systems, to each OS's /etc/fstab.
> 
> - For the /boot/efi mountpoint, add the mount options 
>   x-systemd.automount,noauto
> 
> - Change fs_passno (last column) to a 1 or 2; the canonical fstab 
>   instructions suggest 2, but systemd treats 1 and 2 as the same.
> 
> The first change means the EFI System partition will not be automatically 
> read-write mounted at boot time; it's a bad idea this is the default 
> because it puts the ESP at risk especially if ther are crashes, FAT has no 
> journal and will therefore always be marked dirty in such a case; no other 
> UEFI OS mounts the ESP by default. Second, if anything tries to access 
> /boot/efi (read or write), systemd will automatically mount it, and 
> because of the fs_passno 1 or 2, it will fsck it first and that fixes and 
> clears the dirty bit in case it's set.
> 
> Right now without these changes, it's just a matter of having the right 
> number and bad timing of crashes to render the EFI System partition 
> corrupt.
> ---
> Talk Mailing List
> talk at gtalug.org
> https://gtalug.org/mailman/listinfo/talk


More information about the talk mailing list