[GTALUG] war story: power failure makes CentOS system unbootable
D. Hugh Redelmeier
hugh at mimosa.com
Thu Sep 29 12:22:08 EDT 2022
I'm continuing a very old thread of mine.
Because of the age, I'm top-posting.
Summary up until now:
- system fails to reboot after power failure
- problem seems to be firmware distaste for dirty bit on ESP (/boot/efi)
- dirty bit always happens because /boot/efi is always mounted, even
though it isn't needed 99% of the time
- mounting-on-demand doesn't fix this because packagekit accesses the ESP
almost immediately. This feels like a bug.
- I disabled packagekit to prevent /boot/efi from being automounted.
Well, it grew back, somehow. Updates, I guess.
Now:
So now I'll use a different approach. I'll get automount to also
autodismount. Oh, the wonders and mysteries provided by systemd!
1. Modify the /etc/fstab entry for /boot/efi to include
"x-systemd.idle-timeout=600"
This asks for dismounting after 600 seconds of inactivity.
2. Get the relevant daemon(s) to pay attention:
systemctl daemon-reload
systemctl restart boot-efi.automount
Note that boot-efi.automount's name is synthesized automatically from
the mount point. So is the target script itself.
3. Hope this work
See systemd.mount(5).
You can ignore or be confused by systemd.automount(5).
| From: D. Hugh Redelmeier <hugh at mimosa.com>
| Date: Sun, 8 Aug 2021 09:47:42 -0400 (EDT)
| Subject: Re: war story: power failure makes CentOS system unbootable
|
| I'm replying to a really old posting I made.
|
| [Mystery: I'm replying to the copy I sent. Formatting is screwed up when
| I reply to the copy I got back from the list server.]
|
| [I'm top posting because I'm just including it whole, for reference. I'm
| sure most of you have deleted it from you mailboxes after 4 years.]
|
| I followed the instructions on how to make /boot/esp mounted on demand,
| but I still have problems with my CentOS box not rebooting after a
| crash because the ESP
| is marked as dirty.
|
| "journalctl -b" says that it is mounting /boot/efi because packagekitd
| is requesting it
|
| Aug 07 10:38:35 reddoor.mimosa.com PackageKit[1855]: daemon start
| Aug 07 10:38:35 reddoor.mimosa.com systemd[1]: Got automount request for /boot/efi, triggered by 1855 (packagekitd)
| Aug 07 10:38:35 reddoor.mimosa.com systemd[1]: Starting File System Check on /dev/disk/by-uuid/A9B4-1C74...
| Aug 07 10:38:35 reddoor.mimosa.com systemd-fsck[1866]: fsck.fat 3.0.20 (12 Jun 2013)
| Aug 07 10:38:35 reddoor.mimosa.com systemd-fsck[1866]: /dev/sda1: 25 files, 2913/51145 clusters
| Aug 07 10:38:35 reddoor.mimosa.com chronyd[724]: Selected source 207.34.48.31
| Aug 07 10:38:35 reddoor.mimosa.com systemd[1]: Started File System Check on /dev/disk/by-uuid/A9B4-1C74.
| Aug 07 10:38:35 reddoor.mimosa.com systemd[1]: Mounting /boot/efi...
| Aug 07 10:38:36 reddoor.mimosa.com spice-vdagent[1942]: Cannot access vdagent virtio channel /dev/virtio-ports/com.redhat.spice.0
| Aug 07 10:38:36 reddoor.mimosa.com gnome-session-binary[1460]: WARNING: App 'spice-vdagent.desktop' exited with code 1
| Aug 07 10:38:36 reddoor.mimosa.com gnome-session-binary[1460]: Entering running state
| Aug 07 10:38:36 reddoor.mimosa.com gnome-session[1460]: gnome-session-binary[1460]: WARNING: App 'spice-vdagent.desktop' exited with code 1
| Aug 07 10:38:36 reddoor.mimosa.com systemd[1]: Mounted /boot/efi.
|
| Fix: why the heck am I running packagekit? Stop that.
|
| $ sudo systemctl disable packagekit
| $ sudo systemctl stop packagekit
| $ sudo umount /boot/efi
|
| See https://bugzilla.redhat.com/show_bug.cgi?id=1991228
|
| | From: D. Hugh Redelmeier <hugh at mimosa.com>
| | To: Toronto Linux Users Group <talk at gtalug.org>
| | Date: Thu, 6 Jul 2017 13:12:11 -0400 (EDT)
| | Subject: war story: power failure makes CentOS system unbootable
| |
| | tl;dr / spoiler: change fstab so that /boot/efi is not automatically
| | mounted. See the recommendation at the end of this message.
| |
| | My gateway computer is a little PC running CentOS.
| | It does not come back after a power failure.
| | The reason (as best I can tell) is interesting and I think that I have a
| | fix.
| |
| | My system is UEFI. It boots from a UEFI system partition, known as
| | /boot/efi to Linux. If this gets corrupted, it won't boot. It is a
| | VFAT partition.
| |
| | On my gateway (a Zotac Zbox RI321, a cute little box with two ethernet
| | ports), the UEFI firmware apparently won't boot if the dirty bit is
| | set in the system partition.
| |
| | CentOS normally runs with /boot/efi mounted. So when the system isn't
| | shut down cleanly, the dirty bit will be on.
| |
| | Consequence: the system is not going to boot after a power failure.
| |
| | Odd observation 1: if I put a live Fedora 25 USB stick in the system
| | and try to boot in this otherwise-unbootable state, CentOS boots from
| | the main disk. So this really looks like a firmware bug/quirk.
| |
| | Odd observation 2: fsck doesn't seem to automatically fix the system
| | partition. Once CentOS 7 booted, I dismounted /boot/efi and
| | did an fsck on it.
| |
| | $ sudo fsck /dev/sda1
| | fsck from util-linux 2.23.2
| | fsck.fat 3.0.20 (12 Jun 2013)
| | 0x25: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.
| | 1) Remove dirty bit
| | 2) No action
| | ? 1
| | Leaving filesystem unchanged.
| | /dev/sda1: 16 files, 2420/51145 clusters
| |
| | Googling got me to <https://www.centos.org/forums/viewtopic.php?t=50917>
| | In particular, this advice seemed quite good:
| |
| | ==== recommendation ====
| |
| | Two other changes I recommend for UEFI systems, to each OS's /etc/fstab.
| |
| | - For the /boot/efi mountpoint, add the mount options
| | x-systemd.automount,noauto
| |
| | - Change fs_passno (last column) to a 1 or 2; the canonical fstab
| | instructions suggest 2, but systemd treats 1 and 2 as the same.
| |
| | The first change means the EFI System partition will not be automatically
| | read-write mounted at boot time; it's a bad idea this is the default
| | because it puts the ESP at risk especially if ther are crashes, FAT has no
| | journal and will therefore always be marked dirty in such a case; no other
| | UEFI OS mounts the ESP by default. Second, if anything tries to access
| | /boot/efi (read or write), systemd will automatically mount it, and
| | because of the fs_passno 1 or 2, it will fsck it first and that fixes and
| | clears the dirty bit in case it's set.
| |
| | Right now without these changes, it's just a matter of having the right
| | number and bad timing of crashes to render the EFI System partition
| | corrupt.
| |
|
More information about the talk
mailing list