[GTALUG] War Story: UEFI Boot Failure and Fix
Scott Sullivan
scott at ss.org
Wed Jan 18 23:57:14 EST 2017
I was recently issue an new laptop from work. Of course, being in a
sysadmin position, and giving the latitude to use the tools I deem
necessary, I replaced the disk and installed Fedora 25.
I felt it was also time to take the plunge with UEFI booting.
The Problem
===========
After installation, the first boot was successful. After applying
updates, boot failed.
This was repeatable after a second re-install, and following updates.
Symptoms
========
System boots, and goes to a blue and white UI called 'MokManager.efi'.
Research revealed this to be a UI for enrolling encryption keys for
secure boot.
http://www.rodsbooks.com/efi-bootloaders/secureboot.html
MOKs—A Machine Owner Key (MOK) is a type of key that a user generates
and uses to sign an EFI binary. The point of a MOK is to give users the
ability to run locally-compiled kernels, boot loaders not delivered by
the distribution maintainer, and so on.
Trouble Shooting
================
Firstly, I confirmed that secure boot was disabled in the laptops firmware.
Next, I booted a rescue image. Mounted the EFI partition and investigated.
mkdir /mnt/boot
mount /dev/sda2 /mnt/boot # grub2 boot partition
mount /dev/sda1 /mnt/boot/efi # EFI partition
We then find the MokManager.efi under the following path in our rescue
environment.
/mnt/boot/efi/EFI/fedora/MokManager.efi
I removed this executable to see what I could discover from the boot
process.
Upon reboot, I was greeted with message of grub2-x64.efi listed as being
corrupt, failing through to MokManager.efi that was now missing, and
another optional efi executable that was also missing.
Repair
======
fsck.vfat /dev/sda1
- preformed removal of dirtybit>
- FATs didn't agreed, selected the first one at random. Going to restore
firmware anyways.
mkdir /mnt/chroot
mount /dev/sda3 /mnt/chroot # root filesystem
mount /dev/sda2 /mnt/chroot/boot # grub2 boot partition
mount /dev/sda1 /mnt/chroot/boot/efi # EFI partition
# Make devices and process table avaliable
for j in /dev /dev/pts /sys /proc; do mount -B $j /mnt/chroot$j; done
# Make DNS avaliable inside the chroot
cp /etc/resolve.conf /mnt/chroot/etc/resolve.conf
chroot /mnt/chroot
# re-install grub2 EFI executables
yum reinstall grub2-efi shim
# leave the chroot
exit
umount /mnt/chroot/boot/efi
fsck.vfat /dev/sda1 # just to sanity check that it's still clean.
reboot
Success
=======
Machine booted normally with the latest kernel form updates.
Notes
=====
1) I've simplified my device names for convince of conveying the process
followed. Adjust appropriately for your circumstances and disk layout.
2) In Fedora 25 is a symylink, /etc/resolv.conf ->
/var/run/NetworkManager/resolv.conf, I had to copy this instead, your
distro may vary.
3) There was no left over EFI partition from a previous system. This was
all done on a blank hard drive.
4) Frankly, I found all of this far simpler to manage then a classical
MBR based boot loader. I was able to use standard tools to investigate
and repair the boot chain. No special gurb-install commands or dinking
with config files.
5) None of this explains why a normal update left the boot partition in
a unclean state to being with. But if it happens again, it's an easy
enough repair, just a little time consuming.
--
Scott Sullivan
More information about the talk
mailing list