Warning: there may be occasional oddness due to css and blog edits. **KNOWN ISSUE: possible hidden text**
Showing posts with label recovery. Show all posts
Showing posts with label recovery. Show all posts

Wednesday, February 7, 2024

Zpool-upgrade loader.efi fail

I recently bought a Sony Walkman android device, which I hoped would permit me to use an app to control my Deco S4 wifi devices.  The wifi hardware was added in hopes of improving local wifi from a single Xfinity device and in a more secure way by connecting via my OPNsense box.  Once I solved the wifi device app issue, I still had the Sony Walkman for audio.

Perhaps foolishly, I wanted to use larger better quality flac files, and knowing I still had quite a large number of CDs to run through ripperX, I tried to change my vdev to better compression.  What seemed to me as being better compression algorithms were unavailable to me because my zfs needed an update.  I had done as many as two updates in the past, one which switched me to 'feature flags' and each time was warned about compatibility but never had an issue.

Without any further investigation, assuming that it would be just as easy a process as in the past, so I upgraded zfs and then changed the compression of the vdev for my music files.  Everything ran fine, no issues.  Later, I was playing minetest and after some time in the midst of what I was attempting on the minetest server, my display and mouse, and pc froze up.  There has been an issue with something somewhere which has caused me a panic and reboot, so all I had once more was the inconvenient interruption.  This time it didn't reboot itself, just seemed to remain frozen, so I rebooted.

This is when I discovered that something was wrong.  It started the boot but before it got very far I was greeted with errors including "ZFS: unsupported feature: com.klarasystems:vdev_zaps_v2" and mention that it couldn't find any bootable drives.  Luckily there was a specific term which I could search online for more information: klarasystems:vdev_zaps_v2.

One search result was FreeBSD forum post 14-0-release-zfs-features-gotcha.91085 but I also looked at the 14.0 release notes, and /usr/src/UPDATING:

20160708:
	The stable/11 branch has been created from head@r302406.

After branch N is created, entries older than the N-2 branch point are removed
from this file. After stable/14 is branched and current becomes FreeBSD 15,
entries older than stable/12 branch point will be removed from current's
UPDATING file.

COMMON ITEMS:

	General Notes
	-------------
	Sometimes, obscure build problems are the result of environment
	poisoning.  This can happen because the make utility reads its
	environment when searching for values for global variables.  To run
	your build attempts in an "environmental clean room", prefix all make
	commands with 'env -i '.  See the env(1) manual page for more details.
	Occasionally a build failure will occur with "make -j" due to a race
	condition.  If this happens try building again without -j, and please
	report a bug if it happens consistently.

	When upgrading from one major version to another it is generally best to
	upgrade to the latest code in the currently installed branch first, then
	do an upgrade to the new branch. This is the best-tested upgrade path,
	and has the highest probability of being successful.  Please try this
	approach if you encounter problems with a major version upgrade.  Since
	the stable 4.x branch point, one has generally been able to upgrade from
	anywhere in the most recent stable branch to head / current (or even the
	last couple of stable branches). See the top of this file when there's
	an exception.

	The update process will emit an error on an attempt to perform a build
	or install from a FreeBSD version below the earliest supported version.
	When updating from an older version the update should be performed one
	major release at a time, including running `make delete-old` at each
	step.

	When upgrading a live system, having a root shell around before
	installing anything can help undo problems. Not having a root shell
	around can lead to problems if pam has changed too much from your
	starting point to allow continued authentication after the upgrade.

	This file should be read as a log of events. When a later event changes
	information of a prior event, the prior event should not be deleted.
	Instead, a pointer to the entry with the new information should be
	placed in the old entry. Readers of this file should also sanity check
	older entries before relying on them blindly. Authors of new entries
	should write them with this in mind.

	ZFS notes
	---------
	When upgrading the boot ZFS pool to a new version (via zpool upgrade),
	always follow these three steps:

	1) recompile and reinstall the ZFS boot loader and boot block
	(this is part of "make buildworld" and "make installworld")

	2) update the ZFS boot block on your boot drive (only required when
	doing a zpool upgrade):

	When booting on x86 via BIOS, use the following to update the ZFS boot
	block on the freebsd-boot partition of a GPT partitioned drive ada0:
		gpart bootcode -p /boot/gptzfsboot -i $N ada0
	The value $N will typically be 1.  For EFI booting, see EFI notes.

	3) zpool upgrade the root pool. New bootblocks will work with old
	pools, but not vice versa, so they need to be updated before any
	zpool upgrade.

	Non-boot pools do not need these updates.

	EFI notes
	---------

	There are two locations the boot loader can be installed into. The
	current location (and the default) is \efi\freebsd\loader.efi and using
	efibootmgr(8) to configure it. The old location, that must be used on
	deficient systems that don't honor efibootmgr(8) protocols, is the
	fallback location of \EFI\BOOT\BOOTxxx.EFI. Generally, you will copy
	/boot/loader.efi to this location, but on systems installed a long time
	ago the ESP may be too small and /boot/boot1.efi may be needed unless
	the ESP has been expanded in the meantime.

	Recent systems will have the ESP mounted on /boot/efi, but older ones
	may not have it mounted at all, or mounted in a different
	location. Older arm SD images with MBR used /boot/msdos as the
	mountpoint. The ESP is a MSDOS filesystem.

	The EFI boot loader rarely needs to be updated. For ZFS booting,
	however, you must update loader.efi before you do 'zpool upgrade' the
	root zpool, otherwise the old loader.efi may reject the upgraded zpool
	since it does not automatically understand some new features.

	See loader.efi(8) and uefi(8) for more details.


and then to the manpage for loader.efi, specifically:

EXAMPLES
   Updating loader.efi on the ESP
       The  following  examples	 shows	how to install a new loader.efi	on the
       ESP.

       First, find the partition of type "efi":

	     # gpart list | grep -Ew '(Name|efi)'
	     1.	Name: nvd0p1
		type: efi
	     2.	Name: nvd0p2
	     3.	Name: nvd0p3
	     4.	Name: nvd0p4
	     1.	Name: nvd0

       The name	of the ESP on this system is nvd0p1.

       Second, let's mount the ESP, copy loader.efi to	the  special  location
       reserved	for FreeBSD EFI	loaders, and unmount once finished:

	     # mount_msdosfs /dev/nvd0p1 /boot/efi
	     # cp /boot/loader.efi /boot/efi/efi/freebsd/loader.efi
	     # umount /boot/efi

SEE ALSO
       loader(8), uefi(8)

Since I had experienced similar issues with my box which meant I had to adjust what was on the hard drive, I knew that I would need bootable media so that I could reach the hard drive.  This is what became a more significant and time-consuming problem, at least partly due to my own stubborn foolishness.  I had a number of micro sd cards and a usb reader.  Out of three which had the potential to function as I needed, only one had an old NomadBSD installed upon it.  My mistake was that I insisted upon upgrading it from 12.x to 14.0 which ran into storage constraints and then breaking it from its normal startup process and eventually unable to boot at all.

Later I noticed an old Kingston Digital DataTraveler SE9 64GB USB 2.0 which I discovered has FreeNAS installed.  I could boot it and get to shell which allowed me to get the img file.  It was good enough for some attempts to create a bootable micro sd card but it eventually would fail too quickly during an ftp transfer or dd write, so that I had to find another option.  I was pretty sure that I had some cdroms or dvds which have some kind of FreeBSD, so once I found my cache, I decided to use a PC-BSD 8.2rc2 dvd.

This is when I was finally tired of the whole process and possibly more rested or something, so I actually, after a chunk of two days of attempts, finally made progress.  I booted the PC-BSD disc and after some tries for the GUI, I realized I only truly needed a shell.  From the shell, I was able to mount my target micro sd card which I used to store the downloaded (via ftp -a download.freebsd.org) img file.  I could also format an SSD which I bought more than 5 years ago in a group of four for adding zfs cache devices to my box (obviously still not accomplished).  I shifted the img file to the SSD so I could dd it to the micro sd card.  Once I had the FreeBSD 14.0 installer on the micro sd card, I booted it and began the process of the repair.

My initial plan was to first test the process by an update of the loader.efi on the micro sd card but got stymied.  I followed the steps above (in the loader.efi manpage example).  I discovered that the appropriate partition was ada2p1, which I mounted to /boot/efi and then found that there was no /boot/efi/efi/freebsd path.  What I had was /boot/efi/EFI/BOOT/BOOTxxx.EFI which I left alone.  Instead I created the needed freebsd directory in /boot/efi/EFI.  Once the path was setup, I could copy the loader.efi into the correct location.

This was my very first ever experience with this sort of repair or upgrade, so I was not sure of success until my reboot.  At some point during or shortly after this repair, I decided to properly install FreeBSD 14.0 onto the SSD as a failsafe against future similar problems, I kept the micro sd card as the FreeBSD 14.0 installer.  I used the SSD using the presently unavailable cables2go version of Generic-Adapter-Converter-Optical-External.  Aside from all of the above, I was able to connect to my local network with my Sony Walkman to look for answers and help.  If I can keep my Walkman able to use for such emergencies, this may be an easier method than keeping an entire network and FreeBSD installed on a box functioning, all I will likely need is wifi and access.  I have no plan to ever modify the Walkman away from its functional install.

Definite relief after I was able to properly boot up my box and use it on my last of a series of three days off from work.  I hope that you update your loader.efi BEFORE you upgrade zfs so that you can avoid the excitement of doing the repair above.

Saturday, February 26, 2022

Port breaks kernel breaks port

So many of us chug happpily along without completely realizing or recognizing how some of the present FreeBSD build mechanisms have become a bit more complex.  Those who never have any need of graphics and remain in a text mode commandline interface for the duration of their use of FreeBSD would not know that there is indeed at least one situation, now, which ties a port and the kernel together.  When everything is working perfectly, this would likely never come up, but a relatively small problem inflated itself to cause my kernel build to fail.

During this troubleshooting quest, first I tried the obvious things, re-rebuild world just to be sure it was ok, then rebuild a GENERIC kernel instead of my custom kernconf and that after having re-enabled some possibly related things in that kernconf to no avail.  After beating my figurative head against the wall for quite a while, I went to twitter to see if @FreeBSDHelp had any ideas.

The details and comments on that discussion thread didn't solve my issue as I could not comprehend how our kernel build was now in any way tied to the build of a port, though the comment that was made may not have explicitly indicated this.  So my next thought was, if I could get a different, earlier version of the /usr/src from git in some way, then rebuild kernel from before the error seemed to appear in /usr/src.  I didn't know this was not the path to take to solve this, but I still wasted far too much time trying to go backward with git to an earlier commit.  I am definitely not a particularly big fan of git, and this exertion didn't help me love it any more.

The reason, besides that it had been in excess of 10 days since the last time I rebuilt my kernel and world, was to get virtualbox working which needed bits from the kernel build which I didn't have, and those need to be the same version as the running OS.  This meant my long journey to rebuild my kernel and world (multiple times each) so that I could use Virtualbox to try the game Veloren which due to whatever is different than expected (FVWM3 and Radeon graphics probably, or similar) does build and install but does NOT run.  I am sure that if I could startup Virtualbox, put tinycore Linux in there, and install Veloren for Linux, I would probably succeed where I was prevented otherwise.  I could not install another OS in Virtualbox because I could not boot the iso, and this due to the virtualbox kernel object not having been loaded.  I couldn't load the needed kernel object since it needed to be built, and now you know why I got stuck down this rabbit hole.

It has been some time since setting that whole "Play Veloren in Virtualbox via tinycore Linux" idea on a back burner or in a box on a shelf somewhere.  One of my incomplete projects is to get my port attempt of Reshade rebuilt, which I was attempting and it ran into some conflicts with python items.  The various python things it needed were installed as version 310 while I already had version 39 of those same ports.  The only way forward was to remove each of the python 3.9 ports to let the reshade build then install what it needed as version 3.10.  Among all of the things that were removed as a consequence of this, was vlc and firefox, both I use daily.  So I gave poudriere a gross list of everything I had installed on my system, let it build, and then discovered a number of things that failed.  The graphics/drm-fbsd13-kmod port was among the fairly long list of things that didn't get built, and it was in the smallish group of "lynchpin" ports, meaning that others failed (were skipped) because it failed.

And so, I thought thats not a problem, I'll go investigate what happened with graphics/drm-fbsd13-kmod to make it fail.  I remembered that poudriere keeps logs of all (or most everything) it does, and I just had to find it.  Since I often end up trying to remember where any certain important thing is and its path, I have been keeping a directory of symbolic links with sometimes more descriptive names.  The appropriate one was,

root@ichigo:~ # ls -l Symbolic_Links/p-keg-logs_bulk_13amd64_latest-per-pkg
lrwxr-xr-x  1 root  wheel  66 May  5  2021 Symbolic_Links/p-keg-logs_bulk_13amd64_latest-per-pkg -> /usr/local/poudriere/data/logs/bulk/13amd64-default/latest-per-pkg

and from there I could do

root@ichigo:~ # tail -n 15 Symbolic_Links/p-keg-logs_bulk_13amd64_latest-per-pkg/drm-fbsd13-kmod-5.4.144.g20220223.log
===> Checking for items in STAGEDIR missing from pkg-plist
Error: Orphaned: %%KMODSRC%%/linuxkpi/dummy/include/linux/random.h
Error: Orphaned: %%KMODSRC%%/linuxkpi/dummy/include/linux/suspend.h
===> Checking for items in pkg-plist which are not in STAGEDIR
===> Error: Plist issues found.
*** Error code 1

Stop.
make: stopped in /usr/ports/graphics/drm-fbsd13-kmod
=>> Error: check-plist failures detected
=>> Cleaning up wrkdir
===>  Cleaning for drm-fbsd13-kmod-5.4.144.g20220223
build of graphics/drm-fbsd13-kmod | drm-fbsd13-kmod-5.4.144.g20220223 ended at Sat Feb 26 00:30:21 CST 2022
build time: 00:04:43
!!! build failure encountered !!!

Firstly, the build failure is due to my choice to be a bit more stringent on builds, to test for various things, so it is possible that this might not appear to most users, although it truly should be visible to all the port maintainers and various FreeBSD developers.  It tells me, as did the small highlighted concise reason in the failed build list output from after I ran poudriere, that it is an issue with the pkg-plist.  This I correctly believed was a simple issue, and easy to fix since this process is something I have repeated many times with my own repos for FreeBSD Port Tree Leaf items such as for Minetest-dev which I wrote about in another blog post.

What I needed to do was go to /usr/ports/graphics/drm-fbsd13-kmod and rename the pkg-plist to pkg-plist-old, and then do a fresh build of it.  Once the build completes, I create a fresh pkg-plist by make makeplist > pkg-plist in order to do a comparison between this new fresh list and the old original list.  This is accomplished by diff -y pkg-plist pkg-plist-old | more to step through the output, looking for something that is present or absent in the newly generated pkg-plist as compared to the old one.  Since a pkg-plist that is in the ports tree may have %%text%% type tags which are often not generated by the make makeplist script, I usually modify the pkg-plist-old to match the one freshly generated.  Once the edits are made, I rename the pkg-plist-old to pkg-plist and then rebuild once more to prove no errors related to the file remain.

Now that graphics/drm-fbsd13-kmod successfully builds and installs, I thought from the back of my mind, that I would try to update kernel and world, due to that vague mention of these two things being related.  World builds as expected, so I go on to the kernel, and then it fails.  It complains that kconfig.mk was missing.  I remember that that was one of the things that I had removed from the graphics/drm-fbsd13-kmod pkg-plist for a reason I am already uncertain about now-- and this is being written within hours of having done it.  I go back to that port tree directory and either the pkg-plist-old was still present or I went through the steps to generate fresh and make the needed edits to fix it.  Whatever actually happened seems to have fallen out of my mind but the result was "Hey! graphics/drm-fbsd13-kmod needs to be built in order for the kernel to build, gee that is weird."

I have been writing about all of this within a relatively short period after succeeding to build kernel and world when it had been broken some week(s) ago.  My new kernel has not yet been installed and I have to build the virtualbox thing(s) that are dependent upon the source.  It is nice now to have this mess cleared up and better understood.  I'll be adding this nit to the lists of build issues for kernel or world, and add emphasis on the relationship between this port and the kernel which likely many of us had not known.  The kernel failure meant the virtualbox port for a kernel object couldn't be built, but the kmod graphics port is what broke the kernel.

Saturday, November 14, 2020

Now I can't boot

There have been plenty of times by now that I have made some sort of adjustment on my system and then it doesn't boot.  We know the usual suspects are /etc/rc.conf and /boot/loader.conf but I'm sure there are others, possibly even a badly thought out recently built and installed custom kernel.  So now we are stuck, we have one box and it fails to boot but the way to solve the problem is to get online from it.  The situation with the broken kernel might be sidestepped easily, simply choose the option from the boot menu to use a different kernel.  If there were also mistakes with the buildworld, and lack of items means no booting, then there needs to be another way.

If you can get to single-user mode (another boot menu item), the changes will be easy to apply.  First, mount -u / and then if your filesystem is ZFS rather than UFS, zfs mount -a and now you can re-edit the typo out of your /boot/loader.conf or some other file, but what if your situation is a bit more complex?

You can use a usb stick to boot and from there mount the drives in your pc, make needed adjustments and get everything back to normal again.  This is where it can be fun, and when I say fun I mean not quite a nightmare though it is a real special pain.  You can probably use any bootable BSD which offers shell access to the machine, but since discovering NomadBSD, it has become my preference.  What NomadBSD has is a complete system which is self-contained within the usb media.  So if you are short of time and cannot fix your system, you can use it to get online to do something important, such as check your work schedule.  Of course, this immediate need situation means that you previously setup a web browser and installed and configured an addon (such as blur by Abine) which stores your passwords, and you had made any other needed adjustments to suit your needs.  So, getting online you have the solution to the problem and you've written it down and now you need to fix whatever is wrong on the HDD of your system.

What you need to do is mount the HDD of your system into the usb media that is loaded.  There should be a directory /media present already, if not, create via mkdir /media because this will be the mount point to reach inside your system HDD.  I would assume that you are already in a shell window (xterm perhaps) or you used shell access from the menu when you booted the usb.  We have two pieces of the puzzle, the solution and the system running with a shell, what we need to do in order to make the changes is get to the drive.

With ZFS, there is a special command which will do what we need, change zroot to the name of your pool.

zpool import -f -R /media zroot

Many times the mistake was made or is corrected in either /etc or /boot, so now to reach those directories or any others on your HDD, you would prefix the desired directory with /media such as below.  Work slowly, re-read the command you've typed before committing to it by pressing return or enter.  While your HDD is mounted to your usb stick, /media is the/ (root) directory of your HDD, and / is the root directory of the usb stick itself.  Entering the specific directory in order to edit a file, such as rc.conf or loader.conf, may be better than remembering every time to prefix with /media, but always pay attention to your current working directory or path.

cd /media/etc

or

cd /media/boot

Only you can know what the problem and solution are.  Now that you have access  to your HDD you can make the corrections and reboot.  That zpool import command is only viable until you reboot and does not need to be turned off or disabled.  We have not made any permanent changes to how your drives are mounted in order to fix the problem, unless of course your problem and solution specifically involves a permanent adjustment to how your drives are mounted.  Right now I do not have any examples handy of the dumb things I have done which resulted in being unable to boot along with how they were fixed.

While tinkering with your system in ways that can only truly be done when it is open source, because you use FreeBSD for your need of control over all of it, you are setting yourself up with the potential for mistakes.  There is nothing wrong with unintentionally doing something incorrectly, it is the surest way to learn.  We may read somewhere how to do something but unknowingly miss a step or configure something wrong or assume their technique will work on our system.  The worst of these experiences involve Boot blockers and unfortunately they can at least temporarily halt all further progress.

Frequently viewed this week