So, I broke my VM cluster… Oops.
Updates
This kinda started a couple weeks ago. ProxMox released version 9 of their virtual environment server1. My plan, was to slowly, one by one, reinstall ProxMox on each of my nodes. Removing them, and re adding them from the cluster. This looked to be the safest way to do the update. However, They had instructions for doing the upgrade in place… So I did that.
Fast forward to a couple days ago, I was installing some updates on node two. Nothing out of the ordinary. Just run apt upgrade -y && reboot in tmux, and leave. The node will update things, migrate the VMs elsewhere for the reboot, and take them back when it’s done… It didn’t do that. It migrated most of the VMs, but MiniFlux got stuck, and then the node gave up, and rebooted. Leaving MiniFlux locked in migration, with out it migrating.
That took a bit to fix, but it left me not overly trusting that node. And that’s when I noticed I couldn’t get a shell up on node two, or three… This is bad.
Reinstalling
So, I need to do a clean installed on node two and three. That’s not overly hard. So I shutdown node two, took it to my desk, and got it all hooked up for a reinstall, and tossed on some music.
Half way through the install, and past the point of no return, I realized something. I didn’t remove node two from the cluster. Something you need to do, BEFORE taking it offline… Shit. So some searching, and time with the magic eight ball told me what files I needed to manually edit.

So with that fixed. I went on with the reinstall. After that, It took a bit to get the node fully up to date. But after all that, ProxMox was happly doing live migrations to it. Now just to repeat all this2 with node three. Thankfully, that went smoothly. And everything is happy again.
Lessons Maybe Learned
There’s a reason I don’t host my main websites on this cluster. I don’t trust myself (or Comcast). This wasn’t fun. In place upgrades are dangerous. And if you cosplay as a sys-admin, your hardware might test that.