Intro
I run a three-node cluster and just recently one of the nodes had a PSU fail, thus leaving just two of three nodes working.
Sure enough, the cluster dropped and syncing stopped while also some VMs stopped because of no quorate.
Temporary solution
I solved this temporarily by running the below command.
root@cyndane5:~# pvecm expected 2
Then waiting a bit and after a few minutes, I run the below command.
root@cyndane5:~# pvecm status
Cluster information
-------------------
Name: skynet
Config Version: 14
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Fri Mar 22 10:10:55 2024
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000003
Ring ID: 1.8439
Quorate: Yes
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.0.5
0x00000003 1 192.168.0.4 (local)
root@cyndane5:~#
In the Proxmox web-GUI the quorum status was now green.
Picking up the pieces - restoring to normal
When the third node is fixed and back online I'll run the below command and will be expecting all back to normal.
Ie, using three expected nodes.
root@cyndane5:~# pvecm expected 3
Future enhancements
A way to avoid this broken quorate would be to use a Qdevice. When the failed node has been fixed I'll look into that.
Sources
https://forum.proxmox.com/threads/another-cluster-not-ready-no-quorum-500-case.56104/
https://pve.proxmox.com/pve-docs/pvecm.1.html
https://forum.proxmox.com/threads/2-node-ha-with-external-qdevice.135429/
https://pve.proxmox.com/wiki/Cluster_Manager#_corosync_external_vote_support