1. What happens if a node fails
Starting from the scheme with the PBS (Proxmox Backup Server) node, Node A and Node B.
Assuming that one of the three nodes fails.
Absolutely nothing happens at a cluster level, because you have to think of the 2 node cluster as if it were a 3 node cluster from a quorum perspective.
For example, if Proxmox Backup Server suddenly fails, absolutely nothing happens. The important thing is to keep at least two nodes runnign out of 3, as per Proxmox specifications.
Furthermore, you can easily migrate a virtual machine from Node A to Node B and from Node B to Node A in live mode, i.e. without shutting down the resources.
For example, a running VM can be migrated live from Node A to Node B, regardless of whether the replicas are scheduled or not.
If the replicas are scheduled, when migrating a virtual machine from Node A to Node B, replication tasks will automatically be reversed, so it will be from Node B to Node A.
2. How to determine the amount of storage
Let’s take a practical example to understand how much storage to allocate.
Suppose we have 6 virtual machines on the Cluster: 3 VMs on Node A and 3 VMs on Node B, and each virtual machine (for simplicity) is 100GB.
Overall, about 300 GB will be allocated to Node A’s storage. Same thing for node B.
Assuming to schedule the replicas of all VMs, from Node A to Node B, and vice versa. 300GB for VMs and 300GB for replicas must be considered on each node.
The total of each node will therefore be 600 GB.
The file system will try to optimize the space, so it won’t be 600 GB but it will be a little less. We’re going a little bit larger to be safe, so we’re looking at 600GB.
It is not requited to have replicas of all resources (VM or CT) from Node A to Node B.
For example, if one of the virtual machines on Node A is a machine with unimportant services and therefore you don’t want to replicate to Node B, (Node B could perhaps be considered as purely a backup node).
In this case, resource replication tasks are not created, and consequently 100GB must be removed from the replicas. So overall on Node B (destination host) there will no longer be 600GB but 500GB occupied.
It therefore depends on how the Cluster is set up and in particular the replicas: if you want everything to be redundant from Node A to Node B and vice versa.
A span-metric calculation could always consider double the allocated storage.
3. Resource recovery: how long it takes to get back up and running
The time it takes to restore a backup from Proxmox Backup Server to the nodes depends on the hardware used and in particular on the switches, network cards and above all on the types of disks implemented for PBS storage.
However, a very spannometric order of magnitude, it can range from one minute to 20 minutes.