In this article, we will briefly talk about the Hyper-V replication mechanism with built-in tools. Unfortunately, for some reason, this mechanism is extremely rarely used by someone and instead
steal "buy" products such as Veeam Backup, but use it only to create replicas of VMs.
Hyper-V replication allows you to make a complete copy of a VM to another hypervisor (server) and keeps this copy up to date continuously. Thus, in the event of a total collapse of the main hypervisor, you, as the administrator of the virtualization environment, after making the decision “The main server cannot be restored and you need to run the VM on the backup hypervisor”, spend just a few minutes to turn on this VM with the amount of lost data no more than 0,5 .5 / 15 / XNUMX minutes (depending on setting). Moreover, if the guest OS is also Windows (from a generation not lower than the 9th version), then by using the VSS service in conjunction with the integration components, you will get complete even the DBMS databases that were working at the time of creating a copy of the data for the backup hypervisor.
After replication is configured, the primary hypervisor creates delta virtual disk files and transfers them to the replica server (backup server), and the replica server, having received it, merges it with the main virtual disk. Depending on the settings, you can get also consistent replicas with applications (the same VSS service). Such “consistent” copies guarantee that all data in them will be consistent, and even in MS SQL, MySQL, PostgreSQL and other DBMS, even heavily loaded ones will be on the replica server with their databases in a consistent state, and in the event of an unexpected crash of the main server, you will have the opportunity take the last state of the replica (with a minimum amount of lost data, but with the risk that the database is in a corrupted state), or roll back to an earlier point, but consistent with applications (VSS). And you will get 100% complete transactions inside the database.
Depending on your Hyper-V replication settings settings, you can configure the replica transfer frequency (non-consistent) every 30 seconds, 5 minutes, 15 minutes. Although if you dig deeper into Windows, you can set other options, we do not recommend doing this. But you can get consistent (VSS) rollback points only if you have your Windows guest OS, and it has and runs the VSS service and you enable in the replication settings how often you need to make a “consistent” rollback point. By default, Microsoft suggests doing this once every 4 hours and making an inconsistent point every hour. All these parameters can also be modified in the bowels of Windows, but here we strongly do not recommend doing this. The maximum you can change the frequency of consistent copies to every 3, 2 or 1 hour. But it should be remembered that when creating a consistent point, all applications in the guest OS almost completely stop responding to calls and slow down greatly.
Requirements for tincture:
- Your hypervisors must be members of a domain. Remember that if your Domain Controllers are VMs themselves, then you must have at least two of them and on different hypervisors. Because at the time of loading the hypervisor, it will not detect an available Domain Controller, and if due to some kind of failure of the virtualization environment or the VM itself with DomainController, you will not get access to the hypervisor either. On hypervisors, we recommend that you install AD that is not related to the main enterprise domain in any way. In it, create accounts for only administrators of the virtualization environment. Allocate the hypervisor network segment to a separate VLAN (at least), but it is better to allocate separate switches for at least 10Gbe. Restrict access to this network segment in such a way that administrators of the virtualization environment cannot connect to hypervisors from any point as a local segment, let alone the Internet.
- The disk subsystem of your hypervisor should be sufficiently high-performance, since the replication mechanism uses almost 2 times more operations with the disk subsystem on the main hypervisor.
- On hypervisors, in the local Firewall settings, you must enable permission to connect via TCP/80 and/or TCP/443 for replica transmission (Microsoft will remind you of this during configuration).
Procedure for setting up replication:
Select the data transfer mode with or without encryption. We strongly do not recommend enabling encryption if your data is transmitted in a private segment. But if you need to set up encryption, you must understand what asymmetric encryption is, correctly configure the CA in the domain, and obtain the correct certificates on both servers. You also need to take into account the factor of excessive CPU load on both servers.
Here you can configure which virtual disks should be replicated. For example, if you do not have a lot of free space on your partner server and you do not want, for example, to replicate the WSUS cache of the service, then you can exclude the WSUS disk from replication. Please note that adding it "on the fly" will not work in the future and you will need to recreate the replication settings. You can recreate in two ways:
- Delete the VM on the replica server.
- Delete replication parameters and recreate them and transfer the initial replica to the previously replicated VM. With such a restore, the initial replication will not be completely transferred, but a disk that was not previously replicated will arrive at the replica server with the virtual disk name not as you specified on the main server, but in the form of a long GUID.
When configuring additional points, keep in mind that the replica server must have free space equal to the amount of changes for the selected number of hours (first parameter). Also remember that the VSS service is available only on Windows operating systems and works correctly in tandem with Hyper-V and a Windows guest OS of version 9 or higher. Another important factor is a noticeable decrease in the performance of the guest OS at the time of creating the VSS point. If the guest system is highly loaded and doing this point every hour, then you will get the effect of degrading performance every hour.
If everything is clear with the transfer of the initial replica over the network, then with the transfer on the media and "Use the existing ..." There are some nuances to be aware of.
- To transfer the initial replica on the media, you need to upload this data to this media. And this can be done by simply exporting the VM to the media. But there is a "nuance", which will be discussed below.
- If you already have a replica of your VM on a replica server, then you can start replication using the "Use an existing VM ..." option to speed up the process. Please note that Hyper-V determines which VM is the same on the replica, not by name in the manager, but by GUID (internal VM parameter). Therefore, it doesn’t matter how exactly your VM on the replica is named. If at the time of the start of such replication there were no virtual disks on the replica server, then they will be loaded on the replica server, but the file name of this virtual disk will contain not the name as on the source server, but the GUID. This mechanism is useful in cases where you have added another disk to the original VM and you need to add this disk to replication.
After setting up replication, the “snapshot” (Return Point) will be automatically deleted from sending the initial replica on the main server and the automatic procedure for transferring all changes to the replica server will begin.
The advantages of this solution are obvious, but I will still note them:
- Available in the basic configuration of OS Windows Server.
- Extremely easy to set up even for a beginner.
- Ensures data integrity inside VM for Windows guest OS.
- In the event of a failure of the primary server or the VM itself, you will need 1-5 minutes to check the status of recovery points on the replica server and perform a failover.
- There are return points for 24 hours, then you can return not to the last state (when the data inside the VM was already Corrupt), but choose the time for the recovery point.
- The ability to configure "Extended replication to the 3rd server, which gives you confidence that under the most negative circumstances, you have another copy of the entire VM.
But there are also negative aspects of this decision:
- In the event of a failure of the main server, the VM on the replica server will not turn on by itself (human intervention required).
- Slightly reduced performance due to the excessive cost of the file subsystem to create data files that need to be transferred to the replica.
- Replication may spontaneously stop functioning.
If no one will have any special questions with the first point, and it is clear to everyone that if you need an even higher level of fault tolerance, then you need to use the “Failover Cluster” and everything is clear with the second point, and this is a feature of the technology and the performance decrease is not significant, then with the third point is the problem. Do not hire a special person who will come in and check every 15 minutes.
To do this, we have developed a sensor for the monitoring system PRTG and will publish it on our website soon. GitHub. For Zabbix it will also be ready, but a little later. You can take this sensor and rewrite it yourself since it is written in PowerShell.