Microsoft Hyper-V virtualization technology is one of the most common solutions for creating and managing virtual machines (hereinafter referred to as VM). However, sometimes there are problems with VM replication in Hyper-V. This can lead to significant problems in the operation of the system as a whole in a Hyper-V virtualization environment. One of the reasons for this situation is the inadequate operation of the DFSR service, which can manifest itself in special cases (which is difficult to detect). In this article, we will take a closer look at this problem and possible solutions.
Terms:
- DFS — Distributed File System. A component of Microsoft Windows used to make it easier to access and manage files that are physically distributed over a network. When using it, files distributed across servers appear to be located in one place.
- DFSR — DFS Replication. This is a Microsoft Windows Server service used to synchronize files on different servers (replicas).
- VM — Virtual machine. In computing, a "virtual machine" is a virtualization or emulation of a computer system.
Symptoms:
The replication status becomes Critical, for no apparent reason (the place did not end, there were no reboots, etc.). When attempting to restore replication, a re-initial replication spontaneously starts. In the event logs, either there is no sane description A critical error has occurredor CRC error. Re-replication starts and before reaching the final it starts again. And so on until the space on the Hyper-V hypervisor runs out. When you try to completely recreate the replication, nothing changes, the initial synchronization is successful, then after a few cycles again the status Critical.
Reason:
Exceeding the maximum size of a single replica of 2 TB. At the same time, it will not work to understand by reading the event logs what exactly the problem is. But even if you managed to understand that the problem is precisely in exceeding the size limit of one replica of 2TB, the question still remains: Who is responsible for the overflow? On the Internet, in very similar situations and even on many foreign resources, all recommendations boil down to Reinstall Windows. In our case, we use the DFS + DFSR service inside the VM. We use the DFS + DFSR service for convenient user interaction with network resources, and this solution is especially effective if the company has many branches, and especially if in different countries. The DFSR service has such a thing as "Intermediate storage" its "Default" size is 4 GB. And if you upload a file larger than 4 GB, then the DFSR service starts in a loop endlessly trying to transfer it to the replica. This behavior of DFSR in the VM first leads to the overflow of the DFSR cache size limit, you can see this by going to the hidden DFSRPrivate folder and counting the size of the Staging folder (the size should not exceed the limit set by you), if it is exceeded, it is necessary to either increase the size of the "Intermediate storage" to a size equal to the maximum file size in this folder, or delete the file that interferes with replication (for example, a 1C database backup of 60 GB). Constantly restarting DFSR causes the Hyper-V replica to overflow its limit.
Note that Hyper-V replication disruption may never occur if the replication frequency is set to 30 seconds or 5 minutes. In our case, the replication frequency is set to 15 minutes.