How to determine the minimum size required for a DFSR replication staging folder

[Approx. translator. The material of the article refers to Windows Server 2003 / 2003R2 / 2008 / 2008R2, but most of the above is true for later versions of the OS]

Warren is here again. This article is a quick reference guide on how to correctly calculate the minimum size of an intermediate folder required for normal DFSR operation. Setting lower values may slow down replication or even stop it. Keep in mind that these are only minimal values . When deciding on the size of the intermediate folder, remember the following: the larger the size of the intermediate folder, the better, up to the size of the replicated folder itself. For more information on how important it is to use the correct size of the intermediate folder, refer to the section “How to determine if you have a problem with the intermediate folder” and posts from blogs, links to which are located at the end of this article.

Update: Warren really knows how to convince! Now there is a fix with which you can calculate the size of the intermediate folder.
https://support.microsoft.com/kb/2607047
')

Rules of thumb

Windows Server 2003 R2 - the quota for the staging folder should be the same as the total size of the 9 largest files in the replicated folder.

Windows Server 2008 and 2008 R2 - the quota of the staging folder should be the same as the total size of the 32 largest files in the replicated folder [Approx. translator. This number is also valid for Windows Server 2012 / 2012R2]

Primary replication uses much more space in the staging folder than regular daily replication. If the size of disk space allows, it is strongly recommended to set a size larger than the required minimum before starting replication.

Where to get PowerShell?

PowerShell is included with Windows 2008 and above. On Windows Server 2003, you will have to install it. Download PowerShell for Windows 2003 here .

How to find these biggest files?

Use the PowerShell script to find the 32 or 9 largest files and determine how many gigabytes they occupy (thanks to Ned Pyle for the PowerShell commands). I want to introduce you to three PowerShell scripts. Each of them is useful in its own way, however, the 3rd is the most useful.

Run:
```
Get-ChildItem c:\temp -recurse | Sort-Object length -descending | select-object -first 32 | ft name,length -wrap -auto 
```
This command returns file names and their size in bytes. Useful to find out which 32 files are the largest in the replicated folder, and visit their owners.

Run:

 Get-ChildItem c:\temp -recurse | Sort-Object length -descending | select-object -first 32 | measure-object -property length –sum

This command returns the total number of bytes for the 32 largest files in a folder without specifying their names.

Run:
```
 $big32 = Get-ChildItem c:\temp -recurse | Sort-Object length -descending | select-object -first 32 | measure-object -property length –sum $big32.sum /1gb 
```
This command gets the total number of bytes for the 32 largest files in a folder and converts them into gigabytes using mathematical calculations. This command consists of two separate lines. You can insert them at once into the PowerShell command shell, or run them one at a time.

Manual analysis

To demonstrate the process and, if possible, deepen the understanding of what we are doing, go through each operation and do it manually.

The 1st running command will return results similar to those shown below. For brevity, in this example only 16 files are taken. Always take into account 32 files for Windows 2008 and later operating systems and 9 for Windows 2003 R2.

Sample data returned to PowerShell:

Name	Length
File5.zip	10286089216
archive.zip	6029853696
BACKUP.zip	5751522304
file9.zip	5472683008
MENTOS.zip	5241586688
File7.zip	4321264640
file2.zip	4176765952
frd2.zip	4176765952
BACKUP.zip	4078994432
File44.zip	4058424320
file11.zip	3858056192
Backup2.zip	3815138304
BACKUP3.zip	3815138304
Current.zip	3576931328
Backup8.zip	3307488256
File999.zip	3274982400

How to use this data to determine the minimum size of an intermediate folder:

Name = file name
Length = size in bytes
One gigabyte = 1073741824 bytes

First you need to count the total number of bytes. Then divide the resulting number by 1073741824. I recommend that you use Excel for these calculations, or another spreadsheet editor that you use.

Calculations based on an example

In the example above, the total number of bytes is 75241684992. To get the minimum required size of the intermediate quota, you need to divide 75241684992 by 1073741824.

75241684992/1073741824 = 70.07 (GB)

Based on the data, I would set the size of the intermediate folder to 71 GB, rounded up to the nearest whole number.

Practical use

Despite the fact that manual analysis is an interesting thing, it is not the best thing you can spend your time on. To automate the process, use the 3rd command from the examples above. The results will be something like this:

Using the command from the 3rd example, you can, without any calculations (not counting rounding), determine that the d: \ docs folder requires an intermediate quota of 6GB in size.

Do I need to restart the server or restart the service to apply the changes?

You do not need to restart the server or restart the service in order for the changes to be made to the quota of the intermediate folder to take effect. To apply the changes, you will need to wait for the AD replication to complete and the polling cycle for DFSR objects in AD.

How to identify problems with intermediate folder

Problems with the staging folder are detected by tracking specific event codes in the DFSR server logs. Here is a list of these events: 4202, 4204, 4206, 4208 and 4212. Descriptions for them are presented below. It is important to understand the difference between events 4202 and 4204, as well as other events. Events 4202 and 4204 can be recorded in large numbers and during normal operation. Think of events 4202 and 4204 as something like the presence of a pulse, whereas 4206, 4208 and 4212 will be akin to chest pain. Below I will explain how to interpret events 4202 and 4204.

Events related to the intermediate folder

[Approx. translator. The log events described below are presented in the form in which they are present in the Russian localization of Windows Server 2012 R2.]

Code: 4202
Level: Warning
DFS Replication Service found that the staging space used by the replicated folder with the local path <path> exceeded its upper limit. The service will attempt to delete the oldest intermediate files. This may affect performance.

Code: 4204
Level: Information
DFS Replication Service successfully deleted the old intermediate files of the replicated folder with the local path <path>. The intermediate space is now below the upper limit.

Code: 4206
Level: Warning
DFS Replication Service was unable to clear the old intermediate files for the replicated folder in the local path <path>. The service may not be able to replicate some large files and the replicated folder may become out of sync. The service will automatically attempt to re-clean the intermediate space for <X> minutes. The service may start cleaning earlier if it detects that some intermediate files have been unlocked.

Code: 4208
Level: Warning
DFS Replication Service has detected that the use of staging space has exceeded the quota limit of the replicated folder in the local path <path>. replicate some large files and the replicated folder may become out of sync. The service will automatically attempt to re-purge the staging space.

Code: 4212
Level: Error
DFS Replication Service could not replicate a replicated folder with a local path <path>, because the intermediate path is invalid or inaccessible.

What is the difference between events 4202 and 4208?

Events 4202 and 4208 have a similar description, i.e. DFSR detects that the size occupied by the intermediate folder exceeds the limit. The difference is that event 4202 is registered immediately after the cleaning process of the intermediate folder starts, while the intermediate quota is still exceeded. Event 4202 is a sign of normal full-time work, while 4208 indicates a deviation from the norm and requires intervention.

How many 4202 and 4204 events are considered too large?

This question has no definite answer. In contrast to the events 4206, 4208 and 4212, which always speak about the bad and indicate the need for action, the events 4202 and 4204 also occur during normal operation. Frequent events 4202 and 4204 may indicate a problem. Facts to consider:

Are events 4202 registered for a replicable folder (RF) during its primary replication? If so, then events 4202 and 4204 are normal. If during the initial synchronization you want to reduce the number of these events to a minimum, then this can be achieved by increasing the size of the intermediate folder.
Just counting the total number of events 4202 is not enough. You need to know how many of them apply to a particular RF. If there were twenty 4202 events related to one folder in 24 hours, this is a lot. But if you have 20 replicated folders and one event for each of them, then everything is fine.
To identify trends, it is necessary to analyze the information collected over several days.

I usually advise customers to allow no more than one event 4202 per replicated folder during the day during normal operation. “Normal” means that primary replication does not occur. I justify this by the following reasoning:

The time spent clearing the intermediate folder is the time taken from file replication. Replication is suspended while the intermediate folder is being cleared.
DFSR works more efficiently if enough space is allocated for the intermediate one, using it for RDC and cross-file RDC , as well as for replicating identical files to other replication members.
The more events 4202 and 4204 are logged, the more likely you are to encounter a situation where DFSR will not be able to clear the intermediate folder or will be forced to delete files from it prematurely.
In my experience, events 4206, 4208 and 4212 have always been anticipated and accompanied by a large number of events 4202 and 4204.

Following the rule of “no more than one event 4202 per day per each RF” will significantly reduce the likelihood of problems with the intermediate folder and help the DFSR server more efficiently use resources for a specific purpose - file replication.

Additional Information

https://blogs.technet.com/b/askds/archive/2010/03/31/tuning-replication-performance-in-dfsr-especially-on-win2008-r2.aspx
https://blogs.technet.com/b/askds/archive/2007/10/05/top-10-common-causes-of-slow-replication-with-dfsr.aspx

Warren “Way over my Oud quota” Williams

Source: https://habr.com/ru/post/426237/

All Articles