Introductory: this note describes how to get acceleration by 5-10 (and more times) when processing a large number of strings using the StringBuilder object instead of String.
Call the System.Text.StringBuilder constructor:
$SomeString = New-Object System.Text.StringBuilder
Reverse to String:
')
$Result = $Str.ToString()
While writing a script that handles many text files, a feature was found to work with strings in powershell, namely, the speed of parsing is significantly reduced if you try to process strings with the help of a standard string object.
Initial data - a file clogged with lines by type:
key;888;0xA9498353,888_FilialName
In the raw version of the script, intermediate text files were used to control the processing, the loss of time for processing a file of 1000 lines was 24 seconds, and as the file size increases, the delay quickly increases. Example:
function test { $Path = 'C:\Powershell\test\test.txt' $PSGF = Get-Content $Path # $PSGFFileName = $Path + '-compare.txt' Remove-Item -Path $PSGFFileName -ErrorAction SilentlyContinue | Out-Null New-Item $PSGFFileName -Type File -ErrorAction SilentlyContinue | Out-Null # ToDo # , . # Add-Content, foreach ($Key in $PSGF) { $Val = $Key.ToString().Split(';') $test = $val[2] $Val = $test.ToString().Split(',') $test = $Val[0] Add-Content $PSGFFileName -Value $Test } $Result = Get-Content $PSGFFileName Remove-Item -Path $PSGFFileName -ErrorAction SilentlyContinue | Out-Null ### # end ################################ return $Result }
The result of the run:
99 lines - 1.8 seconds
1000 lines - 24.4 seconds
2000 lines - 66.17 seconds
Optimization №1
It is clear that this is no good. Replace upload to file with operations in memory:
function test { $Path = 'C:\Powershell\test\test.txt' $PSGF = Get-Content $Path $Result = '' # foreach ($Key in $PSGF) { $Val = $Key.ToString().Split(';') $test = $val[2] $Val = $test.ToString().Split(',') $test = $Val[0] $Result = $Result + "$test`r`n" } return $Result } Measure-Command { test }
The result of the run:
99 lines - 0.0037 seconds
1000 lines - 0.055 seconds
2000 lines - 0.190 seconds
It seems to be all right, the acceleration is obtained, but let's see what happens if there are more rows in the object:
10,000 lines - 1.92 seconds
20,000 lines - 8.07 seconds
40,000 lines - 26.01 seconds
This processing method is suitable for lists of no more than 5-8 thousand lines, then losses begin on the object constructor, the memory manager constantly allocates new memory when adding a line and copies the object.
Optimization # 2
Let's try to do better, use the "programmer" approach:
function test { $Path = 'C:\Powershell\test\test.txt' $PSGF = Get-Content $Path # $Str = New-Object System.Text.StringBuilder foreach ($Key in $PSGF) { $Val = $Key.ToString().Split(';') $temp = $val[2].ToString().Split(',') $Val = $temp $temp = $Str.Append( "$Val`r`n" ) } $Result = $Str.ToString() } Measure-Command { test }
The result of the run: 40,000 lines - 1.8 seconds.
Further improvements such as replacing foreach with for, ejecting the internal $ test variable did not give a significant increase in speed.
Briefly:
For efficient work with a large number of strings, use the System.Text.StringBuilder object. Constructor call:
$SomeString = New-Object System.Text.StringBuilder
Conversion to string:
$Result = $Str.ToString()
Explanation of the work of StringBuilder (the whole secret is in more efficient work of the memory manager).