📜 ⬆️ ⬇️

Powershell String Optimization

Introductory: this note describes how to get acceleration by 5-10 (and more times) when processing a large number of strings using the StringBuilder object instead of String.

Call the System.Text.StringBuilder constructor:

$SomeString = New-Object System.Text.StringBuilder 

Reverse to String:
')
 $Result = $Str.ToString() 

While writing a script that handles many text files, a feature was found to work with strings in powershell, namely, the speed of parsing is significantly reduced if you try to process strings with the help of a standard string object.

Initial data - a file clogged with lines by type:

key;888;0xA9498353,888_FilialName


In the raw version of the script, intermediate text files were used to control the processing, the loss of time for processing a file of 1000 lines was 24 seconds, and as the file size increases, the delay quickly increases. Example:

 function test { $Path = 'C:\Powershell\test\test.txt' $PSGF = Get-Content $Path #   $PSGFFileName = $Path + '-compare.txt' Remove-Item -Path $PSGFFileName -ErrorAction SilentlyContinue | Out-Null New-Item $PSGFFileName -Type File -ErrorAction SilentlyContinue | Out-Null # ToDo #     ,  . #     Add-Content,    foreach ($Key in $PSGF) { $Val = $Key.ToString().Split(';') $test = $val[2] $Val = $test.ToString().Split(',') $test = $Val[0] Add-Content $PSGFFileName -Value $Test } $Result = Get-Content $PSGFFileName Remove-Item -Path $PSGFFileName -ErrorAction SilentlyContinue | Out-Null ###    # end ################################ return $Result } 

The result of the run:

99 lines - 1.8 seconds
1000 lines - 24.4 seconds
2000 lines - 66.17 seconds

Optimization №1


It is clear that this is no good. Replace upload to file with operations in memory:

 function test { $Path = 'C:\Powershell\test\test.txt' $PSGF = Get-Content $Path $Result = '' # foreach ($Key in $PSGF) { $Val = $Key.ToString().Split(';') $test = $val[2] $Val = $test.ToString().Split(',') $test = $Val[0] $Result = $Result + "$test`r`n" } return $Result } Measure-Command { test } 

The result of the run:

99 lines - 0.0037 seconds
1000 lines - 0.055 seconds
2000 lines - 0.190 seconds

It seems to be all right, the acceleration is obtained, but let's see what happens if there are more rows in the object:

10,000 lines - 1.92 seconds
20,000 lines - 8.07 seconds
40,000 lines - 26.01 seconds

This processing method is suitable for lists of no more than 5-8 thousand lines, then losses begin on the object constructor, the memory manager constantly allocates new memory when adding a line and copies the object.

Optimization # 2


Let's try to do better, use the "programmer" approach:

 function test { $Path = 'C:\Powershell\test\test.txt' $PSGF = Get-Content $Path #     $Str = New-Object System.Text.StringBuilder foreach ($Key in $PSGF) { $Val = $Key.ToString().Split(';') $temp = $val[2].ToString().Split(',') $Val = $temp $temp = $Str.Append( "$Val`r`n" ) } $Result = $Str.ToString() } Measure-Command { test } 

The result of the run: 40,000 lines - 1.8 seconds.

Further improvements such as replacing foreach with for, ejecting the internal $ test variable did not give a significant increase in speed.

Briefly:

For efficient work with a large number of strings, use the System.Text.StringBuilder object. Constructor call:

 $SomeString = New-Object System.Text.StringBuilder 

Conversion to string:

 $Result = $Str.ToString() 

Explanation of the work of StringBuilder (the whole secret is in more efficient work of the memory manager).

Source: https://habr.com/ru/post/273619/


All Articles