Foreword
First of all I want to note, I am not a programmer. I'm admin bye. Of course, I would like to be called an architect, but in the foreseeable space there are suitable vacancies, with adequate requirements and, most importantly, no salaries for these requirements. It's a pity.
As a matter of fact, within the framework of this note I want to tell about the useful pluses of the new Powershell version. In particular, about the possibility of quickly and confidently parsing web pages and doing it “in parallel”.
Task
So, the task that stood before me was pretty simple. There is a certain site, if you go through the initial form, on which you need to select the starting and ending date, we get on this page:

The number of such pages can be large within one period of dates. But no more than 999. That is, if for example, you need to select data for 5 years, then they will not fit into 999 pages. This page is a directory, I was only interested in the data to which it leads by the link in the Permit NO column:
')

In general, since I am not a programmer, my knowledge was not enough to take advantage of the possibilities of C # or something like that. In general, my favorite tool - powershell helped here.
Decision
I decided to go in two stages. First, unload and parse the directory with links, and then go through the directory to select the documents to which it refers. Primitive task for the programmer. It took me around 16 hours. True, given the fact that I did this using new commands for me, not only with the goal of solving the problem, but also with the goal of learning new commands and powershell 3 chips for me, which at that moment had just been released.
I was lucky that the site took the parameters directly in the URL bar, like this:
http: // [skip] / [skip]? allcount = $ allcount & allstartdate_month = $ allstartdate_month [skip]
because how to work with HTML forms, I can not. So I decided to just request the necessary pages, changing the parameters of the request. For this, I used the Invoke-WebRequest cmdlet. It allows you to send a request in the simplest form and get the result without using the .NET classes directly or using the COM objects of IE. The result is a parsed HTML document that can be further parsed.
In addition, a feature of this page was the fact that it was returned not only with the HTML code of the table, but also with such parsed contents of this table itself.

First half parsing
In this part, I chose just a directory. The main problem at this stage is to go through all the pages that were returned by the system and determine the last one. For this, I decided to check whether there is a Next button on the page, or it does not exist.
In addition, at the output of this part, I wanted to get a flat csv file containing the actual directory. And at the end pass this file to the next stage. For this was born the code below. It simply selects all root labels for the date range, parses the contents of the page with regular expressions, using the above feature, and returns an object that contains all of the specified information.
function Get-AppList { [CmdletBinding()] param( [datetime] $startDate = '01.01.2012', [datetime] $endDate = '01.01.2012', [string] $allpermittype = "SG", [string] $allcount = "0000", [string] $requestid= "1" ) begin{ [string] $allstartdate_month = "{0:d2}" -f $startDate.Month [string] $allstartdate_day= "{0:d2}" -f $startDate.Day [string] $allstartdate_year= $startDate.Year [string] $allenddate_month = "{0:d2}" -f $endDate.Month [string] $allenddate_day = "{0:d2}" -f $endDate.Day [string] $allenddate_year = $endDate.Year $fields = @{Regex="\[0:PtAppFirstName\]\{(?<PtAppFirstName>.+)\}";Column="PtAppFirstName"}, @{Regex="\[1:PtAppLastName\]\{(?<PtAppLastName>.+)\}";Column="PtAppLastName"}, @{Regex="\[2:PtAppMI\]\{(?<PtAppMI>.+)\}";Column="PtAppMI"}, @{Regex="\[3:PtJobNum\]\{(?<PtJobNum>.+)\}";Column="PtJobNum"}, @{Regex="\[4:PtJobDocNum\]\{(?<PtJobDocNum>.+)\}";Column="PtJobDocNum"}, @{Regex="\[5:PtJobType\]\{(?<PtJobType>.+)\}";Column="PtJobType"}, @{Regex="\[6:PtPermitType\]\{(?<PtPermitType>.+)\}";Column="PtPermitType"}, @{Regex="\[7:PtPermitSubtype\]\{(?<PtPermitSubtype>.+)\}";Column="PtPermitSubtype"}, @{Regex="\[8:PtPermitSeqNum\]\{(?<PtPermitSeqNum>.+)\}";Column="PtPermitSeqNum"}, @{Regex="\[9:PtIssuanceDate\]\{(?<PtIssuanceDate>.+)\}";Column="PtIssuanceDate"}, @{Regex="\[10:PtFilingDate\]\{(?<PtFilingDate>.+)\}";Column="PtFilingDate"}, @{Regex="\[11:PtExpirationDate\]\{(?<PtExpirationDate>.+)\}";Column="PtExpirationDate"}, @{Regex="\[12:PtBin\]\{(?<PtBin>.+)\}";Column="PtBin"}, @{Regex="\[13:JHouseNumber\]\{(?<JHouseNumber>.+)\}";Column="JHouseNumber"}, @{Regex="\[14:JStreetName\]\{(?<JStreetName>.+)\}";Column="JStreetName"}, @{Regex="\[15:PermitIsn\]\{(?<PermitIsn>.+)\}";Column="PermitIsn"} $uri = "http://[skip]/bisweb/[skip]?allcount=$allcount&allstartdate_month=$allstartdate_month&allstartdate_day=$allstartdate_day&allstartdate_year=$allstartdate_year&allenddate_month=$allenddate_month&allenddate_day=$allenddate_day&allenddate_year=$allenddate_year&allpermittype=$allpermittype&go13=+GO+&requestid=0&navflag=T&requestid=$requestid" } process{ do { # . $a = Invoke-WebRequest -Uri $uri -SessionVariable sv $s = $a.ParsedHtml.childNodes| % data $s2 = ($s[3] -split "\[\d+\]") $obj = @{} $s2 | % { $item = $_ if ($item) { $fields | % { $res = $item -match $_.regex if ($res) { $obj[$_.Column] = $matches[$_.Column] } else { $obj[$_.Column]= $null } } if (($obj.PtPermitType -ne $null) -and ($obj.PtPermitType -ne " ")) { new-object psobject -Property $obj } } } # , . $form = $a.Forms | where id -EQ "frmnext" if ($form) { $allstartdate_month=$form.Fields["allstartdate_month"] $allstartdate_day=$form.Fields["allstartdate_day"] $allstartdate_year=$form.Fields["allstartdate_year"] $allenddate_month = $form.Fields["allenddate_month"] $allenddate_day = $form.Fields["allenddate_day"] $allenddate_year = $form.Fields["allenddate_year"] $allpermittype = $form.Fields["allpermittype"] $allcount = $form.Fields["allcount"] $requestid = $form.Fields["requestid"] $uri = "http://[skip]/skip?allcount=$allcount&allstartdate_month=$allstartdate_month&allstartdate_day=$allstartdate_day&allstartdate_year=$allstartdate_year&allenddate_month=$allenddate_month&allenddate_day=$allenddate_day&allenddate_year=$allenddate_year&allpermittype=$allpermittype&go13=+GO+&requestid=0&navflag=T&requestid=$requestid" } } while ($form) } }
Parsing second half
In the second part, another problem arose. The number of pages that had to be requested became a little larger. Once in 30. Because, the search of the results of the first stage and the selection of pages one by one took a lot of time. That's why I decided to use another powershell v3 chip - powershell workflow. Well, rather, to say the
foreach –parallel operator. In fact, workflows are designed for something completely different, but in this case it’s already gone. I’ll say right away that this is not a tool for paralleling tasks in order to increase productivity, so you shouldn’t expect this from it. So, in this case, the idea was to take advantage of this opportunity to run queries for each line of the directory "in parallel". In fact, this command starts a separate process, and their number is limited. I did not ask myself whether it is possible to change their maximum number. This mechanism allows you to simply simplify the code to get "parallelism". In quotes, not because they are not parallel. They are parallel, just run not in light streams but in heavy processes within the .NET Workflow and are forced to transfer results across process boundaries. Therefore, it is not too productive, but, “as our favorite chef says, cheap is convenient and practical,” and the most important thing for the admin is just 2 lines of code. Losing a few seconds in a separate task does not play a role relative to the task as a whole. In general, a good thing.
The code came out like this.
workflow Get-AppDetails2 ($list) { $webList = @() foreach -parallel ($i in $list){ $PermitIsn = $i.PermitIsn $queryUri = "http://[skip]/bisweb/[skip]?allisn=$PermitIsn&allbin=&requestid=1" Invoke-WebRequest -Uri $queryUri } }
findings
In general, all this proves that powershell is a powerful and useful thing, suitable for all important and useful things.