📜 ⬆️ ⬇️

Your console bike to check broken links - LinkInspector

image
I needed for my sites to run a weekly check for broken and non-existent links. After spending half an hour surfing the Internet, I found some decent console applications (because I had a server on Windows, I wanted to use TaskSheduler for this task). All of them were paid. And since I could give myself some free time, and the task at first glance seemed not difficult, I decided to write my own.

I decided to push off from this implementation: WebSpider , but, as it usually happens in the end, I rewrote almost everything, as I like.

I made myself a small list of what I need and gradually erase from it the task:
TaskDescriptionStatus
Recursively gather all linksRun through all pages within one site and collect all linksMade by
Check N linksMainly for debugging purposes, stop after checking N referencesMade by
Save result to fileSave to txtMade by
Save result using html templateFor readability + screw the jquery data table plugin for filtering and sortingMade by
Show only errorsShow only broken links in the report file.Made by
Option to archive a report fileAdd 7zip supportNot done
Send result by mailAdd support for console mailerNot done
Show redirects in the reportCorrectly handle all redirects and display information about them in the report.Not done
Add loggingAdd Log4Net LibrariesNot done
General information about the process in html templateShow when the processing started, when it ended, and other general information in the html templateNot done
Check and configure the correct handling of redirectsNot done
app.config default configurationSince there were too many parameters for the utility, I decided that I should make the default configuration from app.configNot done


The program is simple to ugliness:
1. The input is the URI for which the content is downloaded and the content is searched for links using a regular expression:
public const string UrlExtractor = @"(?: href\s*=)(?:[\s""']*)(?!#|mailto|location.|javascript|.*css|.*this\.)(?<url>.*?)(?:[\s>""'])"; 

2. All found links, if they relate to this site, are placed in the hashtable, where the key is an absolute URI so that there is no duplication.
3. For each reference from the hash table, we create a Request and try to get a Response, and read the returned status:
 public bool Process(WebPageState state) { state.ProcessSuccessfull = false; HttpWebRequest request = (HttpWebRequest) WebRequest.Create(state.Uri); request.Method = "GET"; WebResponse response = null; try { response = request.GetResponse(); if (response is HttpWebResponse) state.StatusCode = ((HttpWebResponse) response).StatusCode; else if (response is FileWebResponse) state.StatusCode = HttpStatusCode.OK; if (state.StatusCode.Equals(HttpStatusCode.OK)) { var sr = new StreamReader(response.GetResponseStream()); state.Content = sr.ReadToEnd(); if (ContentHandler != null) ContentHandler(state); state.ProcessSuccessfull = true; } } catch (Exception ex) { //   todo:   catch  } finally { if (response != null) { response.Close(); } } return state.ProcessSuccessfull; } 

')
Everything else is beautiful and entropy.

From interesting: used for convenient parsing of console parameters this package https://nuget.org/packages/ManyConsole .

As a result, for parameter processing, all that is required of me is to create such a class here:

 public class GetTime : ConsoleCommand { public GetTime() { Command = "get-text"; OneLineDescription = "Returns the current system time."; } public override int Run() { Console.WriteLine(DateTime.UtcNow); return 0; } } 


PS And in conclusion, as the project I am writing for myself and still in the process, I added it to github https://github.com/alexsuslin/LinkInspector

Oh, yes ... to whom, after all, it’s interesting to visually see what happens in the end, this is in the console:
D:\WORK\Projects\Own\LinkInspector\LinkInspector\bin\Debug>LinkInspector.exe -u www.google.com -n=10 -ff=html -e

Executing -u (Specify the Url to inspect for broken links.):

======================================================================================================
Proccess URI: www.google.com
Start At : 2011-12-21 04:56:09
------------------------------------------------------------------------------------------------------

0/1 : [ 2.98s] [200] : www.google.com
1/7 : [ 0.47s] [200] : accounts.google.com/ServiceLogin?hl=be&continue=http://www.google.by/
2/6 : [ 0.22s] [200] : www.google.com/preferences?hl=be
3/5 : [ 0.27s] [200] : www.google.com/advanced_search?hl=be
4/7 : [ 0.55s] [200] : www.google.com/language_tools?hl=be
5/341 : [ 0.21s] [200] : www.google.by/setprefs?sig=0_OmYw86q6Bd9tjRx1su-C4ZbrJUU=&hl=ru
6/340 : [ 0.09s] [200] : www.google.com/intl/be/about.html
7/361 : [ 0.30s] [200] : www.google.com/ncr
8/361 : [ 0.21s] [200] : accounts.google.com/ServiceLogin?hl=be&continue=http://www.google.com/advanced_search?hl=be
9/360 : [ 0.13s] [200] : www.google.com/webhp?hl=be
------------------------------------------------------------------------------------------------------
Pages Processed: 10
Pages Pending : 0
End At : 2011-12-21 04:56:14
Elasped Time : 0h 0m 5s 456ms
======================================================================================================


or this is the screenshot of the report in the html template
image

PPS Asked for compiled binaries, here you are: download Link Inspector 0.1 alpha

Source: https://habr.com/ru/post/135055/


All Articles