📜 ⬆️ ⬇️

Ultra Efficient Text Processing

It does not matter whether you are writing a book, imposing a web page or editing the source code of programs, sometimes there are a number of different routine tasks that take a lot of time but are required to be performed.

In this article I want to demonstrate a few scenarios that can help you out in such situations. I will take examples not the most difficult, but indicative, on the basis of which you can build something more cunning.

image

')
I think many people are familiar with situations where you need to change the format of dates in a large text, normalize indents and spaces in a document, calculate the occurrence of a word in a text fragment; convert the xml-document or server response to a class for deserialization, convert a portion of the code of one programming language into another ... Everyone comes in such cases differently: he searches for the corresponding utilities, writes his own, and someone acts in the forehead!

The most courageous begin to master regular expressions ... And even more courageous ones try substitutions. Yes, the threshold of entry of these tools is very high, but the efficiency with proper application beats all records!

One of the factors hindering the study of the language of regular expressions is, I believe, a certain lack of development of existing programs and development environments in this area.

So one day I decided to create my own text editor with redjexes and substitutions.
It is called Poet (website: poet.of.by ), and it is with the help of it that we will work small wonders today!

Counting matches



As soon as the user begins to enter a sample for the search, the program immediately highlights the matches found and counts them. Moreover, according to the results, you can navigate using the scroll bar! Try it, sometimes it is very convenient! Such a function I have not met in other programs.

Multiline search with special characters

The editor makes it easy to search for multi-line matches, and special, space and newline characters are always clearly visible in the search bar, so you don’t have to somehow distinguish between tabs and spaces and count them manually.



Date conversion

Enable the use of regular expressions and substitutions.



If, for example, we want to put in the document the date of August 1, 2014 format to the form August 1, 2014, then we need a small regular expression and a simple substitution:

0*(?\d{1,2}).08.(?\d{1,2})

${Day} 20${Year}








- C# xml- .
<?xml version="1.0" encoding="UTF-8"?> <Result> <Deposit> <Synonym></Synonym> <Curr>BYR - </Curr> <CurrCode>BYR</CurrCode> <State>2 - </State> <Sum>250,000.00</Sum> <Rest>88,505,000.00</Rest> <PercentSum>4,579,405.00</PercentSum> <PercentRest>4,579,405.00</PercentRest> <LastPercentDate>22.03.2013</LastPercentDate> <CurrentPercent>36.000000</CurrentPercent> <PercentSetupDate>18.03.2013</PercentSetupDate> <NextPercentDate>22.04.2013</NextPercentDate> <NextPercentSumma>2,650,603.00</NextPercentSumma> <ContractPost>1170646001265</ContractPost> <OpenDate>22.11.2012</OpenDate> <ReopenDate>22.03.2013</ReopenDate> <FinishDate>22.04.2013</FinishDate> </Deposit> </Result>

:

<(?\w+)>.+</(?\w+)>

[XmlElement("${TagName}")]
public string ${TagName} { get; set; }






- . , !

PS , .
 0*(?\d{1,2}).08.(?\d{1,2}) 

${Day} 20${Year}








- C# xml- .
<?xml version="1.0" encoding="UTF-8"?> <Result> <Deposit> <Synonym></Synonym> <Curr>BYR - </Curr> <CurrCode>BYR</CurrCode> <State>2 - </State> <Sum>250,000.00</Sum> <Rest>88,505,000.00</Rest> <PercentSum>4,579,405.00</PercentSum> <PercentRest>4,579,405.00</PercentRest> <LastPercentDate>22.03.2013</LastPercentDate> <CurrentPercent>36.000000</CurrentPercent> <PercentSetupDate>18.03.2013</PercentSetupDate> <NextPercentDate>22.04.2013</NextPercentDate> <NextPercentSumma>2,650,603.00</NextPercentSumma> <ContractPost>1170646001265</ContractPost> <OpenDate>22.11.2012</OpenDate> <ReopenDate>22.03.2013</ReopenDate> <FinishDate>22.04.2013</FinishDate> </Deposit> </Result>

:

<(?\w+)>.+</(?\w+)>

[XmlElement("${TagName}")]
public string ${TagName} { get; set; }






- . , !

PS , .
0*(?\d{1,2}).08.(?\d{1,2})

${Day} 20${Year}








- C# xml- .
<?xml version="1.0" encoding="UTF-8"?> <Result> <Deposit> <Synonym></Synonym> <Curr>BYR - </Curr> <CurrCode>BYR</CurrCode> <State>2 - </State> <Sum>250,000.00</Sum> <Rest>88,505,000.00</Rest> <PercentSum>4,579,405.00</PercentSum> <PercentRest>4,579,405.00</PercentRest> <LastPercentDate>22.03.2013</LastPercentDate> <CurrentPercent>36.000000</CurrentPercent> <PercentSetupDate>18.03.2013</PercentSetupDate> <NextPercentDate>22.04.2013</NextPercentDate> <NextPercentSumma>2,650,603.00</NextPercentSumma> <ContractPost>1170646001265</ContractPost> <OpenDate>22.11.2012</OpenDate> <ReopenDate>22.03.2013</ReopenDate> <FinishDate>22.04.2013</FinishDate> </Deposit> </Result>

:

<(?\w+)>.+</(?\w+)>

[XmlElement("${TagName}")]
public string ${TagName} { get; set; }






- . , !

PS , .

Source: https://habr.com/ru/post/231929/


All Articles