⬆️ ⬇️

Haskell in the real world

This blog has already written a lot about the Haskell language itself, and there have been several articles about its practical application. Now I’m inspired to talk about one more real use of language in production.



Data description



I work in telecoms: fixed telephony, Internet, IP-telephony. Tasks include processing of traffic from PBX, from IP-telephony servers, support of current billing, writing software and administering networks, domain (yes, "a programmer). Recently, in my organization, new equipment was installed, operating on completely different principles, and traffic, respectively, has also changed. Now it comes in two versions: from old stations and from new ones. New traffic is duplicated in binary and text formats. Binaries do not suit us at all (although some people unknowingly said: "Why are you there? You take them, and they drop by billing themselves!"), The text one takes an order of magnitude more space, but it is with it that you can do something. The traffic from the old equipment is still on, and we and his partner are thrown with the help of proven schemes using the services of a local DBMS. It would be possible to configure the same services for traffic from new stations, but several features were discovered. Traffic at new stations is written into separate files every 15 minutes; In total, there are about 3000 such files per month (ideally 2976 in 31 days and 2880 in 30 days). You will not import each file separately, for insanity. They can be merged into one and even necessary, since they are all text-based, and the call records are arranged line by line. Manual merging looks like this: select files only for the last month and add them to the simplest merge script on the command line. The file name format is fixed, therefore, the merge can be automated, only you have to parse the year and month. Linuksoidy would use any Bash , Perl or Python , cheap and cheerful, but you can’t install them on a Windows machine for the sake of a single operation, the same goes for PowerShell . And cmd is a perversion, as we well know. ;) Finally, there were also surprises in the traffic itself, due to which, even after the merger and import using the means of the DBMS, a lot of manual SQL work was required. In general, the factors somehow well formed into the task for Haskell , which I began to study at that time (April-May 2011).



So, ~ 3000 15 minute files per month. On the equipment, you can change the interval and set not 15 minutes, but any other value: 5, 10, 30, 45 ... The number of files and their sizes, respectively, will change. Example of the file name (for 05/09/2011 09:30:00):



  999992011050909300081.txt
 99999 - identifier sewn into the equipment (for obvious reasons, I replaced it with nines)
      2011
          05 - month
            09 - day
              09 - hours
                30 minutes
                  00 - seconds
                    81 is a random number, possibly tenths of seconds. 


')

There are more subscribers, and each file is growing steadily in size. We now have an average of 240 lines per file, but there was a seasonal summer subsidence, people left for holidays and called less. For September, we are waiting for an increase in activity of one and a half to two times.



Call records are varied. There are several different record types that have a different number of fields. R210 entries are very rare, and we didn’t find out what they mean (hereinafter the traffic data is replaced with random ones):



  | R210 | 2011-06-24 21: 43: 53 | 2011-06-24 01: 43: 52 | 1 | 




As you can easily see, there are only 4 fields here: the record type identifier, the start date, the end date (ISO 8601 / SQL format) and, for some reason, one. The fields are separated by a vertical bar, which should stand at the beginning of the record and at the end, so it’s really 1 more fields, that is, 5. It’s convenient to assume that the field with the index 0 is empty is in front of the first vertical line. Then the countdown of significant fields will go with 1.



Regular calls are recorded in R200 records. There are already 152 fields there, and this can be reconfigured on the equipment: some fields should be added, some should be removed, while others can be changed.



  | R200 | 99999 | 111111 | CR, CS, AM | 1 | 1 | 3022 | 222222 | 333333 ||| 2011-06-23 11: 33: 58 | C | 2011-06-23 11: 34: 22 | S | 0 | 16 | 1 |||||| 1 | 1 |||||| 3 | 162 | 17 | 1 | 12 | 24 |||||||||| 16 | 0 |||| || 192.168.1.172 || 192.168.1.12 ||||| 8 | 8 | 20 | 20 | 64 | 64 | 20 | 0 | 0 | OS | 7777 | 8888 | 555555 | 666666 | 0 | 8 | 9 | ||| OS | 19 ||| 30 | 10 | 42 | 43 ||||||||||| 1 |||||| 1 | 1 | 0 | 3 || 222222 ||||| || 2 | 1 || 333333 ||||||||||||||||||||||||||||||| 




We are interested in fields with indices [7, 8, 9, 12, 14, 36, 112, 122] , and in the final result I would like to filter everything that is unnecessary, so as not to import the excess into the DBMS. Selecting from the raw data only the necessary, we get the line:



  Record: 3022 | 222222 | 333333 | 2011-06-23 11: 33: 58 | 2011-06-23 11: 34: 22 | 24 | 222222 | 333333
 Indices: 7 | 8 | 9 | 12 | 14 | 36 | 112 | 122

 Indices |  Explanation
 ---------------------------
 7 |  Cheat city code
 8, 112 |  outgoing number
 9, 122 |  incoming number
 12 |  date and time to start a conversation
 14 |  end date and time
 36 |  conversation duration in seconds 




All other fields are not particularly needed. Some, as you can see, are generally empty, and the meaning of the others is unknown. Is that the IP-addresses (changed) belong to two cards in the infrastructure of the telephone network, between which the RTP traffic will go. Having imported these fields, one could study the load on the boards. Perhaps in the future come in handy.



Traffic records go tight, line by line. Perhaps there are some other types of records, but they are not interesting to us. For billing only R200 type entries are sufficient. However, during a visual study of traffic, another interesting fact emerged. Sometimes there were calls from the same number that started at the same time, but their duration was different. At first, in the context of incomplete information, I thought it was some kind of glitch, because a person cannot call in parallel from the same number. Then a pattern began to be seen, and finally, I understood what was the matter. Here is an example of such entries for one phone number, I have thrown out all the extra fields for clarity:



  | 7 | 8 | 9 | 12 | 14 | 36 |
 | 3022 | 222222 | 333333 | 2011-05-23 13: 07: 54 | 2011-05-23 13: 37: 54 | 1800 |
 | 3022 | 222222 | 333333 | 2011-05-23 13: 07: 54 | 2011-05-23 13: 59: 40 | 3106 |

 | 3022 | 444444 | 555555 | 2011-05-23 14: 53: 52 | 2011-05-23 15: 23: 52 | 1800 |
 | 3022 | 444444 | 555555 | 2011-05-23 14: 53: 52 | 2011-05-23 15: 53: 52 | 3600 |
 | 3022 | 444444 | 555555 | 2011-05-23 14: 53: 52 | 2011-05-23 16: 00: 50 | 4018 |

 | 3022 | 666666 | 777777 | 2011-05-23 19: 15: 55 | 2011-05-23 19: 45: 54 | 1800 |
 | 3022 | 666666 | 777777 | 2011-05-23 19: 15: 55 | 2011-05-23 20: 15: 54 | 3600 |
 | 3022 | 666666 | 777777 | 2011-05-23 19: 15: 55 | 2011-05-23 20: 45: 54 | 5400 |
 | 3022 | 666666 | 777777 | 2011-05-23 19: 15: 55 | 2011-05-23 20: 47: 17 | 5483 | 




You can see now what the salt is, and then these records among thousands of them like me, among a heap of extra fields, letters and numbers, were not easy to find. In general, it was either luck, or intuition, or magic. :) And the solution is simple: the equipment every half hour (1800 seconds) marks a "milestone in the conversation" in case something happens. Even if the last 29 minutes of the conversation were somehow lost, the entire previous three-hour discourse was recorded many times - for greater reliability. The last entry will be current data. Perhaps, on equipment, the duration of the milestones can be somehow changed, but for now their series looks like this: [1800, 3600, 5400, 7200, 9000, over 9000 ..] It is also noteworthy that the recording milestone differs in several "unimportant" fields from the record-result. It may be worthwhile to take this into account in the future in order to better filter out the unnecessary, but for now I have decided to just throw out all entries with a duration from this series. Theoretically, here a small percentage of normal calls will be lost, but for this you need a person to talk for half an hour to exactly one second. The probability of this is very, very small, and our volumes are not so significant that the law of large numbers somehow influences the sample. We called this phenomenon creatively: "The problem is 1800 seconds."



Briefly about the programs



In total, I created four programs that merged necessary files and filtered useful information. Originally it was parser and merger - two programs, written separately and combined into a third one, also called merger. They all worked extremely slowly and consumed a lot of memory, although they coped with tasks. Data of one month (3000 files with 240 lines = 720000 lines) they could process for at least 10 minutes with 500 MB of memory (or even more, because the swap worked). Very, very scary result. And although the task had to be done once a month, my partner contemptuously wrinkled Haskell's nose. True, Haskell has nothing to do with it; I made this in the programs a number of typical mistakes of a novice functionary, because of which the curves algorithms worked very badly. But worked! Even more: the program (and the largest of them takes only 150 useful lines) could be configured from the command line. Here are the features available:



1. Work in the mode without parameters. Fields are taken by default, files of the last month are taken.

2. Work in the mode with parameters:

- Fields [<list of field indices>] - which fields to take (parser Fields [1, 24, 55]);

- (yyyy, mm) - what month to process (merger (2011, 5));

-W - do not close the console window after processing (from the word "wait").

3. It turned out three files:

- yyyy.mm.txt - all files with raw traffic for this month were merged into it;

- processed.txt - file with only the required fields for this month;

- yyyy.mm.txt.log - a log file that lists the raw files involved and summarizes information (the number of lines, files, the date range).

4. Programs display statistics and examples of processed traffic.



A couple of times we used what it was, but then I, of course, rewrote the program from scratch. Very much in the old code there was a lot of curved code, unnecessary bicycles, stupid algorithms and strange solutions. As a result, the fourth program, NgnParser, with the same functionality and the same data set does not work 10 minutes, but 10 seconds, consuming only 10 MB of memory. In terms of speed, the difference is almost two orders of magnitude and at least one is from memory! What could such a thing be to slow down the program? I suppose there are people who stepped on the same rake as I, who believed in the inhibition of the language rather than in their crooked hands - it’s not for nothing that there are so many screams on the Internet on the Internet ... Haskell is a wonderful language. It was easy for me to write these programs. For each, I spent no more than two working days. And every time I received a lot of pleasure. I can not imagine how much torture it would be if I did the same thing in C.



First program



I started the first program with a simple one. For the time being I merged the necessary files into one (merged.txt) using the command line, and the task of parser was to parse the traffic and filter out the necessary entries with the identifier R200. To process a large amount of text, it is more expedient to use a special type of string ByteString . But working with it is not as convenient as with the usual type of String . If ByteString is an optimized implementation of the string itself, then String is a list of characters, that is, [Char] . String , due to its nature, is very convenient, but on large data, the performance of any lists drops sharply. In the first versions of the program, it was String and a few stupid decisions that caused strong brakes and a big devouring of memory. However, I was then worried about the speed of development, and not productivity. I wrote the prototype of the program very quickly. Here is what it looks like (revision 2):



replaceChars :: Char -> Char -> Char -> Char

replaceChars whatC withC c = if c == whatC then withC else c



interestFields :: [ String ] -> [ Int ] -> [ String ]

interestFields s takeWhat = undefined - Stub



isR200 :: [ String ] -> Bool

isR200 s = ( head s ) == "R200"



processLine :: String -> String

processLine s = if isR200 sInWords then unwords ( interestFields sInWords [ 1 , 2 , 3 ] ) else [ ] - [1,2,3] - test fields

where sInWords = words ( map ( replaceChars ' | ' '' ) s )



processString :: String -> [ String ]

processString s = map processLine ( lines $ s )



main :: IO ( )

main = do

str <- readFile "merged.txt"

putStrLn ( intercalate " \ r \ n " ( processString $ str ) )




You can immediately notice the strange function replaceChars with type Char -> Char -> Char -> Char . The idea was this: take a string-record, replace the vertical line '|' space and use the words function to split the string into words:



sInWords = words ( map ( replaceChars ' | ' '' ) s )




Running sInWords will result in the following conversion:



"| R200 | 99999 | 111111 | CR, CS, AM | 1 | 1 | 3022 | 222222 | 333333 ||| 2011-06-23 11: 33: 58 |" ->

"R200 99999 111111 CR, CS, AM 1 1 3022 222222 333333 2011-06-23 11:33:58" ->

[ "R200" , "99999" , "111111" , "CR, CS, AM" , "1" , "1" , "3022" , "222222" , "333333" , "2011-06-23" , " 11:33:58 " ]




Unfortunately, the field with date-time will also be divided into two separate fields. Later, in order to avoid this, I complicated the construction even more, initially replacing the space in this field with an asterisk and then returning it. As a result, the string-record went through more transformations:



Replacing '' -> '*';

Replace '|' -> '';

Word splitting by the function words ;

Processing the list of fields, getting the fields with the desired index;

Merge fields with unwords function;

Replacing '' -> '|'

Replacing '*' -> ''



In addition, after these terrible transformations, the number of fields received did not coincide with the number of initial ones, because empty fields eventually disappeared altogether (see the example of the transformation above). It is good that in all the records the empty / filled fields were on the same places, but not that I would get unpleasant artifacts. The code, as you see, is not only redundant, but also ugly. I do not even dare to estimate its asymptotic complexity; I think she far exceeded O (n ^ 2). Moreover, closer to revision 12, I realized that something must be done with the fields that are lost due to double vertical lines. And added another transformation:



- To empty fields, designated as "||", were processed correctly, a space is inserted between them.

refieldDoubles :: String -> String

refieldDoubles [ ] = [ ]

refieldDoubles ( ' | ': [ ] ) = "|"

refieldDoubles ( ' | ': ' | ': ss ) = "| |" ++ ( refieldDoubles ( ' | ': ss ) )

refieldDoubles ( s: [ ] ) = [ s ]

refieldDoubles ( s: ss ) = s: ( refieldDoubles ss )




Thereby added another full pass for each row! That's really the truth - monkey work. But it was necessary to do quite a bit: instead of all this, use the split function from the Data.String.Utils module, or write your own version. Then, in just one pass through the record line, I would get the correct splitting into fields:



split "|" "| R200 | 99999 | 111111 | CR, CS, AM | 1 | 1 | 3022 | 222222 | 333333 ||| 2011-06-23 11: 33: 58 |" ->

[ "" , "R200" , "99999" , "111111" , "CR, CS, AM" , "1" , "1" , "3022" , "222222" , "333333" , "" , "" , "2011-06-23 11:33:58" , "" ]


What to say ... facepalm No, some emotions.



What improvements are possible



Experienced Haskellists have already noticed that the code does not use the pointless style (I have not yet felt it at the time), and there are almost no case constructions, pattern matching, or there any security expressions. As a result, even such a simple code is difficult to read in some places. Here is how it would be better to write several functions:



replaceSymbols s = map ( replaceChar ' | ' '' ) ( map ( replaceChar '' ' * ' ) s )

- ->

replaceSymbols = map ( replaceChar ' | ' '' . replaceChar '' ' * ' )





isR200 s = ( head s ) == "R200"

- ->

isR200 ( "R200" : _ ) = True

isR200 _ = False





replaceChars whatC withC c = if c == whatC then withC else c

- ->

replaceChars whatC withC c | c == whatC = withC

| otherwise = c





processLine s = if isR200 sInWords then unwords ( interestFields sInWords [ 1 , 2 , 3 ] ) else [ ]

where sInWords = words ( map ( replaceChars ' | ' '' ) s )

- ->

processLine s | isR200 sInWords = unwords ( interestFields sInWords [ 1 , 2 , 3 ] )

| otherwise = [ ]

where sInWords = words . map ( replaceChars ' | ' '' ) $ s





processString s = map processLine ( lines $ s )

- ->

processString = map processLine . lines




The intercalate function "\ r \ n" generally needed to be replaced with unlines . It would be both shorter and clearer, and even in tests, unlines showed great performance - at least 30%:



  ItemsCnt testUnlines (ns) testIntercalate (ns) Percent
 10 23.84 34.05 29.9
 100 22.70 34.62 34.4
 1000 23.28 35.48 34.3
 10000 22.17 35.48 37.5
 50000 22.13 33.26 33.4
 100000 21.06 35.47 40.6
 200000 22.70 34.05 33.3 




But being still inexperienced, I didn’t know standard functions well even from the Prelude module, which is why I was bogging unnecessary bikes. Although, even now I don’t really understand how to select the elements with the necessary indexes from the list of elements with minimal effort. Compare the code from the old and the new program:



- Old code:

- Argument 1 - list of raw foods

- Argument 2 - list of required indices

takeInterest :: [ String ] -> [ Int ] -> [ String ]

takeInterest _ [ ] = [ ]

takeInterest ss ( n: ns ) = [ ss !! n ] ++ takeInterest ss ns



- New code:

- Argument 1 - Battery

- Argument 2 - list of required indices

- Argument 3 - list of raw fields

collectFields :: Int -> [ Int ] -> [ String ] -> [ String ]

collectFields _ _ [ ] = [ ]

collectFields idx fis ( s: ss ) | idx ` elem` fis = s: collectFields ( idx + 1 ) fis ss

collectFields idx fis ( s: ss ) | otherwise = collectFields ( idx + 1 ) fis ss




In the first case, we iterate the index list and pull out the item using an insecure function !! . In the second case, we use the battery at the same time as we iterate through the list of fields, and if the battery is in the list of indices, we take the current field with the battery index into the collection. And there, and there - tail recursion. Although, according to tests, the new code was 40% slower. Perhaps, in this case, it is worth returning the old code - to speed up the new program even more.



  ItemsCnt takeInterest (ns) collectFields (ns) Percent
 10 17.33 36.84 52.9
 100 20.58 36.84 44.1
 1000 21.67 37.92 42.8
 10000 21.13 36.84 42.6
 50000 21.67 37.92 42.8 




Go ahead



The next program, merger, was supposed to merge all text files from the previous month into one. (I didn’t want to do it every time manually, but laziness, as we know, is a mover of progress.) The question arose: how can we find out what our last month will be like? Yes, in general, it is simple: we take the current date and subtract one month. The program was to be used exclusively in the first days of the month, so there were no problems foreseen. The current date and date-time operations in general are in the Time module. Operations with the file system structure are in the System.Directory module. Both functions and other functions mainly work in the IO monad, and at that time I still couldn’t even brush the monad code, as a result it looks creepy (merger, revision 14):



main :: IO ( )

main = do

args <- getArgs

curDir <- getCurrentDirectory

dirContents <- getDirectoryContents curDir

curTime <- T. getClockTime

monthAgoTime <- return $ T. addToClockTime ( T. TimeDiff 0 ( - 1 ) 0 0 0 0 0 ) curTime

calendarMonthAgoTime <- T. toCalendarTime monthAgoTime

let maybeDateRange = case args of

( a: b: _ ) -> readDateRange ( unwords [ a , b ] )

_ -> Just $ defaultDateRange calendarMonthAgoTime

case maybeDateRange of

Just dr -> do

let fsToMerge = filesToMerge dirContents dr

let fsToMergeCountStr = show $ length fsToMerge

let mergeLog = ( newFileName dr ++ ".log" )

let dateRangeMsg = "DateRange:" ++ show dr

fsContents <- merge fsToMerge

writeFile ( newFileName dr ) ( unlines fsContents )

writeFile mergeLog ( unlines fsToMerge ++ printf " \ n % s \ n Total files:% s" dateRangeMsg fsToMergeCountStr )

putStrLn ( unlines fsContents )

putStrLn dateRangeMsg

--putStrLn ("Files to merge:" ++ unlines fsToMerge)

putStrLn ( printf "Count of files:% s. See% s for file list." fsToMergeCountStr mergeLog )

Nothing -> putStrLn ( "Invalid date range." )




What this code is doing is not even recommending to delve into ... But it was here that the germ of a huge future error crept in, due to which the latest version of the old program devoured a huge amount of memory. Consider the merge function, which is called from this code:



merge :: [ String ] -> IO [ String ]

merge fsToMerge = mapM readFile fsToMerge




It accepts the list of files to be merged, reads them and returns a list of their contents. There are two lines in the code:



do

...

fsContents <- merge fsToMerge

writeFile ( newFileName dr ) ( unlines fsContents )

...




The key point is unlines fsContents . That is, all the raw contents of all files were combined in one sitting and pushed into the result file. Later, when parser and merger were combined, it was this huge amount of data that was transferred to parser-part processing, where, you remember, there is a whole bunch of replacements, passes, and other overheads. Here is what this part of the code looks like in the old program built from parser and merger:



do

...

fsContents <- readFilesToMerge fsToMerge

let mergedContents = unlines fsContents

writeFile ( newFileName dr ) mergedContents

let processedContentStr = unlines $ processData nedeedFields mergedContents

...




And this is not just facepalm bad, it is a gross violation of the dataflow concept. It should be like this:



  Scheme 1
        _____ _____ _____ 
       |  |  |  |  |  |
 A1 -> | F (A1) |  -> B1 -> | G (B1) |  -> C1 -> | H (C1) |  -> RESULT 1 -> SAVE
       | _____ |  | _____ |  | _____ |

        _____ _____ _____
       |  |  |  |  |  |
 A2 -> | F (A2) |  -> B2 -> | G (B2) |  -> C2 -> | H (C2) |  -> RESULT 2 -> SAVE
       | _____ |  | _____ |  | _____ |

        _____ _____ _____ 
       |  |  |  |  |  |
 A3 -> | F (A3) |  -> B3 -> | G (B3) |  -> C3 -> | H (C3) |  -> RESULT 3 -> SAVE
       | _____ |  | _____ |  | _____ |


 ...-> ... -> ... -> ... -> ... -> ... -> RESULT n -> SAVE 




And it turned out like this:



  Scheme 2
     ____________________ ____________________ ____________________
    |  |  |  |  |  |
    |  |  |  |  |  |
    |  |  |  |  |  |
 A-> |  F (A) | -> B-> |  G (B) | -> -> |  H (C) | -> RESULT-> SAVE
    |  |  |  |  |  |
    |  |  |  |  |  |
    | ____________________ |  | ____________________ |  | ____________________ | 




I felt the difference between sequential processing of parts and processing of everything at once. Now I know: it’s not worth taking the whole amount of work right away, it’s better to create a mechanism that issues data in chunks - it will be as efficient from memory as well as speed better. And, among other things, the program made according to the second scheme cannot be parallelized. Horror, in one word ...



Later I tried to optimize the old program - I replaced String with ByteString almost everywhere. The winnings, of course, were small, but you couldn’t disperse the fan.



Need I say that the new program, NgnTraffic, I did, having already a normal stock of knowledge? It is observed a lot. The program is divided into modules: Main , Constants , DataProcess , FileListProces , Options , Tools , Types . Of course, scheme 1 is used. Instead of String , the type ByteString (Strict-variant) is initially taken. The code is more coiffed, even without a point style available. And, most importantly, the principle of action has changed. First, a list of files to be processed is compiled - this part is similar to the one from the old program. Then, however, the files are not read all at once into one large variable, but each is read separately. Its contents are immediately processed, the lines are parsed, the records are filtered, the necessary fields are pulled out. The result is added to the resulting file. As a result, we have the cycle "Reading from disk ( F (An) ) - Processing lines ( G (Bn) ) - Writing the result to disk ( H (Cn) )" - and so many times by the number of files. Plus, of course, there is no substitution of one character for another, but there is a simple split function from the Data.ByteString.Char8 module, which in one pass breaks a string-record into fields and by itself wins a monstrous share of performance. Here are the functions that satisfy the scheme 1:



process' :: ResFilePath -> FieldIndexes -> FilePath -> IO ( )

process' resFile fis targetFile = do

fileContents <- C. readFile targetFile

let processResult = processData fis predicates fileContents

C. appendFile resFile processResult



process :: ResFilePath -> [ FilePath ] -> FieldIndexes -> IO String

process _ [ ] _ = return "No files to process."

process resFile fs fieldIndexes = do

C. writeFile resFile c . empty

mapM_ ( process' resFile fieldIndexes ) fs

return "All ok."




Here process takes the name of the result file, the list of files to process, and the indices of the required fields. Indexes can come from the command line, so they are derived as an argument to this function. process takes the name of each file in turn and applies the process ' function to it (in the line mapM_ (process' resFile fieldIndexes) fs ). She is already engaged in the main work with files. The processData function takes care of the file contents. It is interesting that, in addition to filtering the R200 records, a predicate mechanism was created in the new program: you can check any record by the available predicate, and the predicates themselves can be easily supplemented. So far two have been done: the field with the index x belongs to the list of fields and does not belong to it. This is the type of predicate:



data Predicate = NotInList [ C. ByteString ]

| InList [ C. ByteString ]

type PredicateMap = [ ( FieldIndex , Predicate ) ]




And these are given predicate constants:



predicates :: PredicateMap

predicates = [ ( 1 , InList [ C. pack "R200" ] ) ,

( 36 , NotInList ( map C. pack [ "1800" , "3600" , "5400" , "7200" , "9000" , "10800" , "12600" , "14400" , "16200" , "18000" , "19800" , "21600" , "23400" ] ) ) ]




It is not difficult to guess that only those records whose field 1 is equal to “R200” will satisfy the predicates, and field 36 is not in the “Problems of 1800 seconds” list. Filtering occurs in the checkPredicate and examineFields functions :



checkPredicate :: Predicate -> C. ByteString -> Bool

checkPredicate ( NotInList l ) str = ( not . elem str ) l

checkPredicate ( InList l ) str = elem str l



examineFields :: Int -> PredicateMap -> Fields -> Bool

examineFields _ _ [ ] = True

examineFields idx preds ( s: ss ) = case L. lookup idx preds of

Just pred -> ( checkPredicate pred s ) && ( examineFields ( idx + 1 ) preds ss )

Nothing -> examineFields ( idx + 1 ) preds ss




Now I see that the examineFields function could have been done through foldr , but this is so, little things.



In general, I am pleased with the work done. This rarely happens - but I like the NgnTraffic code. And I especially like the progress that has become visible on the example of the same program, written at different times and with different knowledge. And “even more special” is the fact that this is a real Haskell program used in real production.



No matter what anyone says, Haskell is a terrific language, the best one I've ever worked with.



Source code parser for revisions: [2] , [6] , [12]

Merger sources (including merger merged program): [13] , [14] , [16] , [23]

NgnTraffic Sources



PS Translation of the next part of Yet Another Monad Tutorial too soon.

PPS Corrected links to the program, thanks to my friend steck .

PPPS Fixed a couple of errors, thanks goad and jack128 comrades .

Source: https://habr.com/ru/post/129235/



All Articles