📜 ⬆️ ⬇️

Node.js on Fidonet: read javascript echomail headers stored in JAM format

Today I have two reasons to run over the keys.

First, after last week I translated the jParser documentation (after reviewing the RReverser example of using jParser in analyzing BMP files ), it seems to me appropriate to go to the next step that is to follow: develop a theme, share my own example with readers using jParser to analyze a slightly more complex data structure. (In part, this will be the answer to the question that alekciy asked, taking interest in further examples of the practical use of jParser.)

Secondly, about half a year ago ( November 26, 2011 ), ertaquo asked why I wanted to use Node.js in Fidonet. Then I said that I simply liked the name (I remember those times when the term “node” or “now,” if used without clarification, in the Russian computer world, by default meant a Fidonet node), but could not give any good example of a working code. and now bring.
')
So, the example will be double. I bring to your attention the analysis of the headers of letters from Fidonet echomail, stored in the JAM format. This format has been popular in Fidonet since the days of the distant and immemorial ( Wikipedia says that the appearance of JAM dates back to 1993). I’ll say right away that I have long preferred JAM to another popular format ( Squish ), because this latter stores in the header of the letter identifiers of no more than nine responses to it, whereas JAM uses a more flexible data structure (a linked list ) instead of an array of limited length, so which allows you to build a complete tree of answers, even in the most lively and extensive discussions.

JAM documentation can be easily found on various fidosh BBSs, but BBS tends to close or change addresses over time, so for reliability I refer to my own letter five years ago, in which I quoted this documentation literally and in its entirety. (The Czech BBS, which then served as a source for me, is now already closed. Everything is ghostly in this raging world.)

As you can see there, the headers of the Fidonet echomail letters are stored inside the JHR file. This file consists of a fixed-length header ( FixedHeaderInfoStruct ), followed by the actual letter headers ( MessageHeader ), each of which consists, again, of a fixed-size structure ( MessageFixedHeader ) and a variable tail consisting of several fields ( SubFieldXX ), total length specified in the SubfieldLen field within the MessageFixedHeader structure. The SubFieldXX field again consists of a fixed-size header followed by a string of bytes, the length of which is specified in the previous number datlen . (This resembles the implementation of strings in the dialects of Pascal, common in the same nineties - Turbo-Pascal, UCSD Pascal; however, in Pascal the length was indicated by one byte, and in JAM the number datlen is of the ulong type , that is, it is thirty-two bits . This is prudent. )

Much less clear is another important fact: inside the JHR file, the MessageHeader headers are not necessarily end-to-end. The subsection “Updating message headers” indicates that if, after editing or processing a letter, its header grows in size, then it is placed at the end of the file, and the old header is marked as deleted. The fate of the letters, whose title did not grow in volume, but decreased, doesn’t say anything - however, in practice many Fidonet programs write such a new title to the previous one, changing the value of SubfieldLen accordingly (and, if necessary, individual values ​​of datlen ). Between this and the subsequent MessageHeader , there remains garbage consisting of the contents of the former last fields of SubFieldXX . That is why, after reading the next MessageHeader header, there is no more reasonable way to go to the next MessageHeader header, besides searching for a string of three ASCII characters "JAM" followed by a null byte - this is the Signature sequence with which the MessageFixedHeader header must start.

The module code for Node.js, which reads echomail headers from a JHR file into RAM, can therefore be sketched as follows:

var fs = require('fs'); var jParser = require('jParser'); var ulong = 'uint32'; var ushort = 'uint16'; var JAM = function(echotag){ if (!(this instanceof JAM)) return new JAM(echotag); this.echotag = echotag; // Buffers: this.JHR = null; /* this.JDT = null; this.JDX = null; this.JLR = null; */ } JAM.prototype.readJHR = function(callback){ // (err) if (this.JHR !== null) callback(null); fs.readFile(this.echotag+'.JHR', function (err, data) { if (err) callback(err); this.JHR = data; callback(null); }); } JAM.prototype.ReadHeaders = function(callback){ // err, struct this.readJHR(function(err){ if (err) callback(err); var thisJAM = this; var parser = new jParser(this.JHR, { 'reserved1000uchar': function(){ this.skip(1000); return true; }, 'JAM0' : ['string', 4], 'FixedHeaderInfoStruct': { 'Signature': 'JAM0', 'datecreated': ulong, 'modcounter': ulong, 'activemsgs': ulong, 'passwordcrc': ulong, 'basemsgnum': ulong, 'RESERVED': 'reserved1000uchar', }, 'SubField': { 'LoID': ushort, 'HiID': ushort, 'datlen': ulong, 'Buffer': ['string', function(){ return this.current.datlen }] /* 'type': function(){ switch( this.current.LoID ){ case 0: return 'OADDRESS'; break; case 1: return 'DADDRESS'; break; case 2: return 'SENDERNAME'; break; case 3: return 'RECEIVERNAME'; break; case 4: return 'MSGID'; break; case 5: return 'REPLYID'; break; case 6: return 'SUBJECT'; break; case 7: return 'PID'; break; case 8: return 'TRACE'; break; case 9: return 'ENCLOSEDFILE'; break; case 10: return 'ENCLOSEDFILEWALIAS'; break; case 11: return 'ENCLOSEDFREQ'; break; case 12: return 'ENCLOSEDFILEWCARD'; break; case 13: return 'ENCLOSEDINDIRECTFILE'; break; case 1000: return 'EMBINDAT'; break; case 2000: return 'FTSKLUDGE'; break; case 2001: return 'SEENBY2D'; break; case 2002: return 'PATH2D'; break; case 2003: return 'FLAGS'; break; case 2004: return 'TZUTCINFO'; break; default: return 'UNKNOWN'; break; } } */ }, 'MessageHeader': { 'Signature': 'JAM0', 'Revision': ushort, 'ReservedWord': ushort, 'SubfieldLen': ulong, 'TimesRead': ulong, 'MSGIDcrc': ulong, 'REPLYcrc': ulong, 'ReplyTo': ulong, 'Reply1st': ulong, 'Replynext': ulong, 'DateWritten': ulong, 'DateReceived': ulong, 'DateProcessed': ulong, 'MessageNumber': ulong, 'Attribute': ulong, 'Attribute2': ulong, 'Offset': ulong, 'TxtLen': ulong, 'PasswordCRC': ulong, 'Cost': ulong, 'Subfields': ['string', function(){ return this.current.SubfieldLen; } ], /* 'Subfields': function(){ var final = this.tell() + this.current.SubfieldLen; var sfArray = []; while (this.tell() < final) { sfArray.push( this.parse('SubField') ); } return sfArray; }, */ 'AfterSubfields': function(){ var initial = this.tell(); var bytesLeft = thisJAM.JHR.length - initial - 4; var seekJump = 0; var sigFound = false; var raw = this; if (bytesLeft <= 0) return 0; do { this.seek(initial + seekJump, function(){ var moveSIG = raw.parse('JAM0'); if (moveSIG === 'JAM\0') { sigFound = true; /* if (seekJump > 0){ console.log( 'initial = ' + initial + ', seekJump = ' + seekJump + ', moveSIG = ' + moveSIG ); } */ } }); seekJump++; } while (!sigFound && (seekJump < bytesLeft) ); this.skip(seekJump-1); return seekJump-1; } }, 'JHR': { 'FixedHeader': 'FixedHeaderInfoStruct', 'MessageHeaders': function(){ var mhArray = []; while (this.tell() < thisJAM.JHR.length - 69) { mhArray.push( this.parse('MessageHeader') ); } return mhArray; } } }); callback(null, parser.parse('JHR')); }); } module.exports = JAM; 

This sketch uses raw data caching from a JHR file inside the exported JAM object (in the JHR field ) —a solution that is not economical from the point of view of the current module design, but it will be useful if, along with the ReadHeaders method, you need a simpler method that reads , for example Only the FixedHeaderInfoStruct header. There are also fields for the other three JAM files (for JDT, and JDX, and JLR), but commented out. (Ideally, the cache should also be kept up-to-date — doing stat () , and not watchFile () , but it’s clear that for the initial draft of the module, this code will fit without it.)

The data types from the JAM documentation (for example, ulong ) are not specified by jParser tools (for example, “ 'ulong': 'uint32' ”), but are declared as JavaScript variables (for example, “ var ulong = 'uint32' ”), whose values ​​are used in description of data structures. This is for speed: it is clear that the V8 JavaScript engine code will work much faster than the jParser module code.

In the description of the SubField structure , you will find the commented type field - it is filled with a javascript function containing mnemonic field notations borrowed from the JAM documentation. Can be used for debugging purposes.

The Subfields field within the MessageHeader structure is defined in two ways. The first (fast) reads this field as a string of bytes the size of SubfieldLen . The second (commented out) fully processes this field, isolating the subfields by jParser - if the application using the module needs metadata from the variable part of the fidomail header in any case, then why postpone their analysis for a long time.

The AfterSubfields field contains a simple search for a string of three ASCII characters “JAM” followed by a null byte — the reason for this is set out in one of the previous paragraphs. The commented out console.log () call has a debugging meaning, no more. (The name of the moveSIG internal variable is an allusion to the meme " All your base are belong to .")

The number 69 in the description of the MessageHeaders field in the JHR structure is "magic"; its goal is to ensure that the analysis does not get too close to the end of the file, where you can also expect garbage data.

I checked the speed of the analysis with the help of this test script:

 var JAM = require('../'); var util = require('util'); console.log( new Date().toLocaleString() ); var blog = JAM('blog-MtW'); blog.ReadHeaders(function(err,data){ if (err) throw err; //console.log( util.inspect(data, false, Infinity, false) ); console.log( new Date().toLocaleString() ); }); 

The script is in the test subdirectory, so the first line uses a call to the parent directory, where the text of the main module is in the index.js file ; since this name is implied by default in Node.js , it suffices to specify only the parent directory.

The test data in the blog-MtW.jhr file contains the headers of my Fidonet blogogues ( Ru.Blog.Mithgol ) blog entries that have been accumulated since March 2007.

A single-core Pentium IV (2.2 GHz) test runs shows that headers are processed in three to four seconds. If the simple reading of the Subfields array is replaced by its analysis (which is now commented out), then this time is still doubled.

This is a lot for a single ehoconference, because on the Fidonet node such ehoconferences can easily be more than a hundred, and the total time for analyzing the echomail headers will turn out to be multi-minute.

But fidoshnikam certainly do not need to be reminded that the popular Fidonet mail editor GoldED (GoldED +, GoldED-NSF) scans echo conferences (at the beginning of their work) much faster, and their names flash on the status bar on his screen saver so quickly that it is easy to see - on each spent a fraction of a second, no more. One has to come to a unpleasant conclusion: javascript analysis of binary data, even on the fast V8 engine, works an order of magnitude slower - and not even slower than just one order of magnitude.

It only remains cynically to suspect that at the beginning of work GoldED reads for speed not the entire file, but only one header structure FixedHeaderInfoStruct (there would be enough data from it to display the number of messages in echo conferences, and more than GoldED does not do anything at the beginning of work ) , I can neither confirm nor deny this suspicion, because CVS GoldED + did not have time to figure it out.

I put the code for my module (JAM header reader) on Github under a free MIT license.

Source: https://habr.com/ru/post/144268/


All Articles