📜 ⬆️ ⬇️

[Go] [JS] And again about processing marc formats

I welcome, I already wrote 2 articles (on geektimes tyts tyts ) concerning the MARC formats.

Today I have an article with technical details, I cleaned the code of my decision, removed the magic from there, and even brushed my hair.

Under the cut: friendship go and js, hatred of marc formats
')


And so, let's start with the “core” - package, for working with marc formats, the package is written in go, covering 63% of tests.
https://github.com/t0pep0/marc21

The “head” of the entire package is the structure of MarcRecord

type MarcRecord struct { Leader *Leader directory []*directory VariableFields []*VariableField } 


And only two methods working with her, this

 func ReadRecord(r io.Reader) (record *MarcRecord, err error) func (mr *MarcRecord) Write(w io.Writer) (err error) 


To dwell on them, to be honest, I see no point. The only thing that ReadRecord returns to the end of the Reader is err == io.EOF.

We look further, we are interested in the Leader and VariableField structures, as well as why VariableField is made with a slice and not with a hashmap (because, as opposed to all sorts of standards and common sense, the situation of two different fields (by content), with one tag, is possible, running I’ll say that for SubField this is also true)

 type Leader struct { length int Status byte Type byte BibLevel byte ControlType byte CharacterEncoding byte IndicatorCount byte SubfieldCodeCount byte baseAddress int EncodingLevel byte CatalogingForm byte MultipartLevel byte LengthOFFieldPort byte StartCharPos byte LengthImplemenDefine byte Undefine byte } 


Leader structure, the right word, nothing interesting, just a set of flags, and the fact that it is not exported is used only for serialization / deserialization. Two methods are tied to it - serialization and deserialization, called from {Read, Write} Record (for other structures this is also true.

 type VariableField struct { Tag string HasIndicators bool Indicators []byte RawData []byte Subfields []*SubField } 


The structure of the "variable field". I just want to point out several interesting points - the three-word tags, RawData - could be made a line, but for me personally it was more convenient to work with an array of bytes. When serialized, if the field has no subfields (len (Subfields) == 0), then RawData is written, otherwise RawData is ignored

 type SubField struct { Name string Data []byte } 


Name - one character clipped
Data - again, you could use the string, but I decided so ...

There are no special nuances in the package, I can only say one thing on the go - before adding a field, make sure that the field has at least something other than the tag, otherwise you risk spending a lot of time thinking about the high and trying to understand why the export to OPAC \ IRBIS does not take place.

An example of code that does not change the data, but, in fact, simply copies one record file to another
 package main import ( "github.com/t0pep0/marc21" "io" "os" ) func main() { orig := os.Args[1] result := os.Args[2] origFile, _ := os.Open(orig) resultFile, _ := os.Create(result) for { rec, err := marc21.ReadRecord(origFile) if err != nil { if err == io.EOF { break } panic(err) } //  -   .... err = rec.Write(resultFile) if err != nil { panic(err) } } } 


Now let's go to https://github.com/HerzenLibRu/BatchMarc

In fact, this is the js interpreter https://github.com/robertkrimen/otto/ with the library connected to it, which was mentioned above.

 func main() { marcFile, err := os.Open(os.Args[1]) outFile, _ := os.Create(os.Args[2]) jsFile, _ := os.Open(os.Args[3]) jsBytes, _ := ioutil.ReadAll(jsFile) jsRules := string(jsBytes) if err != nil { return } for { rec, err := marc21.ReadRecord(marcFile) if err != nil { if err == io.EOF { break } panic(err) } if rec == nil { break } res := new(marc21.MarcRecord) js := NewJSMachine(rec, res) err = js.Run(jsRules) if err != nil { panic(err) } res.Write(outFile) } } 


The only difference from the previous code is that here we open a file with js, and create a js machine, passing its rules.

Let's take a closer look at the js machine and its constructor.

 type jsMachine struct { otto *otto.Otto source *marc21.MarcRecord destination *marc21.MarcRecord } func NewJSMachine(source, destination *marc21.MarcRecord) (js *jsMachine) { js = new(jsMachine) js.otto = otto.New() js.otto.Run(classJS) js.otto.Set("LoadSource", js.fillSource) js.otto.Set("WriteResult", js.getResult) js.source = source js.destination = destination return js } func (js *jsMachine) Run(src string) (err error) { _, err = js.otto.Run(src) if err != nil { return err } return nil } 


As we can see, everything is simple and trite, embedding is not consciously used.

Two functions are added to the standard otto distribution - LoadSource and WriteResult, plus class constructors added (MarcRecord, Leader, VariableField, VariableSubField)

I will not describe in detail the implementations of the function, but I will pay attention to an interesting point in otto is the Object type, to which you can reduce all the js variables. The Object type has a Call method (the same goes for the Set / Get methods), which allows you to call a variable method. So here - Object. Call does not allow to call a method at a nested class.
  source := call.Argument(0) if !source.IsObject() { return otto.FalseValue() } object := source.Object() //   jsValue, _ := object.Get("VariableField") jsVariableFields := jsValue.Object() jsValue, _ = jsVariableFields.Call("length") //   -   jsValue, _ = object.Call("VariableField.length") 

It is remarkable that swears on the type of error, and because of this, the right decision for a long time went to the head.

A few words about JS. Artificially created variables are not, just create an instance of the class from the MarcRecord constructor and load it with LoadSource (instance) to submit changes to go at the end of the script specify WriteResult (instance).

PullRequest \ IssueRequest - welcome.

Source: https://habr.com/ru/post/303782/


All Articles