📜 ⬆️ ⬇️

Reverse engineering of the binary format using the Korg SNG files as an example. Part 2



In the last article I described the line of reasoning when parsing an unknown binary data format. Using Synalaze It !, Hex-editor, I showed how you can parse the header of a binary file and select the main data blocks. Since, in the case of the SNG format, these blocks form a hierarchical structure, I managed to use recursion in grammar to automatically construct their tree representation in a form understandable to man.

In this article I will describe a similar approach, which I used to analyze the musical data directly. Using the built-in capabilities of the Hex editor, I will create a prototype of the data converter into the common and simple Midi format. We will have to face a number of pitfalls and break our head over the simple at first glance task of converting time samples. Finally, I will explain how you can use the results and grammar of a binary file to generate part of the code of a future converter.

Parsing music data


So, it's time to figure out how to store music data in .SNG files. In part, I mentioned this in the last article. The documentation of the synthesizer states that the SNG file can contain up to 128 “songs”, each of which consists of 16 tracks and one master track (for recording global events and changing master effects). Unlike the Midi format, where music events simply follow each other from a specific time delta, the SNG format contains musical measures.
')
A beat is a kind of container for a sequence of notes. The dimension of the beat is indicated in musical notation. For example, 4/4 - means that the beat contains 4 beats, each of which is equal to a quarter note in duration. Simply speaking, such a beat will contain 4 quarter notes or 2 half, or 8 eighths.

Here's what it looks like in a musical score.


The clock in the SNG file is used to edit the tracks in the built-in sequencer of the synthesizer. Using the menu, you can delete, add and duplicate beats anywhere in the track. You can also loop cycles or change their dimensions. Finally, you can simply start recording a track from any measure.

Let's try to see how all this is stored in a binary file. A common container for "songs" is the SGS1 block. The data for each song is stored in SDT1 blocks:



The SPR1 and BMT1 blocks store general song settings (tempo, metronome settings) and individual track settings (patches, effects, arpeggiator parameters, etc.). We are also interested in the TRK1 block - namely, it contains musical events. But you need to go down a couple more levels of hierarchy - to the block MTK1



Finally, we found our tracks - these are the MTE1 blocks. Let's try to write on the synthesizer an empty track of short duration and another one a little longer - in order to understand how the information about the beats is stored in binary form.



It seems that the clocks are stored as eight-byte structures. Add a couple of notes:



So, we can assume that all events are stored in the same form. The beginning of the MTE block contains as yet unknown information, then to the end goes the sequence of eight-byte structures. Open the grammar editor and create an event structure with a size of 8 bytes.

Add the mte1Chunk structure that inherits the childChunk , and place the reference to the event in the data structure. We point out that event can be repeated an unlimited number of times. Next, by experiment, find out the size and purpose of several bytes before the beginning of the stream of events of the track. I got the following:



The beginning of the MTE1 block stores the number of track events, its number and, presumably, the dimension of the event. After applying grammar, the block began to look like this:



We turn to the flow of events. After analyzing several files with different sequences of notes, the following picture appears:
#Type ofBinary representation
oneTact101 00 00 ...
2Note09 00 3C ...
3Note09 00 3C ...
fourNote09 00 3C ...
fiveTact201 C3 90 ...
6Note09 00 3C ...
7The end of the track03 88 70 ...

It looks like the first byte encodes the type of event. Add a type field to the event structure. Create two more structures that inherit event : measure and note . We indicate the corresponding Fixed Values ​​for each of them. And finally, add links to these structures in the data of the mte1Chunk block.



Apply the changes:



Well, we are well advanced. It remains to understand how the height and the force of pressing a note are coded, as well as the time shift of each event relative to the others. Try again to compare our files with the result of the export in midi, made through the menu of the synthesizer. This time, we are specifically interested in the events of pressing the notes.



The same events in the SNG file


Fine! It seems that the height and the force of pressing the notes are encoded in the same way as in the midi format with just a couple of bytes. Add the appropriate fields to the grammar.

With a temporary shift, unfortunately everything is not so simple.

Deal with duration and delta


In midi format, NoteOn and NoteOff events are separate. The duration of the note is determined by the delta time between these events. In the case of the SNG format, where there is no analogue of the NoteOff event, the duration and time delta values ​​should be stored in the same structure.

To find out exactly how they are stored, I recorded several sequences of notes of different lengths on the synthesizer.



Obviously, the data we need is in the last 4 bytes of the event structure. With the naked eye, the patterns are not visible, so select the bytes of interest in the editor and use the Data Panel tool.

Hidden text


Apparently, both the duration of the note and the time shift are encoded by a pair of bytes (UInt16). At the same time, the reverse byte order is Little Endian. Comparing a sufficient amount of data, I found out that the time delta here is not calculated from the previous event as in midi, but from the start of the clock. If the note ends in the next bar, then in the current one its length will be 0x7fff, and in the next one it will be repeated with the same delta 0x7fff and the length counted relative to the start of the new bar. Appropriately if a note sounds a few bars, then in each intermediate, both the duration and the delta will be equal to 0x7fff.

Small scheme

The units of time delta / duration count in the cells. Note 1 sounds normal, and note 2 continues to sound in the 2nd and 3rd bars.

In my opinion, it all looks a bit crutch. On the other hand, in musical notation, notes continuously sounding a few beats are denoted similarly with the help of legato.

In which "parrots" do we have the duration? As in midi, ticks are used here. From the documentation it is known that the duration of one share is 480 ticks. At a pace of 100 beats per minute and a dimension of 4/4, the duration of a quarter note will be (60/100) = 0.6 seconds. Accordingly, the duration of one tick 0.6 / 480 = 0.00125 seconds. A standard 4/4 beat will last 4 * 480 = 1920 ticks or 2.4 seconds at a rate of 100 beats / min.

All this will be useful to us in the future. For now, add the duration and delta to our note structure. Also, we note that in the structure of a clock there is a field that stores the number of events. Another field contains the serial number of the measure - add them to the measure structure.



Prototype Converter


Now we have enough information to try to convert the data. Hex-editor Synalaze It in the pro version allows you to write scripts in python or lua. When creating a script, you need to decide what we want to work with: the grammar itself, with individual files on the disk, or somehow process parsed data. Unfortunately, each of the templates has some limitations. The program provides a number of classes and methods for work, but not all of them are accessible from all templates. Perhaps this is a lack of documentation, but I did not find how you can load the grammar for a list of files, parse them and use the resulting structures to export data.

Therefore, we will create a script to work with the result of parsing the current file. This pattern implements three methods: init, terminate, and processResult. The latter is called automatically and recursively passes through all the structures and data obtained during the parsing.

We use the Python MIDI toolkit (https://github.com/vishnubob/python-midi) to write the converted data to midi. Since we are implementing the Proof of Concept, we will not convert the durations of notes and deltas. Instead, we set fixed values. Notes with a duration of 0x7fff or with a similar delta while just discard.

The capabilities of the built-in script editor are very limited, so all the code will have to be placed in one file.

gist.github.com/bkotov/71d7dfafebfe775616c4bd17d6ddfe7b

So, we will try to convert the file and hear what we got.


Hmm ... And it turned out pretty interesting. The first thing that occurred to me when I tried to formulate what it looked like was structureless music. I will try to give a definition:

Structureless music is a musical work with a reduced structure, built on harmony. The duration of the notes and the intervals between the notes are abolished or reduced to the same values.

A sort of harmonious noise. Let it be pearlescent (by analogy with white, blue, red, pink, etc.), like no one has taken this combination yet.

Perhaps we should try to teach the neural network on my data, perhaps the result will be interesting.

Warming up mind


This is all great, but the main task is still not solved. We need to convert the duration of the notes into NoteOff events, and the temporal displacement of the event relative to the start of the clock to the time delta between neighboring events. I will try to formulate the conditions of the problem more formally.

Task
:
1
1
2
3
...
N
2
...
N
1
...



: 1
: 1920
: Int
: Int


: 9
: 0-127
: 0-127
: 0-1920 0xFF
: 0-1920 0xFF

, , 0xFF, =0xFF . , . = = 0xFF.

.

midi. :

:
: 9
: 0-127
: 0-127
: Int

:
: 8
: 0-127
: 0-127
: Int


The task is a bit simplified. In a real SNG file, each clock cycle can have a different dimension. In addition to the Note On / Off events, other events will also occur in the stream, such as pressing a sustain pedal or changing the pitch with a pitchBend.

I will give my solution to this problem in the next article (if it will be).

Current totals


Since the solution with the script does not scale to an arbitrary number of files, I decided to write a console converter in Swift. If I wrote a two-way converter, then the created grammar structures would be useful to me in the code. You can export them to C structures or any other language using the same script functionality built into Synalize It! A file with an example of such an export is created automatically when you select the Grammar template.



Currently, the converter is 99% complete (in the form that suits me in terms of functionality). I plan to put the code and grammar on github.

An example for which everything was started can be heard here .

How this fragment sounds in the finished form.

Source: https://habr.com/ru/post/442740/


All Articles