In search of the fat (The Quest For FAT)

When developing a certain software and hardware complex, it was necessary to create a client device, which for other devices should look like a regular USB flash drive, or if more formally, then a USB Mass Storage Device. The device’s unusualness is that it must imitate for the outside world the FAT file system with files of sufficiently large size (2GB and more), while the files themselves on the device are, of course, absent and are on the network. And in general these are not files, but some audio streams.

The task, at first glance, is simple: for every request to read a block (a SCSI command), we return the contents of this block. The block may either belong to any of the “files” or contain the FAT service information.

The first thought was, of course, to pack the FAT image using, for example, bzip2 and unpack it on the device as needed. There are three problems at once:

A compressed file system image of the heel of 100-megabyte files and several directories takes up something around 125Kb, which exceeds the amount of RAM on the device.
Bzip2, apparently, does not know anything about the sectors and works at the level of flows, therefore, it will have to be packed sector-by-sector, hence the image size will be quite outrageous: as you know, packing in small blocks is unprofitable - the overhead is large compared to the compression gain.
Preparing an image (mounting an empty file, creating a file system, packing) takes a lot of time and is hardly acceptable for implementation on a server with thousands of users. And the image itself in the process of creation requires some rather noticeable disk space.

')
Well, this is not to mention generally that you have to port bzip2 to the microcontroller.

Thus, it was necessary to invent something else.

The task can be put as follows: it is necessary to write a code that will take some kind of file system description as an input, and return its contents to each request for a sector number. The content is either service information or file data, which is taken from the corresponding audio stream at a given URL.

This formulation of the question leads us to the system of rules:

=>

Note that we are talking about sectors, not “clusters”, since a cluster is a concept of the FAT file system itself. The devices operate at the block level, they are sectors. Suppose our “playlist” contains 10 “files” of 2Gb each (2Gb is a practical approach to infinity). If each rule has a size of one byte, which, of course, is impossible, then we succeed.

2*1024*1024*1024 * 10 / 512 = 41 943 040

bytes to all the rules. Somewhat more sensible. But, of course, the rules are not unique to each sector. We will set the rules for the ranges of sectors. This leads us to a set of rules:

(A) =>
(A,B) =>

Let's try to pack also the sectors themselves. Since data compression tasks are not in front of us - the data itself is missing in the device and taken from the Web, then you just need to somehow more or less compactly present the service data of the file system itself. At first glance, there are many duplicate sequences in this data, so we will encode as follows: repeat sequences are presented as

( RLE, , )

The non-repeating sequences will be represented as

( Sequence, )

In addition, those sequences that we have already encoded, or parts of them would be good not to re-insert, but to refer to them. We probably have another sequence.

( , )

Perhaps in the implementation process, other sequences may appear for a more compact representation of the file system structures.

All this is very similar to the command system of a virtual machine, and since there are challenges, that is, the stack. The simplest known virtual machine is one of the varieties of the fort. In fact, this is the reverse.
Polish entry on steroids, with the added stack of return addresses from calls, which eliminates the need to tinker with the organization of frame functions: everything is extremely simple - when returning from a call, we remove
the upper word from the stack R and go to the address to which it points.

In addition, the token threaded code (and this will be it) for the two-stack machine has a very good density, which, in this case, is very convenient for us.

Interpreting such a code quickly, it turns out only an average of five times slower than the native code, and also very simple.

So, we have some kind of coding system, rule system and some virtual machine on which these rules should be played.

It remains to generate these rules from some description, get bytecode and implement the machine for its interpretation. And only then we will see what happened.

With the implementation of a virtual machine, the situation is simple: it will work on a microcontroller, respectively, so far there are no C options here. True, it is possible that there is nothing to write there - it will turn out to be generated by some
in a way.

It remains the generation of the system of rules from the description, the description itself, the generation of the code and the description of the commands of this code. In addition, it would be good to check the rules not consistently, but somehow more rationally: organize checks in the form of
tree comparisons, so that the number of comparisons for each sector was of the order of the binary logarithm of the number of comparisons.

We finished with the primary analysis, we should make a prototype and see what we can do.

We will need to generate and, possibly, read binary data of various dimensions and finiteness (FAT service data is written in low-endian format) and work with nested data structures.

What would it do? C, C ++ or maybe Python? Or Ruby? Joke.
Of course, we will do on Haskell: the task is not the easiest, some kind of performance is required, and we do not have much time. Well, anyway, the server that will call this code is also implemented on
Haskell, so the choice is quite natural.

Let's get started

The central thing in the system is the “rules”. The file system description is converted into them, code is generated from them. We describe them:

data Rule = REQ Int [Chunk] | RANGE Int Int [Chunk] deriving Show
data Chunk = SEQ BS.ByteString
| RLE Int Word8
deriving (Eq, Ord)

In addition, there is a description of the file system itself, which consists of directories and files, with some specific features of the FAT itself.

 data Entry = DirRoot Int [Entry] | DirDot Int | DirDotDot Int | Dir Int String [Entry] | File Int String Int BS.ByteString deriving (Eq, Ord, Data, Typeable, Show)

data Entry = DirRoot Int [Entry] | DirDot Int | DirDotDot Int | Dir Int String [Entry] | File Int String Int BS.ByteString deriving (Eq, Ord, Data, Typeable, Show)

Here we will stop in more detail. The strange constructors DirDor and DirDotDot are nothing more than directories '.' and '..', which are a surprise - are first-class, physically present directory entries. Fortunately they
they are only links and do not require cluster allocation.

Everything else is pretty obvious: the first attribute of type constructors is a unique identifier. He obviously can be useful to us in order to understand the firmware from which "file" the data was requested.

The second attribute is the file name. In the case of a file, add the same size and data. This, of course, is not the data of the file itself, but some indication of the device firmware, where this data is taken from. There you can write, for example, sishnuyu structure or URL stream. Therefore ByteString.

Now we need to somehow construct the Entry, taking into account the requirements of the file system: each directory, except the root directory, must contain the entries '.' and '..', they must refer to their respective directories; there should be no
identical record names, titles should not contain prohibited characters, and so on and so forth. It can be seen that it is hard to create this structure manually, besides, the user of the API must do this, and he will surely confuse something and everything will break, but the matter is serious. So it’s better to ban the import of Entry type content from our module, and provide the user with some more convenient and error-proof solution. Something like:

 fileContents = ... fatSample2 = filesystem $ do file "file0" (16384) fileContents dir "A" $ do file "file1" (megs 100) fileContents dir "C" $ do file "file3" (megs 100) fileContents file "file4" (megs 100) fileContents file "file5" (megs 100) fileContents dir "E" $ emptyDir dir "B" $ do file "file2" (megs 50) emptyFile

fileContents = ... fatSample2 = filesystem $ do file "file0" (16384) fileContents dir "A" $ do file "file1" (megs 100) fileContents dir "C" $ do file "file3" (megs 100) fileContents file "file4" (megs 100) fileContents file "file5" (megs 100) fileContents dir "E" $ emptyDir dir "B" $ do file "file2" (megs 50) emptyFile

It looks good, even those who do not know the language can understand what is being described here.
It is easy to implement: to generate something horrible, there is already a ready-made Writer monad.

In addition, we will need to distribute unique identifiers, so State is also useful, where we will put some kind of counter. Since we want to cross the State and Writer, we will not interfere with the monad transformer. Something like this:

 newtype EntryIdT ma = EntryIdT { runF :: (WriterT [Entry] (StateT (Int, Int) m)) a } deriving (Monad, MonadWriter [Entry], MonadState (Int, Int)) type EntryIdM = EntryIdT Identity runEntryIdM :: (Int, Int) -> EntryIdM a -> ([Entry], (Int, Int)) runEntryIdM init f = runState (execWriterT (runF f)) init filesystem :: EntryIdM () -> Entry filesystem f = DirRoot 0 dirs where dirs = fst $ runEntryIdM (1,0) f dir :: String -> EntryIdM () -> EntryIdM () file :: String -> Int -> (EntryInfo -> BS.ByteString) -> EntryIdM ()

newtype EntryIdT ma = EntryIdT { runF :: (WriterT [Entry] (StateT (Int, Int) m)) a } deriving (Monad, MonadWriter [Entry], MonadState (Int, Int)) type EntryIdM = EntryIdT Identity runEntryIdM :: (Int, Int) -> EntryIdM a -> ([Entry], (Int, Int)) runEntryIdM init f = runState (execWriterT (runF f)) init filesystem :: EntryIdM () -> Entry filesystem f = DirRoot 0 dirs where dirs = fst $ runEntryIdM (1,0) f dir :: String -> EntryIdM () -> EntryIdM () file :: String -> Int -> (EntryInfo -> BS.ByteString) -> EntryIdM ()

Each function accepts parameters like name, size and another monadic value for constructing nested records. Each such calculation will be launched in a separate Writer, and State will be pulled through a common one to guarantee the uniqueness of identifiers.

So, the directory structure was set, now we need to get some rules out of it.

To do this, you need to somehow place the data files and directories on the "disk".
We assume that they are placed sequentially, first the directories, then the files:

 data AllocEntry = AllocEntry { beginSect :: Int , endSect :: Int , entry :: Entry } deriving (Show) allocate :: ClustSize32 -> Int -> Entry -> [AllocEntry] allocate cl from = eFix . eAlloc . eOrder . filter eFilt . universe where eFilt (File _ _ _ _) = True eFilt (Dir _ _ _) = True eFilt (DirRoot _ _) = True eFilt _ = False eOrder = uncurry (++) . partition (not.isFile) eAlloc = reverse . snd . foldl fentry (from, []) fentry (n, xs) e = let sectors = entryLen cl e `div` fatSectLen begin = n end = begin + sectors - 1 n' = n + sectors allocated = AllocEntry begin end e in (n', allocated : xs) eFix = id

data AllocEntry = AllocEntry { beginSect :: Int , endSect :: Int , entry :: Entry } deriving (Show) allocate :: ClustSize32 -> Int -> Entry -> [AllocEntry] allocate cl from = eFix . eAlloc . eOrder . filter eFilt . universe where eFilt (File _ _ _ _) = True eFilt (Dir _ _ _) = True eFilt (DirRoot _ _) = True eFilt _ = False eOrder = uncurry (++) . partition (not.isFile) eAlloc = reverse . snd . foldl fentry (from, []) fentry (n, xs) e = let sectors = entryLen cl e `div` fatSectLen begin = n end = begin + sectors - 1 n' = n + sectors allocated = AllocEntry begin end e in (n', allocated : xs) eFix = id

The code as a whole is fairly obvious: take all the entries, remove the '.' and '..' which do not have their own clusters, but only point to others, do so that the directories go first, then the files (there is no difference, but it is more logical,
and even the table of contents of the volume will be read faster), we select sectors (it’s more convenient for us to work with sectors, “clusters” is an artificial concept) and that’s all.

It is worth noting the function of the universe module uniplate. It allows you to list all the elements of a nested structure as a list (optionally with a list comprehension), in order to avoid the routine writing of recursive walk functions.

For her sake, we declared the type Entry deriving (Data, Typeable) above.

Having the files placed by sectors, we already have nothing to generate rules for them:

 generateData :: Maybe CalendarTime -> ClustSize32 -> [AllocEntry] -> [Rule] generateData ct cl es = mergeRules $ execWriter $ do forM_ es $ \(AllocEntry {beginSect = a, endSect = b, entry = e}) -> do case e of DirRoot _ es -> writeEntries ab es Dir _ _ es -> writeEntries ab es File _ _ _ bs -> tell [RANGE ab (encodeBlock (BS.take (fatSectLen) bs))] where ...

generateData :: Maybe CalendarTime -> ClustSize32 -> [AllocEntry] -> [Rule] generateData ct cl es = mergeRules $ execWriter $ do forM_ es $ \(AllocEntry {beginSect = a, endSect = b, entry = e}) -> do case e of DirRoot _ es -> writeEntries ab es Dir _ _ es -> writeEntries ab es File _ _ _ bs -> tell [RANGE ab (encodeBlock (BS.take (fatSectLen) bs))] where ...

The encodeBlock function here is able to encode a ByteString into a sequence of rules, writeEntries generates directory entries and encodes them, and mergeRule attempts to merge sector ranges of successive rules.

Generating a single directory entry looks like this:

 entryRecordShort :: String -> Int -> Int -> Maybe CalendarTime -> [ATTR] -> BS.ByteString entryRecordShort nm size clust clk a = runPut $ do putNameASCII nm -- Name putWord8 (fatAttrB a) -- Attr putWord8 0 -- NTRes putWord8 0 -- CrtTimeTenth putWord16le cT -- CrtTime putWord16le cD -- CrtDate putWord16le cD -- LstAccDate putWord16le cHi -- FstClusHI putWord16le cT -- WrtTime putWord16le cD -- WrdDate putWord16le cLo -- FstClusLO putWord32le (fromIntegral size) -- FileSize where ...

entryRecordShort :: String -> Int -> Int -> Maybe CalendarTime -> [ATTR] -> BS.ByteString entryRecordShort nm size clust clk a = runPut $ do putNameASCII nm -- Name putWord8 (fatAttrB a) -- Attr putWord8 0 -- NTRes putWord8 0 -- CrtTimeTenth putWord16le cT -- CrtTime putWord16le cD -- CrtDate putWord16le cD -- LstAccDate putWord16le cHi -- FstClusHI putWord16le cT -- WrtTime putWord16le cD -- WrdDate putWord16le cLo -- FstClusLO putWord32le (fromIntegral size) -- FileSize where ...

It uses the extremely useful PutM monad from Data.Binary.Put, which allows you to output data of any capacity and finish to a lazy byte string.

So, we have the directory structure of the FAT volume, we have their placement by sector and the corresponding rules. What is left for us?

Here you need to step aside a bit and remember the device FAT. If you do not go into the extra details, widely available on the web and literature, then FAT32 is designed as follows:

     | BootSect | FAT32 Info | FAT1 | FAT2 | DATA |

So far we only have rules for DATA. FAT1 and FAT2 are cluster allocation tables. Each file or directory (which is also a file) occupies a chain of clusters in a data region, and each cluster of a data region is represented by a 32-bit value in FAT1 and FAT2 (they are identical).

The number of the next file cluster is recorded in each FAT cell, the last cluster is marked with a special value. The number of the first file cluster is indicated in the directory entry. Our data is placed sequentially, so that each cell of the chain will contain the number N + 1, where N is the previous value.

Here, the first problem arises: for our calculated 10 x 20Gb, this table will occupy as many as 655360 32-bit values, which again exceeds the available RAM. However, these rules cannot be compressed.
our primitive packaging algorithm RLE, since there are no duplicate values. However, since we were able to generate this sequence once, then we can probably generate it again on the device.

A close look at it showed that the values in one sector of the allocation table depend on the maximum value in the previous one, and in general the sequence is determined by the expression:

     Na = BASE + (Nsect - M) * STEP
     Ni <- [Na, Na + 1 ..]

where Na is the first value for this sector, Nsect is the number of the requested sector (will be at the top of the stack of our fort machine), M, BASE and STEP are constants calculated statically, Ni is the i-th number of the sequence, and only in the sector, obviously 512/4.

So we got a new sequence that generates a series of values based on dynamic data (sector number). Add types for this sequence and contiguous ones:

 data Chunk = SEQ BS.ByteString | RLE Int Word8 | SER Word32 Word32 | NSER Word32 Int Word32 -- base offset step | CALLBACK Word8 deriving (Eq, Ord)

data Chunk = SEQ BS.ByteString | RLE Int Word8 | SER Word32 Word32 | NSER Word32 Int Word32 -- base offset step | CALLBACK Word8 deriving (Eq, Ord)

Looking ahead, we will add another rule for callback, which should be called after generating the file data sector, so that the device firmware would take the buffer and fill it with real data.

It would have been possible to immediately generate the table as a set of rules, but for some reason I needed it in binary form, besides, there is already a debugged function for encoding binary strings, and in direct generation it is easy
make mistakes.

This table is quite large, and in the case of a large data area and a small cluster size, poor Haskell is tight.

At some point, the application became very bad from a large lazy Word32 list, so I had to quickly rewrite to lazy byte strings and use runPut / runGet to put 32-bit values there and extract them from there.

Surprisingly, this led to an acceleration of about ten times and everything began to work at an acceptable speed, although, of course, you should rewrite it in such a way that it would immediately generate rules and not create data.
But for the concept goes well.

The functions of generating the table and the rules for it are omitted; they are rather large, but quite obvious:

 type ClusterTable = BS.ByteString genFAT :: ClustSize32 -> Int -> [AllocEntry] -> ClusterTable encodeFAT :: Int -> ClusterTable -> [Rule]

type ClusterTable = BS.ByteString genFAT :: ClustSize32 -> Int -> [AllocEntry] -> ClusterTable encodeFAT :: Int -> ClusterTable -> [Rule]

The coding function of the table first assigns one sector REQ a (NSER _ _ _) to each sector, then considers the sectors in pairs, and if two sectors form a common sequence of values, the rule for the sector is replaced by the rule for a range of sectors, the result is quite compact for so you can bring it here:

 REQ 32 [SEQ [F8], RLE 2 255, SEQ [0F], RLE 3 255, SEQ [0F],
         RLE 3 255, SEQ [0F], RLE 3 255, SEQ [0F], RLE 3 255,
         SEQ [0F], RLE 3 255, SEQ [0F], RLE 3 255, SEQ [0F],
         SEQ [08], RLE 3 0, SEQ [09], RLE 3 0, SEQ [0A],
         RLE 3 0, RLE 3 255, SEQ [0F], SER 12 128]
 RANGE 33,231 [NSER 129 33,128]
 REQ 232 [SER 25601 25610, RLE 3 255, SEQ [0F], SER 25612 25728]
 RANGE 233 431 [NSER 25729 233 128]
 REQ 432 [SER 51201 51210, RLE 3 255, SEQ [0F], SER 51212 51328]
 RANGE 433 631 [NSER 51329 433 128]
 REQ 632 [SER 76801 76810, RLE 3 255, SEQ [0F], SER 76812 76928]
 RANGE 633 831 [NSER 76929 633 128]
 REQ 832 [SER 102401 102410, RLE 3 255, SEQ [0F], SER 102412 102528]
 RANGE 833 931 [NSER 102529 833 128]
 REQ 932 [SER 115201 115210, RLE 3 255, SEQ [0F], RLE 468 0]
 RANGE 933 1056 [RLE 512 0]

It looks promising, clearly better than a two megabyte piece of data.
The second copy of the table coincides up to constants, so that in the long run you can replace this sequence by subtracting the constant from the offset and calling the first table. But this later.

So, we have FAT1, FAT2 and DATA. It remains to get only BootSect and FAT32 Info. This is static binary data, so again we use Data.Binary.Put, and then we pack into the rules.

These two modules (Put and Get) are literally irreplaceable and I personally quote them higher than the binary patterns in Erlang, although this is subjective.

 fatGenBoot32 :: FAT32GenInfo -> BS.ByteString fatGenBoot32 info = addRsvd $ runPut $ do -- BOOT AREA sect0 putBytes [0xEB, 0x58, 0x90] -- 0 JmpBoot putBytes bsOEMName -- OEMName putWord16le bps -- BytesPerSec putWord8 spc -- SectPerClust putWord16le rsvd -- ReservedSecCnt putWord8 2 -- NumFATs putWord16le 0 -- RootEntCnt putWord16le 0 -- TotSec16 putWord8 0xF8 -- Media putWord16le 0 -- FAT16Sz putWord16le 0x3F -- SectPerTract putWord16le 0xFF -- NumHeads putWord32le 0 -- HiddSec putWord32le sectNum -- TotSec32 -- FAT32 Structure putWord32le fsect -- FATSz32 -- ... --

fatGenBoot32 :: FAT32GenInfo -> BS.ByteString fatGenBoot32 info = addRsvd $ runPut $ do -- BOOT AREA sect0 putBytes [0xEB, 0x58, 0x90] -- 0 JmpBoot putBytes bsOEMName -- OEMName putWord16le bps -- BytesPerSec putWord8 spc -- SectPerClust putWord16le rsvd -- ReservedSecCnt putWord8 2 -- NumFATs putWord16le 0 -- RootEntCnt putWord16le 0 -- TotSec16 putWord8 0xF8 -- Media putWord16le 0 -- FAT16Sz putWord16le 0x3F -- SectPerTract putWord16le 0xFF -- NumHeads putWord32le 0 -- HiddSec putWord32le sectNum -- TotSec32 -- FAT32 Structure putWord32le fsect -- FATSz32 -- ... --

We let our packer on the result, merge the rules into ranges, and get the final list of rules describing our entire file system.

So, we have a set of rules. It remains to generate for them a tree of comparisons and
compile it all into bytecode.

Let's start with the tree:

 data CmpTree = GEQ Int CmpTree CmpTree | CODE [Rule] deriving (Show) mkCmpTree :: [Rule] -> CmpTree mkCmpTree r = mkTree' rulemap where rulemap = M.fromList $ map (\x -> (fsect x, x)) r splitGeq nm = let (a, b, c) = M.splitLookup nm in (a, c `M.union` (maybe M.empty (M.singleton n) b)) mkTree' xs | M.null xs = CODE [] | M.size xs < 3 = CODE (map snd (M.toList xs)) | otherwise = let ks = map fst $ M.toAscList xs n = ks !! (length ks `div` 2) (le, geq) = splitGeq n xs in GEQ n (mkTree' le) (mkTree' geq)

data CmpTree = GEQ Int CmpTree CmpTree | CODE [Rule] deriving (Show) mkCmpTree :: [Rule] -> CmpTree mkCmpTree r = mkTree' rulemap where rulemap = M.fromList $ map (\x -> (fsect x, x)) r splitGeq nm = let (a, b, c) = M.splitLookup nm in (a, c `M.union` (maybe M.empty (M.singleton n) b)) mkTree' xs | M.null xs = CODE [] | M.size xs < 3 = CODE (map snd (M.toList xs)) | otherwise = let ks = map fst $ M.toAscList xs n = ks !! (length ks `div` 2) (le, geq) = splitGeq n xs in GEQ n (mkTree' le) (mkTree' geq)

Maybe not the best option, but the rules turned out less than a hundred, you can not worry yet.

The case for the virtual machine, a set of commands and a compiler:

 -  ,     -     class OpcodeCL a where isRLE :: a -> Bool arity0 :: a -> Bool arity1 :: a -> Bool arity2 :: a -> Bool arity3 :: a -> Bool firstCode :: a lastCode :: a data Opcode = DUP | DROP | CONST | CRNG | JNZ | JZ | JGQ | JNE | JMP | CALLT | CALL | RET | NOT | EQ | NEQ | GT | LE | GQ | LQ | RNG | LOADS2 | LOADS3 | LOADS4 | LOADS5 | LOADS6 | LOADS7 | LOADS8 | LOADS9 | LOADS10 | LOADSN | SER | NSER | NSER128 | RLE1 | RLE2 | RLE3 | RLE4 | RLE5 | RLE6 | RLE7 | RLE8 | RLE16 | RLE32 | RLE64 | RLE128 | RLE256 | RLE512 | RLEN | OUTLE | OUTBE | OUTB | NOP | CALLN | DEBUG | EXIT deriving (Eq, Ord, Enum, Show) data CmdArg = W32 Word32 | W16 Word16 | W8 Word8 | ADDR Addr data Addr = ALabel Label | AOffset Int data Cmd = Cmd0 Opcode | CmdConst Word32 | Cmd1 Opcode CmdArg | Cmd2 Opcode CmdArg CmdArg | Cmd3 Opcode CmdArg CmdArg CmdArg | CmdJmp Opcode Addr | CmdCondJmp Opcode Addr | CmdLabel Label | RawByte Word8 type Label = Int type Block = (Label, [Cmd])

- , - class OpcodeCL a where isRLE :: a -> Bool arity0 :: a -> Bool arity1 :: a -> Bool arity2 :: a -> Bool arity3 :: a -> Bool firstCode :: a lastCode :: a data Opcode = DUP | DROP | CONST | CRNG | JNZ | JZ | JGQ | JNE | JMP | CALLT | CALL | RET | NOT | EQ | NEQ | GT | LE | GQ | LQ | RNG | LOADS2 | LOADS3 | LOADS4 | LOADS5 | LOADS6 | LOADS7 | LOADS8 | LOADS9 | LOADS10 | LOADSN | SER | NSER | NSER128 | RLE1 | RLE2 | RLE3 | RLE4 | RLE5 | RLE6 | RLE7 | RLE8 | RLE16 | RLE32 | RLE64 | RLE128 | RLE256 | RLE512 | RLEN | OUTLE | OUTBE | OUTB | NOP | CALLN | DEBUG | EXIT deriving (Eq, Ord, Enum, Show) data CmdArg = W32 Word32 | W16 Word16 | W8 Word8 | ADDR Addr data Addr = ALabel Label | AOffset Int data Cmd = Cmd0 Opcode | CmdConst Word32 | Cmd1 Opcode CmdArg | Cmd2 Opcode CmdArg CmdArg | Cmd3 Opcode CmdArg CmdArg CmdArg | CmdJmp Opcode Addr | CmdCondJmp Opcode Addr | CmdLabel Label | RawByte Word8 type Label = Int type Block = (Label, [Cmd])

Alas, here a simple type system of Haskell begins to be missed: I want to set compile time invariants for commands and their classes, so that, for example, you cannot create a command with the wrong opcode. But you can’t just do it, but you don’t want to introduce a separate type for each opcode, an existential data type for the team and still use metaprogramming to generate opcodes.

Postpone until better times, let's do what we have. Anyway, in order to implement a virtual machine, you will have to write tests, so the errors planted will pop up there.

So, the virtual machine command system is there, now we need to compile into it a tree of comparisons built from our rules:

 mkVMCode :: CmpTree -> [Block] mkVMCode xs = normalize maxl code -- skip scanT :: CmpTree -> GenM () scanT (GEQ n left right) = do s <- newLabel l <- runGen' (scanT left) >>= withLabel r <- runGen' (scanT right) >>= withLabel _ex <- newLabel label s dup cnst n jgq (labelOf r) block l >> jmp _ex block r >> label _ex scanT (CODE []) = op0 EXIT scanT (CODE rules) = mapM_ scanR rules scanR :: Rule -> GenM () scanR ( REQ n code ) = do s <- newLabel code' <- runGen' (scanmC code) >>= withLabel ex <- newLabel label s dup cnst n jne ex block code' label ex scanR ( RANGE ab code ) = do s <- newLabel code' <- runGen' (scanmC code) >>= withLabel ex <- newLabel label s dup crng ab jz ex block code' label ex -- skip

mkVMCode :: CmpTree -> [Block] mkVMCode xs = normalize maxl code -- skip scanT :: CmpTree -> GenM () scanT (GEQ n left right) = do s <- newLabel l <- runGen' (scanT left) >>= withLabel r <- runGen' (scanT right) >>= withLabel _ex <- newLabel label s dup cnst n jgq (labelOf r) block l >> jmp _ex block r >> label _ex scanT (CODE []) = op0 EXIT scanT (CODE rules) = mapM_ scanR rules scanR :: Rule -> GenM () scanR ( REQ n code ) = do s <- newLabel code' <- runGen' (scanmC code) >>= withLabel ex <- newLabel label s dup cnst n jne ex block code' label ex scanR ( RANGE ab code ) = do s <- newLabel code' <- runGen' (scanmC code) >>= withLabel ex <- newLabel label s dup crng ab jz ex block code' label ex -- skip

This is the favorite way to generate all sorts of things using eDSL, built on top of the Writer monad.

The generation of a flat code from the comparison tree leads to a set of "snot", for example, to long chains of exit from the blocks:

 L1: ... JMP L2 L2: JMP L3 L3: JMP L4 L4: EXIT

L1: ... JMP L2 L2: JMP L3 L3: JMP L4 L4: EXIT

jumping into the next blocks and so on. normalize eliminates these outrages, and breaks the code into blocks, each of which begins with a label and ends with an unconditional command to go to the next block. There are no conditional or unconditional jump commands inside the block, they are valid only at the end. We need such blocks to calculate the label offset. After it, you can merge the blocks, getting rid of unnecessary transitions completely.

Let's write the Show instance for our baytkod for a beautiful print of the fort and see what we get after optimizing the blocks:

 ... L215: DUP CONST 2122 JGQ L220 DUP CRNG 00000843 00000849 JZ L235 RLE512 00 EXIT L220: DUP CRNG 0000084A 000C8869 JZ L223 LOADS2 BYTE 48 BYTE 45 RLE2 4C LOADS7 BYTE 4F BYTE 20 BYTE 57 BYTE 4F BYTE 52 BYTE 4C BYTE 44 RLE2 21 CALLN 00 EXIT L223: DUP CRNG 000C886A 000E1869 JZ L235 RLE512 00 CALLN 00 ;;       --- ,    EXIT ;;  L235: EXIT ... L0: LOADS5 BYTE 02 BYTE 08 BYTE 20 BYTE 00 BYTE 02 RET ...

... L215: DUP CONST 2122 JGQ L220 DUP CRNG 00000843 00000849 JZ L235 RLE512 00 EXIT L220: DUP CRNG 0000084A 000C8869 JZ L223 LOADS2 BYTE 48 BYTE 45 RLE2 4C LOADS7 BYTE 4F BYTE 20 BYTE 57 BYTE 4F BYTE 52 BYTE 4C BYTE 44 RLE2 21 CALLN 00 EXIT L223: DUP CRNG 000C886A 000E1869 JZ L235 RLE512 00 CALLN 00 ;; --- , EXIT ;; L235: EXIT ... L0: LOADS5 BYTE 02 BYTE 08 BYTE 20 BYTE 00 BYTE 02 RET ...

Not perfect, but there are no snot, the general code is partially highlighted in the procedures, a branch tree is available. Will go.

It remains for something to run, for this we need to implement, finally, the virtual machine itself.

You can simply write it in C, since only opcodes change significantly, but experience shows that it is better to generate it all than to monitor the consistency of opcodes and C code. There is no way to check this, and the situation when the compiler produces one thing, and vm wants to interpret something completely different, is quite likely. So it's better to generate everything. Sketch again the mini-eDSL to generate C, so as not to bother closing the brackets, indents and semicolons.

Again Writer, no variety ...

 stubs :: String stubs = envFile $ do comment "top of the file" put "#include <stdint.h>" put "#include \"emufatstubs.h\"" defines ... stmt (pt codeType ++ op `assign` "code") endl push a "n" put "for(;;)" braces $ indented $ do put "switch(*op)" braces $ do forM_ codes $ \op -> do put (printf "case %s:" (show op)) indented $ decode op endl put "default:" indented $ exit exitLabel indented $ stmt "return 0" ... decode (CRNG) = do skip "1" stmt (tmp0 `assign` pop a) stmt (tmp1 `assign` decode32) >> skip "4" stmt (tmp2 `assign` decode32) >> skip "4" push a ( _and (tmp0 `gq` tmp1) (tmp0 `lq` tmp2) ) next decode (CALL) = do skip "1" stmt (tmp0 `assign` decode32) >> skip "4" stmt (push' r pc') jump tmp0 ...

stubs :: String stubs = envFile $ do comment "top of the file" put "#include <stdint.h>" put "#include \"emufatstubs.h\"" defines ... stmt (pt codeType ++ op `assign` "code") endl push a "n" put "for(;;)" braces $ indented $ do put "switch(*op)" braces $ do forM_ codes $ \op -> do put (printf "case %s:" (show op)) indented $ decode op endl put "default:" indented $ exit exitLabel indented $ stmt "return 0" ... decode (CRNG) = do skip "1" stmt (tmp0 `assign` pop a) stmt (tmp1 `assign` decode32) >> skip "4" stmt (tmp2 `assign` decode32) >> skip "4" push a ( _and (tmp0 `gq` tmp1) (tmp0 `lq` tmp2) ) next decode (CALL) = do skip "1" stmt (tmp0 `assign` decode32) >> skip "4" stmt (push' r pc') jump tmp0 ...

Let's see what we got:

 #define DEFSTACK(n, t, l) ... #define RESET(a) ... #define PTOP(a) ... #define TOP(a) ... #define POP(a) ... #define PUSH(a,v) ... #define NEXT(x) ... #define JUMP(x, b, o) ... #define SKIP(x, n) ... #define PC(x, b) ... #define DECODE32(op) ... #define DECODE8(op) ... ... DEFSTACK(a, uint32_t, 16); DEFSTACK(r, uint32_t, 8); uint32_t tmp0; uint32_t tmp1; uint32_t tmp2; uint32_t tmp3; ... uint8_t *op = code; PUSH(a, n); for(;;) { switch(*op) { ... case CRNG: SKIP(op, (1)); tmp0 = POP(a); tmp1 = DECODE32(op); SKIP(op, (4)); tmp2 = DECODE32(op); SKIP(op, (4)); PUSH(a, ((tmp0 >= tmp1) && (tmp0 <= tmp2))); NEXT(op); ... case CALL: SKIP(op, (1)); tmp0 = DECODE32(op); SKIP(op, (4)); PUSH(r, PC(op, code)); JUMP(op, code, tmp0); ... case EXIT: goto _exit; default: goto _exit; } } _exit: return 0; ...

#define DEFSTACK(n, t, l) ... #define RESET(a) ... #define PTOP(a) ... #define TOP(a) ... #define POP(a) ... #define PUSH(a,v) ... #define NEXT(x) ... #define JUMP(x, b, o) ... #define SKIP(x, n) ... #define PC(x, b) ... #define DECODE32(op) ... #define DECODE8(op) ... ... DEFSTACK(a, uint32_t, 16); DEFSTACK(r, uint32_t, 8); uint32_t tmp0; uint32_t tmp1; uint32_t tmp2; uint32_t tmp3; ... uint8_t *op = code; PUSH(a, n); for(;;) { switch(*op) { ... case CRNG: SKIP(op, (1)); tmp0 = POP(a); tmp1 = DECODE32(op); SKIP(op, (4)); tmp2 = DECODE32(op); SKIP(op, (4)); PUSH(a, ((tmp0 >= tmp1) && (tmp0 <= tmp2))); NEXT(op); ... case CALL: SKIP(op, (1)); tmp0 = DECODE32(op); SKIP(op, (4)); PUSH(r, PC(op, code)); JUMP(op, code, tmp0); ... case EXIT: goto _exit; default: goto _exit; } } _exit: return 0; ...

Well, that should be. An important caveat: in order for the switch to compile into the transition table, it is necessary that the values of its labels go in sequence and have no holes. And probably fit in bytes. In case of violation of these heuristics, C compilers can generate a tree of comparisons, which in this case does not suit us at all. We provided the sequence of opcodes by defining an Enum instance for our type of Opcode (see above).

What a pity that such a low-level, it would seem, C does not have in the standard ways to go to a variable address, even though GCC supports this extension. But not all interesting platforms have GCC, so we restrict ourselves to a switch-based interpretation.

Our virtual machine is ready. We write tests for her. This is easy — let the test VM accept the bytecode stream as input, generate the contents of the buffer as a result of their interpretation, and give it to the output stream. Each test case, therefore, will be considered passed if the contents of the buffer ultimately meet expectations.

Let's write tests ...

 testJne = makeTest $ do [l1, l2] <- replicateM 2 newLabel cnst 1 cnst 2 jne l1 exit label l1 cnst 0xCAFEBABE -- 1 outle cnst 1 cnst 1 jne l2 cnst 0xCAFEBABE -- 2 outle exit label l2 cnst 0xFFFFFFFF outle

testJne = makeTest $ do [l1, l2] <- replicateM 2 newLabel cnst 1 cnst 2 jne l1 exit label l1 cnst 0xCAFEBABE -- 1 outle cnst 1 cnst 1 jne l2 cnst 0xCAFEBABE -- 2 outle exit label l2 cnst 0xFFFFFFFF outle

... and test cases:

 tests = testSuite $ do ... test "testJne" testJne (assert $ do a <- getWord32le b <- getWord32le return $ a == 0xCAFEBABE && b == 0xCAFEBABE)

tests = testSuite $ do ... test "testJne" testJne (assert $ do a <- getWord32le b <- getWord32le return $ a == 0xCAFEBABE && b == 0xCAFEBABE)

and the shell to run them:

 runTest :: String -> Test -> IO Bool runTest path (T{tname=nm, tcode=code, tcheck = tc})= do let bin = toBinary code (inp,out,err,pid) <- runInteractiveProcess path [] Nothing Nothing BS.hPut inp bin hClose inp res <- BS.hGetContents out let r = tc res hPutStrLn stderr (printf "test %-24s : %s" nm (if r then "PASSED" else "FAILED !")) return r ... case args of ... ... -> mapM_ (runTest path) tests ... ...

runTest :: String -> Test -> IO Bool runTest path (T{tname=nm, tcode=code, tcheck = tc})= do let bin = toBinary code (inp,out,err,pid) <- runInteractiveProcess path [] Nothing Nothing BS.hPut inp bin hClose inp res <- BS.hGetContents out let r = tc res hPutStrLn stderr (printf "test %-24s : %s" nm (if r then "PASSED" else "FAILED !")) return r ... case args of ... ... -> mapM_ (runTest path) tests ... ...

Run, fix all problems and crashes in the core (surprisingly few)

 ... test testJgq : PASSED test testJne : PASSED test testCallRet1 : PASSED ...

... test testJgq : PASSED test testJne : PASSED test testCallRet1 : PASSED ...

and run it all together:

 ... helloFile = const $ BS8.pack "HELLO WORLD!!" fatSample2 = filesystem $ do file "file0" (16384) helloFile dir "A" $ do file "file1" (megs 100) helloFile dir "C" $ do file "file3" (megs 100) helloFile file "file4" (megs 100) helloFile file "file5" (megs 100) helloFile dir "E" $ emptyDir dir "B" $ do file "file2" (megs 50) emptyFile ... $ ./FatGen bin | cbits/genfat 1000000 > fat.img 521106 / 1000000 ( 13027 kb/s) $ fsck.vfat ./fat.img dosfsck 3.0.9, 31 Jan 2010, FAT32, LFN Free cluster summary uninitialized (should be 15863) ./fat.img: 10 files, 115209/131072 clusters $ sudo mount -o loop ./fat.img /mnt/test2/ $ find /mnt/test2/ /mnt/test2/ /mnt/test2/FILE0 /mnt/test2/A /mnt/test2/A/FILE1 /mnt/test2/A/C /mnt/test2/A/C/FILE3 /mnt/test2/A/C/FILE4 /mnt/test2/A/C/FILE5 /mnt/test2/A/C/E /mnt/test2/B /mnt/test2/B/FILE2

... helloFile = const $ BS8.pack "HELLO WORLD!!" fatSample2 = filesystem $ do file "file0" (16384) helloFile dir "A" $ do file "file1" (megs 100) helloFile dir "C" $ do file "file3" (megs 100) helloFile file "file4" (megs 100) helloFile file "file5" (megs 100) helloFile dir "E" $ emptyDir dir "B" $ do file "file2" (megs 50) emptyFile ... $ ./FatGen bin | cbits/genfat 1000000 > fat.img 521106 / 1000000 ( 13027 kb/s) $ fsck.vfat ./fat.img dosfsck 3.0.9, 31 Jan 2010, FAT32, LFN Free cluster summary uninitialized (should be 15863) ./fat.img: 10 files, 115209/131072 clusters $ sudo mount -o loop ./fat.img /mnt/test2/ $ find /mnt/test2/ /mnt/test2/ /mnt/test2/FILE0 /mnt/test2/A /mnt/test2/A/FILE1 /mnt/test2/A/C /mnt/test2/A/C/FILE3 /mnt/test2/A/C/FILE4 /mnt/test2/A/C/FILE5 /mnt/test2/A/C/E /mnt/test2/B /mnt/test2/B/FILE2

Everything works as expected: a file system image is generated, tested, and mounted. Content corresponds to the description on our eDSL.

The size of the compiled rule file is slightly more than 2Kb and can be further optimized, 2Kb is quite an acceptable size for dynamic download even via GSM / EDGE, not to mention 3G.

The performance of the fort is also optimized, not to mention that in the most extreme case it can be compiled into C and then into native processor code.

Here is a small story about the benefits of Haskella in the national economy.

Source: https://habr.com/ru/post/134432/

All Articles

In search of the fat (The Quest For FAT)

More articles: