📜 ⬆️ ⬇️

Star Wars in source code


I have long had the idea to implement something interesting related to quineas. And finally it turned out to bring this thing to the end. I once saw the implementation of the game "Life" in the source code: Self Printing Game of Life and I wanted to do something similar. After some thought, I remembered that there are ASCII graphics . But it would be even more interesting and more difficult to make animation in ASCII graphics. It did not take long to find an example, almost immediately the Star Wars animation was found, more precisely, 4 episodes of “New Hope” called Asciimation . The work of the Japanese programmer Yusuke Endoh "Quine Relay" also spurred me to implement this idea. In the process of implementing such a “film”, many other problems had to be solved, which, however, was not to be regretted.

The resulting source is: gist . To make it easier to “view,” it also contains a script that will twist the entire animation. The sources of the entire project for its generation, the interface and other utilities are laid out on the githaba: Freaky-Sources (The quine-palindromes, which I already wrote about, are included there). It is clear that such a thing does not carry any practical benefits, it is just just for fun.

Introduction


Animation in a quine can only be output in one way: output each subsequent frame to the console at each compilation. Accordingly, the speed of the “movie” in this case depends on the size of the code (the larger the code, the longer it will compile and flip through the console), the decompression algorithm, the compilation speed (of iron). Moreover, the code is mainly occupied by compressed animation frames.

The aforementioned “Game of Life” uses strings as the field, which are initialized and displayed at the beginning of the program source code. Such an approach is valid if the amount of code is small, and it will always be visible during the compilation process. However, in my case, the code size is a few hundred kilobytes, and therefore frames must be output at the end so that they can be seen. It was decided to display the frames not in the form of real lines, but in the form of single-line comments (multi-line ones can also be used), which made it possible to output them without unnecessary tail characters (the final curly bracket) and quotes.
')
Since I have a low IQ, I do not have special abilities and cannot operate with a large number of parameters in my head, I decided to automate the process of writing a quine to the maximum. In addition, it allowed for a better understanding of the Quine structure and gaining experience in related fields (parsing and processing C # source code using the NRefactory).

Quine generation stages


Thus, I decided to break the process of writing any quine into the following steps:

All these steps are performed for a specially crafted template. By “code,” as applied to star wars, is meant a code that unpacks compressed data. And the data is the packed array of animation frames. Minification is needed to make the code unreadable to reduce the text size of the code.

The truth is, at the beginning, a version with the usual .NET GZipStream class was written to decompress the compressed data. But I thought it was not sporty, and besides, such a class exists only for this platform. Thus, I wrote a more complex version with my own archiver.

In order to make it easier to navigate in these stages, I wrote the interface on WinForms.




To highlight C # syntax, the BigObfuscator FastColoredTextBox component was used . Unfortunately, it works rather slowly, so the resulting “movie” is more convenient to look through a regular console (although it, unfortunately, is also not very fast).

Quine template

The template is the starting point of such a complex quine, and it may look like this:
Quine template
using System; using System.Text; using System.Collections.Generic; namespace Asciimation_1_3 { class Program { /*#Asciimation_1_3*/ /*Asciimation_1_3#*/ /*#DecodeBase64*/ /*DecodeBase64#*/ /*#HuffmanTree*/ /*HuffmanTree#*/ /*#HuffmanRleDecode2*/ /*HuffmanRleDecode2#*/ /*#Enums*/ /*Enums#*/ /*#Utils*/ /*Utils#*/ static string Data = /*%Data_1_3*/""/*Data_1_3%*/; static int CurrentFrame = /*$CurrentFrame*/0/*CurrentFrame$*/; static void Main() { var output = Decompress_v_1_3(Data, CurrentFrame++); if (CurrentFrame == 3591) CurrentFrame = 3590; /*$@$*/ } } } /*$Output_1_3$*/ 


As you can see, in the above code a bunch of incomprehensible comments. But in reality, they are markers that indicate where a particular code or data should be inserted. The advantage of this approach is that the code can be inserted from other full versions of classes, and they, in turn, can be tested (I use NUnit). I classified these markers as follows:



Thus, combining information about the stages and markers, you can build the following graphical scheme:

It is worth noting that the minifier leaves the parameters of the quine itself unchanged, otherwise a situation will arise in which the parameter will be assigned a different name, and because of this, the quine will not compile or will output what is not necessary. (True, the parameters are still minified, but in a more cunning way).

More details about all the stages listed below.

Code generation

When generating the code, all source files from the specified folder are analyzed. And the code between the markers / * # ... .. * / ... / * ... ... # * / is copied to the appropriate places in the original template. For example, in the following example, the HuffmanRle2 class is entirely copied to the area between / * # HuffmanRleDecode2 * // * HuffmanRleDecode2 # * / . In order to ignore some code fragments inside these markers (for example, .ToString () methods are not needed in Quine, they are needed for debugging) and not to split this fragment into two, markers were entered / * # Ignore * // * Ignore # * / , which allow you to ignore all the code inside them.

An example of markers in the source code
 /*#HuffmanRleDecode2*/ public class HuffmanRle2 { public static byte[] Decode(HuffmanTree tree, byte[] bytes, ref int curBit, int bytesCount, int bitsCountPerRepLength = 8, int bitsCountPerNotRepLength = 8) { int minLength = Math.Min(bitsCountPerRepLength, bitsCountPerNotRepLength); var maxCount = 1 << (minLength - 1); var root = tree.Root; var result = new List<byte>(bytes.Length * 2); int curBytesCount = 0; int i = 0; while (curBytesCount < bytesCount) { var length = Utils.GetInt(bytes, ref curBit, minLength); if ((length & maxCount) == 0) { curBit -= minLength; length = Utils.GetInt(bytes, ref curBit, bitsCountPerRepLength); var repeatCount = length + 2; byte value = Utils.GetValue(root, bytes, ref curBit); for (int j = 0; j < repeatCount; j++) { result.Add(value); curBytesCount++; } } else { curBit -= minLength; length = Utils.GetInt(bytes, ref curBit, bitsCountPerNotRepLength); var notRepeatCount = (((1 << (bitsCountPerNotRepLength - 1)) - 1) & length) + 1; for (int j = 0; j < notRepeatCount; j++) { byte value = Utils.GetValue(root, bytes, ref curBit); result.Add(value); curBytesCount++; } } i++; } return result.ToArray(); } } /*HuffmanRleDecode2#*/ 


Although the code collected from different files may not be optimal from the point of view of the criterion of the minimum length, since it was written in the standard style, however, this approach allowed us to write and run tests (NUnit) for compressing and extracting Quine data immediately before its generation, which simplifies the development process and makes it less tedious. Although code minification, which is described below, allows to virtually neutralize the disadvantage of this approach.

Data generation

The data compression used the simplest lossless compression algorithms, such as RLE and Huffman coding. However, this animation, like any movie, can be divided into main and intermediate frames. Intermediate frames contain information that has changed since the last frame, while the main frames contain all information about the frame. Thus, in order to get all the intermediate frame characters, you must decode all previous frames to the main frame, and then render all frames to the current frame, taking into account the changes received. It turned out such a pseudo-MPEG. The structure of the main and intermediate frames is presented below. In addition to normal and intermediate frames, I decided to also include intermediate frames obtained by shifting to the left, right, up and down, since they are not so rare.

Frame:
0..3 - type:

Change:
0..2 - type:


Unfortunately, the final compression was not as big as we would like, but quite acceptable. And I would like to at the level of lzma in 7-zip, i.e. less than 100 Kb. Most likely, with the help of more advanced methods (interval coding, vocabulary methods), an even greater degree of compression could be achieved.



Code Minification

Minification is needed in order to reduce the size of the code in text format. For processing C # code, the NRefactory library was used . Its advantage over Roslyn is that the parse tree can be modified in the process, which is required in the task of the minifator.

The shortening of the names of classes, methods, fields, local variables

The most important task of the minifier was precisely this stage. To bypass the constructed syntactic tree, the “Visitor” pattern was used, implemented as an overload of the necessary methods of traversing the nodes during a depth crawl.

In order to use the minimum number of identifiers on the one hand, and on the other, so that they do not overlap and do not lead to compilation errors, the shortening algorithm was divided into the following steps:


To generate substitutions, a class is used that can generate either minimal identifiers or random ones. This project uses the first option. When generating a replacement at each stage, any identifier names are ignored (so that there are no intersections and potential errors).

It is worth noting that the library supports ignored identifiers, i.e. those that for some reason should not be shortened, which is used in this project (Quine parameters).

To obtain a list of all references to used identifiers (Find All References in Visual Studio), the NRefactory mechanism is used. To do this, you first need to compile the tree (at each stage: local variables, type members, types):
 var unresolvedFile = SyntaxTree.ToTypeSystem(); var projectContent = projectContent.AddOrUpdateFiles(_unresolvedFile); var compilation = projectContent.CreateCompilation(); var resolver = new CSharpAstResolver(_compilation, SyntaxTree, _unresolvedFile); 


Then you can search for links as follows:
 var resolveResult = resolver.Resolve(node); var findReferences = new FindReferences(); ... findReferences.FindLocalReferences(); ... findReferences.FindReferencesInFile(); 


Other minification

The following small minifikatsii were also implemented:


Remove whitespace and line breaks

After all kinds of minifikatsii made, from the code it remains to remove only extra whitespace characters and translations, and so that the length of the lines does not exceed a certain value (for beauty).

Minifiers implemented a separate project and it can be downloaded here: CSharp-Minifier . It was primarily developed with the aim of shortening the code in quineas, so there is nothing surprising in the fact that it has some strange functionality. However, I will gladly accept your comments and / or pull-ricksta.

Code formatting

At this stage, the minified code can be brought to mind so that it looks like some kind of picture. Formatting goes exactly after the minification, because the generated code is basically an array of single characters (they can be broken), from which it is easier to build a picture. The picture should have only black and white format, so that the symbol represents a white pixel, and black - its absence. For a better match, you can use any heuristic selection algorithms (genetic). My formatting was not implemented.

Quine generation

The following was used as a marker where the final line of the quine should be printed: / * @ * /
Step by step everything looks like this. All code can be completely arbitrary, but in the example line breaks are added for better readability.

  1. Template initialization:
    template =
     class P { static void Main() { var a = 6; /*@*/ int b = 10; } } 

  2. Generate the line pattern used for quine printing. Quotation marks, backslashes and line breaks must be escaped, so that their presentation is separate parameters when outputting a formatted string.
    kernel = "var s = {1}{0}{1}; System.Console.Write(s,s,'{1}','{2}{2}',{1}{2}r{2}n{1});"
  3. Replaces the marker in the template with the string used to print from the previous step. Since in C #, curly brackets are used when displaying parameters, they must also be escaped. This is done by duplicating them.
    str = template.Replace("{", "{{").Replace("}", "}}").Replace("/*@*/", kernel)
    str =
     class P {{ static void Main() {{ var a = 6; var s = {1}{0}{1}; System.Console.Write(s,s,'{1}','{2}{2}',{1}{2}r{2}n{1}); int b = 10; }} }} 

  4. Generating the resulting string used to print a quine:
    insertToResult = "var s=\"" + str + "\""
    insertToResult =
     var s=" class P {{ static void Main() {{ int a=6; var s={1}{0}{1}; Console.Write(s,s,'{1}','{2}{2}',{1}{2}r{2}n{1}); int b=10; }} }}"; Console.Write(s,s,'"', '\\', "\r\n"); 

  5. Replacing the marker in the original template with the resulting string:
    result = template.Replace("/*@*/", insertToResult)
    result =
     using System; class c { static void Main() { int a=6; var s="using System;class c{{static void Main(){{int a=6;var s={1}{0}{1};Console.Write(s,s,'{1}','{2}{2}',{1}{2}r{2}n{1});int b=10;}}}}"; Console.Write(s,s,'"', '\\', "\r\n"); int b=10; } } 


Done: Received a program that can print itself.
The described approach can be generalized before generating quine palindromes , but so far I have not implemented it.

Launch


After the quine is generated, for example, you can save it into the Asciimation.cs file and output one frame with the following command:
"C:\Windows\Microsoft.NET\Framework64\v4.0.30319\csc.exe" Asciimation.cs && (Asciimation > Asciimation.cs) && Asciimation

And to see the entire generated "movie", you need to use the following script:
 echo off SET /ai=0 :LOOP IF %i%==3591 GOTO END "C:\Windows\Microsoft.NET\Framework64\v4.0.30319\csc.exe" Asciimation.cs && (Asciimation > Asciimation.cs) && Asciimation SET /ai=%i%+1 GOTO LOOP :END pause 


Conclusion


The principles described in the article are applicable in the case of the implementation of various quineas in other programming languages. Unfortunately, using the online services ( ideone.com , compileonline ), it is impossible to test the developed code, because it still exceeds the maximum size, despite the compression.

The generated files and batch files can be downloaded here: AsciimationQuine_1_3.7z . In the future, you can try to engage in chain quines and, perhaps, repeat the feat of Yusuke Endoh, only by manual generation, not with manual generation.

Especially for those who do not have the ability to compile, but want to see the process:

Source: https://habr.com/ru/post/190616/


All Articles