Congratulations to all translators of human language to machine with their professional day, I wish you fewer bugs and more or less cool ideas! And as an ideological gift, on my part, I propose a solution to one beautiful task - writing code that produces its own text at the output, is valid for interpreters and compilers of various languages, and is correctly executed when reversing the source code.
Not so long ago, I learned about code that can be simultaneously interpreted in PHP and compiled into Java: PhpJava.java . As it turned out, this idea is not new: the code valid for several compilers or interpreters at once is called a polyglot . It is possible to write such code due to the peculiarities of processing lines and comments in various interpreters or compilers.
For example, in Java, you can describe characters in the usual way, for example, the symbol '/', or in the form of unicode-records: 'u002F'. In the C # compiler, such entries will not be valid. However, if they are "hidden" in a comment, then, on the one hand, they will not interfere with the compilation of C # code, on the other, they will be valid in Java code. For example, if you want code A to be compiled only in C #, but not compiled in Java, and code B compiled only in Java, but not compiled in C #, you need to use the following snippet:
//\u000A\u002F\u002A A//\u002A\u002FB
The C # compiler will "perceive" this code in the usual way, comments will remain comments:
//\u000A\u002F\u002A A//\u002A\u002FB
But with Java all the more interesting, because The unicode entry is transformed and you get the following:
// /* A//*/B
By combining instructions common to both languages, as well as sharing different ones, you can write any program.
However, tricks are not limited to such unicode entries. For example, in C-like languages, there are preprocessor directives that can determine which code should be compiled and which should not. In a non-compiled code, you can put a completely different language code. For example, the following is a polyglot code in C ++ and Python ( from here ), in which Python code is placed in the #if false
section. And the '''
sequences begin and end a comment in Python code.
#include <stdio.h> #if false print "Hello world" ''' #endif int main() { printf("Hello world"); return 0; } #if false ''' #endif
In HTML, for example, comments begin with a sequence of <!--
symbols, which can also be used. There are also more complex polyglots working simultaneously in 6 and in 16 languages. The first displays the message "hitforum" for all languages, and the second works more interesting: displays the name of the language in which it is compiled or interpreted.
# define xu /* v # :::::::::::::::::::>>>>>>>$$$a"muroftih"#[>:#,_@] eval 'echo "hitforum";exit';sub echo { print "@_\n"} __END__>++++++++++>++++++++++[>+++++++++++>++++++++++ +<<-]>------.+.>++++++.<---.+++++++++.>--.+++ .<--.<<. */ main() { printf ("hitforum\n"); }
# /* [ <!-- */ include <stdio.h> /* \ #`{{coding=utf-8\ "true" if 0 != 0 and q != """0" ; ` \ \ if [ -n "$ZSH_VERSION" ]; then \ \ echo exec echo I\'ma zsh script.; \ \ elif [ -n "$BASH_VERSION" ]; then \ \ echo exec echo I\'ma bash script.; \ else \ echo exec echo I\'ma sh script.; \ fi`; #!;#\ BEGIN{print"I'm a ", 0 ? "Ruby" :"Perl", " program.\n"; exit; } #\ %q~ set =dummy 0; puts [list "I'm" "a" "tcl" "script."]; exit all: ; @echo "I'm a Makefile." \ #*/ /*: */ enum {a, b}; \ \ static int c99(void) { #ifndef __cplusplus /* bah */ unused1: if ((enum {b, a})0) \ (void)0; #endif unused2: return a; \ } \ static int trigraphs(void) { \ \ return sizeof "??!" == 2; \ } \ char X; \ \ int main(void) { \ \ struct X { \ \ char a[2]; \ };\ if (sizeof(X) != 1) { \ \ printf("I'm a C++ program (trigraphs %sabled).\n", \ \ trigraphs() ? "en" : "dis");\ \ }else if (1//**/2 )unused3 : { ; \ printf("I'm a C program (C%s, trigraphs %sabled).\n", \ c99() ? "89 with // comments" : "99", \ trigraphs() ? "en" : "dis"); \ } else { \ printf("I'm a C program (C89, trigraphs %sabled).\n", \ trigraphs() ? "en" : "dis"); \ } \ return 0; \ } /* # \ \begin{code} import Prelude hiding ((:)); import Data.List (intercalate); import Language.Haskell.TH; import Data.String; default (S, String, Integer, Double); data S = S; instance Eq S where { _ == _ = False }; instance IsString S where { fromString = const S }; ifThenElse cte = case c of True -> t; False -> e cPP = False; {- #define cPP True -} main :: IO () main = putStr ("I'm a Literate Haskell program" ++ bonus ++ ".\n") where _ = (); bonus | null details = "" | otherwise = " (" ++ details ++ ")" details = intercalate ", " [ name | (True, name) <- extensions ] :: String extensions = (bangPatterns, "BangPatterns" ) : (templateHaskell, "TemplateHaskell" ) : (rebindableSyntax, "RebindableSyntax" ) : (magicHash, "MagicHash" ) : (overloadedStrings, "OverloadedStrings" ) : (noMonomorphismRestriction, "NoMonomorphismRestriction") : (scopedTypeVariables, "ScopedTypeVariables" ) : (cPP, "CPP" ) : (unicodeSyntax, "UnicodeSyntax" ) : (negativeLiterals, "NegativeLiterals" ) : (binaryLiterals, "BinaryLiterals" ) : (numDecimals, "NumDecimals" ) : [] (!) = (!!) bangPatterns = [True] ! 0 where foo !bar = False templateHaskell = thc $(return (TupE []) :: ExpQ) rebindableSyntax = null (do { [()]; [()] }) where _ >> _ = [] :: [()] magicHash = foo# () where foo = ['.']; "." # _ = False; foo# _ = True overloadedStrings = "" /= "" noMonomorphismRestriction = show foo == "0" where foo = 0 bar = foo :: Double unicodeSyntax = let (★) = True in (*) where (*) = False negativeLiterals = -1 == NNa binaryLiterals = let b1 = 1 in 0b1 == 1 numDecimals = show 0e0 == "0" scopedTypeVariables = stv (0 :: Double) == "0.0" data{- = -} NN = NNa | NNb deriving Eq; instance Num NN where { fromInteger _ = NNa; negate _ = NNb; _ + _ = NNa; _ * _ = NNa; abs _ = NNa; signum _ = NNa } instance{- = -} (Num a) => Num (e -> a) where { fromInteger = const . fromInteger; negate = (.) negate; abs = (.) abs; signum = (.) signum; x + y = \e -> xe + ye; x * y = \e -> xe * ye } class THC a where { thc :: a -> Bool }; instance THC () where { thc _ = True }; instance THC (Q a) where { thc _ = False }; class (Show a, Num a) => STV a where stv :: a -> String stv = const $ show (f 0) where f = id :: a -> a instance STV Double -- : \ \end{code} # \ ]>++++++++[<+++++++++>-]<+.>>++++[<++++++++++>-]<-.[-]>++++++++++ \ [<+++++++++++>-]<-.>>++++[<++++++++>-]<.>>++++++++++[<++++++++++> \ -]<- - -.<.>+.->>++++++++++[<+++++++++++>-]<++++.<.>>>++++++++++[ \ <++++++++++>-]<+++++.<<<<+.->>>>- - -.<+++.- - -<++.- ->>>>>+++++ \ +++++[<+++++++++++>-]<- - -.<<<<<.<+++.>>>.<<<-.- ->>>>+.<.<.<<.> \ ++++++++++++++.[-]++++++++++""" else 0 # \ from platform import * # \ print("I'm a Python program (%s %s)." % # [-][ \ (python_implementation(), python_version())); """--><html><head> <!--:--><title>I'm a HTML page</title></head><body> <!--:--><h1>I'm a <marquee><blink>horrible HTML</blink></marquee> page</h1> <!--:--><script language="JavaScript"> <!--: # \ setTimeout( // \ function () { // \ document.body.innerHTML = "<h1>I'm a javascript-generated HTML page</h1>"; // \ }, 10000); // \ //--> </script><!--: \ </body></html><!-- }} # \ say "I'm a Perl6 program."; # """ # */ #define FOO ]-->~
In order to write a quine polyglot in C # and Java, it is necessary to combine the principles of quine and polyglot development. As it turned out, such a topic is not new: on the website codegolf, users are competing who has a quinte polyglot or polyquine a shorter time: Write a Polyquine . However, in this question there were no options with more verbose C # and Java languages, which I wanted to correct.
To write such a quine-polyglot, you need to escape characters that are prohibited in the strings of both languages and are found in the strings of the program itself. These characters are quotation marks ", line break \ n, and backslash \. And to further reduce the size of the code, repeated sequences of characters, such as \ u000A, \ u002F, and \ u002A, were also replaced with single characters in the encoding string itself. Here is an example the resulting quine polyglot valid for C # and Java compilers:
//\u000A\u002F\u002A using System;//\u002A\u002F class Program{public static void//\u000A\u002F\u002A Main//\u002A\u002Fmain (String[]z){String s="//@#'^using System;//'#^class Program{public static void//@#'^Main//'#main^(String[]z){String s=!$!,t=s;int[]a=new int[]{33,94,38,64,35,39,36};String[]b=new String[]{!&!!,!&n!,!&&!,!&@!,!&#!,!&'!,s};for(int i=0;i<7;i++)t=t.//@#'^Replace//'#replace^(!!+(char)a[i],b[i]);//@#'^Console.Write//'#System.out.printf^(t);}}",t=s;int[]a=new int[]{33,94,38,64,35,39,36};String[]b=new String[]{"\"","\n","\\","\\u000A","\\u002F","\\u002A",s};for(int i=0;i<7;i++)t=t.//\u000A\u002F\u002A Replace//\u002A\u002Freplace (""+(char)a[i],b[i]);//\u000A\u002F\u002A Console.Write//\u002A\u002FSystem.out.printf (t);}}
Further complicating the task: try to write a quine polyglot palindrome. Let me remind you that a palindrome is a number or text that is equally readable in both directions. I described the principle of developing quayns-palindromes in an article 3 years ago, also on the day of the programmer. The principle of the palindrome code is that the mirror part of the program is placed in a single-line or multi-line comment, which does not interfere with the compilation. The simplest C # code palindrome in C # looks like this:
/**/class P{static void Main(){}};/*/;}}{)(niaM diov citats{P ssalc/**/
As you can see, the multi-line comment /*
starts from the middle and continues to the end, where it is closed by the sequence */
. An empty comment at the beginning is needed to make the line completely symmetrical.
So, combining the principles of creating a palindrome, polyglot and quine, I wrote the following code:
/**///\u000A\u002F\u002A using System;//\u002A\u002F class Program{public static void//\u000A\u002F\u002A Main//\u002A\u002Fmain (String[]z){String s="`**?`@#_^using System;?_#^class Program{public static void?@#_^Main?_#main^(String[]z){String s=!$!,t=s;int i;int[]a=new int[]{33,94,38,64,35,95,96,63,36};String[]b=new String[]{!&!!,!&n!,!&&!,!&@!,!&#!,!&_!,!`!,!?!,s};for(i=0;i<9;i++)t=t.?@#_^Replace?_#replace^(!!+(char)a[i],b[i]);t+='*';for(i=872;i>=0;i--)t=t+t?@#_^[i];Console.Write?_#.charAt(i);System.out.printf^(t);}}/",t=s;int i;int[]a=new int[]{33,94,38,64,35,95,96,63,36};String[]b=new String[]{"\"","\n","\\","\\u000A","\\u002F","\\u002A","/","//",s};for(i=0;i<9;i++)t=t.//\u000A\u002F\u002A Replace//\u002A\u002Freplace (""+(char)a[i],b[i]);t+='*';for(i=872;i>=0;i--)t=t+t//\u000A\u002F\u002A [i];Console.Write//\u002A\u002F.charAt(i);System.out.printf (t);}}/*/}};)t( ftnirp.tuo.metsyS;)i(tArahc.F200u\A200u\//etirW.elosnoC;]i[ A200u\F200u\A000u\//t+t=t)--i;0=>i;278=i(rof;'*'=+t;)]i[b,]i[a)rahc(+""( ecalperF200u\A200u\//ecalpeR A200u\F200u\A000u\//.t=t)++i;9<i;0=i(rof;}s,"//","/","A200u\\","F200u\\","A000u\\","\\","n\",""\"{][gnirtS wen=b][gnirtS;}63,36,69,59,53,46,83,49,33{][tni wen=a][tni;i tni;s=t,"/}};)t(^ftnirp.tuo.metsyS;)i(tArahc.#_?etirW.elosnoC;]i[^_#@?t+t=t)--i;0=>i;278=i(rof;'*'=+t;)]i[b,]i[a)rahc(+!!(^ecalper#_?ecalpeR^_#@?.t=t)++i;9<i;0=i(rof;}s,!?!,!`!,!_&!,!#&!,!@&!,!&&!,!n&!,!!&!{][gnirtS wen=b][gnirtS;}63,36,69,59,53,46,83,49,33{][tni wen=a][tni;i tni;s=t,!$!=s gnirtS{)z][gnirtS(^niam#_?niaM^_#@?diov citats cilbup{margorP ssalc^#_?;metsyS gnisu^_#@`?**`"=s gnirtS{)z][gnirtS( niamF200u\A200u\//niaM A200u\F200u\A000u\//diov citats cilbup{margorP ssalc F200u\A200u\//;metsyS gnisu A200u\F200u\A000u\///**/
Unfortunately, the monster turned out to be too large (1747 characters), but this is due to the long chains of Unicode characters and the verbosity of the C # and Java languages. I am sure that in other languages it will be possible to write a much smaller quine-palindrome-polyglot.
Now let's check how this program is executed. To get started, let's get rid of all comments and format the code correctly:
using System; class Program { public static void Main(String[] z) { String s = "`**?`@#_^using System;?_#^class Program{public static void?@#_^Main?_#main^(String[]z){String s=!$!,t=s;int i;int[]a=new int[]{33,94,38,64,35,95,96,63,36};String[]b=new String[]{!&!!,!&n!,!&&!,!&@!,!&#!,!&_!,!`!,!?!,s};for(i=0;i<9;i++)t=t.?@#_^Replace?_#replace^(!!+(char)a[i],b[i]);t+='*';for(i=872;i>=0;i--)t=t+t?@#_^[i];Console.Write?_#.charAt(i);System.out.printf^(t);}}/", t = s; int i; int[] a = new int[] { 33, 94, 38, 64, 35, 95, 96, 63, 36 }; String[] b = new String[] { "\"", "\n", "\\", "\\u000A", "\\u002F", "\\u002A", "/", "//", s }; for (i = 0; i < 9; i++) t = t.Replace("" + (char)a[i], b[i]); t += '*'; for (i = 872; i >= 0; i--) t = t + t[i]; Console.Write(t); } }
From the listing it becomes clear that the variable s
contains the program code, in which the characters forbidden in the lines are escaped and the repeated sequences are compressed. Below is a table of characters in the code - character - replacement format, which is used to form the output string t
.
code | symbol | replacement |
---|---|---|
33 | ! | " |
94 | ^ | \ n |
38 | & | \ |
64 | @ | \ u000A |
35 | # | \ u002F |
95 | _ | \ u002A |
96 | ` | / |
63 | ? | // |
36 | $ | s |
At the next stage, the string t
is concatenated with it, inverted so that the output will be a palindrome. It should be noted that the number 872 (the length of the string t
) must be calculated after the quine has already been written. And for this you need to run the already written code, which also presents certain difficulties.
The main criterion for writing such a code was to achieve the smallest size in text form, not its purity. Therefore, of course, it looks a bit strange.
As you can see, there is nothing difficult in writing such quineas. For those who do not believe that all this really works, tests were written in the Freaky-Sources repository (for launch, Java is required to be installed).
Throw in a comment your Quind-palindromes-polyglots in other languages and other interesting code-friches things, I will be happy to study them.
Once again, congratulations to all programmers and sympathizers!
Source: https://habr.com/ru/post/309702/
All Articles