📜 ⬆️ ⬇️

Jancy - scripting language for system / network programmers

jancy Why even create a new programming language? There are already an incredible number of them - in my firm conviction, significantly more than necessary. And certainly, the fact that compiler creation is an incredibly exciting process plays a significant role in this situation. Adjusted for watermelons and pork hryashchiki - this is generally one of the most “delicious” works that an enthusiastic programmer can dream of.

Indescribably zdorovskim is a flower-candy period - the first stage of the study of the theory of compilers for thick smart books, and - right there! - its application in practice, in its own language. Even the sad prospect that the creator of the language may well be his only user is not able to outweigh the joy of creativity and stop the spherical-in-vacuum compiler Kulibin. Of course, if the satisfaction of self-interest is not only important, but also the sole driving force of the whole process, the above described perspective will inevitably be realized. But even if this is not the only reason for creating a new language, the prospect of becoming a single user of your creation still has a chance to materialize.

My flower-candy period of the novel with compilers and the joy of sculpting my first languages ​​ended a long time ago. I can say that I legitimized my relationship with the bonds of sacred marriage: compilers, debuggers and development tools - this is my main job in Tibbo with all the ensuing consequences (yes, including in the form of saturation with the subject, increasing the percentage of routine tasks, etc. e.) Therefore, I had a different motivation for creating my own scripting programming language than satisfying my own interest.

So why?


If we briefly formulate the practical side of my personal and our (as a company) motivation, then it will sound like this: we wanted to have an embedded script engine with pointers to structures and safe address arithmetic . This was not found. And we made Jancy ("between-Java-and-C"), which has C-compatible structures, and pointers with safe arithmetic, and much, much more:
')
Unique features

Design Principles

Other significant features

A more complete list of features with examples of use can be found here: http://tibbo.com/jancy/features.html

Who can it be useful?


First of all, we wrote a language for ourselves - Jancy is used in the IO Ninja project as an embedded scripting language. However, if he was useful to us, we humbly hope that he may well help others. Hope this primarily relies on the three strongest sides of Jancy , in which our language has a real advantage over analogues.

1. High C / C ++ Compatibility
This applies to binary ABI compatibility as well as source level compatibility. There are a lot of advantages here: this is a seamless connection of existing C-libraries, and porting code with C / C ++ using copy-pasting and subsequent cosmetic edits (and sometimes without them at all), and the ease of creating new libraries in C / C ++ for use from Jancy- scripts, and the efficiency of embedding a Jancy engine in a C / C ++ application, etc.

2. Convenient tools for IO programming
Here I first of all speak, firstly, about support of pointers and address arithmetic, ideally suited for parsing and generating binary packages, and, secondly, about the lexer generator (with incremental, i.e., applicable to parsing the IO streams coming piece by piece). This also includes a partial application and a scheduling operator, which together allow, for example, to create a completion routine with captured context arguments; at the same time, it will be automatically called in the necessary workflow.

3. Convenient tools for UI programming
Two words: reactive programming . I am sure that in the near future, reactivity support - at the language level or in the form of crutches like preprocessors and libraries - will become an integral part of any user interface development system (UI). Jancy offers out-of-box reactivity, and in my opinion, in a completely intuitive way. In addition to reactivity, Jancy supports all sorts of variations in properties and events, which also helps build beautiful user interface frameworks.

At the same time, despite the remarkable features of item number three, we are not yet positioning Jancy as a language for developing a user interface. The maximum task at the moment is to become a scripting language for low-level IO programming, i.e. a system / network programmer / hacker tool .

And now - slides! ©

ABI Compatibility with C / C ++


Compatibility is always good, and compatibility with the de facto system programming language standard is just great, isn't it?

Jancy scripts are JIT-compiled and can be directly called from a C / C ++ program, as well as directly calling C / C ++ functions. This means that after correctly describing data types and function prototypes in Jancy scripts and a C ++ application, it becomes possible to transfer data naturally, through function arguments and return values.

We declare and use functions from a script on Jancy:
bool foo ( char charArg, int intArg, double doubleArg ); bar (int x) { bool result = foo ('a', x, 3.1415); // ... } 

We write implementation in C / C ++:
 bool foo ( char charArg, int intArg, double doubelArg ) { // ... return true; } 

We connect before JIT-compile the script:
 class MyLib: public jnc::StdLib { public: JNC_BEGIN_LIB () JNC_FUNCTION ("foo", foo) JNC_LIB (jnc::StdLib) JNC_END_LIB () }; // ... MyLib::mapFunctions (&module); 

Done! No packaging / unpacking of variant-like containers, explicit pushing of arguments on the virtual machine stack, etc. - everything works directly. To date, Jancy supports all major calling models (calling conventions):

In the opposite direction, calling Jancy from C ++ is just as easy:
 typedef void Bar (int); Bar* bar = (Bar*) module.getFunctionByName ("bar")->getMachineCode (); bar (100); 

How about calling system functions and dynamic libraries (dll / so)? No problem! Jancy offers seamless integration with dynamic libraries:
 library User32 { int stdcall MessageBoxA ( intptr hwnd, char const thin* text, char const thin* caption, int flags ); // ... } // ... User32 user32; user32.load ("user32.dll"); user32.lib.MessageBoxA (0, "Message Text", "Message Caption", 0x00000040); 

In this case, the resolution of the names will be made as they are accessed, and the found addresses will be cached (like DELAYLOAD, adjusted for the explicit loading of the module itself). Error handling when loading and resolving names is performed by the standard for Jancy method of pseudo-exceptions (for details, see the next section).

Dynamic name lookups (GetProcAddress / dlsym) are, of course, also possible - although not as elegant as the previous approach.
Example
 typedef int cdecl Printf ( char const thin* format, ... ); jnc.Library msvcrt; msvcrt.load ("msvcrt.dll"); Printf thin* printf; unsafe { printf = (Printf thin*) msvcrt.getFunction ("printf"); } printf ("function 'printf' is found at 0x%p\n", printf); 

Another important consequence of the high degree of compatibility between Jancy and C / C ++ is the ability to copy-paste from publicly available sources (such as Linux, React OS or other open source projects) and use the C protocol header definitions:
 enum IpProtocol: uint8_t { Icmp = 1, Tcp = 6, Udp = 17, } struct IpHdr { uint8_t m_headerLength : 4; uint8_t m_version : 4; uint8_t m_typeOfService; bigendian uint16_t m_totalLength; uint16_t m_identification; uint16_t m_flags; uint8_t m_timeToLive; IpProtocol m_protocol; bigendian uint16_t m_headerChecksum; uint32_t m_srcAddress; uint32_t m_dstAddress; } 

By the way, pay attention to the support of integer types with the reverse order of bytes (bigendians). This, of course, is far from large-scale innovation, but it greatly simplifies the description and work with the headers of network protocols - here the reverse byte order is ubiquitous.

Pseudo-exceptions


Paradoxically, one of the consequences of ABI compatibility with C / C ++ was the rejection of the exception model that is familiar to C ++ programmers. The fact is that such exceptions are completely inappropriate for the multilanguage call stack (although, of course, the list of objective claims to C ++ - these exceptions are not limited to - hot debates “for” and “against” exceptions pop up on programmer resources with a regularity that can only envy).

Anyway, Jancy uses a hybrid model. It is based on checking return values, but the compiler eliminates the need to do it manually. As a result, everything looks almost like exceptions in C ++ or Java, but at the same time, the behavior of the program when an error is an order of magnitude more transparent and predictable, and the support for exceptions during interlingual interactions (such as calling C ++ functions from scripts on Jancy and vice versa) becomes so simple as far as possible.
 bool foo (int a) throws { if (a < -100 || a > 200) // invalid argument { jnc.setStringError ("detailed-description-of-error"); return false; } // ... return true; } 

Return values ​​for functions marked with the throws modifier will be treated as error codes. In Jancy, intuitive error conditions are accepted for standard types: false for the boolean type, null for pointers, -1 for unsigned integers, and <0 for signed ones. The remaining types are cast to boolean (if this is not possible, a compilation error is generated). It is obvious that the function returning void in this model cannot return errors.

In addition, in this model, the developer is free to choose how to handle errors in each case. Sometimes it is more convenient to do this by checking the return code manually, sometimes using the semantics of exceptions. In Jancy, when calling the same function! - You can do this and that, depending on the situation.
 bar () { foo (10); // can use exception semantics... foo (-20); catch: printf ($"error caught: $(jnc.getLastError ().m_description)\n"); // handle error } baz (int x) { bool result = try foo (x); // ...or manual error-code check if (!result) { printf ($"error: $(jnc.getLastError ().m_description)\n"); // handle error } } 

The finally construct in most languages ​​is traditionally associated with exceptions. But in Jancy finally it can be added to any block at the request of the developer. In the end, you need to clean up after yourself even if no errors occurred, is it?
 foo () { // nothing to do with exceptions here, just a 'finally' block to clean up finally: printf ("foo () finalization\n"); } 

Of course, a more traditional use of the finally construct is allowed in cases where exceptions are expected.
Example
 foo (char const* address) { try { open (address); transact (1); transact (2); transact (3); catch: addErrorToLog (jnc.getLastError ()); finally: close (); } } 


Safe pointers and address arithmetic


Address arithmetic in the scripting language is what it was all about.

Pointers, for all their inherent insecurity, are explicitly or implicitly part of any language. By limiting the types of pointers available to the developer, they can significantly protect the language, simplify the handling of unfavorable situations during execution, and even catch incorrect operations at the time of compilation using static analysis. But if address arithmetic comes into play, it is simply impossible to completely shift the analysis to the compilation stage.

To always be able to verify the correctness of operations, the pointers in Jancy are thick by default. In addition to the address, they also contain a validator - a special meta-data structure from which you can get information about the allowed range of addresses, the type of data, and the integer nesting level (scope level).

The security formula for pointers and address arithmetic in Jancy is:
  1. checking ranges for indirect references to pointers;
  2. checking the level of nesting when assigning pointers;
  3. check of reducibility when assigning pointers.

What about performance?
This mechanism is not free and really results in certain overhead costs during execution.

But first, even in the most naive version, without any optimizations, two integer comparisons to check the range or one to check the level of nesting is not so bad, especially taking into account the JIT compilation and the fact that Jancy is still a scripting language.

Secondly, in the future with the help of static analysis it will be possible to get rid of many unnecessary checks at the compilation stage. And thirdly, unsafe (thin, thin) pointers without validators can already be used for critical sections of code performance — checks are not performed during operations with thin pointers.

Checks for the valid address range are performed both in the case of an explicit use of the pointer, and in the case of the indexing operator:
 foo ( char* p, size_t i ) { p += i; *p = 10; // <-- range is checked static int a [] = { 10, 20, 30 }; int x = a [i]; // <-- range is checked } 

In the case of pointers to stack and stream variables, checks of the level of nesting are also needed - to prevent addresses from leaking beyond their lifetime. This mechanism even works in the case of multi-level pointers, like pointers-on-structures-with-pointers-on-structures-e-t:
 int* g_p; bar ( int** dst, int* src ) { *dst = src; // <-- scope level is checked } baz () { int x; bar (g_p, &x); // <-- runtime error: scope level mismatch } 

Finally, reducibility checks are designed to prevent the destruction of the validators themselves. Indeed, what if we create a pointer to a pointer, bring it to a pointer to char and then byte by byte wipe the validator with garbage? Jancy will simply not do this: the compiler and runtime only allow casts where it is safe.
Read more
Jancy divides all types into categories POD (plain-old-data) and non-POD. The concept of POD in Jancy is slightly different from that in C ++. Perhaps, in this regard, it was worthwhile to come up with a new term to avoid confusion, but in the end I decided not to make new cuts. In addition, it seems to me that the POD in Jancy reflects the meaning of the concept of plain-old-data much more accurately.

In Jancy, POD is data without meta data. They can be safely copied and modified byte-by-byte without breaking anything. Aggregation of POD data, whether it is the inclusion of fields, inheritance (here is the difference from C ++) or union into arrays - also leads to POD. Everything that contains meta-data, namely classes, safe pointers to data and any aggregates of them, is non-POD.

The Jancy compiler allows casting of non-POD types if and only if, as a result of casting, it is not possible to destroy or replace metadata. For situations in which at the compilation stage it is not known (for example, we cast to a child type, the so-called downcast) - there is a special dynamic cast operator. The dynamic cast operator is compiled into a call to a built-in function that returns a pointer to the requested type, or null if the cast is not possible.

For example, let's prepare test types that we will bring to each other:
 struct A { int m_a; } struct B { int m_b; } struct C: A, B { int m_c; } struct D: C { char const* m_s; } 

Here A, B, C is POD (and the latter type would not be POD in C ++), D is not POD, since This type contains meta data in the form of the m_s pointer validator. Now consider the possible cast operations.

Upcast casts are always allowed and do not require an explicit cast operator for either POD or non-POD:
 foo (D* d) { C* c = d; A* a = ; } 

POD types can be arbitrarily reduced to each other using the cast operator:
 bar (B* b) { char* p = (char*) b; C* c = (C*) b; // <-- unlike C++ no pointer shift } 

Reductions from POD to non-POD types are allowed only in the case of a resultant constant pointer:
 foo (D* d) { char* p = (char*) d; <-- error char const* p2 = (char const*) d; // OK } 

Reduction to child types (downcast) is possible with the help of the dynamic cast operator:
 bar (B* b) { D* d = dynamic (D*) b; A* a = dynamic (A*) b; // not a downcast, but still OK } 

Dynamic coercion is possible due to the validator contained in the index, and hence the type information. In addition to dynamic casting, Jancy also offers a dynamic sizing operation that is available from the same validator - although this is not related to the security of pointers, in certain situations it is very convenient:
 foo (int* p) { size_t size = dynamic sizeof (*p); size_t count = dynamic countof (*p); } //... bar () { int a [100]; foo (a); } 


Dear habrovchenie invited to play with our online-compiler and actually try out how it all works (read: try to slip the compiler an example of a script with pointers that dump it;)

Read more about pointers in Jancy here: http://tibbo.com/jancy/features/pointers.html

Automaton functions


Safe pointers and address arithmetic are ideal for parsing and generating binary packages:
 dissectEthernet (void const* p) { io.EthernetHdr const* hdr = (io.EthernetHdr const*) p; switch (hdr.m_type) { case io.EthernetType.Ip: dissectIp (hdr + 1); break; case io.EthernetType.Ip6: dissectIp6 (hdr + 1); break; case io.EthernetType.Arp: dissectArp (hdr + 1); break; // ... } } 

But there is another class of protocols — protocols that do not rely on binary headers and instead use some kind of query / response language. In this case, to parse IO streams, you need to write a parser for this language. Thus it is necessary to attend to the preliminary buffering of data - often there is no guarantee that the transport delivered the message in its entirety, and not in pieces.

Since this task is typical in IO programming, Jancy offers a built-in tool for solving it. The Jancy automatons are designed to facilitate the first and most routine stage of writing any parser — the creation of a lexer / scanner. It works on the principle of well-known lexer-generators such as Lex , Flex , Ragel :
 jnc.AutomatonResult automaton scanRx (jnc.Recognizer* recognizer) { %% "getOption" createToken (Token.GetOption); %% "setOption" createToken (Token.SetOption); %% "exit" createToken (Token.Exit); %% [_\w][_\w\d]* createToken (Token.Identifier, recognizer.m_lexeme); // ... } 

Inside the automaton function, a list of recognized lexemes in the form of regular expressions is described. After the description of each token, there is a block of code that must be executed when it is detected in the input stream. This whole kitchen is compiled into a table DFA, the state of which is stored in an external jnc.Recognizer object (a pointer to this object is passed in the recognizer argument). In it, the characters of the potential lexeme accumulate, and it also implicitly calls our automaton function, while performing the necessary transitions between states.

The aggregate of the automaton function and this control recognizer is our lexer. At the same time, this lexer will be incremental, that is, able to parse messages coming in parts:
 jnc.Recognizer recognizer (scanRx); // create recognizer object try { recognizer.write ("ge"); recognizer.write ("tOp"); recognizer.write ("tion"); recognizer.eof (); // notify recognizer about eof (this can trigger actions or errors) catch: // handle recognition error } 

Note that, as in Ragel, it is possible to switch between different automaton functions, which allows, in particular, to create context-sensitive keywords (or, to put it differently, to parse a multi-language input).
Example
 jnc.AutomatonResult automaton scanGlobal (jnc.Recognizer* recognizer) { %% '#' recognizer.m_automatonFunc = scanPreprocessor; // switch to pp-specific keywords // ... } jnc.AutomatonResult automaton scanPreprocessor (jnc.Recognizer* recognizer) { %% "if" createToken (Token.If); %% "ifdef" createToken (Token.Ifdef); // ... %% '\n' recognizer.m_automatonFunc = scanGlobal; // switch back } 

Automaton functions on the one hand, and safe pointers with address arithmetic on the other, make it easy to parse protocols and IO streams of any type.

Conclusion


Despite the fact that user interface programming (UI) is not Jancy’s main purpose at the moment, we would still like to demonstrate the approach to reactive programming that is used in Jancy - I think we managed to find the optimal compromise in the coexistence of imperative and declarative started in reactive programming. A story about this will go in the next article.

In the meantime, we invite you to try out the features of the Jancy language (as described in this article, as well as many others) on the live page of the compiler's live demo . You can also download, compile and play with the Jancy JIT compiler library and examples of its use - all of this is available on the downloads page.

Source: https://habr.com/ru/post/258427/


All Articles