15 years ago there was no Habrahabr, there was no facebook, and that is typical, there was no C ++ compiler, with the output of diagnostic messages in Russian. Since then, several new C ++ standards have been released, development technologies have made a giant leap, and it may take less time to write your own programming language or code analyzer using existing frameworks. A post about how I started my career, and through self-education and writing the C ++ compiler, I came to the expert level. The general implementation details, how much time it took, what happened in the end and the meaning of the idea - also inside.

How it all began
Back in 2001, when I was bought the first Duron 800mhz / 128mb ram / 40gb hdd computer, I quickly took up the study of programming. Although no, at first I was constantly tormented by the question, what to put Red Hat Linux, FreeBSD or Windows 98 / Me? For me, Hacker Magazine served as a reference point in this endless world of technology.
Old such, stibny magazine. By the way, since then, the style of presentation in this edition has almost not changed.
Vinduzyatniki, lamers, trojans, elite, linuh - that was all that took down the roof. Really wanted
hurry to master this whole stack, which they printed there and hack the Pentagon (without the Internet).
The internal struggle for whether to become a linuxsoid or chopped into games on Windows lasted until the Internet was brought into the house. Modem, grinding 56kb / s Internet, which occupied the phone at the time of connection, and downloaded an mp3 song in the region of half an hour. With a price on the order of $ 0.1 / mb, one song stretched 40-50 cents. This is the day.
')
But at night, there were very different rates. It was possible from 23.00 to 6.00 to stick to all sites without turning off the image in the browser! Therefore, all that could be downloaded from the network during the night, swayed on the screw, and then read in the afternoon.
On the first day, when I was led home and set up a network, the admin in front of me opened IE 5 and Yandex. And quickly retreated. Thinking what to look for first in the network, I typed something like “a site for programmers”. To which the first link in the issue fell just recently opened
rsdn.ru. And on it I began to hang for a long time, feeling a sense of dissatisfaction, from the fact that I understand little. At that time, the flagship and the most popular language on the forum (and indeed in general) was C ++. Therefore, the challenge was thrown, and nothing was left but to catch up with the bearded uncles in their knowledge of C ++.
And there was also an equally interesting site at that time -
firststeps.ru . I still consider their best method of presenting material. In small portions (steps), with small end results. Nevertheless, everything worked out!
Actively buying books at a flea market, I tried to understand all the basics of programming. One of the first books purchased was “The Art of Programming” - D. Knut. I do not remember the exact motivation to buy this particular book, and not some C ++ for coffee pots, the seller probably recommended, but with all my diligence the student took up the study of the first volume, with the obligatory performance of tasks at the end of each chapter. It was the very thing, and although it wasn’t going well with my math at school, there was progress with Mat.anna Knut because there was a great desire and motivation to write programs and do it right. Having mastered the algorithms and data structures, I have already purchased the 3rd volume of "The Art of Programming" Sorting and searching. It was a bomb. Pyramid sorting, quick sorting, binary search, trees and lists, stacks and queues. All this I wrote down on a piece of paper, interpreting the result in my head. I read at home, read when I was at sea, read everywhere. One continuous theory, without implementation. At the same time, I didn’t even guess what immense benefit this basic knowledge will bring in the future.
Now, conducting interviews with developers, I have not yet met a person who could write the implementation of a binary search or quick sort on a piece of paper. It's a pity.
But back to the topic of the post. Having mastered Knut, I had to move on. Along the way, I went to Turbo Pascal courses, read Kernighan and Ritchie, and after them C ++ for 21 days. From C and C ++, I wasn’t understanding everything, I just took and copied texts from books. There was no one to google or ask, but there was time for the car, since I abandoned the school and went to the evening one, in which one could hardly walk, or appear for 3-4 lessons per week.
As a result, from morning to night, I fanatically developed, learning more and more new topics. Could write a calculator, could write a simple application on WinApi. On Delphi 6, too, it turned out something to slap. As a result, having received a diploma of secondary education, I was already prepared at the level of 3-4 years of university, and of course I didn’t stand for what specialty to study.
Having entered the Department of Computer Systems and Networks, I was already fluent in C and C ++ tasks of any level of university complexity. Although, going to the same rsdn.ru, I understood how much still needs to be learned and how experienced the forum users pumped me in the pros. This hurt, misunderstanding and at the same time a burning desire to know everything led me to the book “Compilers. Instruments. Methods Technologies ”- A. Aho, Ravi Seti. In common folk called the book of the Dragon. This is where the fun began. Before this book,
Herbert Schildt, Theory and Practice of C ++ , was read in which he covered advanced development topics such as encryption, data compression, and the most interesting thing is writing his own parser.
Having started to scrupulously study the dragon book, moving from lexical analysis, then to syntactic and finally to checking semantics and generating code, a fateful decision came to me - to write my own C ++ compiler.
- And why not, I asked myself?
- And come on, answered that part of the brain, which with age becomes more and more skeptical of everything
new way. And compiler development began.
Training
The modem Internet was blocked by that time, due to the change of telephone lines to digital ones, so the
ISO C ++ edition of 1998 was downloaded for reference. Visual C ++ 6.0 has already become a favorite and familiar tool.
And as a matter of fact, the task has come down to realizing what is written in the C ++ standard. The dragon book was an aid in compiler development. And the starting point was the parser calculator from Schildt’s book. All pieces of the puzzle come together and development has begun.
Preprocessor
nrcpp \ KPP_1.1 \In the 2nd chapter in the ISO C ++ 98 standard there are requirements for the preprocessor and lexical conventions (lexical conventions). That's nice, I thought, because this is the simplest part and can be implemented separately from the compiler itself. In other words, first, the file preprocessing is started, to the input of which a C ++ file is received in the form you used to see it. And after preprocessing, at the output we have a converted C ++ file, but without comments, substituted with files from #include, substituted macros from # define, saved with #pragma and processed with conditional # if / # ifdef / # endif compilation.
Before preprocessing:#define MAX(a, b) \ ((a) > (b) ? a : b) #define STR(s) #s int main() { printf("%s: %d", STR(This is a string), MAX(4, 5)); }
After preprocessing: int main() { printf("%s: %d", "This is a string", ((4) > (5) ? 4 : 5)); }
In the appendage, the
preprocessor did a lot of useful work, like calculating constant expressions, concatenating string literals, outputting #warning and #error. Oh yeah, have you ever seen Digraphs and trigraphs in C-code? If not, know - they exist!
An example of trigraphs and digraphsint a <: 10:>; // equivalent int a [10];
if (x! = 0) <%%> // equivalent if (x! = 0) {}
// Example trigraph
?? = define arraycheck (a, b) a ?? (b ??) ??! ??! b ?? (a ??)
// is mapped to
#define arraycheck (a, b) a [b] || b [a]
Read more in the
wiki .
Of course, the main benefit of the C ++ preprocessor is the substitution of macros and the insertion of files indicated in #include.
What have I learned in the process of writing a C ++ preproster?
- How is the vocabulary and syntax of the language
- The priorities of C ++ operators. And in general, how are expressions calculated
- Strings, characters, constants, postfixes of constants
- Code structure
In general, it took about a month to write the preprocessor. Not too difficult, but also a non-trivial task, nevertheless.
At this time, my classmates tried to write the first “Hello, world!”, But at least to collect it. Not everyone has succeeded. And the following sections of the C ++ standard were waiting for me, with the direct implementation of the language compiler already.
Lexical analyzer
nrcpp / LexicalAnalyzer.cppEverything is simple, I have already written the main part of the analysis of vocabulary in the preprocessor. The task of the lexical analyzer is to parse the code into lexemes or tokens that will already be parsed by the parser.
What was written at this stage?
- State machine for the analysis of integer, real and symbolic constants. Think it easy? But just when you did it.
- State machine for the analysis of string characters
- Parsing variable names and C ++ keywords
- Something else to drink like to give. I will remember I will add
Syntactical analyzer
nrcpp / parser.cppThe task of the parser is to check the correctness of the placement of tokens, which were obtained at the stages of lexical analysis.
The parser was based on a simple parser from Schildt, which was upgraded to C ++ syntax, with a stack overflow check. If we write for example:
(((((((((((((((((((((((((((((0)))))))))))))))))))))))))))))))));
Then my recursive parser will eat the stack, and it will show that the expression is too complicated.
An attentive reader may have a question. Why reinvent the wheel, because it was the same yacc and lex. Yes, it was. But at that stage, I wanted a bike with full control over the code. Of course, in performance it was inferior to the code generated by these utilities. But that was not the goal - technical excellence. The goal was to understand everything.
Semantics
nrcpp / checker.cppnrcpp / Coordinator.cppnrcpp / overload.cppIt occupies chapters 3 through 14 of the ISO C ++ 98 standard accordingly. This is the most difficult part, and I am sure that> 90% of C ++ developers do not know all the rules described in these sections. For example:
Did you know that the function can be declared twice, thus:
void f(int x, int y = 7); void f(int x = 5, int y);
There are such constructions for pointers:
const volatile int *const volatile *const p;
And this is a pointer to a member function of class X:
void (X::*mf)(int &)
This is the first thing that came to mind. Needless to say, when testing the code from a standard in Visual C ++ 6, it was not rare that I received an Internal Compiler Error.
The development of the analyzer of the semantics of the language took me 1.5 years, or one and a half uni courses. During this time, I was almost kicked out, in other subjects besides programming, for the happiness, we got a triple (well, ok, four), and in the meantime the compiler was developed and acquired functionality.
Code generator
nrcpp / Translator.cppAt this stage, when the enthusiasm began to fade a little, we already have a completely working version of the front-end compiler. What further to do with this front-end, the developer decides for himself. You can distribute it in this form, you can use it to write a code analyzer, you can use it to create your own converter like C ++ -> C #, or C ++ -> C. At this stage we have a syntactically and semantically validated AST (abstract syntax tree) .
And at this stage the compiler developer understands that he has comprehended Zen, has achieved enlightenment, and may, without looking, understand why the code works in this way. To achieve my goal, to create a C ++ compiler, I decided to finish on generating C code, which could then be converted into any existing assembler language or input to existing Sishny compilers (as Straustrup did in the first versions of C with Classes) ).
What is not in nrcpp?
- Templates (templates) . C ++ templates, this is such a cunning wise system from the point of view of implementation that I had to admit without interfering with the parser and mixing it with semantics - the templates will not work properly.
- namespace std . You can’t write a standard library without templates. Yes, however, and it would take many, many months, as it takes the lion’s share of the standard.
- Internal compiler errors . If you play with the code, you can see messages like:
internal compiler error: in.txt (20, 14): “theApp.IsDiagnostic ()” -> (Translator.h, 484)
This is either not implemented functionality, or not taken into account semantic rules.
Why write your bike?
And in conclusion, I want to mention something for which this post was written. Writing my bike, even if it took more than 2 years, it still feeds me. This is invaluable knowledge, a base that will be with you throughout your development career. Technologies, frameworks, new languages ​​will change - but the foundation will be laid in them from the past. And their understanding and development will take very little time.
github.com/nrcpp/nrcpp - the source code of the compiler. You can play the right in.txt file and watch the output in out.txt.
github.com/nrcpp/nrcpp/tree/master/KPP_1.1 - the source code of the preprocessor. It is built using Visual C ++ 6.