📜 ⬆️ ⬇️

Interview with the founder of the project www.checkmycode.org

Our today's interview is with Andrei Vasilkovsky (Andrzej Wasylkowski). Since 2005, he has been working on his Ph.D. in the Software Engineering Department at the University of SaarbrĂĽcken, Germany. His research area is software development, with a focus shifted towards code analysis methods and their applicability in automated problem search systems.

One of the projects in which he is involved is checkmycode. This is a service that allows you to compare your code with the “wisdom of the collective mind”, more than 200 million lines of C code from the Gentoo distribution.

Read on and you’ll learn everything about this project and how it uses Gentoo!

')
Hi Andrey, and thanks for the interview!

Hello! In turn, I would also like to thank for the interview, it is a real pleasure to be your virtual guest.

1. When I go to www.checkmycode.org , I see a form in which I can enter my code, and in return it gives me a list of “anomalies” in it, and an explanation of why this or that piece of code is “anomalous”, but therefore, it may contain errors. Tell us what is happening at this time “behind the scenes”?

In short, we rummaged through the entire Gentoo Linux distribution, searching for typical ways to use component interfaces - well, that is, as usual, Linux components are used. If you use some kind of interface that is not in the generally accepted key, this will be designated as an anomaly.

If you take the whole picture, the process can be divided into three stages. First, the code you sent is parsed, and so-called “serial communications” are generated from it. These are two-element sequences of function calls annotated with data transfer information, for example, “the return value of socket () -> first argument to listen ()”. They are an abstraction of how your code uses functions to perform operations on "objects".

Then, we look for all the projects from the Gentoo Linux distribution that may be relevant to your code to see how these projects use the same functions as you. Without going into details, if you call socket (), then the code from all projects that also call socket () will be taken into account to detect the serial links that include the socket () call.

Finally, we check if your code does not violate any of the patterns found in the second stage (in the tutorial you can find samples of such violations). Any violations found will be reported to you via the website interface.

In reality, much more is happening, but the overall picture is as follows.

2. When did this project start, and why?

Today we have several complex methods of checking code for errors. What we lack is specifications for comparison. So we wanted to take these specifications from real-life code. The project grew gradually, and it is rather difficult to take and indicate the exact date of its beginning. We started with a lightweight parser that was written by my student Natalie Gruska as part of her bachelor's degree. The parser was completed in July 2009, but the original idea had nothing to do with the idea of ​​analyzing large amounts of code. We just wanted to create a language-independent frontend for one of my tools, JADET. It turned out that the parser was very fast, and soon Professor Andreas Zeller came up with the idea of ​​analyzing large amounts of code with it. The remaining few months before the creation of the web service, for the most part I worked; so that it really works that way, and scaled to the size of the entire Linux distribution.

3. Who else is involved in the project?

As I said, the parser used by the site is written by Natalie Gruska, which is now a student at the Royal University of Canada. The original idea belongs to my boss, Professor Andreas Zeller. The web interface and programming of the site is the work of my colleague Kevin Streit, who, like me, is working on a doctoral thesis at the University of Saarbrucken.

4. What is the participation of Gentoo in the project?

All source code that is used to search for patterns is taken from the Gentoo distribution (in other words, the snippet that you insert into the form is compared with the source codes of the Gentoo distribution).

5. Why was he chosen?

It gives us access to all the projects included in the distribution package. In turn, this gives us the opportunity to use our parser to determine how certain functions are used by these projects.

6. What are some of the hardest times to use Gentoo?

There are none :) Working with Gentoo was one pleasure, really. And so far - the easiest part of the whole project.

7. What would you change in Gentoo to make it easier to use within your project?

That's what I would like to access, but I could not (or simply did not find where it can be done), so this is the web interface to the source tree of all projects. When a pattern violation is found, the user also receives three examples of where the “correct” source code can be found. Now we provide a web interface for this, but until checkmycode existed, I had to manually unpack the source code archives, and look for code in them, and it was boring.

8. What do you like about Gentoo?

I like the fact that portage is fairly easy to use, and that Gentoo is using a "rolling release" strategy. Anyone who has ever used non-rolling release Linux distributions and has run into unresolvable version conflicts while trying to use the newest version of a package will understand what I'm talking about.

In addition, for obvious reasons, I like the fact that I have access to the source code :) In fact, the machine on which our site is located is running Gentoo.

9. Thanks again for taking the time to talk about your project. Would you like to add something else?

Thank you for asking me all these questions, it was very nice! I would just like to point out that www.checkmycode.org is a small interface to a tool that can analyze quite large programs as a whole and find violations in them. Consequently, we make great efforts to ensure that he is able to filter what he considers to be a “false alarm”, and this greatly reduces the number of violations found. Unfortunately, a side effect of this is that some of the real errors associated with incorrect use of functions will not be detected. So, to paraphrase Edsger W. Dijkstra, the tool can only indicate the presence, not the absence of potentially problematic places in your code.

Source: https://habr.com/ru/post/89761/


All Articles