
Before the advent of gccxml, there was only one way to extract meta-information from C /
C ++ code. To begin with, it was necessary to write a parser that can handle the grammar of
C ++ . This is not a task that you usually solve at home over the weekend.
Now, writing a parser is no longer necessary. The modified gcc compiler analyzes your code and provides a description of all namespaces, types, classes, and functions encountered in the program. The data is issued in XML format and,
in principle, is ready for further automatic analysis and processing.
To parse the XML data received from gccxml, the pygccxml library is useful. This is not just a reader of the gccxml format — the library provides interfaces for exploring the collected metadata; in particular, are there ready functions that answer questions like “are the types T
1 and T
2 compatible?” or “does class C
1 inherit from C
2 ?”. The library is written in Python.
')
Introduction to gccxml
Gccxml was developed in Kitware (they are also CMake authors). This is a modified C ++ parser from GCC.
You probably haven't installed gccxml yet. Personally, I put gccxml using the package manager and I see no need to dwell on this step in detail. If there is no package manager for your OS, I'm afraid I can't help you.
Let's start with a simple definition of a function.
namespace test { int fn(int a, int b); }
Compile:
gccxml -fxml=test.xml test.cpp
At the output we get test.xml of this content (fragment):
<GCC_XML> <Namespace id="_1" name="::" members="… _96 …" mangled="_Z2::" demangled="::"/> <Namespace id="_96" name="test" context="_1" members="_141 " mangled="_Z4test" demangled="test"/> <FundamentalType id="_128" name="int" size="32" align="32"/> <Function id="_141" name="fn" returns="_128" context="_96" mangled="_ZN4test2fnEii" demangled="test::fn(int, int)" location="f1:2" file="f1" line="2" extern="1" > <Argument name="a" type="_128" location="f1:2" file="f1" line="2"/> <Argument name="b" type="_128" location="f1:2" file="f1" line="2"/> </Function> <File id="f1" name="test.cpp"/> </GCC_XML>
Here everything is clear and without documentation. I will not give examples for other constructions of the C ++ language - everything is similar there. The primary goal is achieved - the meta-information is extracted in a format suitable for further automatic processing.
We extract even more meta-information.
Sometimes the source contains more information than is provided by the semantics of the C ++ language. Example:
SAL annotations in Windows (__in, __out, and so on.)
BOOL WINAPI CreateProcess( __in_opt LPCTSTR lpApplicationName, __inout_opt LPTSTR lpCommandLine, __in_opt LPSECURITY_ATTRIBUTES lpProcessAttributes, __in_opt LPSECURITY_ATTRIBUTES lpThreadAttributes, __in BOOL bInheritHandles, __in DWORD dwCreationFlags, __in_opt LPVOID lpEnvironment, __in_opt LPCTSTR lpCurrentDirectory, __in LPSTARTUPINFO lpStartupInfo, __out LPPROCESS_INFORMATION lpProcessInformation );
Example: information about the minimum version of Mac OS X in which the API function is available.
CFErrorRef CFErrorCreate( CFAllocatorRef allocator, CFStringRef domain, CFIndex code, CFDictionaryRef userInfo ) AVAILABLE_MAC_OS_X_VERSION_10_5_AND_LATER;
Even this additional meta-information can be extracted with gccxml. Here we can use the gcc-specific C ++ language syntax extension
attribute construction. We will experiment with the definition of the function fn:
#define __foo __attribute__((gccxml("__foo")) #define __bar __attribute__((gccxml("__bar")) namespace test { __foo int fn(__bar int a, int b); }
Attributes “apply” to the nearest semantic unit in the source text. So the first attribute refers to the function fn, and the second to the parameter a. Gcc understands various attributes, but in this case we are only interested in the gccxml attribute.
Gccxml provides the following information on the fn function. As we can see, all the annotations are saved and available for further processing.
<Function id="_141" name="fn" returns="_128" context="_96" mangled="_ZN4test2fnEii" demangled="test::fn(int, int)" location="f1:7" file="f1" line="7" extern="1" attributes="gccxml(__foo)"> <Argument name="a" type="_128" location="f1:7" file="f1" line="7" attributes="gccxml(__bar)"/> <Argument name="b" type="_128" location="f1:7" file="f1" line="7"/> </Function>
Introduction to pygccxml
Pygccxml is developed by Roman Yakovenko and co. The goal of the project is automatic production of C ++ / Python bindings using boost :: python. I wonder what they did not suit SWIG?
Pygccxml can be supplied either through the package manager or manually (download
here , installation instructions in the README.txt).
The pygccxml documentation is poor. To get started it is enough, but if you need something that goes beyond the basic capabilities, you will have to look into the source code of the library. This is strange, but the documentation is not available for online viewing, it can only be
downloaded .
Below is an example of a simple C ++ code analyzer using the pygccxml library.
The script prints all the functions declared in the test namespace.
import pygccxml db = pygccxml.parser.parse(['test.cpp']) global_ns = pygccxml.declarations.get_global_namespace(db) for test_ns in global_ns.namespaces('test'): for function in test_ns.calldefs(): pygccxml.declarations.print_declarations(function)
Here is the result of the script:
free_function_t: 'fn' location: [./test.cpp]:4 artificial: 'False' attributes: gccxml(__foo) demangled: test::fn(int, int) mangled: _ZN4test2fnEii return type: int arguments type: int a, int b
Ideas for your code
Why do I need to programmatically analyze
C ++ code if I am not interested in languages and compilers? - the pragmatic reader will ask. Now I will talk about a few very real tasks that required automatic code analysis.
Here is what Alexey Pakhunov
writes aka notakernelguy:
I was recently wondering why there are no libraries emulating UTF-8 support at the Win32 API level. Those. such a library implements, say, CreateFileUtf8 in addition to the ones offered by the system CreateFileA and CreateFileW, and the CreateFile macro will select the desired implementation from three options.
In 2007, Alex decides to create such a library for the transparent implementation of Unicode support in the Notepad2 program. It was supposed to automatically process the Windows header files and generate the required library programmatically. Alexey
does not use gccxml and in 2012 his library is still not ready.
The following two examples from my practice.
Using gccxml, I made
C ++ a wrapper for
CoreFoundation - the basic object-oriented C API on Mac OS X. The goal of this project is to implement automatic control of the lifetime of CF objects. Yes, I know about
ARC .
And here is the second example. I have a data processing system written in
C ++ . The system is initially single-threaded, in order to increase the speed in the plans to distribute part of the interacting objects along different streams. For this, it is supposed to create a series of proxy classes that will convert the method call to sending a message to another thread, where after the message has been opened, the method of the object hidden behind the proxy will be called. Changes to the existing code are not required, since access to any object is still carried out from a single stream. It is required to write a lot of the same type of code, and this task is best entrusted to the automatic generator.
Gccxml restrictions
Unfortunately, gccxml has some drawbacks. Only declarations are retrieved from the code, and function bodies are not available. Template declarations are also not available. Gccxml is based on a fairly old version of gcc and the development is not very active.