A good opportunity for web developers to learn the C programming language is the HTML5 parser
Gumbo , implemented as a small C99 library without external dependencies. The parser is created as a building block for creating other tools and libraries, such as validators, template languages, refactoring and code analysis tools.
Features:
- Full compatibility with HTML5 specifications .
- Resistance to poor input data.
- Simple APIs that can be easily processed by programs in other PLs.
- Support for source positions and pointers in the original code, while navigating the dependency tree.
- It passes all tests html5lib-0.95 .
- Tested for more than 2.5 billion pages in the Google index.
The developers do not set themselves the goals of optimizing the parser for performance, it was written in C not to increase the speed of code execution tenfold.
In the future we plan to add support for the latest HTML5 features, support for parsing code snippets, full error reports, and so on.
')
To use the Gumbo parser, you need to include the
gumbo.h
file, and then call
gumbo_parse
.
#include "gumbo.h" int main(int argc, char** argv) { GumboOutput* output = gumbo_parse(argv[1]); // Do stuff with output->root gumbo_destroy_output(&kGumboDefaultOptions, output); }
Useful examples see
here .
The program is published under the Apache 2 license.