(The article is available for offline reading: Markdown |
PDF |
PDF (print) |
HTML )
What for?
Everyone around is constantly saying: “Do you want to learn how to write professional programs? See how others do it! ” So I decided to follow this advice, especially since my studies at the university are just coming to an end. It is especially interesting to compare how they were
taught to do and how they are
done in the real world. The
GNU Coreutils package was chosen as an example to follow. It has everything:
- Strict requirements for portability.
- Big life cycle.
- Huge development team.
- Code of varying complexity: from the trivial echo to the super-sophisticated sed, from the purely applied wc to closer to the OS mkdir.
GNU Coreutils
GNU Core Utilites is a set of utilities for performing basic user operations: creating a directory, displaying a file on the screen, and so on. According to the developers, these utilities should be available in any operating system, which we are seeing at the present time:
Cygwin is for Windows, but there is nothing to say about * nix. Maintain the uniformity of work in different systems helps the
POSIX standard, which in Coreutils try to
comply . Coreutils contains such commonly used utilities as cat, tail, echo, wc, and many others.
To begin, choose the most trivial program called yes. Its simplicity allows you to deal with the tools and libraries used in Coreutils.
')
Utility yes
As stated in the
mana , all that the yes utility is able to do is infinitely output "yn" to stdout. If we pass yes some arguments, then instead of "y" yes will display the arguments separated by spaces. Surely a similar program was written by anyone who began to study C. So, many people have the opportunity to compare their approach with how the harsh, bearded guys from GNU do. About practical application yes is written a little in
Wikipedia .
Source
Go to the source code. You can get it either with
apt-get source
and get the version that is used on your system by default, or pull the latest version out of the repositories. We will choose the second option: it is more convenient and familiar.
- Coreutils:
git clone git://git.sv.gnu.org/coreutils
- Gnulib (look there a couple of times):
git clone git://git.savannah.gnu.org/gnulib.git
The source code yes fits in a single
coreutils/src/yes.c
, and open it.
Coding style
The first thing you notice is the unusual formatting of the code. You can read about it in the
relevant chapter of the GNU Coding Standards. For example, when defining a function, the type of the return value should be placed on a separate line, like the opening bracket:
int main (int argc, char **argv) { foo(); ... }
Only spaces are used for indentation and alignment. Between different levels of nesting, the difference in indent is 2 spaces. Bracelets with operators have a particularly perverted form:
if (x < foo (y, z)) haha = bar[4] + 5; else { while (z) { haha += foo (z, z); z--; } return ++x + bar (); }
12 lines
yes.c
begins with a comment required for all GPL programs. He had already managed to kill my eyes in other programs and the need for its presence was a mystery to me. It turns out that the text of this comment is fixed in the
instructions for using the GPL. It is written in it that everyone who wants to release their software under the GPL must add these 12 lines of copyright statement to the beginning of each source code file.
initialize_main
The first thing the program does is call
initialize_main
. This function is intended for the program to perform its specific actions on the arguments. In practice, in Coreutils, there is not a single utility that would use this function for something useful. Everywhere the stub is used, represented in the
coreutils/src/system.h
:
#ifndef initialize_main # define initialize_main(ac, av) #endif
The name of the program
Coreutils utilities distinguish two program names:
- The official name that the user can not change.
- The real name of the executable file.
The official name is used when displaying application version information:
user@laptop:~$ yes --version yes (GNU coreutils) 8.5 Usage: yes [STRING]... or: yes OPTION
Moreover, this name does not depend on the name of the executable file:
user@laptop:~$ /usr/bin/yes --version yes (GNU coreutils) 8.5 user@laptop:~$ cp /usr/bin/yes ./foo user@laptop:~$ ./foo --version yes (GNU coreutils) 8.5
This behavior is provided by the macro
PROGRAM_NAME
specifically defined at the beginning of the file:
#define PROGRAM_NAME "yes"
The real name without any tricks is taken from
argv[0]
and is used when displaying errors and prompts:
user@laptop:~$ yes --help Usage: yes [STRING]... or: yes OPTION user@laptop:~$ /usr/bin/yes --help Usage: /usr/bin/yes [STRING]... or: /usr/bin/yes OPTION
The value
argv[0]
is placed in the global variable
program_name
by calling the
set_program_name
function in the second line of
main
:
set_program_name (argv[0])
The
set_program_name
function
set_program_name
provided by the
Gnulib library. The corresponding code is located in the
gnulib/lib/
directory, in the
progname.h
and
progname.c
. It is interesting to note that
set_program_name
not only saves the values
argv[0]
into the global variable
program_name
declared in
progname.h
, but also performs additional conversions related to the subtleties of using
GNU Libtool , a tool for developing dynamic libraries.
Internationalization
Coreutils are used throughout the world, so all utilities provide for localization. Moreover, this feature is provided with minimal effort due to the use of the
GNU gettext package. Few will be surprised by the use of gettext, because this package has spread far beyond the GNU project. For example, internationalization in my favorite Django web framework is built
on gettext . About using gettext with various languages and frameworks have already been written on
Habré .
A great feature of gettext is that it is used in approximately the same way in all languages, and C is no exception. Here is the standard magic function
_
, the use of which can be found in the
usage
function:
void usage (int status) { if (status != EXIT_SUCCESS) fprintf (stderr, _("Try `%s --help' for more information.\n"), program_name); ... }
The function definition
_
is in the
system.h
file already familiar to us:
#define _(msgid) gettext (msgid)
Initialization of the internationalization mechanism in Coreutils is performed by calling three functions in
main
:
setlocale (LC_ALL, ""); bindtextdomain (PACKAGE, LOCALEDIR); textdomain (PACKAGE);
- setlocale sets the default locale of the environment as working for the application.
- bindtextdomain tells you where to look for a file with translations for a specific message domain
- textdomain sets the current message domain
Error processing
Moving further along the
main
code, we meet the following line:
atexit (close_stdout)
Intuitively, you might think that the standard output stream is closed in the
close_stdout
function, which eliminates data loss if we replace
stdout
with some file descriptor and use buffered output. But I did not succeed in finding the source code for this function and understanding what is actually happening there, whether any additional actions for cleaning up resources are being performed.
Command line arguments
This is the last question that does not concern the work of the program itself. Here, as in the case of internationalization, the time-tested and crawled into many projects (for example,
in Python ) solution is used - the
getopt module. This module is very simple: in fact, the developer is required to call one of the functions
getopt
or
getopt_long
in a loop. More information about getopt can be read on the Internet, and on Habré, they also wrote about it.
Gnulib has a special function
parse_long_options
for handling the
--version
and
--help
arguments, which any GNU application
must support. It is located in the
gnulib/lib/long-options.c
file and uses
getopt_long
in its work.
The source code yes is a great example of working with getopt. There is at the same time no need for learning the complexity of the analysis of dozens of arguments, and there is the use of all getopt tools. First, of course, call
parse_long_options
. Then it is checked that no more options-keys are passed and the remaining arguments, if any, are just arbitrary strings:
parse_long_options (argc, argv, PROGRAM_NAME, PACKAGE_NAME, Version, usage, AUTHORS, (char const *) NULL); if (getopt_long (argc, argv, "+", NULL, NULL) != -1) usage (EXIT_FAILURE);
The following code can be translated into Russian as follows: “If there was nothing in the argument list of the command line except the --version and --help keys, then we will output“ y ”to stdout”:
if (argc <= optind) { optind = argc; argv[argc++] = bad_cast ("y"); }
Writing to
argv[argc]
not an error: the ANSI C standard requires that the
argv[argc]
element be a null pointer.
Main loop
Well, we got to the functionality of the program. Here it is, as it is:
while (true) { int i; for (i = optind; i < argc; i++) if (fputs (argv[i], stdout) == EOF || putchar (i == argc - 1 ? '\n' : ' ') == EOF) error (EXIT_FAILURE, errno, _("standard output")); }
It can be noted here that all actions are performed inside the
if
condition, and not in its body. So, Kernigan and Ritchie did not lie when they wrote that an experienced C-programmer implements the copying of lines like this:
while (*dst++ = *src++) ;