Being engaged in the development of algorithms, I constantly pull myself up, and suddenly the changes that work on a small example will bring confusion and vacillation into the results on other big data. Then the command line comes to my rescue. The worst thing is that every time the argument parser is already implemented, it means that the package
program_options from the
boost library is not the last resort for a C ++ programmer.
Let's start with an example. Suppose I am developing an algorithm for recognizing something with learning and we have the following data. Files with some data and extension .dat (data); files with training information and the extension .trn (train) and parameter files with the extension .prs (parameters). Parameter files are the result of learning and are used for recognition. So, we have 3 actions: train (to train), recognize (recognize), score (to estimate quality of recognition). In this case, the script for calling the chain of learning, recognition, and evaluation looks, for example, like this:
recognizer --type=train --input=train.dat --info=train.trn --output=best.prs recognizer --type=recognize --input=test1.dat --input=test2.dat --params=best.prs --output=./ recognizer --type=score --ethanol=test1_expected.trn --test=test1.trn --output=scores.txt recognizer --type=score --ethanol=test2_expected.trn --test=test2.trn --output=scores.txt
In the example of the data file and the learning file, a parameter file is created, then the parameter file is used to recognize another data file, the recognition result is compared with a standard and appended to the end of the file with the results. In order to program all this logic for parsing the command line using program_options, nothing is required:
po::options_description desc("General options"); std::string task_type; desc.add_options() ("help,h", "Show help") ("type,t", po::value<std::string>(&task_type), "Select task: train, recognize, score") ; po::options_description train_desc("Train options"); train_desc.add_options() ("input,I", po::value<std::string>(), "Input .dat file") ("info,i", po::value<std::string>(), "Input .trn file") ("output,O", po::value<std::string>(), "Output parameters file .prs") ; po::options_description recognize_desc("Recognize options"); recognize_desc.add_options() ("input,I", po::value<std::vector<std::string> >(), "Input .dat file") ("params,p", po::value<std::string>(), "Input .prs file") ("output,O", po::value<std::string>(), "Output directory") ; po::options_description score_desc("Score options"); score_desc.add_options() ("ethanol,e", po::value<std::string>(), "Etalon .trn file") ("test,t", po::value<std::string>(), "Testing .trn file") ("output,O", po::value<std::string>(), "Output comparison file") ;
A description of valid command line arguments includes information about their types, a brief verbal description of each of them, and some grouping. Checking the coercion of argument types minimizes concerns about incorrect data. A brief description allows you to systematize information and virtually avoid comments, and grouping allows you to separate the required arguments from the optional ones. Let's take a closer look at a specific line:
')
("input,I", po::value<std::string>(), "Input .dat file")
The first argument is input, I actually are two variants of the argument: input is the long name of the argument, I is short (case has a value). A special feature of boost :: program_options is that the short name must always be single-letter (however, it is possible not to specify it). A call to a long name on the command line will look like this:
--input=train.dat
A short transfer of the argument, less readable at first glance, but I prefer to use it:
-Itrain.dat
The second parameter po :: value <std :: string> () defines the format of the argument value (the part after the equal sign) and may be absent if no value is required to be passed. For example, the following calls are equivalent:
recognizer --help recognizer -h
If you look more closely, you can see that in the group recognize, the input argument is of type:
po::value<std::vector<std::string> >()
std :: vector <std :: string> means that input can appear in command line arguments more than once, that is, in our case, it is possible to conduct recognition of more than one file at a time. For example:
recognizer --type=recognize -itest1.dat -itest2.dat -pbest.prs -O./
The third and last parameter is the description. A very useful item, especially when you need to find something else six months after writing the last line in the recognizer. In our case, the help output will look something like this:
me@my: ./recognizer -h General options: -h [ --help ] Show help -t [ --type ] arg Select task: train, recognize, score Train options: -I [ --input ] arg Input .dat file -i [ --info ] arg Input .trn file -O [ --output ] arg Output parameters file .prs Recognize options: -I [ --input ] arg Input .dat file -p [ --params ] arg Input .prs file -O [ --output ] arg Output directory Score options: -e [ --ethanol ] arg Etalon .trn file -t [ --test ] arg Testing .trn file -O [ --output ] arg Output comparison file
Let's move on to parsing the command line arguments. The first thing you need to do is find out the task that should be performed by the program recognizer:
namespace po = boost::program_options; po::variables_map vm; po::parsed_options parsed = po::command_line_parser(ac, av).options(desc).allow_unregistered().run(); po::store(parsed, vm); po::notify(vm);
We pass only General options as an argument template. Without calling allow_unregistered, boost :: program_options will swear on extra arguments not described in the template, in which only the type of operation and help. After this code is executed, the task_type variable is populated and you can write “switch”:
if(task_type == "train") { desc.add(train_desc); po::store(po::parse_command_line(ac,av,desc), vm); train(vm); } else if(task_type == "recognize") {
The corresponding group is added to the template and the command-line arguments are fully understood without exceptions. The
vm variable is a dictionary with a string key and boost :: any as values. help, as you can see, is almost free.
Consider the procedure train (vm) more closely to understand how to get the values from the resulting dictionary.
void train(const po::variables_map& vm) { std::string input_path, info_path, output_path; if (vm.count("input")) { input_path = vm["input"].as<std::string>(); } if(vm.count("info")) { info_path = vm["info"].as<std::string>(); } if(vm.count("output")) { output_path = vm["output"].as<std::string>(); }
As you can see, everything is simple, however, note that the arguments must be addressed by their
full name, and not by the string passed in the description. Compare “info, i” and simply “info”.
Conclusion
A full version of the example can be found on
pastebin . This is not all the library's capabilities, but for those who are already interested in the middle have gone to read the official documentation.
Benefits:
- intuitiveness (at least for me)
- self-sufficiency (comments, types, names and groups out of the box)
- working with arguments and configuration files (although this was not covered)
Disadvantages:
- meager documentation
- requires linking binaries (compared to many other boost packages)
- only single letter short argument names