Building Command Line Applications (CLI)

This article was written under the influence of the book “Build Awesome Command-Line Application in Ruby” by David Copeland ( buy , download and study additional materials ). Most of it will be devoted to designing CLI applications, regardless of the language used. Along the way, things specific to ruby will be discussed, but it's not scary if you don't know it, the code will not be too much. You can consider this article as a rather detailed review of the above-mentioned book interspersed with your own experience. I recommend the book!

To begin, I will ask a question. If you look at the IT communities, you will notice that despite the abundance of programs with a beautiful graphical user interface, command-line applications remain quite popular. Why?
There are several answers. Firstly, it is ~~beautifully~~ convenient - if you can describe a task with a command from the command line, then it is much easier to automate than if you have to analyze mouse movements and clicks on different menu items. Secondly, it makes it possible to combine programs in an incredible number of ways, which is difficult to achieve with the help of graphical interfaces.
To a large extent, the Unix philosophy is based on the principle that many small utilities, each of which is able to do its own specific task, are better than one multifunctional universal program. And this is one of the reasons for the success of Unix-systems in the world of IT-Schnick.
Probably everyone understands that the average user is unlikely to be able to lure out from the GUI to the CLI, let's focus on us, “computer scientists,” and specify our wishes for CLI applications.

General requirements

In a nutshell, we want to use them was simple, but effective. David Copeland has written an exhaustive list of application requirements to achieve this:

Easy to use - It should be easy to use and have a clear goal. Preferably one. Tula, which, like a Swiss knife, are able to do everything, are usually difficult to use and no one knows all their possibilities. However, we'll also talk about how to design multi-functional applications to make them easier to use and to support.
The minimum that you can do to simplify working with your program is to follow the option format agreements. Do not force your users to relearn! How it is accepted to specify options, and how to name them, I will write in detail.
Helpful - This means that the user should have easy access to the help about how the application does, how to start it, and how to configure it. It is desirable that the application adds its page to the man. In addition, integration with the shell at the level of command completion will not hurt.
Plays well with others - The application must be able to interact with other applications. This means the modularity of applications, as is customary in Unix. Do not neglect return codes, well-designed work with input-output streams and not only.
Has sensible defaults but is configurable - Standard usage scenarios should be available without specifying thousands of options. Non-standard scripts do not need to be easy to use, but should still be available. In addition, the default set of options must be customizable.
Installs painlessly - Easy to install along with all dependencies, installs the path to the application in environment variables for easier launch. Updates should be just as easy.
Fails gracefully - In case of an error in the call to the application, it should tell you what the error was and how to fix it. In addition, the application must be non-destructive, i.e. should not overwrite or erase files if an error is made in the arguments (and ideally, if it does not perform dangerous operations at all without confirmation)
Gets new features and bug fixes easily - The application must be supported. Crush the application into modules and scatter them across different files. Write tests. Use semantic versioning. Use a version control system.
Delights users - The output of the application should be nice looking. To your order colors, formatting (for example, tabular or html). It also includes interactive user interaction.

Now I will go through these points in more detail.
')

Easy to use

Utilities and software packages. What is more convenient?

So, all applications can be divided into two types: utilities and software packages (in the original, Command Suite).

The first type is applications that have one goal, one mode of operation. There are countless examples of this type of program, almost all Unix commands are: ls, grep, diff, ... (rubists can recall, for example, the rspec command). The convenience of these programs is that their capabilities are easier to remember and more difficult to get confused. In addition, it is easier to glue them together in chains for consistent processing. The following analogy will be relevant here. Imagine that you are building a house, moreover a house is not a type specimen. It is much more convenient to build it from bricks, and not from monolithic blocks, because in some places you would have to file these blocks, but somewhere you would have to stone the joints. Yes, and the blocks can only be a crane, while bricks can be laid by hand.

The second type of program can be compared with a Swiss knife or food processor. Sometimes they are extremely convenient. Look at git (in the world of Ruby, gem, rails, bundle are immediately remembered) - one program, and how much it can do. And kommitit / checkout can, and searches in the history itself, and considers changes between files. So grep, diff and others are built into it, nothing to combine with git and not necessary, he can do everything himself. If you go back to the analogy with the house, then the guitar has a model project for each case of life (and try to remember them all).
And yet, not all programs should be multifunctional: anyway, you will not overrun all use cases. In confirmation of this thesis, I suggest you imagine a "multitool" that can do cd, ls, pwd, diff, df and a bunch of useful operations with one command, only the options will have to be slightly changed (for example, filesystem change, filesystem show, filesystem where, etc. ). Will you use this? I think that throw out for excessive bulkiness. Although software packages are extremely convenient, it makes sense to think carefully before writing your centaur with eight tentacles.

By the way, if you wrote a dozen utilities, and then realized that you want it to be a software package, then it is not so difficult to fix. It is enough to write a wrapper that will route the commands. You may not know, but git consists of dozens of utilities like git-commit, git-show, git-hash-object, git-cat-file, git-update-index, and so on, which is passed to the git command based on the type of command and options (of course, there can be a whole chain of utility calls behind one command). So even large projects start better with a set of small programs that you will combine in the future. They are easier to write, debug, maintain and use.

The syntax of the command line arguments or "this house has its own traditions."

I'll start with the terminology. When you run a command line application, you specify a set of parameters. They are divided into the following types: options and arguments. David Copeland additionally divides options into two subtypes: flags and switches.
It can be schematically depicted as follows executable [options] <arguments> . I think that everything is already up to date, but just in case, I will explain that the parameters in square brackets are optional, and in the corner ones are mandatory.

Arguments (or positional arguments) are the parameters that must be specified for the program to work. Their order is strictly defined.
Options are optional parameters. They can be listed in any order.
Switch type options are boolean options, the program checks for their presence or absence. Example: --ignore-case (or, sometimes, --no-ignore-case ).
Flag type options are options with a parameter. This parameter can have a default value. For example, grep -C ... and grep -C 2 ... in my version, the grep utilities are equivalent.
Both arguments and options may have default values, but they may not.

As an example, in the grep --ignore-case -r - 4 "some string" /tmp arguments are "some string" /tmp , and the options --ignore-case -r - 4 . At the same time, --ignore-case and -r are switches, and -C 4 is a flag.

Agreement on the use of options

Options can be specified both in the short -i and in the long --ignore-case form. In principle, nothing prevents you from having fun with the format of options in any way, because you can intercept command line options in the script directly. But it is better to adhere to the rules developed by the Unix-community, as they are honed by time, and most people are accustomed to them. In addition, there are ready-made libraries that allow you to conveniently work with these options.
These rules are:

The long option ( --long-option ) starts with two hyphens. There can be no spaces in the option name, but single hyphens are quite acceptable.
The short option ( -l ) consists of one letter (as a rule, the register has a value) and the preceding hyphen - one. Conveniently, when a short option is the first letter of a long analog, it is easier to remember its value.
Several short options can be combined together as follows: ls -a -l equivalent to ls -al . After a single hyphen, there may be any number of options without arguments.
If a short option has a parameter, it can usually go either immediately after the option or be separated from it with a space. -C4 or -C 4 . I did not accidentally say “usually”. For example, the standard optparse Ruby library handles these two cases in the same way. The ls utility handles similar options in the same way. But the grep utility, for example, thinks that -C separate, and 4 is separate. Maybe this is a bug. I could not fail to mention that there are exceptions, but users are unlikely to be grateful to you if your program becomes another exception.
I do not know how they usually do when the short option has a non-numeric parameter with a default value ( -c [param] ), because -cxyz can be interpreted as - xyz , and -c -x -y -z . It is better for the user to always write a space if the option has an optional parameter. It is better for the programmer to think in advance about how to minimize the problems associated with the continuous writing of options.
In the case of a long option, the space in front of the parameter is usually allowed, but it is advisable to use an equal sign. ls --width=10 or ls --width 10
Without a separating character after the long option, the parameter is not indicated (judge for yourself what confusion will come out, especially if the parameter is not numeric).
Each option can be either short form or long. And it may be that there is only short or only long. However, the presence of a long form for each of the options is highly desirable.
For boolean options, you can specify an optional prefix no- , for example: --[no-]pager . The --pager option sets the pagination, --no-pager indicates that there should be no pagination, and the absence of the option retains the default value. This is especially important when the default options are configurable. Without a prefix, for example, it would be impossible to cancel the value of the option set by default.

Why is it always advisable to have a long option

The long form of options is usually used in writing scripts that use your application. Long options should be self-documenting. Imagine that a sysadmin looks in cron and sees the launch of a database backup task with incomprehensible options -zsf -m 100 . If he is not the author of the script, then he will have to climb into the help to understand what he means. Agree, the set of options --scheme --max-size 100 --gzip --force will tell him much more and will not make him spend too much time.
In addition, it is not recommended to make short options for rarely used options. First, the letters in the alphabet are not too many to spend on all options. Secondly, and this is even more important, the lack of a short option prompts the user that this option is secondary or even undesirable during normal operation, and that one should not use it thoughtlessly, simply because it is possible.
So, often used options have both a short form, and a long one, and rarely used ones - only a long one.

Differences in command line parameters for single-featured and feature-rich applications

These two types of applications have slightly different order of arguments when called.
For the first type of application, the call usually looks like this: executable [options] <arguments>
For software packages, the format is somewhat more complicated: executable [global options] <command> [command options] <arguments>
Here is an example from the book: git --no-pager push -v origin_master Here --no-pager is an option that can be applied to any git command, and -v is an option specific to the push command. It should be noted that global options and options specific to a command may have the same name. But in this case, not all means will be suitable for their processing, for example, the standard OptionParser Ruby-Library will not cope with this task, since it does not distinguish where the option occurred. If you are making a software package, use the GLI library better.

Just in case, I remind the reader that, technically, the input programs do not receive a string of parameters, but an array of parameters already divided into elements. In ruby, this array is called ARGV . It is with him that we work, directly or with the help of special libraries. It should be noted that it is not broken down into elements by spaces (otherwise it could not have, for example, file names with spaces), the shell has slightly more complex rules. There is involved and the screening of characters, and the use of quotes for grouping. If you need to screen, glue, or cut the parameter strings into arrays and back - look at the standard shellwords library. It consists of three methods for this purpose: String#shellsplit , String#shellescape and Array#shelljoin .

In the ruby community, most command-line applications use either OptionParser or a library based on it. In the section on configuration - when we find out a little more - I will give an example of code written by using OptionParser.

Helpful

Imagine that you are seeing for the first time the awesome_program program that you are going to use. As an experienced user, you will probably type in awesome_program --help in the hope of seeing the order of the arguments, a set of options, and examples of usage. When you release your program, remember that the user who first saw it will probably do the same thing first thing, so let you have the keys -h , --help at the ready. If you use a library of type OptionParser, you will automatically be included in the prompt line an enumeration of all the options that the program recognizes with the descriptions you give.

In addition to the help line, it makes sense to write extended help in man. However, as far as I know, rubygems does not automatically install pages in man. However, there is a gem-man gem that allows you to display the man documentation pages for gems installed in the system.
In order to create a man-documentation, you must create a file in a complex nroff format. To simplify the task, use the ronn converter library, which allows you to write documentation pages in a simpler format. When everything is ready, you can use the gem man awesome_gem - and see the help line. In addition, you can write alias gem='gem man -s' . At the same time, the man command is replaced by the gem man command, which allows you to search for man-help for gems. For those requests that the gem-man could not process, an automatic redirection to the corresponding page of the usual man occurs.
If you are thinking of making your own man-tips, look in the book, there is much more attention paid to it.

To make it even easier for the user to launch a command, you can make the command completion at the shell level (it does not work in all shells). This will allow clicking on the tab button to automatically supplement the names of commands, file names, etc. The user will be less likely to make a spelling mistake, and he will spend much less time writing the command.

In order to do tab-completion and store the history of commands within a program (online in programs like irb), it is enough to use the readline library built into Ruby. It allows you to automatically save the history of all entered commands - due to the use of the Readline.readline command instead of gets .

Tab-completion is done as follows: a Readline.completion_proc=(block) is returned to the Readline.completion_proc=(block) method, which returns an array of possible additions along the line of the text already entered, and that is the whole task. For example:

 Readline.completion_proc = proc { |input| allowed_commands.grep /^#{input}/ }

If you need a tab-completion not at the level of an already running program, but at the shell level, then this is somewhat more complicated. You will have to tinker with the .bashrc .
First add the complete -F get_my_app_completions my_app line to complete -F get_my_app_completions my_app
Now every time you type my_app()[- ] , and then the tab key, the get_my_app_completions function will be called. This function should return possible autocomplete options to the COMPREPLY variable, which the shell will use to provide the user with possible add-on options. In the .bashrc you need to define this function. I will give an example from the book for the todo application:

 function get_todo_completions() { if [ -z $2 ] ; then #    COMPREPLY=(`todo help -c`) else #      ,   $2 COMPREPLY=(`todo help -c $2`) fi } complete -F get_todo_completions todo

Now the program needs to implement the following behavior (let's leave it as an easy exercise):
1) if you manually type todo help -c on the command line, you should see a list of application commands: list, add, complete, help (each on its own line)
2) if you type todo help -c complete - you should see a list of all the cases that have been started but not completed yet (those to which the complete command can be applied).

help -c [...] is a service command, its presence can be not advertised in the brief help. It is assumed that it will not be used by the user, but by the script in .bashrc .
In this script, the function asks the application itself what can be substituted based on what is already typed (this one is transmitted after the help - option help - ). The program monitors the situation when the application is given such a special set of parameters, and displays a list of all the options in the standard output (as you have already seen), from where they are sent straight to the shell script COMPREPLY .
The author in the book uses its own GLI library, which automatically tracks such a set of options. You can easily realize this opportunity without the help of GLI. As you can see, there is no magic here.

Plays well with others

We will not discuss the need for convenient interaction between programs; anyone who worked in Unix can evaluate for himself how important this is. The main question is how to achieve this?

Return codes

First, use return codes. In case of successful completion of the program - 0, in case of error - various non-zero return codes. This is important because the shell scripts can determine if the program has worked normally by requesting the status of the last completed program from the $? variable $? . To return the return status, ruby uses the exit(exit_status) method.
If you assign different return codes to different errors, then there is a chance that another program using yours will be able to decide whether the problem can be fixed and whether you should pay attention to it at all. From the outside of the program, it’s better to see whether it’s scary that the program has “dropped” or not. Different return codes are like different exception classes, maybe you have a mistake in that there is no RAM, or maybe the network just disappeared for a second and you should try it one more time. Some programs may crash simultaneously from several errors. If you need to report several problems at once, use bitmasks. The network has recommendations on which return codes are used for which errors, which you can read about on the GNU (very general) and FreeBSD (very specific) sites. If you do not want to bother with error codes, make at least the minimum effort - return at least some non-zero value in case of an error. Otherwise, other programs will not even be able to find out if your program has worked normally.

By the way, you can run the program not only from the shell script, but also from the ruby script. There are several options to do this, for example, Kernel.system or IO.popen Read more about them in the documentation. If you call another program using system , can you find out its return code in a similar shell variable of $? .

I / O flows and error flow. Pipe

The main way of interaction between programs launched from the command line is pipes. A pipe is a way to redirect the output of one program to the input of another. Denoted by a vertical bar | . For example, in the command ls | sort ls | sort , the first part - ls does not display anything on the screen, but instead redirects its output to the input to the sorting program. And the sort program picks up text from the input stream line by line and already sorted the list displays. On the one hand, ls itself could sort the files, but it is not intended for this. At the same time, sort is needed for this very purpose and has many options. For example, you can sort strings not quite lexigraphically, if the names start with a number (otherwise the order will be as follows: 1.jpg, 10.jpg, 100.jpg, 2.jpg, ...). Or sort in reverse order. In addition, with the help of special programs (such as awk, sed), before sorting a string, you can correct it (for example, erase prefixes). It should be noted that the pipe can be composed of an arbitrary number of programs. So ls | sort -n | tail ls | sort -n | tail ls | sort -n | tail will print the last ten lines of the file list, sorted by number.

Think about why your program may need input and output streams. What you write to the output stream, and what to write to the error stream. We will first deal with the second question: what is the difference between output streams (stdout) and errors (stderr)? The fact that one stream goes to the pipe, and the other does not. The stderr stream is used to output not only errors, but also to output any information about the work process, such as the program execution stage, debug information, etc. This information should not get to the input of another program, it is needed only for the convenience of the user. The stdout stream is used for all other information, such as the output of the program.

You need to think about the format of the output, because the output of your program will work not only man, but the machine. It makes sense to make the option --format=<plain|csv|pretty|html|table|...> . When specifying the human-readable format (pretty / html / table), you can display information in a way that is pleasing to the eye without thinking about the convenience of parsing the output. When you specify a machine-readable format (plain / csv), it doesn't matter to you at all whether the result looks nice or not - the main thing is that it is easy to parse. For the convenience of parsing, use tab-delimited values or comma-separated values (csv), or format one line - one value, or another format that suits you best. We will talk more about how to make the most pleasant conclusion for a person in Delight Users.
It is not always possible for the user to specify the output format each time the program is launched, and it is not always possible to choose one for all occasions. Is there a trick that will help you automatically determine if the output is intended for the eyes of a person or a machine, namely the IO#tty? method IO#tty? . Call $stdout.tty? will tell you whether your output is directed to the terminal or not. If not, then the output of your program is sent to the pipe or redirected to a file (as follows: ls > output.txt ). For output sent to the terminal, and for output to the redirected stream, you can choose different default formatting options[:format] = $stderr.tty? ? 'table' : 'csv' : options[:format] = $stderr.tty? ? 'table' : 'csv' options[:format] = $stderr.tty? ? 'table' : 'csv'
And if you want, for example, to output to a file the result in an output format intended for a person, just indicate the format explicitly.

Now let's talk about the input stream. What data should the program receive from the input stream? This, of course, depends on the specifics of the program. Let's think about what data can come in the input stream? The obvious answer - those that are on the output of another program. For example, I have a program that converts a matrix file to another format and writes to a file with a different extension. Does it make sense to take matrices from the input stream? In my opinion, it does not have: how and why does this matrix fall into the input stream? It will be much more convenient to take a bundle of file names in order to immediately process multiple files. This is the data that the ls command can give, for example.
You can argue with this point of view, because you can write a script that will go through the list of files and run the program several times. But then you lose the ability to do it directly from the command line with one I / O redirection, you have to write a loop. In addition, some programs start and run quickly for a long time, so you can spend not one second, but one hundred one per hundred matrices (alas, this is a very real - and even optimistic - time scale when using multiple script launches from a heme on a Windows system) . But, in any case, the choice is yours. Do what is appropriate and do not forget to describe it in the manual.
By the way, is another program trying to transfer data to the input of your script by calling the method already familiar to us$stdin.tty?

Signals

Finally, another way that programs can communicate (without taking into account sockets and everything connected with them) - signals - deserves mention. Suppose you have a long-running process, for example, a web server, and you need to ask it to read the new configuration without rebooting. You can send him a signal (usually SIGHUP), and he, having intercepted him, will do what he is asked for. Or hang up the handler on the SIGINT signal, which will neatly exit the program by pressing Ctrl + C. All this is achieved by the methodSignal.trap. This method takes as arguments the name of the signal and the block that is executed when the specified signal arrives to the program. This works (like thread redirection, by the way) on all POSIX systems, i.e. both in Unix and in Windows. It is possible, however, that the set of supported signals in Windows will be smaller than in Unix, so if you are trying to cross-platform the application, the signals are a place to be thoroughly tested.
Here is an example of how to make sure that by pressing Ctrl + C, the program first cleaned up the unfinished files, and then closed, returning the error code:

 Signal.trap("SIGINT") do FileUtils.rm output_file exit 1 end

Has sensible defaults but is configurable

Again, standard usage scenarios should be available without specifying thousands of options. Non-standard scripts do not need to be easy to use, but should still be available. In addition, the default set of options must be customizable.

About standard scripts everything is clear. It is necessary to think over what the program will be used for and choose the most popular parameters - by default parameters.
All users have slightly different needs, so think of as many uses of your script as possible. If your application performs one task, but will be flexibly configured (within reasonable limits), users will thank you. Non-standard scripts should be, if they have an application. It doesn't matter if you have to specify many options for their execution - the user will think twice whether your script is intended. Let me remind you of the recommendation to make rare options long and not provide a short version for such options ( --use-nonstandard-modeinstead -u).

I will dwell on the last point in more detail. What does “the default option set have to be customizable”? Imagine that many people use your program and do it often. For example, you wrote a utility for database backup. Your sysadmin uses the utility every day and uses the default set of options (for example, --no-scheme --gzip), just typing db_backup my_db.
But besides the admin, the program is used by your fellow developers, whose database schema changes every day. And they are forced to write every day db_backup --scheme my_db, they cannot forget this key. You may be right, if you say that sisadminskie setting is important and will be the default setting ... but in reality there will be more options and --login, --password, --host,--force, and such a set of parameters is already difficult to reproduce without errors even by a sysadmin, whose other settings go by default. Do not force the sysadmin or the programmer to enter all these parameters every time and think about the settings, because you can make the default values configurable.

For this are the files of the form ~/.myapp.rc. There is no magic in the file, it is just an agreement. Each user in his / her home directory can create a file with his default settings. In the home directory - so that different users can set different defaults. The dot at the beginning of the file is to make it hidden. The .rc extension is a tribute to tradition.
What should be stored in this file? Just a list of those options that differ from standard defaults. For this configuration file is extremely convenient to use the format YAML. I will give an example:

 --- :gzip: false :force: true :user: "Bob" :password: "Secr3t!"

Consider how these options load.

 require 'yaml' require 'optparse' #  - options = { :gzip => true, :force => false } #     # (     ,    HOME) CONFIG_FILE = File.join(ENV['HOME'],'.db_backup.rc.yaml') if File.exists? CONFIG_FILE #     config_options = YAML.load_file(CONFIG_FILE) #     -     options.merge!(config_options) end #       #       ,    option_parser = OptionParser.new do |opts| #    . #__FILE__    ,        opts.banner = "Usage: #{__FILE__} [options] <db_name>" #   -u  --username   ,     opts.on("-u USER", "--username", "Database username, in first.last format") do |user| options[:user] = user end #     ,      opts.on("-p PASSWORD", "--password", "Database password") do |password| options[:password] = password end #     ,     .   ,   . #   --gzip      true. #      ,     - #     false     --no-gzip opts.on("--[no-]gzip", "Compress or not the backup file") do |gzip| options[:gzip] = gzip end end #       ,    option_parser. #    ARGV    , #        options # (     ) option_parser.parse!(ARGV) #    db_name = ARGV.shift

If you are faced with the task of setting up a software package, this is done with the same configuration file. Only in the hash this time there should be both global options and the options of each command - in the nested hash.

 --- :filename: ~/.todo.txt :url: http://jira.example.com :username: davec :password: S3cr3tP@ss :commands: :new: :f: true :group: Analytics Database :list: :format: pretty :done: {}

There are other ways to write data to a configuration file, YAML is just one of the most readable. Anyway, configuration via configuration files is a widely used technique. For example, gem and rspec utilities, as well as git, are configured this way.

Installs painlessly

Even a very good application no one will use if the installation process is too complicated. Fortunately, in the world of Ruby there is rubygems - a package manager that allows you to install and update programs in one line: A
gem install/update gemname

gem is a package that contains source codes, as well as information about the version number and the author, package description, and a set of dependencies (which versions of which gems are used by your library or your application). By default, all gems are published on the server rubygems.org. Thanks to this, when installing gems, you don’t need to look for where to download the package, it’s on a server common to everyone (of course, you can separately configure the corporate gems server so as not to give your gems to other hands) and the gem program automatically uses it to pump gems.
When you execute a commandgem install, the package manager searches the rubygems server for a package with the specified name and finds out what other gems (and which specific versions) it needs for work. Then it downloads all the necessary packages and installs them, compiles the native code, makes it possible to run the files listed as executable (it can be Ruby scripts, not just binaries). In the system, one gem can stand in any number of versions, each version can have its own set of dependency versions and not cause conflicts. For example, if you have rails 2.3.8 and rails 3.2, each of them will refer to its version of activesupport, the one with which it is agreed.

About how to create gems, I will not write, this was already a great article on Habré, if you still do not know how to do it - right now, break away from my article and take half an hour of your time to this question. It is very simple, very convenient and vital if you are going to engage in ruby and further.
When you create your gem, you have to specify the version number. There is a very consistent agreement called "semantic versioning." The version format consists of three numbers: Major.Minor.Patch. The lower number - patch-level is responsible only for bugfixes. The average number changes with API changes that are backward compatible. And the older number changes as you make changes that degrade backward compatibility. Notice, not a number, but a number, so the version number can easily be like this: 1.13.2.
Semantic versioning is useful because you can specify in the dependencies not the exact version of the heme, but the version with the accuracy of the patch level or the minor version. This way you get the opportunity, without doing anything with your own package, to get fixes that fix bugs in dependency packages. But at the same time, you have the ability to prevent dependency versions from changing too much, so as not to get API changes incompatible with your package with the next update.

Now - a couple of words about executable files. All files that you placed in the bin folder of your gem are considered executable (if you use the bundler in the default configuration to create a gem). When you install the package in the folder ... / ruby / bin, something like links to these files is created (in fact, a special Ruby script is created that knows where to find the executable file). The trick is that when installing ruby, this folder falls into the PATH environment variable - and thus becomes one of the places where executable files are searched. Thus, all executable files from the heme become available from anywhere in the system.
Under Windows, the process, as I understand it, is a bit more complicated - around this file, a bat wrapper is also created, which transfers control to the script itself. However, from the programmer and from the user all these details are hidden.

What should be in the executable file?
First, although it is a Ruby script, it does not need to specify the .rb extension. Neither on Unix, nor on Windows. This will only knock down the user, and for the successful launch of the script it is completely optional.
Secondly, in the first line it should be written: #!/usr/bin/env ruby
Pay attention to the fact that there is no path in this line /usr/bin/ruby. Using envit allows detecting ruby, even if it is located in another folder, which is simply necessary when rvm is installed.
Thirdly, it is better to put the entire script logic into a separate file, for example lib / my_exec.rb, and get it from the executable file using require. You can read more about this in the article on the manufacture of gems, which I mentioned above. As a result, the executable file looks like this:

 #!/usr/bin/env ruby require 'rubygems' #    ruby 1.9   require 'your-gem' require 'your-gem/my_exec'

What troubles are waiting for you if you decide not to collect your gem package? Well, apart from the obvious extra headache of dependency control, you will have another unpleasant surprise. Imagine that you wrote a script and write in the command line my_app value1 value2. What list of arguments do you expect? Probably ['value1', 'value2']. What do you really have? In Unix, everything will be as you expect. And running the script in Windows, you have an empty list of arguments, since my_appit is not perceived in Windows as a program that can receive arguments (instead, the ruby.exe program usually receives the arguments). In order to pass arguments to the script, it is necessary to run the script, assigning the word ruby at the beginning. Those.Each program launch will look like this: ruby my_app value1 value2and if you forget the word ruby, then you have a chance not even to understand why nothing works.

Remember, I mentioned that rubygems creates a bat-wrapper? This is necessary because the bat-file understands the command line arguments and can pass them to the script by calling the script appropriately. Thus rubygems solves this problem. But there is a fly in the ointment: such a cascade of calls slows down the start of the application. Sometimes the Hello World program runs for five seconds. After the application is started, everything works at normal speed, but the process of launching applications is unpleasantly long (as far as I understand, Windows processes are generally more heavyweight than Unix processes). This can be annoying when you run, for example, rspec after each code change. Or, every five minutes, spend five seconds waiting for the git reaction (which also suffers from this problem, although it is not written in ruby). But this is the pricewhich have to pay for the compatibility of programs with Windows, in a different way - no way.

Fails gracefully

It's all very simple.
Your script has dropped. Print an error message to stderr, suggest possible solutions (for example, tell me which argument is missing or which options conflict). If necessary, print a help line describing the usage. And certainly the program should not try to perform any actions if there are not enough arguments to perform them.
Does your script write something to the files or, maybe, erase the files? If so, make sure that it does not overwrite existing files. If the file exists, the script should say this and ask for confirmation of the file modification, or suggest using the key --force. Well, let the program display a message about which file it has overwritten or deleted.
The user specified strange options? Ask him to confirm that he meant exactly that. Imagine a command. rm -rf * .logThis random space will cost you a lot, so ask the user if there are suspicions that the program may behave too destructively.

Gets new features and bug fixes easily

Code support is too broad a concept to fully describe it. In a nutshell, make the application modular, split it into separate files. If necessary, break into several separate gems. In order to eliminate bugs and prevent new ones, it is necessary to write tests. And here there may be problems ...
The fact is that testing usually involves the isolation of tests, and when working with the file system (which is often the purpose of the utilities), this is not easy. You may need to create, delete and overwrite files, as well as restore the state of the file system after each test. All this can be extremely unpleasant and a long matter (working with HDD is not a quick deal at all). There are at least two solutions to the problem.
The first solution is aruba gem .that provides specific Cucumber scripts for CLI testing. Scripts for filling files with content, for cleaning, for checking the existence of a file, as well as scripts for launching an application with certain arguments, checking the return code, the contents of input / output streams, etc.
The second solution (which I personally like much more) is the usual TestUnit or rspec tests with the fakefs heme . This hem replaces the classes that work with the file system, and create a virtual file system in RAM - with its folder structure, its files with the content that you wish to put there. No mock-ups need to be created, the whole class File,Dir(and those involved) turn into one big fake for a while, so the program code doesn't need to be changed at all to test the behavior. There are no traces in the file system after work. Beauty!We enable the fake file system mode, load (do not start the application using Kernel.system, namely, load) the script from our lib / my_exec file and check the results.

How, without using aruba, to check that the program displays? To do this, you need to replace the stdout and stderr streams with objects of the class StringIO. Then after the work of the program it will be possible to check the contents of these “streams”. You can use a ready-made solution: out, err = MiniTest::Assertions.capture_io{ ... }from the standard testing library, or you can try to write code to intercept the contents of streams, it is not complicated at all.
Important note! Ruby has two variables for I / O streams: a constant STDOUTand a global variable.$stdout. When you will substitute streams using StringIO, use a global variable, do not try to change the constant. It is not that scary that you will see a warning, but that it will not give the expected effect. Probably, commands of the type putsby default are tied to a global variable, and the constant only refers to it.
Of course, this remark does not apply to cases of explicitly specifying a stream ( STDOUT.puts 'smth'), and this, by the way, is the reason not to use constants at all STDOUT, STDINand STDERR.

There is one more thing. We want the testing code to be as close to reality as possible. The script gets these variables as an array.ARGV. And we pass the arguments to the application in a string, not an array, right? We are faced with the question: how to get an array from an argument string. Recall the shellwords library mentioned above and break the string into elements using the method String#shellsplit.
Now that we have an array of command line arguments, we can, for example, replace the contents of ARGVthis array: ARGV.replace(new_array).
But it would be better to replace the array passed to the method.OptionParser#parse! .Instead, ARGVit is enough to transfer our new array OptionParser#parse!(new_array), and it will extract the options from it, and not from ARGV. Do not forget that it will be necessary to select positional arguments also from the new array - after isolating the options.

A separate question is how to test correct work with configuration files. The author advises to make it possible to override the settings of these files through environment variables. I personally find it inelegant and leave a curious reader the opportunity to get into the book and independently see the recommendations related to it.

Delights users

A few comments about the beautiful in the format of the output.

Tables

Suppose that your utility displays a list of the most popular bloggers: name, number of posts, comments, friends. The obvious output format suggests itself: draw a label. The solution is just as straightforward - use the terminal-table gem .

Colors

You will find another kind of prettiness if you use one of the utilities to compare files. You will see lines with plus and minus lines for added and deleted lines. But in addition, these lines for convenience are painted in different colors: red / green. Coloring the colors in the console is done by adding special escape sequences at the places where the color changes. Two popular solutions are rainbow and term-ansicolor gems.. They are also used very straightforward, read their manuals. It should be noted that - alas - not all terminals normally support working with colors. The standard Windows terminal for some programs instead of colored lines gives numbers encoding these colors, while for others it works correctly. So check the work of gems in different terminals before you start using them in the code.

David Copeland recalls that almost 10% of people are color blind. From this it follows that the color should only help to navigate the output of the program, and not to assume the function of a single data transmission channel. If the diff utility removes pluses and minus ones, then a significant part of people will lose the opportunity to use the results of the work. Therefore, in the colored output should be colors, and other data, having that, the colors are no longer necessary.
Important note! When your utility returns text in machine-readable format - it is desirable that color formatting be turned off. Otherwise, the party accepting the input data has a chance to earn “indigestion” from special characters in the string.

Interactive communication with the user

Another library is designed to provide interactivity. Readline, which I mentioned. So, at your service: remembering the history of user answers and autocompletion. Also in rubygems and thor there are special modules responsible for user interaction and providing methods such as say and ask.
What type of interactive applications are there? Remember irb and rails console. The author of the book gave another example: suppose you come to a large JSON object and you want to explore what it is. To do this, you can write an interactive JSON viewer that allows you to browse the hierarchy with the commands cd, ls, and also change it with the commands rm and mknode. An example is given solely to awaken your imagination. You can come up with hundreds more uses for interactive applications.

Take care of the user's nerves

Imagine yourself in the place of a user who is pumping a large file through a mobile connection and does not know how much has already been downloaded. The first ten minutes the user waits, knowing that the file is large. And then begins to bite your elbows: what if there is no connection? But can the program hang? maybe the file is so large that at the end of the month the user will be turned off the mobile phone once and for all? If the program can work for a long time, it would not hurt to display the status bar in stderr or in the log file (and do not forget that file operations are buffered, so do it from time to time flush, otherwise you have a chance to see the results in the log file only after the program has completed).
It would be ridiculous to count every percentage of loading on a new line! Instead, let's rewrite one line, changing numbers each time. The special symbol \r- carriage return will help us with this . When a line is encountered \r, the terminal cursor seems to shift to the beginning of the line and starts typing characters over the old ones (carefully, the “tail” of the old line is not automatically overwritten). But be careful, the method putsdoes not suit us, because it automatically translates the cursor to a new line. We need a method print, and here is an example of its use:

 (0..100).each do |i| $stderr.print("\r#{i}% done") sleep(0.1) end

It is worth a bit of caution, though: in the terminal window, the stderr and stdout streams are mixed, so you can start overwriting the wrong data.

In conclusion, I can not tell you about several popular libraries.

If you have worked a little with Ruby projects, you probably already met the rake team , and maybe the thor team . These are specialized libraries that allow using a special DSL to describe a set of utilities as a set of automation tasks.

Rake is an improved analogue of the make program for ruby. He looks in the Rakefile of the current or parent directory and searches for a description of the task there. For example, the bundler creates for each new Rakefile gem with a set of tasks: build, install, release, which allows you to install and publish your own gems with one command. One of the distinctive features of rake is a system of dependencies between tasks - so rake release will first execute the build task and only then release. Unfortunately, it is either impossible or non-trivial to pass arguments to rake. On Habré, by the way, there was already an introductory post about rake.

Thor is a rake-like system. It also allows you to "install" tasks in the system. More is better to look at other sources .
I mention these libraries because they can make life easier for you if you don’t need any complicated handling of options and arguments. Simple software packages are fully described in terms of these two well-known libraries. In particular, in Ruby on Rails, they are both used for tasks such as running generators, migrations, clearing the application cache, and so on.

I used most of the narration to parse OptionParser options from the standard ruby library. This is a very convenient library, but some consider it rather heavy and write wrappers. Some wrappers concentrate on simplifying the setting of options, some on making the hash of options available globally, and so on. If it seems to you that OptionParser is slowing you down, you can pick up one of the ready-made wrappers (the list can be found on the book’s website - see the beginning of the article) or make your own.
OptionParser has other drawbacks: it is not able to distinguish global from local options in the software package (we did this separation ourselves; in general, it is not followed from anywhere except by successfully applying the concept in some large projects, such as git). Another OptionParser feature (this is not spelled out in the specification and I suppose that this is a bug) is that it understands the negative numbers in the arguments as options. I believe that sooner or later this bug will be fixed, but if your program accepts numeric arguments, be careful and carefully test the program.

For building software packages, the author of the book, David Copeland, made quite a good GLI gem . It distinguishes between global options, command options, and recognizes the command itself. The project is live and periodically receives updates.

In addition, I can not fail to mention a rather crude, but extremely curious project - docopt . This is a library that generates an options parser on the hint line, whereas OptionParser and related libraries do the opposite. This library was originally written in python and ported to a fairly large number of languages. About her features can be read here . I think with the proper attention of the community, it can turn into an extremely convenient and powerful library.

PS In addition to the one I described in the article, rubistu, who works with command line applications, it makes sense to read about a special variable ARGF. If all the arguments of your script are file names, then ARGFsimply concatenate the contents of all files.

Source: https://habr.com/ru/post/150950/

All Articles