📜 ⬆️ ⬇️

The birth of Software Tools: how and why did GREP and AWK appear

image
This summer, I came across an essay by Brian Kernighan, “Sometimes the old ways are the best,” published in honor of the 25th anniversary of IEEE Software magazine in 2008. In it, Professor Kernighan talked about what tools he uses in his work.

At that moment, he was busy with two complex projects - one of them meant expert analysis of a project for 100,000 lines of code, written in C and Assembler in 1990, under Windows XP; the other is the transfer of code from an exotic L1 language to a no less exotic L2 language using a program written in an unnamed scripting language for Linux. Surprisingly, for such different tasks, Professor Kernigan instead of IDE used the same set of tools - grep, diff, sort, awk, wc and other “old acquaintances” from the early Unix era. Moreover, he criticized many tools and IDE of the end of the last decade for inconvenience and imperfection.

Indeed, we have become so accustomed to some things in our lives that we already take them for granted and it does not even occur to us to criticize them - it seems to us that they have always existed. This way of thinking helps to adapt to the flow of information and is inevitable in the modern world, but let's not deny ourselves the pleasure of “going down a level” today to see how the very idea of software tools appeared (they are further in the text - “software tools ", Utilities, Unix commands ).

Those who


image Of course, it all began in the walls of the Bell Labs lab, owned by AT & T - it was here that Unix, C, and ( somewhat later ) C ++ were born. Since the appearance of Unix teams was inextricably linked with the creation of the OS itself, the main actors here are Ken Thompson, Brian Kernighan and Alfred Aho, who are well known to Habrahabr readers.
')
Ken Thompson, having become the creator of Unix with Dennis Ritchie, will leave Bell Labs in 1975 and return to his alma mater, University of Berkeley, where he will contribute to the emergence of BSD; he will later return to Bell Labs, where he will work on Plan 9 with Rob Pike and the others.

Alfred Aho is known to domestic programmers for the famous "Book of the Dragon" (it is also called "Compilers: principles, technologies and tools" ), as well as for the textbook "Data Structures and Algorithms" . Aho worked at Bell Labs from 1967 to 1991, after which he once again returned to the role of vice president of the Computer Science research center at the turn of the millennium.

Brian Kernigan, co-author of the all-time K & R C Programming Language (hiding under the letter “K” from K & R), and co-author of a number of other great programming books published over the past thirty years - including the recent Go Programming Language .

Kernigan began his career in 1966 at MIT, where he was indirectly related to the Multics project. A year later, during graduate school at Princeton University, he became an employee of Bell Labs, and in 1969 - one of the first developers who took part in the work on Unix. Kernighan was assigned user ID # 9, but his first notable contribution to Unix dates back to 1973 — the creation of eqn , the troff preprocessor for the language for describing mathematical expressions.

image
This is said to be the very first “Hello World” in the world. Posted by Brian Kernigan. (a source)

For these and many other scientists who worked at Bell Labs in the late 60s - early 70s, Unix was much more than just another OS.

Software tools


Today, we take the idea of ​​Unix commands and pipelines as a matter of course; however, in reality this was not always the case.

The very idea that tools should be used to increase the productivity of programmers — analogous to what humanity used in other areas of its life — is certainly not new. One of the pioneers of computer science, Alan J. Perlis, predicted that the emergence of a large number of improvised tools could allow programmers to create much larger projects.

The first real step in this direction was the invention of time-sharing systems, which made it possible to distribute computing resources among many users. The second is the creation of Multics, an attempt to implement such a system. The third is Unix, which grew up on the ruins of Multics.

In the 70s, outside of Bell Labs and the Unix operating system, the idea of ​​creating such tools was unknown. But the authors of Unix were well aware of what they were dealing with - after all, for them, the key feature of Unix was flexibility; if desired, from a simple OS Unix turned into a full IDE. And, of course, this idea remains applicable today (see the article “Unix as IDE” ).

image The flexibility in Unix made it possible to take a fresh look at programming itself. “Do you need this feature? We do not have it, but thanks to utilities # 1 and # 2, we can implement it right now, and tomorrow you will be able to use it. ” In those years when “flexible” software development methodologies were not used in practice, and the tasks could take months, such words cost a lot.

Few people know that the pipe ( pipe ), one of the most important features of Unix, could become part of this OS in 1969. Then an idea like this was considered by Ritchie and Thompson when designing the file system, but, alas, “imagination let them down”, which Richie later lamented. But Douglas McIlroy, head of Bell Labs Research, did not let him down, who invented and described this concept in man pages in Unix v3.

After channels appeared for the first time in Unix on January 15, 1973, this not only “disables” all Bell Labs (“The day after we implemented pipes , everyone considered it his duty to try to write a one-line” - he will remember later McIlroy), but also gives the most powerful final impetus needed to implement the idea of ​​software tools that could be combined.

Grep will appear next, after which it will finally become clear - the future stands behind this idea. It will be formulated in the famous "UNIX Philosophy" :
Let each program do just one thing, but it does it well. If you need to perform a new function, it is better to make a new program than to confuse the old one with the addition of new features. Use tools to make your task easier for yourself and others, even if for this you need to write new utility tools that you have to throw away.

Grep


grep has long been Ken Thompson’s personal tool; The first public version of grep was included in Unix v4 - and immediately became for everyone else a tool in demand in daily work.

The name grep today is often decoded as general regular expression parser ; However, according to Dennis Ritchie and Ken Thompson , the utility name appeared differently: in the editors qed / ed, g / re / p was just an editor command that it performed - global regular expression print (global regular expression search and print strings, containing matches).

Douglas McIlroy, author of spell, diff, sort, join and other commands, will later claim that grep was added at his request. He was working on a speech synthesizer, and he needed Ken Thompson to “take out” from the editor ed the search function of regulars in the text into a separate program. Alas, it is unknown how true this is.

In any case, grep became the starting point for all other software tools - after it, Bell Labs began to move in the direction of developing various tools that could be combined with each other. However, not everything was in the utility directory; among the developers, there was an agreement to refrain from any unnecessary "garbage". Therefore, before the teams could get into public access, they most often took a long run-in in private mode.

In addition, the authors of Unix did not fix the time of creation of a team anywhere, so we need to be guided by the dates specified in man - the moment when a team appeared in the reference manual.

When Brian Kernighan and Bill Plager release the book Software Tools in 1975, she will introduce a wide audience to the idea of ​​software tools. Borrowing an idea from Unix, this book demonstrated how using a small set of text utilities you can make programmers much more productive. An interesting fact: the programs in this book are written in the Fortran dialect - because at that time C was barely three years old, the authors made a bet on Fortran, hoping thereby to sell more copies.

Despite the fact that the phrase “software tools” was not used before the release of the book within the walls of Bell Labs, it is difficult to say who owned the authorship of this concept. The same Kernigan flatly refuses to recognize himself as the author of this idea; he rather considers himself a popularizer of this approach, and it’s hard to argue with that - starting with eqn and the numerous Unix manuals written by him in the 70s, he has been promoting this idea to the masses.

Two years later, Kernigan will be the author of another utility (or rather, the language), which still continues to serve programmers awk, faithfully.

Awk


AWK was born out of necessity. No one ever thought that it would be used outside of a group of several programmers working at Bell Labs. Alfred Aho, who then worked at Bell Labs, had to keep track of budgets, correspondence — and the assessments of students at the university that was located nearby and in which he taught at that time.

Of course, the best way to solve these problems would be to write programs of a length of one or two lines; that's just the language in which such programs could be written, then did not exist. In those years, in the room next to Aho, Brian Kernigan worked, and his own routine tasks caused a similar desire. Every day they discussed these ideas again and again, which as a result led them to want to create together a pattern-matching language that would be suitable for simple data processing tasks.

The inspiration for AWK was grep . But if everything that GREP could do was search in the match file with a fairly limited class of regular expressions and print all the found strings, then the AWK authors would like more. First, the program had to work with numbers and strings; secondly, to allow more varied data processing actions, and not just a simple printing of lines.

Aho and Kernighan were engaged in data-processing algorithms for a long time. For Aho, these tasks were of particular interest; therefore, he drew attention to the LEX and YACC that Bell Labs used to create compilers (only later will they be used in other places). Brian Kernighan was familiar with these programs, so things like using lexemes were taken for granted by AWK.

Peter Weinberger was in the know from the very beginning. He joined Aho and Kernighan at the moment when they finished with the grammar specification, and in a week created a working prototype. Thanks to his work, it became possible to continue to engage in the evolution of language.

Just to agree on which constructions should be in the language, and which - no, it took a whole year. The very first version of AWK was written at the end of 1977. The name of the software and the language came by itself - due to the fact that the trinity was constantly seen together, the colleagues had time to get used to refer to them as “AWK” (by the first initials of the last names)!

The created language turned out so successful that it attracted to the ranks of programmers those who had not thought about programming before. Aho later recalled that he had met people who were doing absolutely breathtaking projects with the help of AWK - for example, one enthusiast implemented his own CAD (CAD) on it and lamented that due to a bug in AWK he lost three weeks of his time. (By the way, after this complaint, Aho and Kernighan decided that it was time to implement quality control, and since then, in order to implement new functionality in AWK, the developer had to first write a test for it)

Past & Present


The tools with which Unix slowly mastered attracted the attention of an ever wider audience to the OS. They could be quickly studied and conveniently used, each time saving time on hitherto difficult tasks.

Studying awk today is not particularly difficult. This command line repository on Github offers hundreds of examples of awk and grep programs . Do not forget to read about xargs , which is used to run multiple processes in parallel.

If you need a more thorough approach to the question, then your service is The GNU Awk User's Guide , an extremely detailed reference book that is perfect as a textbook.

If you’re excited to see what the awk source code looked like in different years, then a pleasant surprise awaits you. Dan Bornstein, the creator of Dalvik , collected in his repository all the versions of the one True One Awk he could find! The repository stores only the original awk , which Brian Kernighan subsequently wrote and actively supported — which is why it is often called bwk .

If you don’t want to dive so deeply, you can look at the working bwk sources in another repository .

The idea of ​​"software tools" will continue to gain momentum. By 1981, Unix itself included more than 300 utilities. AWK will not disappear without a trace either - it will long remain one of the most popular languages ​​and inspire Perl with its principles.

According to him, since that time a lot of water has flowed - Java and Python gained their popularity, which offered programmers their expressiveness and security in exchange for time and memory, which in our days often turns out to be an accessible compromise.

Epilogue


Returning to the beginning of our conversation, after reading the essay, I decided to ask Professor Kernighan how has his working process changed over the past ten years, and what does he use in his work today? The answer seemed to me quite interesting:

Since then, little has changed. I mostly use sam and vi to edit text. I used them for the Go book; Alan Donovan is an emacs wizard ( "emacs wizard" ). Among those who write on Go in NY, I rarely met someone who would use a full-fledged IDE - and am not sure that I even met such people.

What else has changed? Today I probably write a lot more code in Python than 10 years ago. It scales better than AWK; However, it's not that I have to write large programs.

Brian k

At work, Brian still uses the 27-inch iMac, and at home he has several MacBook Pros and MacBook Airs; old Lenovo with Windows XP installed. “Macs” are required for the most part as terminals for accessing the Linux servers of the university where it works.

Professor Kernighan still uses Alpine to read the mail, Sam , written by Rob Pike, and vi , who knows very well (every time he rushes to emacs , still prevents something from getting used) as an editor, but most of all wants to get rid of the unimaginable number of wires and connectors that annoy homes and on trips. If you are interested in more details, he gave a short interview on this topic here - although it has already become obsolete for 4 years, not much has changed since then.

So, the ideas that appeared half a century ago continue to serve us today; it remains to hope that modern software will find its place in the future.

Source: https://habr.com/ru/post/333780/


All Articles