A grammar in programming is a set of rules for parsing text. This is a very useful thing — for example, the grammar can be used to check whether a line of text obeys specific standards or not. Perl 6 has built-in grammar support. They are so easy to create that once you start, you will find that you use them everywhere.
Recently, I worked on Module :: Minter, a simple application to create the basic structure of a Perl 6 module. I had to check that the proposed module name complies with the Perl 6 naming standards.
Module names are identifiers separated by two colons. The identifier must begin with an alphabetic character (az) or an underscore, followed by alphanumeric characters. True, some modules can have only one identifier, without colons, while others can have many (
HTTP :: Server :: Async :: Plugins :: Router :: Simple ).
')
Define grammar
In Perl, 6 grammars are based on regulars. I need two: one for identifiers, the other for separators in the form of colons. For identifiers, I asked:
<[A..Za..z_]>
Remember that we use regulars from Perl 6, and then it looks a little different. The character class is defined by <[...]>, and the range is determined by the operator ... instead of a dash. This regular field matches any first letter or underscore followed by zero or more alphanumeric characters.
With two colons everything is simpler:
\:\:
Grammars are defined using the grammar keyword, followed by a name. I’ll call this grammar Legal :: Module :: Name
grammar Legal::Module::Name { ... }
Now you can add regular tokens to it:
grammar Legal::Module::Name { token identifier {
Each grammar must be given a token TOP, which marks its beginning.
grammar Legal::Module::Name { token TOP {
The TOP token determines that the resolved module name begins with an identifier token followed by zero or more separator and identifier pairs of tokens. Maintaining such a thing is very simple - if I wanted to change the rules so that the separators contain a dash, I would update the regular list in only one token.
Grammar usage
The parse method drives the grammar on the line, and, if successful, returns a match object. The following code processes the $ proposed_module_name line, and either displays a match object or an error message.
my $proposed_module_name = 'Super::New::Module'; my $match_obj = Legal::Module::Name.parse($proposed_module_name); if $match_obj { say $match_obj; } else { say ' - , ?!'; } : 「Super::New::Module」 identifier => 「Super」 separator => 「::」 identifier => 「New」 separator => 「::」 identifier => 「Module」
Extract the contents of the match object.
You can not dump the entire contents of the match object, but retrieve the tokens that played. The following code uses named regulars and hash keys.
say $match_obj[0].Str;
Action Classes
Perl 6 makes it possible to add an action class that defines additional behavior for the tokens that have played. Suppose I want to add a warning in case the module name contains too many identifiers. First, I define an action class:
class Module::Name::Actions { method TOP($/) { if $.elems > 5 { warn ' – , ?.. '; } } }
This is the usual class definition in Perl 6. I added the TOP method that matches the first grammar token. Then I count the number of matches, and if there are more than 5, I give a warning. It does not interrupt the execution, but makes it clear to the user that it is worth thinking about renaming the module.
Then we initialize the action class and pass it to parse as an argument:
my $actions = Module::Name::Actions.new; my $match_obj = Legal-Module-Name.parse($proposed_module_name, :actions($actions));
The grammar calls the appropriate method of the action class each time a suitable token is encountered during the parsing. In our case, this will happen once during the parsing.
Grammar in Perl 5
And in Perl 5, you can make grammar. For a solution similar to Perl 6, you can look towards
Regexp :: Grammars or Ingy Döt Net's
Pegex . Excellent implementations can be found in Chapter 1,
Mastering Perl by brian d foy, which contains an example of grammar for JSON.