Date: Sun, 23 May 1999 18:44:36 +0200 (CEST)
From: "Felix A. Croes"
To: dgd@list.imaginary.com
Subject: Re: [DGD]parse_string() problems

Geir Harald Hansen wrote:

> finally I'm starting to work on my command parser.  And being a
> parse_string() newbie, I'm running into some problems.  I include
> a minimal piece of code to illustrate my problem:
>
> static create()
> {
>   grammar = "\
> whitespace = / +/ \
> word = /[^ ]+/ \
> verbrule: 'say' str \
> substr: word \
> substr: word substr \
> str: substr ? parse_str \
> ";
> }
>
> static mixed *parse_str(string *str)
> {
>   return ({ ({ "_str", implode(str, " ") }) });
> }
>
> What I'm having trouble with at the moment is this:
> I intended for the str rule to include any string.
> The problem is that tokens are implicitly created from string constants,
> like 'say' above, and these take precedence over the 'word' token rule.
> >From the parser documentation:
>
>    String constants take precedence over regular expressions, and any
>    token that matches both a string constant and a regular expression
>    will always match the string constant, only.
>                                           ^^^^
> The result is that my str production can never contain a 'say'.
> So I cannot parse commands like:
>   "say If only I could say what I want."
> Any suggestions how to solve this?

Replace the substr rules with:

    substr: wordorsay
    substr: wordorsay substr
    wordorsay: word
    wordorsay: 'say'

Further comments: verbs like say which take arbitrary arguments that
do not follow rules established elsewhere in the grammar are handled
most easily by a single token, /say .*/

Left-recursive rules are parsed more efficiently than right-recursive
ones, so other things being equal the following should be preferred:

    substr: word
    substr: substr word


> Another question:
> I want to have a verb production rule, that should match all known verbs.
> I can see two ways of doing this, either:
>   verb: 'look' verb: 'say' verb: 'get' verb: 'drop' etc. etc.
> Or:
>   verb: word ? check_verb
> Where check_verb() would check whether the word is part of some verb
> array, returning the word unchanged if it is, otherwise nil.
>
> The last option would make the grammar smaller, but result in more execution
> of LPC code.  Since the function does not modify the word, perhaps
> the first solution is best?

Which one is best depends on other things.  I'll assume that the
rule considered is the only one in which explicit verbs actually
occur, i.e. elsewhere in the grammar the verb rule is used rather
than 'look'.

The first rule is good in a grammar where verbs may occur in places
where other words can occur as well, since knowing early on that
something is a verb reduces the cost of parsing.  As an example,
consider the following partial grammar:

    command: verb
    command: verb object
    command: adverb verb
    command: adverb verb object

    verb: word    ? check_verb
    adverb: word  ? check_adverb
    object: word  ? check_object

A command consisting of two words can match either "verb object"
or "adverb verb", meaning that the parser internally has to
construct the parse trees for both of them before it can call
an LPC function to determine if the first word is a verb or an
adverb.  By making the verbs explicit in the grammar, the
ambiguity disappears and the cost of parsing drops.  The
cost of parsing an ambiguous grammar can be exponentially
higher than that of parsing a non-ambiguous grammar.

The cost of parse_string() can be measured in ticks.
Parsing with ambiguous grammars takes more ticks than
parsing with equivalent non-ambiguous grammars.

If verbs only occur non-ambiguously in the grammar (for instance,
if the first word of every command is the verb), then your
second way is cheaper as far as the parser is concerned --
whether it is more efficient overall depends on the cost
of the LPC function called.


> For adjectives pretty much the same thing, although I may want to
> allow abbreviations.  ("smile hap")  Which means I need an LPC function.
>
> As an aside, what is the fastest way to check whether something is an
> element of an array?  Is "if (sizeof(({ element }) & array))" faster
> than those loops that compare with every element?

Slower for arrays with only a few elements, faster to much faster
for large arrays.  Mapping lookup is faster yet.


> Finally; if anyone knows of publicly released DGD parser code/grammars or
> would let me see theirs, just to get some idea of how things can be done,
> it would be greatly appreciated. :)

There is an example grammar that parses LPC (though it doesn't handle
nil yet) in the list archive, posting date Wed Feb 11 1998.

Regards,
Dworkin