From: DGD Mailing List (Robert Forshaw)
Date: Sun Mar 28 11:22:01 2004
Subject: [DGD] parse string difficulties

Despite my best efforts I can't understand how parse string is supposed to 
work. I thought I *might* know what I was doing, when I put it to the test, 
but got some unexpected results.

Right now the reason for me employing parse_string is to interpret 
datafiles. The datafiles are meant to contain three things: operators, a 
prefix-parameter and a suffix-parameter. Operators consist of one symbol 
only and can be: '+', '-', '=', or '.'. A prefix-parameter is optional and 
appears behind the operator, a suffix-parameter is also optional and appears 
in front of it.

So an example data file might look like this:

.food
.chocolate
weight=8
layers+3
fattening=yes
strange-
smell-noxious

The prefix-parameter, operator and suffix-parameter collectively make a 
line, with newlines acting as seperators for these lines.

I want parse_string to interpret the entire datafile in one string, and 
allocate an array in the following format: ({ ({ operator, prefix-parameter, 
suffix-parameter }), ({ operator, prefix-parameter, suffix-parameter }) ... 
}). If a prefix-parameter or suffix parameter is occluded, that respective 
element in the returned array will be nil.


So in the example:

.food
.chocolate
weight=8
layers+3
fattening=yes
strange-
smell-noxious

it would return:


({ ({ ".", nil, "food" }), ({ ".", "nil", "chocolate" }), ({ "=", "weight", 
"8" }), ({ "+", "layers", "3"}),
    ({ "=", "fattening", "yes" }), ({ "-", "strange", nil }), ({ "-", 
"smell", "noxious" }) })

Call me stupid, call me what you like, but I can't figure out how to 
accomplish this using parse_string. I can't even do something much simpler, 
like parseing a single line. Here:

ar = parse_string(
        "whitespace = /[\b\r\n\t ]+/" +
        "word       = /[a-zA-Z]+/" +
        "operator   = /[\.\+\=\-]+/" +
        "SENTENCE   : operator word operator",
        "property+value");

Now this returns nil, but I want it to return ({ "property", "+", "value" 
}).

And for some bizzare reason, changing the operator regexp so that it only 
detects "+":

ar = parse_string(
        "whitespace = /[\b\r\n\t ]+/" +
        "word       = /[a-zA-Z]+/" +
        "operator   = /++/" +
        "SENTENCE   : operator word operator",
        "property+value");

Causes a malformed rule error.

I seriously don't know what I'm doing here. It would be most helpful if 
someone could show me line for line how to write a grammar that interprets 
my datafiles, that would help me relate to what the function is doing. 
Anyway, all help is appreciated...

_________________________________________________________________
Express yourself with cool new emoticons http://www.msn.co.uk/specials/myemo


From: DGD Mailing List (Erwin Harte)
Date: Sun Mar 28 12:55:02 2004
Subject: [DGD] Re: parse string difficulties

On Sun, Mar 28, 2004 at 05:20:56PM +0000, Robert Forshaw wrote:
> Despite my best efforts I can't understand how parse string is supposed to 
> work. I thought I *might* know what I was doing, when I put it to the test, 
> but got some unexpected results.
[...]
> I seriously don't know what I'm doing here. It would be most helpful if 
> someone could show me line for line how to write a grammar that interprets 
> my datafiles, that would help me relate to what the function is doing. 
> Anyway, all help is appreciated...

I like a challenge like that and did some experimenting.  This is the
grammar I came up with:

    string query_grammar()
    {   
	return
	    "whitespace = /[\b\r\t ]+/\n" +
	    "newline    = /\n/\n" +
	    "word       = /[a-zA-Z0-9]+/\n" +
	    "operator   = /[\\.\\+\\=\\-]+/\n" +

	    "SENTENCE   : OPERATION          ? fun_a\n" +
	    "SENTENCE   : SENTENCE OPERATION ? fun_b\n" +

	    "OPERATION  : word operator word newline ? fun_1\n" +
	    "OPERATION  : word operator      newline ? fun_2\n" +
	    "OPERATION  :      operator word newline ? fun_3\n";
    }

You need to double-escape the ., +, = and - so that the parse_string()
kfun actually _sees_ \. while "\." is identical to "." (hope that made
sense).  You didn't include digits in your original word regexp.

I took the newline out of the whitespace regexp so that it could be
used separately and avoid grammar confusion between

  word operator word
  operator word

and

  word operator
  word operator word

which would otherwise be impossible to distinguish reliably.

The fun_a and fun_b functions create and append to lists of
word/operator/word combinations.

    static mixed *fun_a(mixed *tree)
    {   
	return ({ tree });
    }

    static mixed *fun_b(mixed *tree)
    {   
	return ({ tree[0] + ({ tree[1] }) });
    }

The fun_1, fun_2 and fun_3 functions fill in the blanks (nils) where
appropriate and create 3-tuples (3-sized arrays) in the order you
wanted.

    static mixed *fun_1(mixed *tree)
    {   
	return ({ ({ tree[1], tree[0], tree[2] }) });
    }

    static mixed *fun_2(mixed *tree)
    {   
	return ({ ({ tree[1], tree[0], nil }) });
    }

    static mixed *fun_3(mixed *tree)
    {   
	return ({ ({ tree[0], nil, tree[1] }) });
    }

Throwing something like ".food\nweight=8\n.chocolate\n" at it, it
returns to me with:

  ({ ({ ({ ".", nil, "food" }),
        ({ "=", "weight", "8" }),
        ({ ".", nil, "chocolate" }) }) })

In general:

    static mixed *parse_text(string text)
    {
        mixed result;

        result = parse_string(query_grammar(), text);
        return result ? result[0] : nil;
    }

Hope that helps,

Erwin.
-- 
Erwin Harte