ANTLR performance

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

ANTLR performance

Chrobot, Stefan
Hi,

 

I'm using ANTLR with the C# target. The generated parser performs too
slow for my needs. My grammar uses k = 6.

Does it have a performance impact? What value should I target to get
optimum performance - 1 or *? Would changing the grammar to 1/* give
significant performance boost?

 

 

Stefan


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
Reply | Threaded
Open this post in threaded view
|

Re: ANTLR performance

anteusz
5/11/2010 11:17 AM keltezéssel, Chrobot, Stefan írta:

> Hi,
>
>
>
> I'm using ANTLR with the C# target. The generated parser performs too
> slow for my needs. My grammar uses k = 6.
>
> Does it have a performance impact? What value should I target to get
> optimum performance - 1 or *? Would changing the grammar to 1/* give
> significant performance boost?
>
>
>
>
>
> Stefan
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
>    
Could you try it yourself?
I mean test it. I would be interested in your results too..


Marton


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
Reply | Threaded
Open this post in threaded view
|

Re: ANTLR performance

Chrobot, Stefan
>> I'm using ANTLR with the C# target. The generated parser performs too
>> slow for my needs. My grammar uses k = 6.
>>
>> Does it have a performance impact? What value should I target to get
>> optimum performance - 1 or *? Would changing the grammar to 1/* give
>> significant performance boost?
>>
>>    
>Could you try it yourself?
>I mean test it. I would be interested in your results too..


It would probably take a good amount of time to change the grammar and
the actions. I can't invest my time in that. Even more, since I found
that the real performance bottleneck is in my case the use of rewrite
rules, TokenRewriteStream and StringTemplate. I got about 100x
performance boost after disabling the rewriting (leaving my actions in
place). I guess I'll have to do the outputting myself. This will be a
costly task (both implementation and performance-wise), but I suspect
(and truly hope) to get something like 50x performance improvement from
the original solution.


Stefan

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
Reply | Threaded
Open this post in threaded view
|

Re: ANTLR performance

anteusz
5/11/2010 12:26 PM keltezéssel, Chrobot, Stefan írta:

>>> I'm using ANTLR with the C# target. The generated parser performs too
>>> slow for my needs. My grammar uses k = 6.
>>>
>>> Does it have a performance impact? What value should I target to get
>>> optimum performance - 1 or *? Would changing the grammar to 1/* give
>>> significant performance boost?
>>>
>>>
>>>        
>> Could you try it yourself?
>> I mean test it. I would be interested in your results too..
>>      
>
> It would probably take a good amount of time to change the grammar and
> the actions. I can't invest my time in that. Even more, since I found
> that the real performance bottleneck is in my case the use of rewrite
> rules, TokenRewriteStream and StringTemplate. I got about 100x
> performance boost after disabling the rewriting (leaving my actions in
> place). I guess I'll have to do the outputting myself. This will be a
> costly task (both implementation and performance-wise), but I suspect
> (and truly hope) to get something like 50x performance improvement from
> the original solution.
>
>
> Stefan
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
>    
What kind of speed is slow for you? How big are the files that you analyse?

Marton


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
Reply | Threaded
Open this post in threaded view
|

Re: ANTLR performance

Chrobot, Stefan
>>>> I'm using ANTLR with the C# target. The generated parser performs
too
>>>> slow for my needs. My grammar uses k = 6.
>>>>
>>>> Does it have a performance impact? What value should I target to
get
>>>> optimum performance - 1 or *? Would changing the grammar to 1/*
give
>>>> significant performance boost?
>>>>        
>>> Could you try it yourself?
>>> I mean test it. I would be interested in your results too..
>>>      
>>
>> It would probably take a good amount of time to change the grammar
and
>> the actions. I can't invest my time in that. Even more, since I found
>> that the real performance bottleneck is in my case the use of rewrite
>> rules, TokenRewriteStream and StringTemplate. I got about 100x
>> performance boost after disabling the rewriting (leaving my actions
in
>> place). I guess I'll have to do the outputting myself. This will be a
>> costly task (both implementation and performance-wise), but I suspect
>> (and truly hope) to get something like 50x performance improvement
from
>> the original solution.
>>
>What kind of speed is slow for you? How big are the files that you
analyse?

For my needs, 10 seconds is definitely too much for a 25KB input. I'm
shooting for up to 0.5sec.


Stefan

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
Reply | Threaded
Open this post in threaded view
|

Re: ANTLR performance

Marcin Rzeźnicki
On Tue, May 11, 2010 at 1:34 PM, Chrobot, Stefan
<[hidden email]> wrote:

>>>>> I'm using ANTLR with the C# target. The generated parser performs
> too
>>>>> slow for my needs. My grammar uses k = 6.
>>>>>
>>>>> Does it have a performance impact? What value should I target to
> get
>>>>> optimum performance - 1 or *? Would changing the grammar to 1/*
> give
>>>>> significant performance boost?
>>>>>
>>>> Could you try it yourself?
>>>> I mean test it. I would be interested in your results too..
>>>>
>>>
>>> It would probably take a good amount of time to change the grammar
> and
>>> the actions. I can't invest my time in that. Even more, since I found
>>> that the real performance bottleneck is in my case the use of rewrite
>>> rules, TokenRewriteStream and StringTemplate. I got about 100x
>>> performance boost after disabling the rewriting (leaving my actions
> in
>>> place). I guess I'll have to do the outputting myself. This will be a
>>> costly task (both implementation and performance-wise), but I suspect
>>> (and truly hope) to get something like 50x performance improvement
> from
>>> the original solution.
>>>
>>What kind of speed is slow for you? How big are the files that you
> analyse?
>
> For my needs, 10 seconds is definitely too much for a 25KB input. I'm
> shooting for up to 0.5sec.
>
>

Hi
I have never cared about parsing performance that much, so there is a
chance that my comment here will be useless to you, measure yourself:
So, having said that, I have generally observed that automatic tree
construction is kind of slow (though it's been ok for my use cases) -
basically, if that's feasible, I rather try to implement my own trees
using visitor pattern/and or specific tree structures that are aligned
to what I need. Also, default CharStream/TokenStream implementations
may not be what you want. See for example how it implements
mark/release. I gained once a lot of speedup implementing my own line
counting and got rid of its state keeping in mark/release, I used
simple table of line endings positions with binary search. There are
lot of things to tailor. Also I try not to use mechanism which buffers
file input at once - but that might not prove to be big gain to you
(if you assume that most of your input is correct then it will
probably not be, if you assume otherwise than it may be). Let us know
how it goes.


--
Greetings
Marcin Rzeźnicki

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
Reply | Threaded
Open this post in threaded view
|

Re: ANTLR performance

Marcin Rzeźnicki
>
> Hi

PS. I am talking about Java parser, you need to check if c# parser is
different. Sorry, I did not notice this in the first place.

--
Greetings
Marcin Rzeźnicki

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
Reply | Threaded
Open this post in threaded view
|

Re: ANTLR performance

Lorenzo de Lara
In reply to this post by Chrobot, Stefan
I have noticed the same thing with rewrite=true and came upon this bug report from 2008, which is currently still open:

http://www.antlr.org/jira/browse/ANTLR-371

The problem is parsers with rewrite rules run in non-linear time on any inputs above a few hundred rewrites. I've verified this in both Java and C#. You can verify this for yourself by commenting out your rewrite rules and running the parser and observing much closer to linear runtime. (5 minutes with rewrite rules on vs. 5 seconds rewrite rules off on a typical 1500 line input for us) The offending method is GetKindOfOps in TokenRewriteStream taking up to 100% of the runtime according to a Java profiling tool.

I've implemented the proposed fix (in Java) which does away with calling GetKindOfOps completely and can confirm it does result in much more reasonable, linear-like performance, without introducing any new problems, as far as I can tell.

-Lorenzo

On 2010-05-11, at 5:17 , Chrobot, Stefan wrote:

Hi,



I'm using ANTLR with the C# target. The generated parser performs too
slow for my needs. My grammar uses k = 6.

Does it have a performance impact? What value should I target to get
optimum performance - 1 or *? Would changing the grammar to 1/* give
significant performance boost?





Stefan


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
Reply | Threaded
Open this post in threaded view
|

Re: ANTLR performance

Chrobot, Stefan
Thanks for your response, Lorenzo!

This is exactly what's happening with my code.
I dropped the rewriting and created my own mechanism. The running time
dropped from ~10.00sec to ~00.10sec. Below I present my solution.


Stefan


1) I created a custom token class:
internal class CustomToken : CommonToken
{
    private string myText;

    public CustomToken(ICharStream input, int type, int channel, int
start, int stop)
        : base(input, type, channel, start, stop)
    {
    }

    public void ParseAs(string text)
    {
        myText = text;
    }

    public override string Text
    {
        get
        {
            return myText ?? base.Text;
        }
        set
        {
            base.Text = value;
        }
    }
}

2) Made lexer emit CustomTokens:
public override IToken Emit()
{
    var token = new CustomToken(this.input, base.state.type,
base.state.channel, base.state.tokenStartCharIndex, this.CharIndex - 1);
    token.Line = base.state.tokenStartLine;
    token.Text = base.state.text;
    token.CharPositionInLine = base.state.tokenStartCharPositionInLine;
    this.Emit(token);
    return token;
}

3) Added "rewrite" method to the parser:
private void ParseAs(CustomToken start, string text)
{
    start.ParseAs(text);

    var stop = (CustomToken)input.LT(-1);
    for (int i = start.TokenIndex + 1; i <= stop.TokenIndex; ++i)
    {
        var token = (CustomToken)input.Get(i);
        token.ParseAs("");
    }
}

4) Set grammar option:
TokenLabelType = CustomToken;


Usage:

assignment
        : ID '=' INT { ParseAs($assignment.start,
"<assignment>"); }
        ;



-----Original Message-----
From: Lorenzo de Lara [mailto:[hidden email]]
Sent: Tuesday, May 11, 2010 4:35 PM
To: Chrobot, Stefan
Cc: [hidden email]
Subject: Re: [antlr-interest] ANTLR performance

I have noticed the same thing with rewrite=true and came upon this bug
report from 2008, which is currently still open:

http://www.antlr.org/jira/browse/ANTLR-371

The problem is parsers with rewrite rules run in non-linear time on any
inputs above a few hundred rewrites. I've verified this in both Java and
C#. You can verify this for yourself by commenting out your rewrite
rules and running the parser and observing much closer to linear
runtime. (5 minutes with rewrite rules on vs. 5 seconds rewrite rules
off on a typical 1500 line input for us) The offending method is
GetKindOfOps in TokenRewriteStream taking up to 100% of the runtime
according to a Java profiling tool.

I've implemented the proposed fix (in Java) which does away with calling
GetKindOfOps completely and can confirm it does result in much more
reasonable, linear-like performance, without introducing any new
problems, as far as I can tell.

-Lorenzo

On 2010-05-11, at 5:17 , Chrobot, Stefan wrote:

Hi,



I'm using ANTLR with the C# target. The generated parser performs too
slow for my needs. My grammar uses k = 6.

Does it have a performance impact? What value should I target to get
optimum performance - 1 or *? Would changing the grammar to 1/* give
significant performance boost?





Stefan


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address