-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cache ATNConfig hashCode in Java, C#, Python and JavaScript #4546
Conversation
Signed-off-by: Eric Vergnaud <[email protected]>
@kaby76 Do you have a benchmark we could use to see if this change genuinely improves performance, and at what cost (memory) ? |
I don't have any scripts offhand to compare the performance between before and after versions of the Antlr runtime. It'll have to be done by hand, or a script developed. But, trgen could be used to run the test. Let me write something up. |
For grouped parsing of the 373 files for sql/plsql, 4.13.1 CSharp target vs your PR, there was no improvement. In fact, it was slightly worse on a couple of runs. I double checked everything. I think this may be because Net Core likely caches hash codes. It may help with TypeScript, but I haven't yet checked it. |
Thanks. Are you able to run this against other targets ?Envoyé de mon iPhoneLe 4 mars 2024 à 21:54, Ken Domino ***@***.***> a écrit :
@kaby76 Do you have a benchmark we could use to see if this change genuinely improves performance, and at what cost (memory) ?
For grouped parsing of the 373 files for sql/plsql, 4.13.1 CSharp target vs your PR, there was no improvement. In fact, it was slightly worse on a couple of runs. I double checked everything.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
|
I don't see any improvements with the TypeScript target either for the cpp grammar. Have you seen a perf increase with these changes? |
No, I haven’t checked.Looks like the perf improvement achieved by antlr4ng is caused by something else…It’s significantly faster during warmup but slightly slower later so not amazing but still worth understanding…Thanksn.b. I think it’d be great to put together some benchmark so we can compare successive versions of antlr5 (and also vs antlr4)Envoyé de mon iPhoneLe 5 mars 2024 à 03:18, Ken Domino ***@***.***> a écrit :
I don't see any improvements with the TypeScript target either for the cpp grammar.
Have you seen a perf increase with these changes?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Supposedly faster. But, I cannot reproduce Mike's result. Here's my code in a zip file. It doesn't run any faster at least how I'm building and running the code. But, nobody ever seems to want to write tools to generate a working example start to finish of a parser with driver from a grammar. Maybe there may be special settings and versions of node that are presumed to perform optimizations? In my generated example for java/java/, Antlr4 parses the 14 files in grouped parse mode aka "warm up" faster than Antlr4ng.
People can make claims, but unless a .zip file is provided that includes the source, drivers, build files, etc etc etc, I don't trust it. There is my code. Why is Antlr4ng slower in it? Someone, tell me what is wrong with this code or prove with a .zip that I can run showing the claim. |
Maybe it's only faster for the MySQL grammar... |
I will check it for the mysql grammar. |
Here's a run of the mysql parser grammar on bitrix_queries_cut.sql. The parse is performed twice on same input. The first time is the "before warm up", the second time is the "after warm up":
Antlr4ng is slower than the original 4.13.1 TypeScript runtime. Note, the Oracle mysql grammar is not in grammars-v4. This is the alternative, non-Oracle mysql grammar. But, in all the tests I've done, Antlr4ng 3.0.3/Antlr4ng-cli 2.0.0 is slower than Antlr 4.13.1 TypeScript. I'm not sure what to make of the claim at this point. Going back to https://github.com/mike-lischke/antlr4ng?tab=readme-ov-file#running-the-tests , I try to replicate his tests, but it doesn't work at all:
|
I'm now able to reproduce the claimed speed-up for Antlr4ng. Apparently, one should not use version 16.x of NodeJS. You must use at least version 20.x in order to get the speed up. That was the only thing I changed, and afterwards, I see Antlr4ng now works much faster as claimed.
|
I don't know how to test this PR because antlr4 is in an npm package which the package.json references. I don't know how to replace the package reference with a copy of the JavaScript/TypeScript source tree. I get
In addition, there are probably other changes that should be considered. mike-lischke/antlr4ng@c5c8bea |
Well done on working that out!
I really think that the sql grammar should not be considered as a model for optimizations, especially when a change speeds up that grammar's generated code but slows down "real" grammars. We should use the SQL grammar as an example of how not to write grammars.
In general programming we think about the worst case scenarios of course, but here we should be rewarding good grammars, and not worrying about the very poor ones. I got into the saem loop with the Go runtime, then realized changes to make the execution time of that grammar better wer of little use to anyone else, so I stopped.
Analysis is still useful though as whatever advantage was discovered by using the latest node.js may be useful elsewhere and for more useful grammars.
My ten cents at least.
Jim
…On Mar 5 2024, at 5:04 am, Ken Domino ***@***.***> wrote:
I'm now able to reproduce the claimed speed-up for Antlr4ng. Apparently, one should not use version 16.x of NodeJS. You must use at least version 20.x in order to get the speed up.
$ bash run.sh ../examples/bitrix_queries_cut.sql ../examples/bitrix_queries_cut.sql
TypeScript 0 ../examples/bitrix_queries_cut.sql success 155.915
TypeScript 1 ../examples/bitrix_queries_cut.sql success 146.365
Total Time: 302.294
03/05-05:51:49 ~/test/grammars-v4/sql/mysql/Positive-Technologies/Generated-TypeScript
$ pushd
~/test/grammars-v4/sql/mysql/Positive-Technologies/Generated-Antlr4ng ~/test/grammars-v4/sql/mysql/Positive-Technologies/Generated-TypeScript
03/05-06:01:09 ~/test/grammars-v4/sql/mysql/Positive-Technologies/Generated-Antlr4ng
$ bash run.sh ../examples/bitrix_queries_cut.sql ../examples/bitrix_queries_cut.sql
TypeScript 0 ../examples/bitrix_queries_cut.sql success 74.543
TypeScript 1 ../examples/bitrix_queries_cut.sql success 49.375
Total Time: 123.935
03/05-06:03:28 ~/test/grammars-v4/sql/mysql/Positive-Technologies/Generated-Antlr4ng
$ node --version
v20.11.1
03/05-06:03:36 ~/test/grammars-v4/sql/mysql/Positive-Technologies/Generated-Antlr4ng
$
—
Reply to this email directly, view it on GitHub ***@***.***/0?redirect=https%3A%2F%2Fgithub.com%2Fantlr%2Fantlr4%2Fpull%2F4546%23issuecomment-1978504607&recipient=cmVwbHkrQUFKN1RNQVdHQlRUWTNEU1YzUE1GVDZENkxPU1hFVkJOSEhJQ0xaU0E0QHJlcGx5LmdpdGh1Yi5jb20%3D), or unsubscribe ***@***.***/1?redirect=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAJ7TMDDOQZDM7SSLA4UWSLYWWRCXAVCNFSM6AAAAABEFTHQFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZYGUYDINRQG4&recipient=cmVwbHkrQUFKN1RNQVdHQlRUWTNEU1YzUE1GVDZENkxPU1hFVkJOSEhJQ0xaU0E0QHJlcGx5LmdpdGh1Yi5jb20%3D).
You are receiving this because you are subscribed to this thread.
|
Here’s how to do it from the console:cd to the antlr folder containing package.jsonnpm linkcd to the program folder that uses antlrnpm link antlr4Envoyé de mon iPhoneLe 5 mars 2024 à 12:34, Ken Domino ***@***.***> a écrit :
I don't know how to test this PR because antlr4 is in an npm package which the package.json references. I don't know how to replace the package reference with a copy of the JavaScript/TypeScript source tree. I get Cannot find module 'antlr4' or its corresponding type declarations..
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
|
I tried out the current 4.13.1 CSharp and TypeScript targets along with the Antlr4ng TypeScript target on the java/java grammar, which is one of our better grammars. I ran the parser on grouped parse mode with the 14 input files. The Antlr4ng is on par with the CSharp target. Yes, Antlr4ng is pretty good. I haven't yet tested this PR, and I haven't figured out why Antlr4ng is so much better, and why with Node 20. (Ran 10 times, bar graph computed using Octave, errors are 1 SD; test.sh.txt) |
The fact that it requires Node 20 is evidence that it relies on a recent improvement in V8.
I’ll dig into this step by step.
Is the below test.sh sufficient to run your tests ?
… Le 6 mars 2024 à 14:16, Ken Domino ***@***.***> a écrit :
I tried out the current 4.13.1 CSharp and TypeScript targets along with the Antlr4ng TypeScript target on the java/java grammar, which is one of our better grammars. I ran the parser on grouped parse mode with the 14 input files. The Antlr4ng is on par with the CSharp target. Yes, Antlr4ng is pretty good.
I haven't yet tested this PR, and I haven't figured out why Antlr4ng is so much better, and why with Node 20.
java.svg (view on web) <https://github.com/antlr/antlr4/assets/11389451/a3b243d3-eb08-4de9-b882-dea1b128c2ff>
(Ran 10 times, bar graph computed using Octave, errors are 1 SD; test.sh.txt <https://github.com/antlr/antlr4/files/14510311/test.sh.txt>)
—
Reply to this email directly, view it on GitHub <#4546 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZNQJDV76HRXL5E6MFL2UDYW4JKRAVCNFSM6AAAAABEFTHQFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBQHA2TGNJVGE>.
You are receiving this because you authored the thread.
|
The test.sh is what I wrote to collect and print the results. But there are dependencies. Trash, dotnet, octave, etc. |
The Antlr4ng code actually contains a number of optimizations which look significant.
Since Antlr4ng and Antlr 4.13.1 TypeScript are so different in implementation, I wanted to check parser tracing and parser atn tracing. Unfortunately, The TypeScript 4.13.1 API does not define setTrace(), and the Antlr4ng does not define the parser atn tracing flag. In fact, Mike removed all that incredibly valuable code for tracing. (People might not use it, but I do, and it really works in sorting out bugs in the API.) So, I cannot directly compare ATN config sets between CSharp and Antlr4ng, nor parser tracing between CSharp and TypeScript. But, between the stuff I can compare, the parsers across CSharp, TypeScript 4.13.1, and Antlr4ng work identically. I just wish both setTrace() and trace_atn_sim were available everywhere. |
I finally figured out how to test this PR. It does not result in any improvement for TypeScript runtime. I have a drop-in replacement of BitSet.js adapted from Antlr4ng. It does improve on performance, but it is only ~5%. |
5% is enormous! I'm closing this PR. Do you want to submit a PR with BitSet.js ? (It's missing unit tests and newbitCount would gain providing references to the algorithm) |
If 5% is good, then you'll like the 20% increase with this drop-in replacement for HashSet.js (with MurmurHash.js as a required dependency). I'm thinking HashMap will see an additional performance increase. I'm thinking why Antlr4? Why not put these changes in Antlr5 alone? |
Because antlr5 uses Kotlin to generate wasm. These optimisations don't make sense there. |
Btw I'm using your BitSet.js code but sticking to the old api for now. I'd rather refactor than add an indirection. |
Yes HashSet is using an object for key/value mappings, not great! |
Shouldn't the JavaScript/TypeScript templates and runtime be removed in Antlr5? And, does it make sense to modify the Antlr4 code any further, vis-a-vis limited resources to develop both Antlr4 and Antlr5, and Mike's work on Antlr4ng? |
We're keeping the TS template as a reference until we have something kinda working in wasm. Then it will be removed. |
This proposal follows findings in mike-lischke/antlr4ng@77d0d47 where caching ATNConfig hashCode seems to improve performance significantly during warmup