Safe memoization during parse graph flattening #2136

PieterOlivier · 2025-01-30T15:50:34Z

This PR improves the memoization strategy during parse graph flattening. The original proved both incorrect and too slow in the context of the highly ambiguous and cyclic parse forests produced by error recovery.

Special thanks to @arnoldlankamp who came up with the following three heuristics specifying when nodes can be safely cached for memoization:

Any non-zero length node that is part of a link with a non-zero length prefix.
Any node, which is a part of a prefix for which the above above holds true.
Any nullable node that is part of a link for which all prefixes adhere to rule one at some point.

This PR implements a new caching strategy based on these heuristics.

We have gone through great lengths to test the correctness and performance of the code in this PR.
We already had a test suite that tested error recovery on all characters in all Rascal source files in the rascal repo using
two tests: delete the single character and delete all characters until the end-of-line.

We modified these tests to:

First try to do the error recovery parse without memoization during flattening
If this succeeded within two seconds, do the same with memoization
Compare the results

In all cases where the parse without memoization succeeded, the parse with memoization resulted in
exactly the same tree as the tree build without memoization. This in contrast with the old memoization approach where in the same test the result with memoization was often different from the result without memoization.

We have also setup a performance benchmark to compare the speed of "normal" parsing (no error recovery and no ambiguities)
of all Rascal source files in the rascal repo with the speed before this PR. We could not find any speed differences between
the two version, so we are confident this PR does not degrade performance of "normal" parses.

We can cross the 2^31 boundary because of sharing.

Config object replaces ever growing list of parameters

…moization

…ctoring

…elimination

codecov · 2025-01-30T15:55:13Z

Codecov Report

Attention: Patch coverage is 47.84314% with 133 lines in your changes missing coverage. Please review.

Project coverage is 49%. Comparing base (8c405fc) to head (d647f70).
Report is 254 commits behind head on feat/error-recovery.

Files with missing lines	Patch %	Lines
src/org/rascalmpl/library/util/ErrorRecovery.java	0%	66 Missing ⚠️
...c/org/rascalmpl/parser/gtd/result/struct/Link.java	71%	11 Missing and 9 partials ⚠️
...rg/rascalmpl/parser/util/ParseStateVisualizer.java	0%	15 Missing ⚠️
src/org/rascalmpl/parser/util/DebugUtil.java	0%	14 Missing ⚠️
src/org/rascalmpl/util/visualize/dot/DotGraph.java	0%	9 Missing ⚠️
...pl/parser/gtd/result/out/DefaultNodeFlattener.java	90%	2 Missing and 1 partial ⚠️
...ascalmpl/parser/gtd/result/out/INodeFlattener.java	50%	1 Missing and 1 partial ⚠️
...ser/gtd/result/out/ListContainerNodeFlattener.java	91%	2 Missing ⚠️
...pl/parser/gtd/result/out/SkippedNodeFlattener.java	0%	0 Missing and 1 partial ⚠️
...ser/gtd/result/out/SortContainerNodeFlattener.java	92%	1 Missing ⚠️

Additional details and impacted files

@@                  Coverage Diff                   @@
##             feat/error-recovery   #2136    +/-   ##
======================================================
  Coverage                     49%     49%            
+ Complexity                  6619    6572    -47     
======================================================
  Files                        687     696     +9     
  Lines                      61218   61043   -175     
  Branches                    8874    8910    +36     
======================================================
+ Hits                       30369   30383    +14     
+ Misses                     28620   28410   -210     
- Partials                    2229    2250    +21

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

DavyLandman · 2025-01-31T08:41:35Z

Hi @arnoldlankamp, your comments on #2100 really helped out. Do you have some time to take a look at this? Thanks.

ps. @PieterOlivier I think this means you should close #2100 ?

PieterOlivier · 2025-01-31T09:22:58Z

ps. @PieterOlivier I think this means you should close #2100 ?

Definitely, I have done so now.

arnoldlankamp · 2025-02-04T23:16:51Z

Hi @arnoldlankamp, your comments on #2100 really helped out. Do you have some time to take a look at this? Thanks.

I'll have a look at it as soon as I'm able.

PieterOlivier added 17 commits January 10, 2025 09:33

Started working on safe node memoization

5642993

Implemented safe node memoization

89aaee6

Removed CycleMark stuff, fixed bugs in safe memoization

413e091

Now using long instead of int to count nodes

9d2c4c5

We can cross the 2^31 boundary because of sharing.

Improved caching ratio

b291504

Implemented sharing for nodes with side effects

97752e1

Fixed safe memoization issue

3dd4c58

Refactored RecoveryTestSupport to support a config object

96924b5

Config object replaces ever growing list of parameters

Refactored caching logic

4961516

Fixed compiler error

0a90eeb

Fixed warning

5459c1a

Added standard benchmark test

17c8c09

Replaced flattener map lookup with field in AbstractNode

2cd67ab

Merge branch 'recovery/recover-all-productions' into recovery/safe-me…

8a8dbc1

…moization

Merge branch 'recovery/safe-memoization' into recovery/cacheable-refa…

93a9f58

…ctoring

Merge branch 'recovery/cacheable-refactoring' into recovery/memo-map-…

db51968

…elimination

Added parse benchmark

d0f1b98

Cleaned up code for PR

d647f70

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safe memoization during parse graph flattening #2136

Safe memoization during parse graph flattening #2136

PieterOlivier commented Jan 30, 2025 •

edited

Loading

codecov bot commented Jan 30, 2025 •

edited

Loading

DavyLandman commented Jan 31, 2025

PieterOlivier commented Jan 31, 2025 •

edited

Loading

arnoldlankamp commented Feb 4, 2025

Safe memoization during parse graph flattening #2136

Are you sure you want to change the base?

Safe memoization during parse graph flattening #2136

Conversation

PieterOlivier commented Jan 30, 2025 • edited Loading

codecov bot commented Jan 30, 2025 • edited Loading

Codecov Report

DavyLandman commented Jan 31, 2025

PieterOlivier commented Jan 31, 2025 • edited Loading

arnoldlankamp commented Feb 4, 2025

PieterOlivier commented Jan 30, 2025 •

edited

Loading

codecov bot commented Jan 30, 2025 •

edited

Loading

PieterOlivier commented Jan 31, 2025 •

edited

Loading