Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reformatting all grammars #3843

Merged
merged 5 commits into from
Nov 29, 2023
Merged

Reformatting all grammars #3843

merged 5 commits into from
Nov 29, 2023

Conversation

mike-lischke
Copy link
Member

@mike-lischke mike-lischke commented Nov 22, 2023

This patch applies common formatting rules to all grammars in the repository. Naturally, it's very large - probably the largest patch you will ever get 😅 and if you don't like that, I can open multiple PRs. However, that will not lower the work of reviewing the changes (quite the opposite).

All grammars also got the used formatting rules attached, so if you want, you can do your own formatter run using antlr-format-cli.

@mike-lischke
Copy link
Member Author

Looks like there are 2-3 grammars that got errors. I'll investigate over the weekend.

@kaby76
Copy link
Contributor

kaby76 commented Nov 23, 2023

Looks like there are 2-3 grammars that got errors. I'll investigate over the weekend.

antlr/antlr4 failed because the .errors needs to be remastered. The example .g4's were reformatted.

apt failed because there's a symbol conflict with "options" in the first rule -- it's a keyword. The Antlr4 tool does not seem to have a problem with "options=" but does with "options =". Bug in the Antlr4 Tool.

For sql/tsql, the error is again a symbol conflict. "options+=" vs "options +=". Bug in Antlr4 Tool.

@mike-lischke
Copy link
Member Author

Ah thanks Ken, for the analysis. Shall I revert the formatting of the example .g4's or rather adjust what you call "the .errors" (no idea what that is)? For the other 2: we can just ignore them? I wonder why these symbol conflicts do not show up in other PRs, but maybe that is because these specific grammars are seldomly touched...

@kaby76
Copy link
Contributor

kaby76 commented Nov 23, 2023

Ah thanks Ken, for the analysis. Shall I revert the formatting of the example .g4's or rather adjust what you call "the .errors" (no idea what that is)? For the other 2: we can just ignore them? I wonder why these symbol conflicts do not show up in other PRs, but maybe that is because these specific grammars are seldomly touched...

I guess just revert the formatting for the examples. The .errors files is the output from the parse. The example file itself has an error in it, and we're just checking that the parser works the same across all targets and hasn't changed from one version of the tool to the next.

The symbol conflict wasn't detected because "options=" gets through the Antlr4 tool, but "options =" does not. I think just rename the element label "options" to "options_" since the grammar should work regardless of formatting.

I added an Issue over in the antlr4 repo to have this corrected at some point. antlr/antlr4#4474

@mike-lischke
Copy link
Member Author

I reverted the examples and fixed the symbol conflicts. However, I don't fully understand why options is actually a conflict. Is that target language specific? I saw the JS target fail, but don't remember options has a special meaning there.

@kaby76
Copy link
Contributor

kaby76 commented Nov 25, 2023

I reverted the examples and fixed the symbol conflicts. However, I don't fully understand why options is actually a conflict. Is that target language specific? I saw the JS target fail, but don't remember options has a special meaning there.

It's not target specific. It's actually from the Antlr4 tool. It's kind of hard to see the error, but the build error is here: https://github.com/antlr/grammars-v4/actions/runs/6988622530/job/19016369650#step:22:1336. The Antlr4 tool just doesn't like the grammar with the space after the "options", whereas before reformatting there was no space after "options" and the tool seems to like the grammar.

@mike-lischke
Copy link
Member Author

Simply jaw-dropping how long the validation takes in this repo... There are still a few failures, where I don't know what to do. It believe this patch is fine.

@kaby76
Copy link
Contributor

kaby76 commented Nov 25, 2023

Simply jaw-dropping how long the validation takes in this repo... There are still a few failures, where I don't know what to do. It believe this patch is fine.

Changing every grammar more or less means every grammar is tested for every target. And even though the work is split up on dozens of machines, a lot of the grammars--frankly--suck. That's because they contains terrible and numerous ambiguities and many fall back to full context. In some ways, this is actually a good thing because it tested a lot of the ATN interp engine code for the parser and lexer.

Believe or not, the testing is better nowadays than a year or two ago because we now only test those grammars that change in a PR, not the whole 350+ grammars. The MacOS servers on Github Actions are still slow, but better.

So, here's a rundown on the failures.

  • asm/ptx/ptx-isa-1.0 with Cpp: NAN is a symbol conflict for Cpp--apparently now for MacOS.
  • ./icon ./lark .sieve ./smtlibv2 ./wavefront with Go target--symbol conflict apparently now with "start" as a parser rule.
  • ./scala with Go target--timeout. This grammar is super slow. Go target should be removed from desc.xml to indicate it doesn't work.
  • elixir with PHP target--stack overflow. The PHP has logical errors in the code. I tried to address some of the problems, but it was too much, and needed to work on other things. Fix for #12 -- Allowed memory size of 134217728 bytes exhausted. antlr-php-runtime#34

I think it's fine to go. I'll clean up these grammars at some point.

@mike-lischke
Copy link
Member Author

Hmm, probably some of the grammars should be removed if they don't work. It's great to have such an exhaustive test framework, but taking hours to run is really a pain. Anyway, we cannot fix all problems in a single PR.

@teverett any objections to merge this PR?

@teverett
Copy link
Member

@mike-lischke no objection to merging the PR. I prefer not to remove any grammars however. Thanks for this contribution.

I know the new QA tools @kaby76 has worked on are great, and have the ability to exclude grammar tests for certain combinations of grammar and language. Can we use those to exclude the grammars that are having trouble so that we have a clean build?

@mike-lischke
Copy link
Member Author

Right, we should not do anything in addition to this already large patch. @kaby76 can you please review the patch? Thanks!

@kaby76
Copy link
Contributor

kaby76 commented Nov 26, 2023

Right, we should not do anything in addition to this already large patch. @kaby76 can you please review the patch? Thanks!

It's fine. I created an issue to keep track of correcting these grammars in a separate PR. It's best to get the bulk of the changes in before git conflicts start to become an issue. The code looks great. Nice and consistent.

@mike-lischke
Copy link
Member Author

Is a formal approval required here to merge a patch (i.e. via the review process)? If not @teverett please proceed.

@teverett
Copy link
Member

Can we correct the buld errors before merging this?

@mike-lischke
Copy link
Member Author

That would require to add changes unrelated to the patch. As I understand it these errors existed before already, but some did not show until now ("options=") or have been ignored (?)(slow PHP and Go grammars, stack overflow). I assume fixing those issues will take quite a while and I cannot do that. Additionally, with every new patch you merge I have to touch this PR again to fix merge conflicts.

I would prefer to merge this PR, because it is not really the reason why some tests fail. But if someone who knows the nature of these problems better than me (@kaby76?) volunteers to fix them I'm ready to wait.

@kaby76
Copy link
Contributor

kaby76 commented Nov 27, 2023

Yes, these are problems that existed before this PR, #3847 I can make a PR asap to fix these problems. Then, you can merge the changes in to your PR.

@kaby76
Copy link
Contributor

kaby76 commented Nov 27, 2023

Yes, these are problems that existed before this PR, #3847 I can make a PR asap to fix these problems. Then, you can merge the changes in to your PR.

The PR containing the fixes is #3849 . It contains changes for these grammars: apt/ asm/ptx/ptx-isa-1.0/ elixir/ icon/ lark/ scala/ smtlibv2/ sql/tsql/ wavefront/

@mike-lischke Should I reformat the grammars with your tool in my PR, or leave as is before your PR?

@mike-lischke
Copy link
Member Author

mike-lischke commented Nov 27, 2023

This patch is simpler than I expected. Let me just cherry pick it into my PR here.

@kaby76
Copy link
Contributor

kaby76 commented Nov 27, 2023

Wow, the build is taking a while. Maybe I should fork a job for each OS x target x grammar x build-script, but I don't know what Github Actions would do with 3 OSes * 8 targets * 350 grammars * 2 build scripts + 1 static workflow for testing grammar issues = 16801 jobs being submitted all at once?!

@mike-lischke
Copy link
Member Author

That's what I meant above :-D Maybe a job matrix helps?

@kaby76
Copy link
Contributor

kaby76 commented Nov 27, 2023

That's what I meant above :-D Maybe a job matrix helps?

It is already in a job matrix. Currently, the matrix is 3 OSes x 8 targets x 2 build scripts (bash and pwsh). Each job tests whatever grammars were changed, with a git diff --name-only to select which grammars to test. So, that's why each job takes a long time, ~2 hours. Maybe I should try parallelize each grammar and serialize testing of all 8 targets in each job. So, instead the max jobs would be 3 OSes x 350 grammars = 1050 jobs and run only the pwsh test.

Damn, now why the fail? Looking... Ah, I see, you need to remove the "Go;" from scala/desc.xml so that testing is not done for that target.

I now see Github Actions has a limit of the number of concurrent jobs. https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration#usage-limits . There goes that idea.

@kaby76
Copy link
Contributor

kaby76 commented Nov 27, 2023

Sorry, I just realized I didn't set cancel-in-progress. That's a big reason why the build is so slow. #3851

@mike-lischke
Copy link
Member Author

mike-lischke commented Nov 27, 2023

¯\_(ツ)_/¯

@mike-lischke
Copy link
Member Author

We are close now. Only PHP on Ubuntu failed. @kaby76 can you please take another look what's wrong this time?

@kaby76
Copy link
Contributor

kaby76 commented Nov 28, 2023

The difference between the last clean build of this PR (here) and the last build (here) is because we now have my PR build change testing PHP on Ubuntu with Powershell. There's a bug I now see in the Powershell script template, so I cannot tell why PHP failed for grammars ./ada/ada2012 ./bicep ./c ./sql/hive/v4. The grammars work for Ubuntu using the Bash script, so it's likely there is something wrong with the Powershell script or the environment. I just checked each of these grammars on my Ubuntu machine, and they all pass.

I suggest we merge this PR in ASAP so that additional merges/rebase DO NOT cause more issues.

@kaby76
Copy link
Contributor

kaby76 commented Nov 28, 2023

Oh geezus H.... Sorry, my bug.

The environment is not setting up PHP for Powershell. So, I have no idea what PHP it is running. Works fine in Bash, where it does this step. There are no steps to install and test the version of PHP in the Powershell run https://github.com/antlr/grammars-v4/actions/runs/7010383029/job/19070929695. Yeah, sorry.

Let's get this in ASAP @teverett . I'll fix the workflow in a separate PR.

@mike-lischke
Copy link
Member Author

So, shall we wait for your fix for PHP or still merge?

@kaby76
Copy link
Contributor

kaby76 commented Nov 28, 2023

So, shall we wait for your fix for PHP or still merge?

Let's get your PR in ASAP because it's huge, and may get whacked by another merge. I'll clean up the workflow issue afterwards. The grammars work fine for PHP; the Bash and Powershell tests are redundant. (I took the Bash scripts, asked ChatGPT to translate them to yucky Powershell, fixed up the Powershell scripts because Chat doesn't know what it's doing.)

@kaby76
Copy link
Contributor

kaby76 commented Nov 28, 2023

So, I fixed this problem. @mike-lischke If you want to rebase, that's up to you. Or, @teverett could merge my PR in first, then merge PR #3843 (there shouldn't be any merge conflicts) and it should all build and test fine. Or, just merge #3843 first, ignore the error with PHP/Ubuntu, and then PR #3857,

@mike-lischke
Copy link
Member Author

I'd prefer to merge my patch first.

@teverett
Copy link
Member

Well, there is a merge conflict now with

sql/mysql/Positive-Technologies/MySqlParser.g4

@mike-lischke
Copy link
Member Author

mike-lischke commented Nov 28, 2023

@harveyyue committed something, even though we said we don't want any commit until this patch is in. How can that be? How can he push something without a review process?

@teverett What if someone else decides to commit while we are waiting for the next validation and then the next and the next? Where should that end? I assumed you are the owner of this repo and have the last word what goes in 🤔

With this patch we introduce a common set of rules for the grammars in this repository. All future changes in a grammar must follow these rules.
They serve special purposes and are not normal grammars.
These have been there for a while already, but were never detected due to a bug in ANTLR.
@kaby76
Copy link
Contributor

kaby76 commented Nov 28, 2023

@harveyyue committed something, even though we said we don't want any commit until this patch is in. How can that be? How can he push something without a review process?

@teverett What if someone else decides to commit while we are waiting for the next validation and then the next and the next? Where should that end? I assumed you are the owner of this repo and have the last word what goes in 🤔

@KvanTTT merged PR #3858, and doesn't seem to be in the loop on this PR. Hopefully, no more merges into this repo until this PR is merged first.

@mike-lischke
Copy link
Member Author

OK, my apologies for the wrong accusation. It wasn't clear to me from the patch that this was a merged PR.

@mike-lischke
Copy link
Member Author

In addition to the already mentioned PHP on Ubuntu problem now the Dart on macOS timed out. How to proceed @teverett? Maybe you can restart the task and maybe this time it does not time out. Or just merge straight away?

@kaby76
Copy link
Contributor

kaby76 commented Nov 29, 2023

The dart2 problem is a network issue. Github actions are pounding on servers for dart libraries and they cannot keep up with the requests. Sometimes builds work sometimes not because of load. We've seen this many times before. Merge please.

@teverett
Copy link
Member

@kaby76 @mike-lischke thanks so much for this.

@teverett teverett merged commit 7535367 into antlr:master Nov 29, 2023
44 of 47 checks passed
@mike-lischke mike-lischke deleted the reformat branch November 29, 2023 15:20
: GRANT OWNERSHIP (
ON (
object_type_name object_name
| ALL object_type_plural IN ( DATABASE id_ | SCHEMA schema_name)
Copy link
Contributor

@mlorek mlorek Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mike-lischke any reason why the space characters after ( and before DATABASE were kept?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, no specific reason actually. Might be a formatter bug. I believe that's coming up when an alt is placed entirely on one line and contains groups.

Copy link
Contributor

@kaby76 kaby76 Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why the space characters after ( DATABASE were kept?

@mlorek Say again? The space characters after the string "( DATABASE"???

Line 263:

| ALL object_type_plural IN ( DATABASE id_ | SCHEMA schema_name)

The grammar must have at least one space after the string ( DATABASE, which is the same as one space after DATABASE, which is the space currently at line 263, column 51. And, in fact, there is exactly one space at line 263, column 51 not multiple.

https://github.com/antlr/grammars-v4/blob/c36b614c1f0882ab54ecbb6dc608ac646b3a3cc6/sql/snowflake/SnowflakeParser.g4#L263C51-L263C52

Otherwise, the code would be DATABASEid_, which is surely wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sry, space character after ( and before DATABASE

| ( schema_privileges | ALL PRIVILEGES?) ON ( FUTURE SCHEMAS IN DATABASE id_)
| (schema_object_privileges | ALL PRIVILEGES?) ON (
object_type object_name
| ALL object_type_plural IN ( DATABASE id_ | SCHEMA schema_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here (

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants