-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't try killing processes if we already know the command finished + reduce error logging noise on Windows #4231
base: master
Are you sure you want to change the base?
Conversation
// and the user is flicking through a bunch of items. | ||
self.throttle = time.Since(startTime) < THROTTLE_TIME && timeToStart > COMMAND_START_THRESHOLD | ||
if err := oscommands.Kill(cmd); err != nil { | ||
if !strings.Contains(err.Error(), "process already finished") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When navigating through files in the Files panel, we always enter this Kill branch. It doesn't matter how slowly this is done. Even if the previous diff command completed however long ago, we will still try killing its (now gone) pid.
There is a string check here to suppress kill failure logs due to this reason but on Windows, the string is different (fails in GetCreationTime -> GetWindowsHandle -> error: "The parameter is incorrect.").
Fix: Don't try killing processes needlessly if we already know the command finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may be the likely culprit behind #3008.
In that thread, the theory proposed is:
- Unrelated process X spawns process Y.
- Process X exits.
- Lazygit spawns a git process with the same pid as X.
- Lazygit kills the running git process and then process Y as collateral due to process Y having ppid = X.
However, this doesn't seem to me like it matches up with the observation in that thread that this usually only happens after lazygit has been running in the background for a while. Why would it be any less likely to happen upon lazygit starting and killing its first git command if it's due to unrelated long-running processes with missing parents?
By contrast, the above issue does match up well with the symptoms. Imagine:
- lazygit is started and runs git diff in process X which completes immediately and exits.
- lazygit is left in the background for several hours by which process X pid is reused by an unrelated process.
- lazygit is focused back on and runs another git diff. It first runs this stop logic which will kill process X and its children.
Checking for children creation time won't help here if these are legitimate child processes of unrelated process X.
@@ -269,9 +277,13 @@ func (self *ViewBufferManager) NewCmdTask(start func() (*exec.Cmd, io.Reader), p | |||
refreshViewIfStale() | |||
|
|||
if err := cmd.Wait(); err != nil { | |||
// it's fine if we've killed this program ourselves | |||
if !strings.Contains(err.Error(), "signal: killed") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When navigating through branches with a lot of commits in the Branches panel, we always enter this error branch. The process will still be running indefinitely due to the buffered git log output so when we run the next git log, the previous process will be killed.
There is a string check here to suppress the failure logs due to this reason but on Windows, the string is different ("exit status 1").
Fix: Check whether we sent a kill signal and if so, suppress the error logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we've made this change, do we still need to check on signal: killed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw another usage of oscommands.Kill in RunAndProcessLines when the command decides it's time to stop reading. I think that could be another source of the kill signal here and should still be suppressed in logs on non-Windows where the string match is possible.
Makes sense to me. I'm not super familiar with the intricacies of the concurrency handling of this area of the code, so I'd like @jesseduffield to have a look too. One thought about the commit history: it seems that the two changes in this commit are independent, so I'd prefer them to be two separate commits. And your github comments explaining the rationale for them could then be the commit messages, so that people blaming the code in the future don't have to go to github to learn about why the changes were made.
Sounds plausible to me. Shouldn't the PR title be reworded then? It's no longer just about reducing logging noise, but about fixing an actual bug. |
This may lead to unrelated processes being killed on Windows (jesseduffield#3008). Imagine: 1. lazygit is started and runs git diff in process X which completes immediately and exits. 2. lazygit is left in the background for several hours by which process X pid is reused by an unrelated process. 3. lazygit is focused back on and runs another git diff. It first runs this stop logic which will kill process X and its children.
There is a string check here to suppress the failure logs due to this reason but on Windows, the string is different ("exit status 1").
4c9731d
to
7d50692
Compare
Nice, thanks for cleaning up the history. LGTM, I think this is ready to merge, but I'd still like @jesseduffield to have a look too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. One question
@@ -269,9 +277,13 @@ func (self *ViewBufferManager) NewCmdTask(start func() (*exec.Cmd, io.Reader), p | |||
refreshViewIfStale() | |||
|
|||
if err := cmd.Wait(); err != nil { | |||
// it's fine if we've killed this program ourselves | |||
if !strings.Contains(err.Error(), "signal: killed") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we've made this change, do we still need to check on signal: killed
?
I was tinkering around with the code and then checking associated logs but even with LOG_LEVEL=error, I found there was a lot of noise on Windows.
This PR fixes two such sources:
More details in the comments below.
go generate ./...
)