Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't try killing processes if we already know the command finished + reduce error logging noise on Windows #4231

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

brandondong
Copy link
Contributor

  • PR Description

I was tinkering around with the code and then checking associated logs but even with LOG_LEVEL=error, I found there was a lot of noise on Windows.

This PR fixes two such sources:

  1. Navigating through files in the Files panel
  2. Navigating through branches in the Branches panel when there are a lot of commits (e.g. this repo)

More details in the comments below.

  • Please check if the PR fulfills these requirements
  • Cheatsheets are up-to-date (run go generate ./...)
  • Code has been formatted (see here)
  • Tests have been added/updated (see here for the integration test guide)
  • Text is internationalised (see here)
  • If a new UserConfig entry was added, make sure it can be hot-reloaded (see here)
  • Docs have been updated if necessary
  • You've read through your own file changes for silly mistakes etc

// and the user is flicking through a bunch of items.
self.throttle = time.Since(startTime) < THROTTLE_TIME && timeToStart > COMMAND_START_THRESHOLD
if err := oscommands.Kill(cmd); err != nil {
if !strings.Contains(err.Error(), "process already finished") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When navigating through files in the Files panel, we always enter this Kill branch. It doesn't matter how slowly this is done. Even if the previous diff command completed however long ago, we will still try killing its (now gone) pid.

There is a string check here to suppress kill failure logs due to this reason but on Windows, the string is different (fails in GetCreationTime -> GetWindowsHandle -> error: "The parameter is incorrect.").

Fix: Don't try killing processes needlessly if we already know the command finished.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may be the likely culprit behind #3008.

In that thread, the theory proposed is:

  1. Unrelated process X spawns process Y.
  2. Process X exits.
  3. Lazygit spawns a git process with the same pid as X.
  4. Lazygit kills the running git process and then process Y as collateral due to process Y having ppid = X.

However, this doesn't seem to me like it matches up with the observation in that thread that this usually only happens after lazygit has been running in the background for a while. Why would it be any less likely to happen upon lazygit starting and killing its first git command if it's due to unrelated long-running processes with missing parents?

By contrast, the above issue does match up well with the symptoms. Imagine:

  1. lazygit is started and runs git diff in process X which completes immediately and exits.
  2. lazygit is left in the background for several hours by which process X pid is reused by an unrelated process.
  3. lazygit is focused back on and runs another git diff. It first runs this stop logic which will kill process X and its children.

Checking for children creation time won't help here if these are legitimate child processes of unrelated process X.

@@ -269,9 +277,13 @@ func (self *ViewBufferManager) NewCmdTask(start func() (*exec.Cmd, io.Reader), p
refreshViewIfStale()

if err := cmd.Wait(); err != nil {
// it's fine if we've killed this program ourselves
if !strings.Contains(err.Error(), "signal: killed") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When navigating through branches with a lot of commits in the Branches panel, we always enter this error branch. The process will still be running indefinitely due to the buffered git log output so when we run the next git log, the previous process will be killed.

There is a string check here to suppress the failure logs due to this reason but on Windows, the string is different ("exit status 1").

Fix: Check whether we sent a kill signal and if so, suppress the error logs.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we've made this change, do we still need to check on signal: killed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw another usage of oscommands.Kill in RunAndProcessLines when the command decides it's time to stop reading. I think that could be another source of the kill signal here and should still be suppressed in logs on non-Windows where the string match is possible.

@stefanhaller stefanhaller added the bug Something isn't working label Feb 4, 2025
@stefanhaller
Copy link
Collaborator

Makes sense to me. I'm not super familiar with the intricacies of the concurrency handling of this area of the code, so I'd like @jesseduffield to have a look too.

One thought about the commit history: it seems that the two changes in this commit are independent, so I'd prefer them to be two separate commits. And your github comments explaining the rationale for them could then be the commit messages, so that people blaming the code in the future don't have to go to github to learn about why the changes were made.

I think this may be the likely culprit behind #3008.

Sounds plausible to me. Shouldn't the PR title be reworded then? It's no longer just about reducing logging noise, but about fixing an actual bug.

This may lead to unrelated processes being killed on Windows (jesseduffield#3008). Imagine:
1. lazygit is started and runs git diff in process X which completes immediately and exits.
2. lazygit is left in the background for several hours by which process X pid is reused by an unrelated process.
3. lazygit is focused back on and runs another git diff. It first runs this stop logic which will kill process X and its children.
There is a string check here to suppress the failure logs due to this reason but on Windows, the string is different ("exit status 1").
@brandondong brandondong changed the title Reduce error logging noise on Windows Don't try killing processes if we already know the command finished + reduce error logging noise on Windows Feb 5, 2025
@stefanhaller
Copy link
Collaborator

Nice, thanks for cleaning up the history. LGTM, I think this is ready to merge, but I'd still like @jesseduffield to have a look too.

@stefanhaller stefanhaller added the blocks-release Must be addressed before we can cut the next release label Feb 6, 2025
Copy link
Owner

@jesseduffield jesseduffield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. One question

@@ -269,9 +277,13 @@ func (self *ViewBufferManager) NewCmdTask(start func() (*exec.Cmd, io.Reader), p
refreshViewIfStale()

if err := cmd.Wait(); err != nil {
// it's fine if we've killed this program ourselves
if !strings.Contains(err.Error(), "signal: killed") {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we've made this change, do we still need to check on signal: killed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocks-release Must be addressed before we can cut the next release bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants