-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bundle: avoid closing file descriptor twice #1857
base: maint-2.48
Are you sure you want to change the base?
Conversation
Already when introduced in c7a8a16 (Add bundle transport, 2007-09-10), the `bundle` transport had a bug where it would open a file descriptor to the bundle file and then close it _twice_: First, the file descriptor (`data->fd`) is passed to `unbundle()`, which would use it as the `stdin` of the `index-pack` process, which as a consequence would close it via `start_command()`. However, `data->fd` would still hold the numerical value of the file descriptor, and `close_bundle()` would see that and happily close it again. This seems not to have caused too many problems in almost two decades, but I encountered a situation today where it _does_ cause problems: In i686 variants of Git for Windows, it seems that file descriptors are reused quickly after they have been closed. In the particular scenario I faced, `git fetch <bundle> <ref>` gets the same file descriptor value when opening the bundle file and importing its embedded packfile (which implicitly closes the file descriptor) and then when opening a pack file in `fetch_and_consume_refs()` while looking up an object's header. Later on, after the bundle has been imported (and the `close_bundle()` function erroneously closes the file descriptor that has _already_ been closed when using it as `stdin` for `git index-pack`), the same file descriptor value has now been reused via `use_pack()`. Now, when either the recursive fetch (which defaults to "on", unfortunately) or a commit-graph update needs to `mmap()` the packfile, it fails due to a now-invalid file descriptor that _should_ point to the pack file but doesn't anymore. To fix that, let's invalidate `data->fd` after calling `unbundle()`. That way, `close_bundle()` does not close a file descriptor that may have been reused for something different. While at it, document that `unbundle()` closes the file descriptor, and ensure that it also does that when failing to verify the bundle. Luckily, this bug does not affect the bundle URI feature, it only affects the `git fetch <bundle>` code path. Note that this patch does not _completely_ clarifies who is responsible to close that file descriptor, as `run_command()` may fail _without_ closing `cmd->in`. Addressing this issue thoroughly, however, would require a rather thorough re-design of the `start_command()` and `finish_command()` functionality to make it a lot less murky who is responsible for what file descriptors. At least this here patch is relatively easy to reason about, and addresses a hard failure (`fatal: mmap: could not determine filesize`) at the expense of leaking a file descriptor under very rare circumstances in which `git fetch` would error out anyway. Signed-off-by: Johannes Schindelin <[email protected]>
As part of the snapshot builds via the `git-artifacts` workflow, we try to import the tag from a bundle. This uncovered a bug that is over 17 years old, and which I am trying to fix via gitgitgadget/git#1857. The symptom is that just after fetching the refs, when looking through the fetched revisions whether a recursive fetch is needed, Git fails to `mmap()` the newly-imported packfile. The symptom looks like this: From [...]/git.bundle * tag <tag> -> FETCH_HEAD * [new tag] <tag> -> <tag> fatal: mmap: could not determine filesize Curiously, this only happens on 32-bit Windows, not on 64-bit Windows, nor on Linux or macOS. The explanation is most likely to be found in how quickly file descriptor values are used again after closing them, which would appear to be _really_ quickly on i686 Windows. Be that as it may, to work around this issue, we simply avoid any operation that would need to access the just-imported packfile in `git fetch <bundle>`. Signed-off-by: Johannes Schindelin <[email protected]>
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
On the Git mailing list, Junio C Hamano wrote (reply to this): "Johannes Schindelin via GitGitGadget" <[email protected]>
writes:
> This seems not to have caused too many problems in almost two decades,
> but I encountered a situation today where it _does_ cause problems: In
> i686 variants of Git for Windows, it seems that file descriptors are
> reused quickly after they have been closed.
Nice finding. It pays not to be in too uniform a monoculture.
> This is a really old bug. Deviating from Git's practice, I did not apply
> this on top of the bugged commit. In fact, I did not even apply this on
> top of an older maint-* branch because it would cause conflicts even
> cherry-picking it on top of maint-2.47.
Just for fun, attached is my backport to maint-2.44 track. It
merges, with conflict which is rather trivial to resolve, back to
v2.48.1, on which your original patch is based, and gives the
merge result identical to the result of direct application of your
patch to v2.48.1.
I am not sure what to feel about the FD leak mentioned in the log
message, though.
Thanks, will queue.
--- >8 ---
From: Johannes Schindelin <[email protected]>
Date: Sat, 25 Jan 2025 23:57:36 +0000
Subject: [PATCH] bundle: avoid closing file descriptor twice
Already when introduced in c7a8a16239 (Add bundle transport,
2007-09-10), the `bundle` transport had a bug where it would open a file
descriptor to the bundle file and then close it _twice_: First, ...
... At least this here patch is relatively easy to reason about, and
addresses a hard failure (`fatal: mmap: could not determine filesize`)
at the expense of leaking a file descriptor under very rare
circumstances in which `git fetch` would error out anyway.
Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
---
bundle.c | 4 +++-
bundle.h | 2 ++
transport.c | 1 +
3 files changed, 6 insertions(+), 1 deletion(-)
Obviously this is hard to test ;-)
diff --git a/bundle.c b/bundle.c
index a9744da255..7b3f2da40f 100644
--- a/bundle.c
+++ b/bundle.c
@@ -616,8 +616,10 @@ int unbundle(struct repository *r, struct bundle_header *header,
{
struct child_process ip = CHILD_PROCESS_INIT;
- if (verify_bundle(r, header, flags))
+ if (verify_bundle(r, header, flags)) {
+ close(bundle_fd);
return -1;
+ }
strvec_pushl(&ip.args, "index-pack", "--fix-thin", "--stdin", NULL);
diff --git a/bundle.h b/bundle.h
index 021adbdcbb..8b187be7cf 100644
--- a/bundle.h
+++ b/bundle.h
@@ -50,6 +50,8 @@ int verify_bundle(struct repository *r, struct bundle_header *header,
*
* Before unbundling, this method will call verify_bundle() with the
* given 'flags'.
+ *
+ * Note that the `bundle_fd` will be closed as part of the operation.
*/
int unbundle(struct repository *r, struct bundle_header *header,
int bundle_fd, struct strvec *extra_index_pack_args,
diff --git a/transport.c b/transport.c
index df518ead70..25e2233da0 100644
--- a/transport.c
+++ b/transport.c
@@ -184,6 +184,7 @@ static int fetch_refs_from_bundle(struct transport *transport,
get_refs_from_bundle_inner(transport);
ret = unbundle(the_repository, &data->header, data->fd,
&extra_index_pack_args, 0);
+ data->fd = -1; /* `unbundle()` closes the file descriptor */
transport->hash_algo = data->header.hash_algo;
return ret;
}
--
2.48.1-293-gbb90fdfe3c
|
This patch series was integrated into seen via git@d69419e. |
As part of the snapshot builds via the `git-artifacts` workflow, we try to import the tag from a bundle. This uncovered a bug that is over 17 years old, and which I am trying to fix via gitgitgadget/git#1857. The symptom is that just after fetching the refs, when looking through the fetched revisions whether a recursive fetch is needed, Git fails to `mmap()` the newly-imported packfile. The symptom looks like this: From [...]/git.bundle * tag <tag> -> FETCH_HEAD * [new tag] <tag> -> <tag> fatal: mmap: could not determine filesize Curiously, this only happens on 32-bit Windows, not on 64-bit Windows, nor on Linux or macOS. The explanation is most likely to be found in how quickly file descriptor values are used again after closing them, which would appear to be _really_ quickly on i686 Windows. Be that as it may, to work around this issue, we simply avoid any operation that would need to access the just-imported packfile in `git fetch <bundle>`. Signed-off-by: Johannes Schindelin <[email protected]>
This branch is now known as |
This patch series was integrated into seen via git@31d6ede. |
This patch series was integrated into seen via git@c3c5cc1. |
This patch series was integrated into seen via git@25b6870. |
There was a status update in the "New Topics" section about the branch The code path used when "git fetch" fetches from a bundle file closed the same file descriptor twice, which sometimes broke things unexpectedly when the file descriptor was reused, which has been corrected. Will merge to 'next'. source: <[email protected]> |
This patch series was integrated into seen via git@64afbe2. |
This patch series was integrated into next via git@e490587. |
This patch series was integrated into seen via git@9dd3406. |
This is a really old bug. Deviating from Git's practice, I did not apply this on top of the bugged commit. In fact, I did not even apply this on top of an older
maint-*
branch because it would cause conflicts even cherry-picking it on top ofmaint-2.47
.