Skip to content

Commit

Permalink
cat-file: add remote-object-info to batch-command
Browse files Browse the repository at this point in the history
Since the `info` command in cat-file --batch-command prints object info
for a given object, it is natural to add another command in cat-file
--batch-command to print object info for a given object from a remote.
Add `remote-object-info` to cat-file --batch-command.

While `info` takes object ids one at a time, this creates overhead when
making requests to a server so `remote-object-info` instead can take
multiple object ids at once.

cat-file --batch-command is generally implemented in the following
manner:

 - Receive and parse input from user
 - Call respective function attached to command
 - Get object info, print object info

In --buffer mode, this changes to:

 - Receive and parse input from user
 - Store respective function attached to command in a queue
 - After flush, loop through commands in queue
    - Call respective function attached to command
    - Get object info, print object info

Notice how the getting and printing of object info is accomplished one
at a time. As described above, this creates a problem for making
requests to a server. Therefore, `remote-object-info` is implemented in
the following manner:

 - Receive and parse input from user
 If command is `remote-object-info`:
    - Get object info from remote
    - Loop through and print each object info
 Else:
    - Call respective function attached to command
    - Parse input, get object info, print object info

And finally for --buffer mode `remote-object-info`:
 - Receive and parse input from user
 - Store respective function attached to command in a queue
 - After flush, loop through commands in queue:
    If command is `remote-object-info`:
        - Get object info from remote
        - Loop through and print each object info
    Else:
        - Call respective function attached to command
        - Get object info, print object info

To summarize, `remote-object-info` gets object info from the remote and
then loop through the object info passed in, print the info.

In order for remote-object-info to avoid remote communication overhead
in the non-buffer mode, the objects are passed in as such:

remote-object-info <remote> <oid> <oid> ... <oid>

rather than

remote-object-info <remote> <oid>
remote-object-info <remote> <oid>
...
remote-object-info <remote> <oid>

Helped-by: Jonathan Tan <[email protected]>
Helped-by: Christian Couder <[email protected]>
Signed-off-by: Calvin Wan <[email protected]>
Signed-off-by: Eric Ju  <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
  • Loading branch information
calvin-wan-google authored and gitster committed Sep 27, 2024
1 parent df8c5dd commit ef30c45
Show file tree
Hide file tree
Showing 5 changed files with 889 additions and 5 deletions.
22 changes: 18 additions & 4 deletions Documentation/git-cat-file.txt
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,13 @@ info <object>::
Print object info for object reference `<object>`. This corresponds to the
output of `--batch-check`.

remote-object-info <remote> <object>...::
Print object info for object references `<object>` at specified <remote> without
downloading objects from remote. If the object-info capability is not
supported by the server, the objects will be downloaded instead.
Error when no object references are provided.
This command may be combined with `--buffer`.

flush::
Used with `--buffer` to execute all preceding commands that were issued
since the beginning or since the last flush was issued. When `--buffer`
Expand Down Expand Up @@ -290,21 +297,23 @@ newline. The available atoms are:
The full hex representation of the object name.

`objecttype`::
The type of the object (the same as `cat-file -t` reports).
The type of the object (the same as `cat-file -t` reports). See
`CAVEATS` below. Not supported by `remote-object-info`.

`objectsize`::
The size, in bytes, of the object (the same as `cat-file -s`
reports).

`objectsize:disk`::
The size, in bytes, that the object takes up on disk. See the
note about on-disk sizes in the `CAVEATS` section below.
note about on-disk sizes in the `CAVEATS` section below. Not
supported by `remote-object-info`.

`deltabase`::
If the object is stored as a delta on-disk, this expands to the
full hex representation of the delta base object name.
Otherwise, expands to the null OID (all zeroes). See `CAVEATS`
below.
below. Not supported by `remote-object-info`.

`rest`::
If this atom is used in the output string, input lines are split
Expand All @@ -314,7 +323,9 @@ newline. The available atoms are:
line) are output in place of the `%(rest)` atom.

If no format is specified, the default format is `%(objectname)
%(objecttype) %(objectsize)`.
%(objecttype) %(objectsize)`, except for `remote-object-info` commands which use
`%(objectname) %(objectsize)` for now because "%(objecttype)" is not supported yet.
When "%(objecttype)" is supported, default format should be unified.

If `--batch` is specified, or if `--batch-command` is used with the `contents`
command, the object information is followed by the object contents (consisting
Expand Down Expand Up @@ -396,6 +407,9 @@ scripting purposes.
CAVEATS
-------

Note that since %(objecttype), %(objectsize:disk) and %(deltabase) are currently not supported by the
`remote-object-info` command, we will error and exit when they are in the format string.

Note that the sizes of objects on disk are reported accurately, but care
should be taken in drawing conclusions about which refs or objects are
responsible for disk usage. The size of a packed non-delta object may be
Expand Down
108 changes: 107 additions & 1 deletion builtin/cat-file.c
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@
#include "promisor-remote.h"
#include "mailmap.h"
#include "write-or-die.h"
#include "alias.h"
#include "remote.h"
#include "transport.h"

enum batch_mode {
BATCH_MODE_CONTENTS,
Expand All @@ -42,9 +45,12 @@ struct batch_options {
char input_delim;
char output_delim;
const char *format;
int use_remote_info;
};

static const char *force_path;
static struct object_info *remote_object_info;
static struct oid_array object_info_oids = OID_ARRAY_INIT;

static struct string_list mailmap = STRING_LIST_INIT_NODUP;
static int use_mailmap;
Expand Down Expand Up @@ -528,7 +534,7 @@ static void batch_one_object(const char *obj_name,
enum get_oid_result result;

result = get_oid_with_context(the_repository, obj_name,
flags, &data->oid, &ctx);
flags, &data->oid, &ctx);
if (result != FOUND) {
switch (result) {
case MISSING_OBJECT:
Expand Down Expand Up @@ -576,6 +582,59 @@ static void batch_one_object(const char *obj_name,
object_context_release(&ctx);
}

static int get_remote_info(struct batch_options *opt, int argc, const char **argv)
{
int retval = 0;
struct remote *remote = NULL;
struct object_id oid;
struct string_list object_info_options = STRING_LIST_INIT_NODUP;
static struct transport *gtransport;

/*
* Change the format to "%(objectname) %(objectsize)" when
* remote-object-info command is used. Once we start supporting objecttype
* the default format should change to DEFAULT_FORMAT
*/
if (!opt->format)
opt->format = "%(objectname) %(objectsize)";

remote = remote_get(argv[0]);
if (!remote)
die(_("must supply valid remote when using remote-object-info"));

oid_array_clear(&object_info_oids);
for (size_t i = 1; i < argc; i++) {
if (get_oid_hex(argv[i], &oid))
die(_("Not a valid object name %s"), argv[i]);
oid_array_append(&object_info_oids, &oid);
}

gtransport = transport_get(remote, NULL);
if (gtransport->smart_options) {
CALLOC_ARRAY(remote_object_info, object_info_oids.nr);
gtransport->smart_options->object_info = 1;
gtransport->smart_options->object_info_oids = &object_info_oids;

/* 'objectsize' is the only option currently supported */
if (!strstr(opt->format, "%(objectsize)"))
die(_("%s is currently not supported with remote-object-info"), opt->format);

string_list_append(&object_info_options, "size");

if (object_info_options.nr > 0) {
gtransport->smart_options->object_info_options = &object_info_options;
gtransport->smart_options->object_info_data = remote_object_info;
retval = transport_fetch_refs(gtransport, NULL);
}
} else {
retval = -1;
}

string_list_clear(&object_info_options, 0);
transport_disconnect(gtransport);
return retval;
}

struct object_cb_data {
struct batch_options *opt;
struct expand_data *expand;
Expand Down Expand Up @@ -667,6 +726,52 @@ static void parse_cmd_info(struct batch_options *opt,
batch_one_object(line, output, opt, data);
}

static void parse_cmd_remote_object_info(struct batch_options *opt,
const char *line,
struct strbuf *output,
struct expand_data *data)
{
int count;
const char **argv;

char *line_to_split = xstrdup_or_null(line);
count = split_cmdline(line_to_split, &argv);
if (get_remote_info(opt, count, argv))
goto cleanup;

opt->use_remote_info = 1;
data->skip_object_info = 1;
for (size_t i = 0; i < object_info_oids.nr; i++) {

data->oid = object_info_oids.oid[i];

if (remote_object_info[i].sizep) {
data->size = *remote_object_info[i].sizep;
} else {
/*
* When reaching here, it means remote-object-info can't retrieve
* information from server without downloading them, and the objects
* have been fetched to client already.
* Print the information using the logic for local objects.
*/
data->skip_object_info = 0;
}

opt->batch_mode = BATCH_MODE_INFO;
batch_object_write(argv[i+1], output, opt, data, NULL, 0);

}
opt->use_remote_info = 0;
data->skip_object_info = 0;

cleanup:
for (size_t i = 0; i < object_info_oids.nr; i++)
free_object_info_contents(&remote_object_info[i]);
free(line_to_split);
free(argv);
free(remote_object_info);
}

static void dispatch_calls(struct batch_options *opt,
struct strbuf *output,
struct expand_data *data,
Expand Down Expand Up @@ -698,6 +803,7 @@ static const struct parse_cmd {
} commands[] = {
{ "contents", parse_cmd_contents, 1},
{ "info", parse_cmd_info, 1},
{ "remote-object-info", parse_cmd_remote_object_info, 1},
{ "flush", NULL, 0},
};

Expand Down
11 changes: 11 additions & 0 deletions object-file.c
Original file line number Diff line number Diff line change
Expand Up @@ -3020,3 +3020,14 @@ int read_loose_object(const char *path,
munmap(map, mapsize);
return ret;
}

void free_object_info_contents(struct object_info *object_info)
{
if (!object_info)
return;
free(object_info->typep);
free(object_info->sizep);
free(object_info->disk_sizep);
free(object_info->delta_base_oid);
free(object_info->type_name);
}
3 changes: 3 additions & 0 deletions object-store-ll.h
Original file line number Diff line number Diff line change
Expand Up @@ -548,4 +548,7 @@ int for_each_object_in_pack(struct packed_git *p,
int for_each_packed_object(each_packed_object_fn, void *,
enum for_each_object_flags flags);

/* Free pointers inside of object_info, but not object_info itself */
void free_object_info_contents(struct object_info *object_info);

#endif /* OBJECT_STORE_LL_H */
Loading

0 comments on commit ef30c45

Please sign in to comment.