Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gc: add --expire-to option #1843

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

adlternative
Copy link

@adlternative adlternative commented Dec 24, 2024

I want to perform a "safe" garbage collection for the Git repository
on the server, which avoids data corruption issues caused by
concurrent pushes during git GC. To achieve this, I currently need to
use git repack --cruft --expire-to=<dir> and git prune
in combination. However, it would be simpler if we could directly use
--expire-to=<dir> with the git-gc command.

v1: add --expire-to option to gc
v1 -> v2: fix git gc --prune=now with --expire-to
v2 -> v3: squash two patch into one patch
v3 -> v4: modify docs, commit message, and give more tests

cc: [email protected]
cc: [email protected]
cc: [email protected]

@adlternative
Copy link
Author

/submit

Copy link

gitgitgadget bot commented Dec 24, 2024

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1843/adlternative/zh/gc-expire-to-v1

To fetch this version to local tag pr-1843/adlternative/zh/gc-expire-to-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1843/adlternative/zh/gc-expire-to-v1

Copy link

gitgitgadget bot commented Dec 30, 2024

There are issues in commit 4254269:
fix(gc): make --prune=now compatible with --expire-to
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

Copy link

gitgitgadget bot commented Dec 31, 2024

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1843/adlternative/zh/gc-expire-to-v2

To fetch this version to local tag pr-1843/adlternative/zh/gc-expire-to-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1843/adlternative/zh/gc-expire-to-v2

@@ -69,6 +69,12 @@ be performed as well.
the `--max-cruft-size` option of linkgit:git-repack[1] for
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, ZheNing Hu wrote (reply to this):

ZheNing Hu via GitGitGadget <[email protected]> 于2024年12月31日周二 10:18写道:
>
> From: ZheNing Hu <[email protected]>
>
> This commit extends the functionality of `git gc`
> by adding a new option, `--expire-to=<dir>`. Previously,
> this feature was implemented in `git repack` (see 91badeb),
> allowing users to specify a directory where unreachable and
> expired cruft packs are stored during garbage collection.
> However, users had to run `git repack --cruft --expire-to=<dir>`
> followed by `git prune` to achieve similar results within `git gc`.
>
> By introducing `--expire-to=<dir>` directly into `git gc`,
> we simplify the process for users who wish to manage their
> repository's cleanup more efficiently. This change involves
> passing the `--expire-to=<dir>` parameter through to `git repack`,
> making it easier for users to set up a backup location for cruft
> packs that will be pruned.
>
> Signed-off-by: ZheNing Hu <[email protected]>
> ---
>  Documentation/git-gc.txt | 6 ++++++
>  builtin/gc.c             | 6 +++++-
>  t/t6500-gc.sh            | 6 ++++++
>  3 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
> index 370e22faaeb..b4c0cf02972 100644
> --- a/Documentation/git-gc.txt
> +++ b/Documentation/git-gc.txt
> @@ -69,6 +69,12 @@ be performed as well.
>         the `--max-cruft-size` option of linkgit:git-repack[1] for
>         more.
>
> +--expire-to=<dir>::
> +       When packing unreachable objects into a cruft pack, write a cruft
> +       pack containing pruned objects (if any) to the directory `<dir>`.
> +       See the `--expire-to` option of linkgit:git-repack[1] for
> +       more.
> +
>  --prune=<date>::
>         Prune loose objects older than date (default is 2 weeks ago,
>         overridable by the config variable `gc.pruneExpire`).
> diff --git a/builtin/gc.c b/builtin/gc.c
> index d52735354c9..77904694c9f 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -136,6 +136,7 @@ struct gc_config {
>         char *prune_worktrees_expire;
>         char *repack_filter;
>         char *repack_filter_to;
> +       char *repack_expire_to;
>         unsigned long big_pack_threshold;
>         unsigned long max_delta_cache_size;
>  };
> @@ -441,6 +442,8 @@ static void add_repack_all_option(struct gc_config *cfg,
>                 if (cfg->max_cruft_size)
>                         strvec_pushf(&repack, "--max-cruft-size=%lu",
>                                      cfg->max_cruft_size);
> +               if (cfg->repack_expire_to)
> +                       strvec_pushf(&repack, "--expire-to=%s", cfg->repack_expire_to);
>         } else {
>                 strvec_push(&repack, "-A");
>                 if (cfg->prune_expire)
> @@ -675,7 +678,6 @@ struct repository *repo UNUSED)
>         const char *prune_expire_sentinel = "sentinel";
>         const char *prune_expire_arg = prune_expire_sentinel;
>         int ret;
> -
>         struct option builtin_gc_options[] = {
>                 OPT__QUIET(&quiet, N_("suppress progress reporting")),
>                 { OPTION_STRING, 0, "prune", &prune_expire_arg, N_("date"),
> @@ -694,6 +696,8 @@ struct repository *repo UNUSED)
>                            PARSE_OPT_NOCOMPLETE),
>                 OPT_BOOL(0, "keep-largest-pack", &keep_largest_pack,
>                          N_("repack all other packs except the largest pack")),
> +               OPT_STRING(0, "expire-to", &cfg.repack_expire_to, N_("dir"),
> +                          N_("pack prefix to store a pack containing pruned objects")),
>                 OPT_END()
>         };
>
> diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
> index ee074b99b70..d4b0653a9b7 100755
> --- a/t/t6500-gc.sh
> +++ b/t/t6500-gc.sh
> @@ -339,6 +339,12 @@ test_expect_success 'gc.maxCruftSize sets appropriate repack options' '
>         test_subcommand $cruft_max_size_opts --max-cruft-size=3145728 <trace2.txt
>  '
>
> +test_expect_success '--expire-to sets appropriate repack options' '
> +       mkdir expired &&
> +       GIT_TRACE2_EVENT=$(pwd)/trace2.txt git -C cruft--max-size gc --cruft --expire-to=./expired/pack &&
> +       test_subcommand $cruft_max_size_opts --expire-to=./expired/pack <trace2.txt
> +'
> +
>  run_and_wait_for_gc () {
>         # We read stdout from gc for the side effect of waiting until the
>         # background gc process exits, closing its fd 9.  Furthermore, the
> --
> gitgitgadget
>

Hi, Jeff King, could you come and help take a look at this patch?
I would be very grateful if you have time!

ZheNing Hu

@@ -69,6 +69,12 @@ be performed as well.
the `--max-cruft-size` option of linkgit:git-repack[1] for
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, ZheNing Hu wrote (reply to this):

This patch has been sitting for weeks with no review. Does anyone want
to help take a look?

ZheNing Hu via GitGitGadget <[email protected]> 于2024年12月31日周二 10:18写道:
>
> From: ZheNing Hu <[email protected]>
>
> This commit extends the functionality of `git gc`
> by adding a new option, `--expire-to=<dir>`. Previously,
> this feature was implemented in `git repack` (see 91badeb),
> allowing users to specify a directory where unreachable and
> expired cruft packs are stored during garbage collection.
> However, users had to run `git repack --cruft --expire-to=<dir>`
> followed by `git prune` to achieve similar results within `git gc`.
>
> By introducing `--expire-to=<dir>` directly into `git gc`,
> we simplify the process for users who wish to manage their
> repository's cleanup more efficiently. This change involves
> passing the `--expire-to=<dir>` parameter through to `git repack`,
> making it easier for users to set up a backup location for cruft
> packs that will be pruned.
>
> Signed-off-by: ZheNing Hu <[email protected]>
> ---
>  Documentation/git-gc.txt | 6 ++++++
>  builtin/gc.c             | 6 +++++-
>  t/t6500-gc.sh            | 6 ++++++
>  3 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
> index 370e22faaeb..b4c0cf02972 100644
> --- a/Documentation/git-gc.txt
> +++ b/Documentation/git-gc.txt
> @@ -69,6 +69,12 @@ be performed as well.
>         the `--max-cruft-size` option of linkgit:git-repack[1] for
>         more.
>
> +--expire-to=<dir>::
> +       When packing unreachable objects into a cruft pack, write a cruft
> +       pack containing pruned objects (if any) to the directory `<dir>`.
> +       See the `--expire-to` option of linkgit:git-repack[1] for
> +       more.
> +
>  --prune=<date>::
>         Prune loose objects older than date (default is 2 weeks ago,
>         overridable by the config variable `gc.pruneExpire`).
> diff --git a/builtin/gc.c b/builtin/gc.c
> index d52735354c9..77904694c9f 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -136,6 +136,7 @@ struct gc_config {
>         char *prune_worktrees_expire;
>         char *repack_filter;
>         char *repack_filter_to;
> +       char *repack_expire_to;
>         unsigned long big_pack_threshold;
>         unsigned long max_delta_cache_size;
>  };
> @@ -441,6 +442,8 @@ static void add_repack_all_option(struct gc_config *cfg,
>                 if (cfg->max_cruft_size)
>                         strvec_pushf(&repack, "--max-cruft-size=%lu",
>                                      cfg->max_cruft_size);
> +               if (cfg->repack_expire_to)
> +                       strvec_pushf(&repack, "--expire-to=%s", cfg->repack_expire_to);
>         } else {
>                 strvec_push(&repack, "-A");
>                 if (cfg->prune_expire)
> @@ -675,7 +678,6 @@ struct repository *repo UNUSED)
>         const char *prune_expire_sentinel = "sentinel";
>         const char *prune_expire_arg = prune_expire_sentinel;
>         int ret;
> -
>         struct option builtin_gc_options[] = {
>                 OPT__QUIET(&quiet, N_("suppress progress reporting")),
>                 { OPTION_STRING, 0, "prune", &prune_expire_arg, N_("date"),
> @@ -694,6 +696,8 @@ struct repository *repo UNUSED)
>                            PARSE_OPT_NOCOMPLETE),
>                 OPT_BOOL(0, "keep-largest-pack", &keep_largest_pack,
>                          N_("repack all other packs except the largest pack")),
> +               OPT_STRING(0, "expire-to", &cfg.repack_expire_to, N_("dir"),
> +                          N_("pack prefix to store a pack containing pruned objects")),
>                 OPT_END()
>         };
>
> diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
> index ee074b99b70..d4b0653a9b7 100755
> --- a/t/t6500-gc.sh
> +++ b/t/t6500-gc.sh
> @@ -339,6 +339,12 @@ test_expect_success 'gc.maxCruftSize sets appropriate repack options' '
>         test_subcommand $cruft_max_size_opts --max-cruft-size=3145728 <trace2.txt
>  '
>
> +test_expect_success '--expire-to sets appropriate repack options' '
> +       mkdir expired &&
> +       GIT_TRACE2_EVENT=$(pwd)/trace2.txt git -C cruft--max-size gc --cruft --expire-to=./expired/pack &&
> +       test_subcommand $cruft_max_size_opts --expire-to=./expired/pack <trace2.txt
> +'
> +
>  run_and_wait_for_gc () {
>         # We read stdout from gc for the side effect of waiting until the
>         # background gc process exits, closing its fd 9.  Furthermore, the
> --
> gitgitgadget
>

@@ -432,7 +433,8 @@ static int keep_one_pack(struct string_list_item *item, void *data UNUSED)
static void add_repack_all_option(struct gc_config *cfg,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Tue, Dec 31, 2024 at 02:18:33AM +0000, ZheNing Hu via GitGitGadget wrote:

> diff --git a/builtin/gc.c b/builtin/gc.c
> index 77904694c9f..8656e1caff0 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -433,7 +433,8 @@ static int keep_one_pack(struct string_list_item *item, void *data UNUSED)
>  static void add_repack_all_option(struct gc_config *cfg,
>  				  struct string_list *keep_pack)
>  {
> -	if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now"))
> +	if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now")
> +		&& !(cfg->cruft_packs && cfg->repack_expire_to))
>  		strvec_push(&repack, "-a");

I expected to see a mention of repack_expire_to here, but not
cfg->cruft_packs. These two are AND-ed together so we are only disabling
"repack -a" when both options ("--expire-to" and "--cruft") are passed.
Can we --expire-to without cruft? I.e., what should happen with:

  git gc --expire-to=some-path --prune=now --no-cruft

Looking at the underlying git-repack, it seems that we only respect
--expire-to at all when used with "--cruft", and don't otherwise
consider it. Which is what the manpage says ("Only useful with --cruft
-d").

But if we look at this proposed patch for example:

  https://lore.kernel.org/git/48438876fb42a889110e100a6c42ca84e93aac49.1733011259.git.me@ttaylorr.com/

then it is expanding how --expire-to is used during the pruning step.
OTOH, I think the way your patch 1 is structured means that we'd always
pass --expire-to to git-repack anyway, and I _think_ even with the patch
linked above that "repack -a -d --expire-to=whatever" would do the right
thing.

In which case the problem really is the combination of cruft packs and
expire-to. Just cruft packs by themselves do not need to override using
"-a" for "--prune=now" because we know that any such cruft pack would be
empty.

So I think this logic is correct. Taylor might have more thoughts,
though (and ideas on whether he intends to revisit that earlier patch).

I do think this change should probably be done as part of patch 1,
rather than introducing a buggy state and then fixing it in patch 2.

-Peff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, ZheNing Hu wrote (reply to this):

Jeff King <[email protected]> 于2025年1月13日周一 17:17写道:
>
> On Tue, Dec 31, 2024 at 02:18:33AM +0000, ZheNing Hu via GitGitGadget wrote:
>
> > diff --git a/builtin/gc.c b/builtin/gc.c
> > index 77904694c9f..8656e1caff0 100644
> > --- a/builtin/gc.c
> > +++ b/builtin/gc.c
> > @@ -433,7 +433,8 @@ static int keep_one_pack(struct string_list_item *item, void *data UNUSED)
> >  static void add_repack_all_option(struct gc_config *cfg,
> >                                 struct string_list *keep_pack)
> >  {
> > -     if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now"))
> > +     if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now")
> > +             && !(cfg->cruft_packs && cfg->repack_expire_to))
> >               strvec_push(&repack, "-a");
>
> I expected to see a mention of repack_expire_to here, but not
> cfg->cruft_packs. These two are AND-ed together so we are only disabling
> "repack -a" when both options ("--expire-to" and "--cruft") are passed.
> Can we --expire-to without cruft? I.e., what should happen with:
>
>   git gc --expire-to=some-path --prune=now --no-cruft
>
> Looking at the underlying git-repack, it seems that we only respect
> --expire-to at all when used with "--cruft", and don't otherwise
> consider it. Which is what the manpage says ("Only useful with --cruft
> -d").
>

Yes, this is the current state of git-repack. The --expire-to option can
only be used with --cruft, which is why I use cruft_packs && repack_expire_to
as a double safeguard.

When using --no-cruft, the option --expire-to becomes irrelevant.
So leaving `git gc --prune=now` as is at this point: passing -a as a
parameter to repack seems reasonable.

> But if we look at this proposed patch for example:
>
>   https://lore.kernel.org/git/48438876fb42a889110e100a6c42ca84e93aac49.1733011259.git.me@ttaylorr.com/
>
> then it is expanding how --expire-to is used during the pruning step.
> OTOH, I think the way your patch 1 is structured means that we'd always
> pass --expire-to to git-repack anyway, and I _think_ even with the patch
> linked above that "repack -a -d --expire-to=whatever" would do the right
> thing.
>

I've taken a look at the patch, and I believe Taylor's changes are primarily
aimed at extending the --expire-to functionality within the --cruft feature,
rather than expecting --expire-to to be used on its own.

> In which case the problem really is the combination of cruft packs and
> expire-to. Just cruft packs by themselves do not need to override using
> "-a" for "--prune=now" because we know that any such cruft pack would be
> empty.
>
> So I think this logic is correct. Taylor might have more thoughts,
> though (and ideas on whether he intends to revisit that earlier patch).
>
> I do think this change should probably be done as part of patch 1,
> rather than introducing a buggy state and then fixing it in patch 2.
>

Yes, I agree with that, and perhaps a single patch will suffice.

> -Peff

- ZheNing Hu

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trong danh sách gửi thư Git , ZheNing Hu đã viết ( trả lời bài này ):

Jeff King <[email protected]> 于2025年1月13日周一 17:17写道:
>
> On Tue, Dec 31, 2024 at 02:18:33AM +0000, ZheNing Hu via GitGitGadget wrote:
>
> > diff --git a/builtin/gc.c b/builtin/gc.c
> > index 77904694c9f..8656e1caff0 100644
> > --- a/builtin/gc.c
> > +++ b/builtin/gc.c
> > @@ -433,7 +433,8 @@ static int keep_one_pack(struct string_list_item *item, void *data UNUSED)
> >  static void add_repack_all_option(struct gc_config *cfg,
> >                                 struct string_list *keep_pack)
> >  {
> > -     if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now"))
> > +     if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now")
> > +             && !(cfg->cruft_packs && cfg->repack_expire_to))
> >               strvec_push(&repack, "-a");
>
> I expected to see a mention of repack_expire_to here, but not
> cfg->cruft_packs. These two are AND-ed together so we are only disabling
> "repack -a" when both options ("--expire-to" and "--cruft") are passed.
> Can we --expire-to without cruft? I.e., what should happen with:
>
>   git gc --expire-to=some-path --prune=now --no-cruft
>
> Looking at the underlying git-repack, it seems that we only respect
> --expire-to at all when used with "--cruft", and don't otherwise
> consider it. Which is what the manpage says ("Only useful with --cruft
> -d").
>

Yes, this is the current state of git-repack. The --expire-to option can
only be used with --cruft, which is why I use cruft_packs && repack_expire_to
as a double safeguard.

When using --no-cruft, the option --expire-to becomes irrelevant.
So leaving `git gc --prune=now` as is at this point: passing -a as a
parameter to repack seems reasonable.

> But if we look at this proposed patch for example:
>
>   https://lore.kernel.org/git/48438876fb42a889110e100a6c42ca84e93aac49.1733011259.git.me@ttaylorr.com/
>
> then it is expanding how --expire-to is used during the pruning step.
> OTOH, I think the way your patch 1 is structured means that we'd always
> pass --expire-to to git-repack anyway, and I _think_ even with the patch
> linked above that "repack -a -d --expire-to=whatever" would do the right
> thing.
>

I've taken a look at the patch, and I believe Taylor's changes are primarily
aimed at extending the --expire-to functionality within the --cruft feature,
rather than expecting --expire-to to be used on its own.

> In which case the problem really is the combination of cruft packs and
> expire-to. Just cruft packs by themselves do not need to override using
> "-a" for "--prune=now" because we know that any such cruft pack would be
> empty.
>
> So I think this logic is correct. Taylor might have more thoughts,
> though (and ideas on whether he intends to revisit that earlier patch).
>
> I do think this change should probably be done as part of patch 1,
> rather than introducing a buggy state and then fixing it in patch 2.
>

Yes, I agree with that, and perhaps a single patch will suffice.

> -Peff

- ZheNing Hu

#1843 (comment)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adlternative
Copy link
Author

/submit

Copy link

gitgitgadget bot commented Jan 16, 2025

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1843/adlternative/zh/gc-expire-to-v3

To fetch this version to local tag pr-1843/adlternative/zh/gc-expire-to-v3:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1843/adlternative/zh/gc-expire-to-v3

Copy link

gitgitgadget bot commented Jan 16, 2025

On the Git mailing list, Junio C Hamano wrote (reply to this):

"ZheNing Hu via GitGitGadget" <[email protected]> writes:

> From: ZheNing Hu <[email protected]>
>
> This commit extends the functionality of `git gc`
> by adding a new option, `--expire-to=<dir>`. Previously,
> this feature was implemented in `git repack` (see 91badeb),
> allowing users to specify a directory where unreachable and
> expired cruft packs are stored during garbage collection.
> However, users had to run `git repack --cruft --expire-to=<dir>`
> followed by `git prune` to achieve similar results within `git gc`.
>
> By introducing `--expire-to=<dir>` directly into `git gc`,
> we simplify the process for users who wish to manage their
> repository's cleanup more efficiently. This change involves
> passing the `--expire-to=<dir>` parameter through to `git repack`,
> making it easier for users to set up a backup location for cruft
> packs that will be pruned.

Today I do not have enough time to do my usual commit log message
critique.  Please use "git show -s --format=reference" when
referring to an earlier commit.

> Note: When git-gc is used with both `--cruft` and `--expire-to`,
> it does not pass `-a` to git-repack to delete all unreachable
> objects as `git gc --prune=now` originally did. Instead, it
> generates a cruft pack in the directory specified by expire-to.

Is this less important than "we added --expire-to to gc that is
passed down to underlying repack" in the previous paragraph?

Not removing the unreachables too early with "repack -a" is an
essential part of the design of this new feature to allow us not to
lose the cruft objects, so I was a bit surprised that this was
described as a "Note:".

> diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
> index 370e22faaeb..b4c0cf02972 100644
> --- a/Documentation/git-gc.txt
> +++ b/Documentation/git-gc.txt
> @@ -69,6 +69,12 @@ be performed as well.
>  	the `--max-cruft-size` option of linkgit:git-repack[1] for
>  	more.
>  
> +--expire-to=<dir>::
> +	When packing unreachable objects into a cruft pack, write a cruft
> +	pack containing pruned objects (if any) to the directory `<dir>`.
> +	See the `--expire-to` option of linkgit:git-repack[1] for
> +	more.

Does "When packing unreachable objects into a cruft pack" mean that
this option is only meaningful with "--cruft"?  As "--cruft" is on
by default, is it an error to pass "--no-cruft" when you use this
option?

"for more" -> "for more information" or something?

> diff --git a/builtin/gc.c b/builtin/gc.c
> index d52735354c9..8656e1caff0 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -136,6 +136,7 @@ struct gc_config {
>  	char *prune_worktrees_expire;
>  	char *repack_filter;
>  	char *repack_filter_to;
> +	char *repack_expire_to;
>  	unsigned long big_pack_threshold;
>  	unsigned long max_delta_cache_size;
>  };
> @@ -432,7 +433,8 @@ static int keep_one_pack(struct string_list_item *item, void *data UNUSED)
>  static void add_repack_all_option(struct gc_config *cfg,
>  				  struct string_list *keep_pack)
>  {
> -	if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now"))
> +	if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now")
> +		&& !(cfg->cruft_packs && cfg->repack_expire_to))
>  		strvec_push(&repack, "-a");

Hmph.  When "--expire-to=<there>" is given, we are dropping these
unreachable objects right away, but we said "--no-cruft", then we
say "repack -a".  If we have both "--cruft" and "--expire-to=<there>",
then ...

>  	else if (cfg->cruft_packs) {
>  		strvec_push(&repack, "--cruft");
> @@ -441,6 +443,8 @@ static void add_repack_all_option(struct gc_config *cfg,
>  		if (cfg->max_cruft_size)
>  			strvec_pushf(&repack, "--max-cruft-size=%lu",
>  				     cfg->max_cruft_size);
> +		if (cfg->repack_expire_to)
> +			strvec_pushf(&repack, "--expire-to=%s", cfg->repack_expire_to);

... we do the usual "repack --cruft --expire-to=<there>" in the next
block.

> @@ -675,7 +679,6 @@ struct repository *repo UNUSED)
>  	const char *prune_expire_sentinel = "sentinel";
>  	const char *prune_expire_arg = prune_expire_sentinel;
>  	int ret;
> -
>  	struct option builtin_gc_options[] = {
>  		OPT__QUIET(&quiet, N_("suppress progress reporting")),
>  		{ OPTION_STRING, 0, "prune", &prune_expire_arg, N_("date"),

OK.

> @@ -694,6 +697,8 @@ struct repository *repo UNUSED)
>  			   PARSE_OPT_NOCOMPLETE),
>  		OPT_BOOL(0, "keep-largest-pack", &keep_largest_pack,
>  			 N_("repack all other packs except the largest pack")),
> +		OPT_STRING(0, "expire-to", &cfg.repack_expire_to, N_("dir"),
> +			   N_("pack prefix to store a pack containing pruned objects")),
>  		OPT_END()
>  	};

OK.

> diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
> index ee074b99b70..d4b0653a9b7 100755
> --- a/t/t6500-gc.sh
> +++ b/t/t6500-gc.sh
> @@ -339,6 +339,12 @@ test_expect_success 'gc.maxCruftSize sets appropriate repack options' '
>  	test_subcommand $cruft_max_size_opts --max-cruft-size=3145728 <trace2.txt
>  '
>  
> +test_expect_success '--expire-to sets appropriate repack options' '
> +	mkdir expired &&
> +	GIT_TRACE2_EVENT=$(pwd)/trace2.txt git -C cruft--max-size gc --cruft --expire-to=./expired/pack &&
> +	test_subcommand $cruft_max_size_opts --expire-to=./expired/pack <trace2.txt
> +'

As "--cruft" is on by default, the command line does not have to
have it, but being explicit is good.

Should we also see what happens when "--no-cruft" is given?

Thanks.

Copy link

gitgitgadget bot commented Jan 16, 2025

This patch series was integrated into seen via git@aa1682c.

@gitgitgadget gitgitgadget bot added the seen label Jan 16, 2025
Copy link

gitgitgadget bot commented Jan 17, 2025

This branch is now known as zh/gc-expire-to.

Copy link

gitgitgadget bot commented Jan 17, 2025

This patch series was integrated into seen via git@9984f53.

Copy link

gitgitgadget bot commented Jan 18, 2025

This patch series was integrated into seen via git@4ae0c92.

Copy link

gitgitgadget bot commented Jan 18, 2025

There was a status update in the "New Topics" section about the branch zh/gc-expire-to on the Git mailing list:

"git gc" learned the "--expire-to" option and passes it down to
underlying "git repack".

Needs review.
source: <[email protected]>

Copy link

gitgitgadget bot commented Jan 21, 2025

This patch series was integrated into seen via git@d36a896.

Copy link

gitgitgadget bot commented Jan 22, 2025

This patch series was integrated into seen via git@c716f47.

Copy link

gitgitgadget bot commented Jan 22, 2025

This patch series was integrated into seen via git@7f3e7d1.

Copy link

gitgitgadget bot commented Jan 22, 2025

There was a status update in the "Cooking" section about the branch zh/gc-expire-to on the Git mailing list:

"git gc" learned the "--expire-to" option and passes it down to
underlying "git repack".

Needs review.
source: <[email protected]>

Copy link

gitgitgadget bot commented Jan 23, 2025

On the Git mailing list, ZheNing Hu wrote (reply to this):

Junio C Hamano <[email protected]> 于2025年1月17日周五 02:23写道:
>
> "ZheNing Hu via GitGitGadget" <[email protected]> writes:
>
> > From: ZheNing Hu <[email protected]>
> >
> > This commit extends the functionality of `git gc`
> > by adding a new option, `--expire-to=<dir>`. Previously,
> > this feature was implemented in `git repack` (see 91badeb),
> > allowing users to specify a directory where unreachable and
> > expired cruft packs are stored during garbage collection.
> > However, users had to run `git repack --cruft --expire-to=<dir>`
> > followed by `git prune` to achieve similar results within `git gc`.
> >
> > By introducing `--expire-to=<dir>` directly into `git gc`,
> > we simplify the process for users who wish to manage their
> > repository's cleanup more efficiently. This change involves
> > passing the `--expire-to=<dir>` parameter through to `git repack`,
> > making it easier for users to set up a backup location for cruft
> > packs that will be pruned.
>
> Today I do not have enough time to do my usual commit log message
> critique.  Please use "git show -s --format=reference" when
> referring to an earlier commit.
>

Okay, I will change to using this format.

> > Note: When git-gc is used with both `--cruft` and `--expire-to`,
> > it does not pass `-a` to git-repack to delete all unreachable
> > objects as `git gc --prune=now` originally did. Instead, it
> > generates a cruft pack in the directory specified by expire-to.
>
> Is this less important than "we added --expire-to to gc that is
> passed down to underlying repack" in the previous paragraph?
>

I had thought that adding --expire-to to gc was key in this patch,
but the change to the implementation of --prune=now should
indeed be mentioned more.

> Not removing the unreachables too early with "repack -a" is an
> essential part of the design of this new feature to allow us not to
> lose the cruft objects, so I was a bit surprised that this was
> described as a "Note:".
>

You're right. This section shouldn't use a note; it should provide
a more detailed explanation instead.

> > diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
> > index 370e22faaeb..b4c0cf02972 100644
> > --- a/Documentation/git-gc.txt
> > +++ b/Documentation/git-gc.txt
> > @@ -69,6 +69,12 @@ be performed as well.
> >       the `--max-cruft-size` option of linkgit:git-repack[1] for
> >       more.
> >
> > +--expire-to=<dir>::
> > +     When packing unreachable objects into a cruft pack, write a cruft
> > +     pack containing pruned objects (if any) to the directory `<dir>`.
> > +     See the `--expire-to` option of linkgit:git-repack[1] for
> > +     more.
>
> Does "When packing unreachable objects into a cruft pack" mean that
> this option is only meaningful with "--cruft"?  As "--cruft" is on
> by default, is it an error to pass "--no-cruft" when you use this
> option?
>

It (--expired-to) can currently only be used together with --cruft.
Using --no-cruft together with --expire-to will not result in an error,
but --expired-to will not take effect either.

I should mention in the document that --expire-to and --cruft
need to be used together, otherwise --expire-to will not
have any effect.

> "for more" -> "for more information" or something?
>

OK,  "for more information".

> > diff --git a/builtin/gc.c b/builtin/gc.c
> > index d52735354c9..8656e1caff0 100644
> > --- a/builtin/gc.c
> > +++ b/builtin/gc.c
> > @@ -136,6 +136,7 @@ struct gc_config {
> >       char *prune_worktrees_expire;
> >       char *repack_filter;
> >       char *repack_filter_to;
> > +     char *repack_expire_to;
> >       unsigned long big_pack_threshold;
> >       unsigned long max_delta_cache_size;
> >  };
> > @@ -432,7 +433,8 @@ static int keep_one_pack(struct string_list_item *item, void *data UNUSED)
> >  static void add_repack_all_option(struct gc_config *cfg,
> >                                 struct string_list *keep_pack)
> >  {
> > -     if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now"))
> > +     if (cfg->prune_expire && !strcmp(cfg->prune_expire, "now")
> > +             && !(cfg->cruft_packs && cfg->repack_expire_to))
> >               strvec_push(&repack, "-a");
>
> Hmph.  When "--expire-to=<there>" is given, we are dropping these
> unreachable objects right away, but we said "--no-cruft", then we
> say "repack -a".  If we have both "--cruft" and "--expire-to=<there>",
> then ...
>
> >       else if (cfg->cruft_packs) {
> >               strvec_push(&repack, "--cruft");
> > @@ -441,6 +443,8 @@ static void add_repack_all_option(struct gc_config *cfg,
> >               if (cfg->max_cruft_size)
> >                       strvec_pushf(&repack, "--max-cruft-size=%lu",
> >                                    cfg->max_cruft_size);
> > +             if (cfg->repack_expire_to)
> > +                     strvec_pushf(&repack, "--expire-to=%s", cfg->repack_expire_to);
>
> ... we do the usual "repack --cruft --expire-to=<there>" in the next
> block.
>
> > @@ -675,7 +679,6 @@ struct repository *repo UNUSED)
> >       const char *prune_expire_sentinel = "sentinel";
> >       const char *prune_expire_arg = prune_expire_sentinel;
> >       int ret;
> > -
> >       struct option builtin_gc_options[] = {
> >               OPT__QUIET(&quiet, N_("suppress progress reporting")),
> >               { OPTION_STRING, 0, "prune", &prune_expire_arg, N_("date"),
>
> OK.
>
> > @@ -694,6 +697,8 @@ struct repository *repo UNUSED)
> >                          PARSE_OPT_NOCOMPLETE),
> >               OPT_BOOL(0, "keep-largest-pack", &keep_largest_pack,
> >                        N_("repack all other packs except the largest pack")),
> > +             OPT_STRING(0, "expire-to", &cfg.repack_expire_to, N_("dir"),
> > +                        N_("pack prefix to store a pack containing pruned objects")),
> >               OPT_END()
> >       };
>
> OK.
>
> > diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
> > index ee074b99b70..d4b0653a9b7 100755
> > --- a/t/t6500-gc.sh
> > +++ b/t/t6500-gc.sh
> > @@ -339,6 +339,12 @@ test_expect_success 'gc.maxCruftSize sets appropriate repack options' '
> >       test_subcommand $cruft_max_size_opts --max-cruft-size=3145728 <trace2.txt
> >  '
> >
> > +test_expect_success '--expire-to sets appropriate repack options' '
> > +     mkdir expired &&
> > +     GIT_TRACE2_EVENT=$(pwd)/trace2.txt git -C cruft--max-size gc --cruft --expire-to=./expired/pack &&
> > +     test_subcommand $cruft_max_size_opts --expire-to=./expired/pack <trace2.txt
> > +'
>
> As "--cruft" is on by default, the command line does not have to
> have it, but being explicit is good.
>
> Should we also see what happens when "--no-cruft" is given?
>

 --expire-to with --no-cruft  will still run repack -a, I will add
corresponding tests.

> Thanks.

Thanks.

Copy link

gitgitgadget bot commented Jan 23, 2025

This patch series was integrated into seen via git@77d4d83.

This commit extends the functionality of `git gc`
by adding a new option, `--expire-to=<dir>`. Previously,
this feature was implemented in 91badeb (builtin/repack.c:
implement `--expire-to` for storing pruned objects, 2022-10-24),
which allowing users to specify a directory where unreachable
and expired cruft packs are stored during garbage collection.
However, users had to run `git repack --cruft --expire-to=<dir>`
followed by `git prune` to achieve similar results within `git gc`.

By introducing `--expire-to=<dir>` directly into `git gc`,
we simplify the process for users who wish to manage their
repository's cleanup more efficiently. This change involves
passing the `--expire-to=<dir>` parameter through to `git repack`,
making it easier for users to set up a backup location for cruft
packs that will be pruned.

Due to the original `git gc --prune=now` deleting all unreachable
objects by passing the `-a` parameter to git repack. With the
addition of the `--cruft` and `--expire-to` options, it is necessary
to modify this default behavior: instead of deleting these
unreachable objects, they should be merged into a cruft pack and
collected in a specified directory. Therefore, we do not pass `-a`
to the repack command but instead pass `--cruft`, `--expire-to`,
and `--cruft-expiration=now` to repack.

Signed-off-by: ZheNing Hu <[email protected]>
@adlternative
Copy link
Author

/submit

Copy link

gitgitgadget bot commented Jan 24, 2025

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1843/adlternative/zh/gc-expire-to-v4

To fetch this version to local tag pr-1843/adlternative/zh/gc-expire-to-v4:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1843/adlternative/zh/gc-expire-to-v4

Copy link

gitgitgadget bot commented Jan 24, 2025

This patch series was integrated into seen via git@099b60c.

Copy link

gitgitgadget bot commented Jan 24, 2025

There was a status update in the "Cooking" section about the branch zh/gc-expire-to on the Git mailing list:

"git gc" learned the "--expire-to" option and passes it down to
underlying "git repack".

Needs review.
source: <[email protected]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants