Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repo Auth and URL Redirects Not Compatible #25068

Open
CauhxMilloy opened this issue Jan 25, 2025 · 0 comments
Open

Repo Auth and URL Redirects Not Compatible #25068

CauhxMilloy opened this issue Jan 25, 2025 · 0 comments
Assignees
Labels
team-Configurability platforms, toolchains, cquery, select(), config transitions team-Core Skyframe, bazel query, BEP, options parsing, bazelrc team-Rules-API API for writing rules/aspects: providers, runfiles, actions, artifacts type: bug untriaged

Comments

@CauhxMilloy
Copy link

Description of the bug:

A repo dep (e.g. http_archive() or similar) supports auth (e.g. via NETRC, auth_patterns, etc). URLs given (e.g. via url, urls, etc) can also support redirects. However, these two mechanisms do not properly function together.

This is caused due to the APIs in use_netrc() and download_and_extract() (in bzl) and due to the implementation for auth headers in the Bazel runtime. Namely, data is not passed or processed based on domains (as would be intuited given the auth_patterns dict). Instead, the exact URLs are used to determine which headers to apply.

This data transformation (from domain to exact URLs) happens in util.bzl (in use_netrc()). This transformed data (map of exact URL -> pattern) is expected by the auth parameter for ctx.download_and_extract() (e.g. as used in http_archive()).

While redirects are supported in HttpConnector, the headers are applied based on the exact URL (given that is what the dict/map keys are from the input data); see here, here, here, and here. This means that a redirected URL will not get auth headers added when connecting to the new location, likely leading to a 404 (or similar).

This seems due to being tied to the com.google.auth.Credentials API, where getRequestMetadata() takes in the entire URI.

An example repro setup is to use something like:

http_archive(
    name = "some_dep_repo",
    urls = [
        "https://example.com/url/which/returns/302/to/actual/url/file.tar.gz",
    ],
    auth_patterns = {
        "example.com": "Bearer <password>",
    },
)

This results in a 404 (or similar, depending on host site), due to auth headers missing after following redirect. This obviously prevents fetching functionality -- but it's also really confusing, especially as debugging with curl would work as expected.

Technically speaking, there is a work-around where the redirected URL can also be explicitly listed in urls -- but that ignores the whole point of a redirect URL. This would look like:

http_archive(
    name = "some_dep_repo",
    urls = [
        "https://example.com/url/which/returns/302/to/actual/url/file.tar.gz",
        "https://example.com/some/url/af123de/for/actual/file",
    ],
    auth_patterns = {
        "example.com": "Bearer <password>",
    },
)

This work-around ensures that, when https://example.com/url/which/returns/302/to/actual/url/file.tar.gz 302-redirects to https://example.com/some/url/af123de/for/actual/file, the auth headers get applied because the URL is found and mapped as necessary. But, as mentioned, this explicit listing is kinda silly.

I was also able to see that the auth headers were missing with Wireshark. Using curl with export SSLKEYLOGFILE="${PWD}/sslkeylog.log" and using Bazel with "--host_jvm_args=-javaagent:${PWD}/extract-tls-secrets-4.0.0.jar=${PWD}/sslkeylog.log" (see https://github.com/neykov/extract-tls-secrets).

Seeing as this is pretty engrained into the APIs for use_netrc() and download_and_extract(), etc, it's not clear how this could best be addressed in a backwards-compat way. Perhaps simply having a flag (default false) to fallback to checking only the domain..? I figured this should at least be documented..

Which category does this issue belong to?

Core, Rules API, Configurability

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

As mentioned in the bug description, creating a http_archive() which points to a redirecting URL that requires auth.

http_archive(
    name = "some_dep_repo",
    urls = [
        "https://example.com/url/which/returns/302/to/actual/url/file.tar.gz",
    ],
    auth_patterns = {
        "example.com": "Bearer <password>",
    },
)

Then running something like bazel fetch @some_dep_repo//... will result in:

WARNING: Download from https://example.com/url/which/returns/302/to/actual/url/file.tar.gz failed: class java.io.FileNotFoundException GET returned 404 Not Found
ERROR: An error occurred during the fetch of repository 'some_dep_repo':
   Traceback (most recent call last):
        File "/root/.cache/bazel/_bazel_root/aaabbb123123/external/bazel_tools/tools/build_defs/repo/http.bzl", line 132, column 45, in _http_archive_impl
                download_info = ctx.download_and_extract(
Error in download_and_extract: java.io.IOException: Error downloading [https://example.com/url/which/returns/302/to/actual/url/file.tar.gz] to /root/.cache/bazel/_bazel_root/aaabbb123123/external/some_dep_repo/temp1111111222222/file.tar.gz: GET returned 404 Not Found
ERROR: /root/some/path/to/my/workspace/WORKSPACE:17:13: fetching http_archive rule //external:some_dep_repo: Traceback (most recent call last):
        File "/root/.cache/bazel/_bazel_root/aaabbb123123/external/bazel_tools/tools/build_defs/repo/http.bzl", line 132, column 45, in _http_archive_impl
                download_info = ctx.download_and_extract(
Error in download_and_extract: java.io.IOException: Error downloading [https://example.com/url/which/returns/302/to/actual/url/file.tar.gz] to /root/.cache/bazel/_bazel_root/aaabbb123123/external/some_dep_repo/temp1111111222222/file.tar.gz: GET returned 404 Not Found

I happened to find this bug when using Gitlab's Release links API redirecting to the Markdown uploads API. But, this isn't a Gitlab issue. It is a HTTP redirect + auth in Bazel issue.

My use case happens to focus on http_archive() (as shown in these examples), but this really affects all download calls with auth.

I tested this with Bazel 6 and Bazel 7.

Which operating system are you running Bazel on?

Linux

What is the output of bazel info release?

tested with release 6.5.0-0 and release 7.4.1-0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

N/A

What's the output of git remote get-url origin; git rev-parse HEAD ?

N/A

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

Looking through Bazel's git history, this seems like this has always been an issue (so long as auth and redirects have been supported for remote repo fetching).

Have you found anything relevant by searching the web?

https://bazel.build/rules/lib/repo/http#http_archive mentions that "Redirections are followed." for url/urls. (which it does, but not much mention of how it interacts with auth_patterns).

Similar bugs/PRs (but not the same problem):
#14866
#14922

I also obviously found all the various code pointers linked above for how auth data is piped/processed/applied for HTTP calls.

Any other information, logs, or outputs that you want to share?

I don't think there's anything else.. 😅 Feel free to ask or let me know, if there is.

@github-actions github-actions bot added team-Configurability platforms, toolchains, cquery, select(), config transitions team-Core Skyframe, bazel query, BEP, options parsing, bazelrc team-Rules-API API for writing rules/aspects: providers, runfiles, actions, artifacts labels Jan 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Configurability platforms, toolchains, cquery, select(), config transitions team-Core Skyframe, bazel query, BEP, options parsing, bazelrc team-Rules-API API for writing rules/aspects: providers, runfiles, actions, artifacts type: bug untriaged
Projects
None yet
Development

No branches or pull requests

4 participants