Repo Auth and URL Redirects Not Compatible #25068
Labels
team-Configurability
platforms, toolchains, cquery, select(), config transitions
team-Core
Skyframe, bazel query, BEP, options parsing, bazelrc
team-Rules-API
API for writing rules/aspects: providers, runfiles, actions, artifacts
type: bug
untriaged
Description of the bug:
A repo dep (e.g.
http_archive()
or similar) supports auth (e.g. viaNETRC
,auth_patterns
, etc). URLs given (e.g. viaurl
,urls
, etc) can also support redirects. However, these two mechanisms do not properly function together.This is caused due to the APIs in
use_netrc()
anddownload_and_extract()
(in bzl) and due to the implementation for auth headers in the Bazel runtime. Namely, data is not passed or processed based on domains (as would be intuited given theauth_patterns
dict). Instead, the exact URLs are used to determine which headers to apply.This data transformation (from domain to exact URLs) happens in
util.bzl
(inuse_netrc()
). This transformed data (map of exact URL -> pattern) is expected by theauth
parameter forctx.download_and_extract()
(e.g. as used inhttp_archive()
).While redirects are supported in
HttpConnector
, the headers are applied based on the exact URL (given that is what the dict/map keys are from the input data); see here, here, here, and here. This means that a redirected URL will not get auth headers added when connecting to the new location, likely leading to a 404 (or similar).This seems due to being tied to the
com.google.auth.Credentials
API, wheregetRequestMetadata()
takes in the entire URI.An example repro setup is to use something like:
This results in a 404 (or similar, depending on host site), due to auth headers missing after following redirect. This obviously prevents fetching functionality -- but it's also really confusing, especially as debugging with
curl
would work as expected.Technically speaking, there is a work-around where the redirected URL can also be explicitly listed in
urls
-- but that ignores the whole point of a redirect URL. This would look like:This work-around ensures that, when
https://example.com/url/which/returns/302/to/actual/url/file.tar.gz
302-redirects tohttps://example.com/some/url/af123de/for/actual/file
, the auth headers get applied because the URL is found and mapped as necessary. But, as mentioned, this explicit listing is kinda silly.I was also able to see that the auth headers were missing with Wireshark. Using
curl
withexport SSLKEYLOGFILE="${PWD}/sslkeylog.log"
and using Bazel with"--host_jvm_args=-javaagent:${PWD}/extract-tls-secrets-4.0.0.jar=${PWD}/sslkeylog.log"
(see https://github.com/neykov/extract-tls-secrets).Seeing as this is pretty engrained into the APIs for
use_netrc()
anddownload_and_extract()
, etc, it's not clear how this could best be addressed in a backwards-compat way. Perhaps simply having a flag (default false) to fallback to checking only the domain..? I figured this should at least be documented..Which category does this issue belong to?
Core, Rules API, Configurability
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
As mentioned in the bug description, creating a
http_archive()
which points to a redirecting URL that requires auth.Then running something like
bazel fetch @some_dep_repo//...
will result in:I happened to find this bug when using Gitlab's Release links API redirecting to the Markdown uploads API. But, this isn't a Gitlab issue. It is a HTTP redirect + auth in Bazel issue.
My use case happens to focus on
http_archive()
(as shown in these examples), but this really affects all download calls with auth.I tested this with Bazel 6 and Bazel 7.
Which operating system are you running Bazel on?
Linux
What is the output of
bazel info release
?tested with
release 6.5.0-0
andrelease 7.4.1-0
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.N/A
What's the output of
git remote get-url origin; git rev-parse HEAD
?If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.
Looking through Bazel's git history, this seems like this has always been an issue (so long as auth and redirects have been supported for remote repo fetching).
Have you found anything relevant by searching the web?
https://bazel.build/rules/lib/repo/http#http_archive mentions that "Redirections are followed." for
url
/urls
. (which it does, but not much mention of how it interacts withauth_patterns
).Similar bugs/PRs (but not the same problem):
#14866
#14922
I also obviously found all the various code pointers linked above for how auth data is piped/processed/applied for HTTP calls.
Any other information, logs, or outputs that you want to share?
I don't think there's anything else.. 😅 Feel free to ask or let me know, if there is.
The text was updated successfully, but these errors were encountered: