-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move vendored modules into a vendor
directory
#129222
Comments
Android doesn't do anything special with these libraries, so it shouldn't be affected. |
Gentoo used to remove these libraries entirely, but we don't do that anymore. Which reminds me that I was supposed to fix the issue that prevented us from doing so. Also, I was supposed to see if we can remove them. Days are too short. |
s/nedbat/ned-deily/ Such a change should have no impact on macOS builds, macOS installer builds, or iOS builds. |
Oups, sorry for mistagging you and for the miswording (forgive my lack of macOS knowledge). |
Er... this is funny but... who should I ask for Ubuntu actually? |
Bikeshedding: call it For example, pip and setuptools have |
Mimalloc has substantial modifications from upstream. Daan Leijen was working on changes to mimalloc to support the features needed by free-threaded CPython upstream, but it's probably going to take a while. Even then, it will require CPython specific build configuration -- it's never going to be something where you can just drop in a shared library provided by the OS. |
@doko42 had/has been the Debian/Ubuntu contact |
I think the point of |
Thank you, I wasn't aware of it! I've removed from the list of possible vendored modules.
I don't mind. At least, vendoring modules could be included last (or first) and be well-separated from other includes thanks to their name starting with |
It may still make sense to include it in the vendored modules directory. I don't have a strong opinion one way or the other. It depends on what you are primarily trying to communicate with the vendored directory. For example, it's third-party code in terms of license, attribution, and code style. But Linux distributions should probably treat it more like internal CPython code. For example, they should not substitute their own mimalloc package. |
Thanks for the ping! I think there would be some minor pain downstream because we have a weird custom build system for the extension modules, but it's fine. What about changes like #127932? |
For me
I'm not sure I understand your question but I'll try to reply with what I understood. If the upstream has issues, we can't really change the vendored copy even if we want to because we have some SBOM checks. So we can't just "hotfix" HACL* when we want to (we need to wait for upstream updates). So whether we move it to |
Historically, the cpython copies of some of the long-time "vendored" modules have been patched and modified in the cpython repo without regard to upstream. That has lead, of course, to maintenance issues when new upstream versions are released. I think you need to clarify here whether your intent is to move to a model of strictly vanilla vendored versions with separate necessary patch files in the repo so that SBOMs for the vendored versions could be used. That's potentially a much bigger change, i.e. reviewing all the cpython copies and extracting our changes into separate patches, updating the devguide to explain the process, etc etc, and one that needs to be carefully done. But it would likely be of long-term benefit to do so. |
On Windows (and it's open to other platforms) we already debundle sources into https://github.com/python/cpython-source-deps and pre-built binaries into https://github.com/python/cpython-bin-deps. They are referenced by build script, rather than submodules, but at least it means our only build-time network access is to the same service that holds the rest of our code (i.e. if it's down, our build is failing already). We do also patch these from time to time, but usually in individual commits to preserve the history. Are you proposing to move them there? Or just to another directory in the current source tree? |
My point is that we already did apply a patch to HACL* — is the idea here that we would revert that change? I think the subsequent comment from @ned-deily (at #129222 (comment)) captures the broader concern. |
That's my main intent. Namely, try to separate what's being vendored (and can be updated if the vendor's upstream has changed) from what's being "our" code. I guess we can also allow separate patches to be applied on our vendored copies if needs arise (e.g., if the vendored copy has issues and the upstream takes a long time for making the change or does not want to make the changes).
I'm just proposing to moving it in another directory.
Err, no? we did apply a patch to HACL* using the upstream HACL* or am I wrong? I mean, what we did is just update the bundled HACL* library once the necessary patch has been written out there or am I missing something? |
Cloning the cpython git repo already requires an internet connection. Doing a recursive clone doesn't require more of one. The release tarballs should absolutely include the contents of a submodule as well -- in fact you have to since you can't later clone the submodules, Internet connection or not, since there is no submodule metadata. The advantage of using submodules is that it becomes very very very clear what is vendored versus what is patched. The Meson build system does something similar using "wrap files" built into the core build system, that contain an URL download location for the upstream release tarballs and a checksum, which works even without git metadata. This also means that Meson can distribute your choice of "fat" release tarballs with vendored dependency fallbacks, or "thin" release tarballs with just the wrap definition, and at build time it will download and checksum the vendored dependency only if you don't have a system copy, or conditional on configure options such as "force vendor dependencies for standalone builds" or "forbid vendor dependencies for distro policy". The code to handle this robustly is not trivial, which is why we built it into the core so that it only has to be maintained in a single place. You probably want to stick with submodules. |
Moving the vendored projects to an explicit directory would not impact Fedora/RHEL. We would need to change the paths for those that we rm, but that should be it. |
If you're bikeshedding the name, github code search already excludes directorys named
Ideal: We'd make a top level Doing that would be good for project health IMNSHO rather than scattering stuff around as we do today. re-packagers should appreciate it as well as they each have their own ideas of what they do and don't want to include vs pull from external sources and it makes it more obvious what we've got that could be considered for that. Non-goal 1: Vendored things each need their own management in terms of how we get them, update them, and patch them if necessary for our repo. Don't try for a single approach there, it won't fit everything. There may be some commonalities on how that gets done but that kind of thing can be worked out on a case by case basis later. Non-goal 2: Vendored things in our tree do NOT need to match upstream. applying patches is part of vendoring. the important thing is to track those patches and automate recreating such changes when updates are needed rather than expecting the next person doing it to fully understand. |
BTW, the reason |
... or |
Oh... then maybe we can use that name :') If it's supported by Github, I would vote for 'third-party' because I don't need to press shift for
Yes, that's what I think as well. I don't think we can have a unified approach.
I'm ok with this as well. I don't mind having each vendored module with its own patching system (I mean, we're already doing it somehow). Now, AFAICT, there doesn't seem to many issues about just moving files which was my primary concern here. I'm no expert in redistribution but since I raised the issue on Discord, I thought it was also good to actually first ask whether this is something that would be interesting for CPython in the long term. And I'd like to thank everyone involved in this (constructive) discussion. |
By the way, Meson explicitly mandates that all "subprojects" (thirdparty code with their own build system that can be autoconfigured) must exist in a single top-level directory (default: |
Feature or enhancement
Proposal:
In CPython, we have some vendored libraries namely
libmpdec
,hacl
, andexpat
. Those libraries are meant to be clone of their upstream (think of them as git submodules) and many times have I been exluding them from code search as they usually have non-CPython code involved.We have a plan to remove vendored
libmpdec
(#115119) and we're almost here. In the long term, we could also try to unvendorexpat
(probably nothacl
as it's used to implement hash functions fallbacks when OpenSSL is not present).Affected modules:
libmpdec
expat
_hacl
Some advantages:
hacl
).Python/Programs
splitSome inconvenients:
We can start with some modules that should be kept untouched such as HACL* sources and progressively move the others to reduce the work and conflicts. It doesn't need to happen in one go (for instance, we may well ignore the
libmpdec
case if we manage to make it unvendored before). I don't think we have much open PRs withexpat
(by the way, we could have a refresh script forexpat
to ease maintenance like #126623).Now, the question is how this could affect downstream redistributors. I'm asking first on Github since I don't know whether they are active on Discourse or not. If everyone tells "it's fine", then I'll ask on Discourse to see if there are more redistributors that could be concerned.
cc
I don't know how moving
mimalloc
related stuff would affect the free-threaded build in particular, so I'm also going to ask @kumaraditya303 and @colesbury about it. EDIT: Turns out it's a no go formimalloc
as there is some CPython dedicated stuff, so we can put it out of the list (see #129222 (comment)).For the
hacl
includes, I can take care of it.The text was updated successfully, but these errors were encountered: