-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new spec for go
package URLs
#338
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change that affects all software utilizing PURL for Go. Personally, I don't think there's anything fundamentally wrong with pkg:golang
except that the description is outdated, and I'm sure it can be fixed without making this level of breaking change. Maintaining the separation of namespace and name and putting the entire Go package ID into the PURL name makes PURLs difficult for human users to work with.
PURL-TYPES.rst
Outdated
------ | ||
``go`` for Go modules: | ||
|
||
- The ``namespace`` field is empty and implies the go mod proxy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the field empty or does it imply the go mod proxy? It can't be both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be done now, see the new commit (sorry for that).
PURL-TYPES.rst
Outdated
- The ``name`` will be the full module path. | ||
- The ``subpath`` will represent the package path within a module. | ||
- The ``version`` will be a valid go version or pseudoversion, or empty. | ||
- Additional Build information for binaries can be included as ``qualifiers`` (i.e VCS info, go version info, GoArch/GoOS info etc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The additional information should be explicitly defined here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactlty. be specific in the spec, so we all are on the same page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL in the new commit (sorry for that).
PURL-TYPES.rst
Outdated
``go`` for Go modules: | ||
|
||
- The ``namespace`` field is empty and implies the go mod proxy. | ||
- The ``name`` will be the full module path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably specify that it is case sensitive. pkg:golang
incorrectly states that it is not case sensitive and must be lowercased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. this is what the whole #308 is about.
Please don't repeat the mistakes from the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be done now, see the new commit (sorry for that).
Thanks for pointing at these! This is essentially a combination of #196 and #308 (with the addition of Say you have a module with the path |
I think the better solution to this problem is that However, "a go module cannot be represented by different PURLs" is not generally the case:
|
I'd disagree. In fact, it is non-breaking, as it adds a completely new purl type. Therefore, no breaking changes are introduced. |
It is breaking because no existing PURL software expects |
this is true to every newly proposed PURL Type :-) this PR is trying to add a new type |
The problem is that this is not a new type. The |
it is not? Could you point me to the existing
I wonder how you come to this conclusion. |
From the PR description:
|
which is a behavioural change in a downstream application. This is out of scope of this spec, and not in our hands at all - we have no authority there.
exactly this paragraph makes it clear: this is a non-breaking change. Causing no breaking change is the whole point of introducing a new purl type, instead of modifying an exising one. |
i don't think so. #308 makes this clear: the existing spec has flaws that require breaking changes to fix them The only way to fix
|
Introducing a new type for an existing type is a breaking change to the PURL ecosystem. Implementations that use If you start writing SBOMs that have Keeping You cannot just fix a PURL type by introducing a new type. Even if PURL libraries are updated to support transparently upgrading the old type into the new type on read, any software that is comparing pre-canonicalized PURL strings will need updates.
What are the flaws that require breaking changes? #308 is about the path being incorrectly converted to lowercase, which is much more easily fixed by just not doing that. |
how? If a tool that produced purls would change it's behaviour by using the new purl-type, where they've used the other one before - this would be a breaking change in that very tool.
So?
A PR tells a story, and the effective patch gets updated along with the discussions on a PR. (PS: I review the content of the PR. and at the time of review, I saw no breaking change.
how comes?
the curerent |
PURL-TYPES.rst
Outdated
``go`` for Go modules: | ||
|
||
- The ``namespace`` field is empty and implies the go mod proxy. | ||
- The ``name`` will be the full module path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The ``name`` will be the full module path. | |
- The ``name`` is the full module path. It MUST be unmodified, and follow the `Go Module Reference <https://go.dev/ref/mod#go-mod-file-ident>`_. |
this change would close #308
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- - The ``name`` will be the full module path. + - The ``name`` is the full module path. In case of an URL: protocol MUST be lowercased; host-part MUST be lowercased; path-part MUSTbe unmodified, as it is case-sensitive.this change would close #308
I don't think this is correct.
- I don't think it's legal to include a protocol in the module path. Go makes some HTTPS requests to resolve a VCS URL to download the package from (usually this is delegated to the proxy).
- The host part is also part of the case sensitive module path. It should not be lowercased. Uppercase characters are currently forbidden by Go for modules. I don't think it's worthwhile or really correct for the PURL spec to be specifying how to convert an invalid module path into a valid module path, I don't think it's worthwhile for the PURL spec to be specifying how to validate Go module paths, this doesn't cover all the restrictions, and this may cause problems if Go ever changes the restrictions for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re 1: I see. i was wrong there. Adjusted my suggestion for the protocol.
re 2: the host-part is, per URL-spec case-insensitive, and is normalized to lowercase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as Go is concerned, it's usually a host-part but it has additional restrictions and it is case sensitive: https://go.dev/ref/mod#go-mod-file-ident
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I will modify my change-suggestion accordingly. does it fit better, now?
PURL-TYPES.rst
Outdated
|
||
- The ``namespace`` field is empty and implies the go mod proxy. | ||
- The ``name`` will be the full module path. | ||
- The ``subpath`` will represent the package path within a module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The ``subpath`` will represent the package path within a module. | |
- The ``subpath`` is the unmodified package path within a module. |
PURL-TYPES.rst
Outdated
- The ``namespace`` field is empty and implies the go mod proxy. | ||
- The ``name`` will be the full module path. | ||
- The ``subpath`` will represent the package path within a module. | ||
- The ``version`` will be a valid go version or pseudoversion, or empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The ``version`` will be a valid go version or pseudoversion, or empty. | |
- The ``version`` may be a valid go version or pseudoversion, omitted when empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why may
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because version
is optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be done now, see the new commit (sorry for that).
Adding a new type for a new type is much different than adding a new type for an existing type. An old tool not recognizing a truly new type is expected, but an old tool not recognizing Go PURLs anymore because a tool producing the data says that Changing "MUST be lowercased" to "MUST NOT be lowercased" is a much less impactful change than this. From what I've seen, names with uppercase characters are uncommon, and an outdated implementation that is incorrectly lowercasing is still working correctly for all names that do not contain uppercase characters to lowercase. I would even say that on a larger scale it is not a breaking change because:
In both cases, the PURL is still parsed successfully and the meaning of the PURL is unchanged with respect to the current "MUST be lowercased" spec. The only differences would be that the canonical form changes¹ and a new consumer receiving a PURL from an old producer might be more likely to expect that the ID refers to the correct package, but since there is no good way for an outdated consumer to recover the correct ID after an outdated producer lowercases it, any consumer that relies on getting the correct ID (eg to resolve the package files) is likely already broken and not lowercasing the name can only improve the behavior in that situation. This causes the same alignment problems as introducing a ¹ Due to underspecification in the text and tests, I wouldn't trust incoming PURLs to be in the canonical form as my implementation understands it. There are numerous minor differences in which characters are escaped when (and sometimes how), so if you're accepting PURLs from an external source, even if you don't expect user-entered, non-canonical PURLs in that source, you should be canonicalizing those PURLs yourself if your application depends on them all being canonical for the same definition of canonical. |
Go isn't the only ecosystem that has this problem of incorrect name normalization rules in this repo. I'm also aware of:
|
If this is indeed true, then there is something really wrong with PURL: it does not allow for evolution. On the one hand, we cannot add modifications to the existing specification that could introduce breaking changes. On the other hand, we cannot introduce a new type because somehow that is a breaking change as well. So one is pretty much stuck with slight variations of the initial spec. Specs should be allowed to evolve just the way the software does. There should really be a way to add versioning on top of PURL itself. What is being proposed here might in essence be just that for the go spec. |
@maceonthompson Thanks for putting this together! this makes a lot sense, and we have an issue with Go alright. Let me look at the comments in details and come back with my 2 cents! |
@matt-phylum re:
I am not sure that's hte case, but a new type vs. updating the existing type demands some careful thinking :) |
PURL-TYPES.rst
Outdated
|
||
pkg:go/google.golang.org%2Fgenproto#googleapis/api/annotations | ||
pkg:go/github.com%2Fjmorion%[email protected]#api | ||
pkg:go/golang.org%2Fx%2Fvuln?goversion=1.23.2&vcs=git&vcs_modified=true#cmd/govulncheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a likely problem with the use of subpath: there is no way to determine where the module ends and the package starts in the general case, is there?
For instance, in the path google.golang.org/genproto/googleapis/api/annotations
how can I determine safely that google.golang.org/genproto
is a module and that googleapis/api/annotations
is a package inside this module? I need either a go proxy lookup or a full filesystem to locate a go.mod/go.sum file, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a way, if the module's code is available to you, to determine from a package import where module path ends and the package path begins by making HTTP requests.
I think the use of the subpath here is good because it puts the burden of determining this on whatever generates the PURL, which is likely aware of Go and either has the module paths or is most likely to be able to find the module path from the full package path. Then if you want to use a tool that checks PURLs against a database of information about modules (eg vulnerabilities), the tool already has all the information it needs. Otherwise, either the tool would need to make external API calls to figure out the module path of the PURL or the database would need to have an entry for every package in the module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to add to @matt-phylum's comment. If a tool is producing a PURL for a Go artifact, then it can use go version
, Debug.BuildInfo, or packages.Load to get information about the package and its corresponding module. The encoding proposed here then makes it clear what the modules and packages are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this is true for the general use case of PURLs. E.g. we do static analysis of binaries and while we can get information about linked packages, there's no indication of which part of the paths correspond to modules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, if you are looking at a Go symbol from the symbol table, you can get its package. You can get the module correctly by prefix-matching it with module information from debug.Buildinfo
of the binary, unless there are several modules that are prefixes of the package. My inclination is that it should not affect what is proposed here. (Arguably, there should be a way to get module information for a symbol in the binary, just the way one can do it for the source analysis.)
BTW, an elephant in the room is whether the distinction between a namespace and name makes sense not only here, but also in the whole spec, globally. I found myself using a variable with a "namespace/name" substring more often than not.
With this the whole It could have a minimal impact on the spec. |
PURL-TYPES.rst
Outdated
|
||
pkg:go/google.golang.org%2Fgenproto#googleapis/api/annotations | ||
pkg:go/github.com%2Fjmorion%[email protected]#api | ||
pkg:go/golang.org%2Fx%2Fvuln?goversion=1.23.2&vcs=git&vcs_modified=true#cmd/govulncheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the plan to include all the buildinfo structure as qualifiers?
If so, this would only apply in a built binary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point.
If so, all the qualifiers MUST be documented in the type-spec.
currently it reads:
Additional Build information for binaries can be included as
qualifiers
(i.e VCS info, go version info, GoArch/GoOS info etc)
I am afraid this documentation is insufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will expand on this.
PURL-TYPES.rst
Outdated
pkg:go/google.golang.org%2Fgenproto#googleapis/api/annotations | ||
pkg:go/github.com%2Fjmorion%[email protected]#api | ||
pkg:go/golang.org%2Fx%2Fvuln?goversion=1.23.2&vcs=git&vcs_modified=true#cmd/govulncheck | ||
pkg:go/golang.org%2Fx%[email protected]?goversion=1.23.2#cmd/govulncheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the Go module versions always to be prefixed with a v
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A version identifies an immutable snapshot of a module, which may be either a release or a pre-release. Each version starts with the letter v, followed by a semantic version.
-- https://go.dev/ref/mod#versions
version
could also be a pseudo-version -- a git-tag, a git-commit-hash, or something like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A pseudoversion is a special kind of version that also starts with a v: https://go.dev/doc/modules/version-numbers#pseudo-version-number
I think for Go modules, including when using the Go module system to refer to something that predates modules, the version always starts with a v. In which case, versions that don't start with v would only be used with older tools like Dep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a version exists, it should be a valid Go module version. It should start with a v
.
Note that hashes should not be permitted, they are not a valid Go version (resolution of hash commits in go tooling is a convenience feature).
@matt-phylum you wrote
Thanks for the links! I tend to think along the same lines, and we can likely salvage the golang type.
I need to pounder this. See my other comment wrt. the namespace/name above in #338 (comment) |
It is fine that PURL spec allows for more flexibility, but there should be only one way the Go module and package information is encoded. This simplifies the work for clients. It is easy to drop qualifiers from a PURL. It is annoying to generate multiple module+package encodings to see if the incoming PURL applies to your code. In general, this proposal tries to make it simple and clear to generate and accurately check against PURLs. It might not be the most user-friendly solution, but tools that render PURLs can easily prettify the output. We believe this is worth the sacrifice. |
Could it also be clarified how standard library packages are to be represented? Go has special handling for these, and the 'module' is never explicitly required when using them. But the module does exist for Go uses but the exact module name would make more sense we believe. |
OSV is already using |
I'm not sure I follow how it would make it easier for tools? |
Are they modules? They have go.mod files in their source code, but they aren't included in your go.mod and when you import their packages the compiler recognizes that you're trying to use the standard library and provides its own copy of the code from |
The I think we can safely use |
I believe I get the argument, and I will defer to your judgement as the more knowledgeable about Go concepts and internals. However, I do still feel that it would make sense for PURLs to use the "modules". PURLs are primarily unique identifiers so does it really matter what Go tooling does and how the internals work?
But I'm guessing that would cause issues, and it also doesn't follow the current convention for PURLs. As I said, I'll defer to better judgements, just giving a mostly uneducated opinion. :) |
Hi folks. I am also working on this proposal and will try to drive the discussion here going forward. (Sorry for the extra commit added.) To keep the conversation going, I wanted to summarize the discussion so far: 1) what this proposal brings to the table and 2) major concerns as well as how those concerns might be addressed. (If it helps, we can add and cross things in this post to keep track of the progress.) Proposal main pointsAdding a new PURL type
Concerns
|
I work on software that deals with Go PURLs and I can assure you that this is a breaking change. If users start submitting
If #308 is fixed for Less than half of all public PURL implementations I'm aware of consistently lowercase as specified, and half of all public PURL implementations have already fixed #308 by allowing uppercase characters, and it hasn't been the breaking change it's being made out to be when talking about starting over with a new
anchore/packageurl-go supports producing |
Introducing a new improved PURL type for go, is a non-breaking change.
So do I, and I can assure you, for me it is not a breaking-change of the PUTL-type
What's the matter? this update would be a non-breaking change, right? it's just a feature.
Finally, @matt-phylum ! So now we come to the actual issue: your software would introduce breaking changes. It is not - like you claimed before - that the PURL-type spec introduced any breaking changes. Well, breaking changes are common in software. If you don't want to deal with them, then your software can simply stay with the old/faulty #338 on the other hand actually introduces breaking changes. PURL generators/ingestors MUST lowercase
that is exactly the fault behavior described in the current faulty state of the PURL-type spec for
Nope. that is exactly implied by the current state faulty of the PURL-type spec for |
This simply not true. Even if I do nothing, there is still a compatibility problem if any other software being used in conjunction starts using All software, not just mine, must be updated to accept It's likely that even some new software will forever be implementing support for reading both
I don't think you understand. The behavior of anchore/packageurl-go is unique, so if it's the only one doing it correctly that means all others are doing it incorrectly. If you provide it a module path
Those generators/ingestors must not have done so in the past either. Given the current state of the spec, PURLs cannot refer to packages that contain uppercase characters. If you ask one of the seven spec compliant or partially spec compliant libraries to produce a PURL for a package with uppercase characters, it produces a PURL for a different package which probably does not exist, but could. At least |
I'd consider the gain of an capability to understand/ingest
What does support even mean? Ingesting it -- gaining this capability is not a breaking change. Producing it -- it depends, see above. Usual software lifecycle, not a problem.
Adding new capabilities - this is a non-breaking change. No need to translate, but at least understand it. But all of this is to be considered software lifecycle downstream, not related to the PURL type spec here. adding a new PURL type to the spec is not a breaking change. to all your other points: The PURL type spec for It is a fact, that changes to existing PURL type spec for Therefore, we intend to create a new PURL type. Creating new purl types never has been a breaking change, and is still no such breaking change, when if the scope is the same as an existing one. Period. Regarding your point with other non-related PURL types: Please adhere to the scope of this PR. We are talking about go, with actual problem, not about the issues others might have. |
Yes. If you only add to software that reads PURLs support for reading However
Managing it through a feature flag could work, assuming all the implementations producing
This seems irresponsible. Somebody reading the PURL spec without having read through all the issues is going be confused as to why there are both Implementations of the new spec using
Adding a new capability is not a breaking change, but having to add a "new capability" to understand Translating
I still disagree with this. Yes, changing a MUST¹ to a MUST NOT requires a change to existing implementations of ¹ The current spec says "must be lowercased" with "must" in lowercase, with no explanation of when or why it should be lowercased. Failure to do so when outputting a PURL makes the PURL non-canonical, not invalid, which doesn't seem like a "MUST" to me. This problem is not unique to A terrible third option would be to say that the current spec has always been correct and the name really must be lowercased, but the spec author just forgot to mention that the module path is supposed to have already had all the uppercase characters escaped using
What non-related PURL types? I only mentioned related PURL types. If there are at least 6 other types which likewise have critical errors in name normalization rules, deciding that to fix a name normalization rule is a breaking change and requires introduction of a new package type and uncontrolled deprecation and replacement of the existing package type means the same thing will be done at least 6 more times creating There is precedence for making this kind of "breaking" change to an existing type in #220 which changed an implied "MUST NOT be lowercased" into an explicit "must be lowercased." |
Here is my understanding of your concerns. Introducing It seems that this can happen since we expect that tools will support only one PURL type when ingesting things. In other words, in order for this proposal to not break the ecosystem in general, tools will have to be able to ingest both Go PURL types. This might be too much to ask from them. It is much simpler when a completely new PURL type is introduced, say, for a new language. Is this the crux of your concerns? |
Yes. If an end user is using a tool that outputs How exactly the problems will manifest will depend on where the problem occurs and how the tool where the problem first occurs is implemented. I'm expecting in most cases either the For example, if you are using a simple vulnerability scanner that just sees PURLs and sends them to the OSV API,
|
So the concern can be boiled down to the question: |
New types are to be expected, but new types for packages that were already supported under a different type cause regressions when existing software receives the new type instead of the old type. If it were just a naming and documentation problem, the naming and documentation provided in this PR is not sufficient for me because it has two ways of expressing Go dependencies and there is no explanation of why or which should be used or that you may need to implement support for both. Considering there are other types with the same package name normalization problem but not the same problem of using an unusual package type name like "golang", it might be better to go with the uglier "pkg:golang2" which implies a relation between the two types, although that could be confusing if Go ever releases a Go version 2, which is already something that has happened for cases like "pkg:npm2". |
a010ff5
to
fc7f73c
Compare
I think this is where a lot of confusion is coming from. This proposal is not suggesting that clients should stop ingesting
I have added some very minimal documentation. We can iterate over it. |
The current PURL specification for Go was created before Go 1.11 modules and thus has namespace inconsistencies and lacks semantic versioning.
Although in many cases a module path corresponds directly to the URL of the hosting repository, that is not always true. The URL formed from the module path may be an endpoint that serves a redirect to the true host. This indirection protects projects that for whatever reason must change their hosting provider: their module names will continue to work. Consequently, it is undesirable to encode any aspect of the underlying hosting system as part of the PURL.
In essence, all Go modules form a single namespace. Since it is used by the majority of Go programmers, we propose to represent this namespace by the empty string. Though not included in this commit, other namespaces could be possible and would represent package managers and/or build tools that are alternatives to the go command.
The
go
type proposed here fixes the current issues by removing the namespace, using valid Go module versions (including pseudoversions), and adds some extra functionality to encode optional information about specific builds (GOOS, GOARCH, etc).If accepted, all tools maintained by the Go project (such as govulncheck and pkg.go.dev) that surface PURLs will use this new type to provide canonical PURLs for Go modules and packages