Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing extensions from vnd.comicbook+zip and vnd.comicbook-rar registered with IANA #321

Open
rluetzner opened this issue Apr 24, 2024 · 2 comments

Comments

@rluetzner
Copy link

rluetzner commented Apr 24, 2024

The two MIME types have clearly defined file extensions.

https://www.iana.org/assignments/media-types/application/vnd.comicbook+zip
https://www.iana.org/assignments/media-types/application/vnd.comicbook-rar

However, I compared this with a few entries that do have extensions listed in src/iana-types.json and as opposed to the ones I looked at, these two MIME type definitions have their file extensions in a numbered list, e.g.

Additional information:

1. Deprecated alias names for this type: application/x-cbr
2. Magic number(s): none
3. File extension(s): .cbz
4. Macintosh file type code: N/A
5. Object Identifiers: N/A

(excerpt form vnd.comicbook+zip). I guess the parsing logic needs to be adjusted to match these, but I'm not good enough with JS to do that myself.

@rluetzner
Copy link
Author

Regexes make my brain hurt. However, I've figured out at least a few things.

  1. The layout from my summary above could be parsed with an older regex /^\s*(?:\d\.\s+)?File extension(?:\(s\)|s|)\s?:\s+(?:\*\.|\.|)([0-9a-z_-]+)\s*(?:\(|$)/im, which was replaced in commit be9ca41 sometime in 2018.
  2. Going by the new variable name and what I've seen, the old regex was not able to parse file extensions with quotes.
  3. The new regex to handle quotes does not work for file extensions that have no quotes, e.g. https://www.iana.org/assignments/media-types/application/atom+xml .
  4. Both regexes fall down when multiple file extensions are given that are separated by a comma.
  5. I have no idea how something like https://www.iana.org/assignments/media-types/application/mp4 is parsed, because the two file extensions are given in prose text.

I've played around a bit with a regex tester and was able to fix some of these things. Making the quotes optional in particular is quite easy. But I'm very uncertain as to how this will affect a full rebuild. There doesn't seem to be a clear scheme to the IANA MIME type declarations, so I don't think there's a way to handle all cases anyway.

@rluetzner
Copy link
Author

For what it's worth, here's the regex I came up with that works with and without quoted file extensions:

/^\s*(?:\d\.\s+)?File extension(?:\(s\)|s|)\s?:[\s]*['"]?(?:\.?([0-9a-z_-]+))['"]?$/im

I used https://regex101.com/ to test things and copied the declaration for atom+xml and modified it manually.

This does not work properly with multiple comma separated file extensions, but none of the other regexes do, so I'd count it as an improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants