-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support direct HTTP retrieval from /https providers #72
Comments
Based on ipfs/helia#439 (comment), |
once ipfs/helia#483 is released, I believe we will get this |
I would be afraid to ship this under shipyard-owned domain like inbrowser.dev without having some story for applying https://badbits.dwebops.pub. I've added task "BLOCKER for deployment to inbrowser.dev" to the list above. @SgtPooki @hacdias @aschmahmann we need to decide what to do here, before we enable direct downloads. Options I see:
In both cases, we could enable filtering by default only when Service Worker is deployed from For someguy, we could have query parameter, HTTP header, or dedicated domain. Prior art: DNS services use dedicated domain/ip: https://quad9.net/service/service-addresses-and-features My vote would be to wire up denylist support into someguy so it applies it only on specific domains (based on Thoughts? |
I feel like both A and B need to be done eventually. With that said, we discussed why we shouldn't enable badbits by default: We don't know the contents, and some countries may not consider certain bits bad. i.e. there's no way for us to tell what is truly, unobjectively, bad content. Still, since we're developing in open source, the easiest, safest thing to do is to implement filtering by default and allow folks to fork and un-filter where they need to. I think a simple blockstore wrapper that double-hashes and checks for badbits would be fairly easy to do in JS. It shouldn't take long, but I think configuring the update mechanisms for ensuring the JS badbits lib is always up to date without significant slowdown on clients is the challenging part.
These two things seem like something Kubo/boxo may already have resolved, and it may be faster or easier to do similar in someguy |
Denylist from https://badbits.dwebops.pub/badbits.deny is ~15 MiB (gzipped). If we go the JS blockstore route, no matter how it is fetched (remotely, or embedded in same DAG as SW), ~16 MiB penalty for initial page load is tough. Perhaps instead of moderating routing responses, we could have a dedicated delegated denylist endpoint that could be queried? |
I wish we could get out of the business of mandating blocklists here, rather than letting them be user controlled (whether opt-in/out). Maybe we're stuck with this reality for now, but it'd be useful to explore with some of the legal + open internet folks (IPFS Foundation should have some contacts) how much this is needed. Ideally it'd be possible for someone to deploy a public resource without having a specific denylist hardcoded given that different legal jurisdictions and individuals feel different things should be blocked. Perhaps there's an expendable domain name we can use here if we're concerned about say being legally all clear, but running into issues with the technical middlemen of the web (i.e. curators of other resource blocking lists that don't understand or would disagree with our position).
If going this route, we could limit the blocklist to be closer to the HTTP/request layer (i.e. not caring about blocked blocks/subdags) and consider something like https://security.googleblog.com/2022/08/how-hash-based-safe-browsing-works-in.html. A few notes:
|
I think this approach sounds good. One thing we may run into is someone creating a static site of all the badbits and then our Would that cause us to get blocked? if not, let's do it. |
I believe that direct HTTP retrieval is now supported in Helia and @helia/verified-fetch with the sessions work. The main thing left here is to add badbits blocking support. Based on the comments above, it seems that the blocking could be implemented in (either or both):
@lidel @aschmahmann Any thoughts on this? |
@2color I agree, been marinating on this for a while and I think we can enable it on I think we could have 3 stages, where (1) can be done TODAY, and is not controversial. (2) and (3) could be discussed / tackled later (TBD order/priorities) Step 1: badbits on
|
That sounds like a good plan @lidel. Regarding Step 2, I suppose if we add the denylist functionality in Someguy, we could automatically apply it for content routing requests to avoid an additional roundtrip to check the denylist. Any thoughts on that? |
@2color we could do it, but not sure if we want to do that by default. We had false-positives in the past, there should be a way of disabling denylist. In #72 (comment) I had idea of having denylist applied only on requests to specific domain, allowing users to switch between delegated routing with/without denylist applied, if it ever causes trouble. ps. "Step 1" is done: https://github.com/ipshipyard/waterworks-infra/pull/160 landed and https://bafybeib536upvgn7bflest7hqjvz4247ltqxn4hvjnjgatenmmufdak6aa.ipfs.inbrowser.dev returns 410 |
Example:
They have
/https
providers such as/dns4/dag.w3s.link/tcp/443/https
.TODO
dag.w3s.link
supports [Block Responses (application/vnd.ipld.raw)] https://specs.ipfs.tech/http-gateways/trustless-gateway/#block-responses-application-vnd-ipld-rawThe text was updated successfully, but these errors were encountered: