Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Bring your own network #1472

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

johannesfrey
Copy link
Contributor

@johannesfrey johannesfrey commented Aug 31, 2024

What this PR does / why we need it:
This PR makes it possible to "adopt" a pre-existing network by passing its ID to hetznerCluster.spec.hcloudNetwork.id instead of the network being created during cluster creation. Furthermore, during cluster deletion it only deletes the attached network if it does not have the owned label attached to it (currently the only way here to discriminate between a CAPH-managed network and an unmanaged one).

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #762

Special notes for your reviewer:
This has been lingering around for a while untouched on my fork and I decided to rebase it onto the current main branch. Please consider this as a first attempt to approach this topic as a whole. I also tried to already add some unit tests. I guess it also might require some e2e tests!? No idea if this is the desired way to do this and about other side-effects I did not see. So looking forward for feedback or any pointers. And also feel free to push changes to the PR, as I'll be pretty occupied with other things almost the whole September. Just wanted to push this out there already for you to take a look at 🙂

The most controversial changes so far:

  • Making hcloudNetwork.id mutually exclusive with cidrBlock, subnetCidrBlock and networkZone
  • Replacing kubebuilder defaulting/validation special tags with custom defaulting/validation, which makes the webhook more complex
  • Changing cidrBlock, subnetCidrBlock and networkZone to be pointers (I guess this could also be done with empty strings, but pointers make it possible to be not shown at all, when not provided)
  • Labels are shown in the NetworkStatus in order to check if it's managed or unmanaged

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squash commits
  • include documentation
  • add unit tests

Copy link
Contributor

@janiskemper janiskemper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this approach seems good to me. Thanks a lot for this contribution!

@guettli @batistein what's your opinion?

api/v1beta1/hetznercluster_webhook.go Show resolved Hide resolved
api/v1beta1/hetznercluster_webhook.go Outdated Show resolved Hide resolved
Copy link
Contributor

@janiskemper janiskemper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot again for this PR @johannesfrey. I think we can merge it if you follow the suggestions I gave. It's really good work!

controllers/hetznercluster_controller.go Outdated Show resolved Hide resolved
pkg/services/hcloud/network/network.go Outdated Show resolved Hide resolved
api/v1beta1/types.go Outdated Show resolved Hide resolved
@johannesfrey johannesfrey force-pushed the bring-your-own-network branch from 3f6b8f6 to b27b859 Compare November 13, 2024 17:07
@johannesfrey johannesfrey marked this pull request as ready for review November 13, 2024 17:08
@johannesfrey
Copy link
Contributor Author

johannesfrey commented Nov 13, 2024

Thanks a lot again for this PR @johannesfrey. I think we can merge it if you follow the suggestions I gave. It's really good work!

Sorry for the long delay 🙏 . Thx for the reviews! I hope I addressed your suggestions correctly. PTAL. Thx!

@syself-bot syself-bot bot added area/test Changes made in the test directory area/code Changes made in the code directory area/api Changes made in the api directory size/L Denotes a PR that changes 200-800 lines, ignoring generated files. labels Nov 13, 2024
Copy link
Contributor

@janiskemper janiskemper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @johannesfrey ! I went another time over the details and found a few things.

api/v1beta1/hetznercluster_webhook.go Outdated Show resolved Hide resolved
api/v1beta1/hetznercluster_webhook.go Outdated Show resolved Hide resolved
api/v1beta1/hetznercluster_webhook.go Outdated Show resolved Hide resolved
api/v1beta1/hetznercluster_webhook.go Show resolved Hide resolved
api/v1beta1/hetznercluster_webhook.go Outdated Show resolved Hide resolved
api/v1beta1/hetznercluster_webhook.go Show resolved Hide resolved
api/v1beta1/hetznercluster_webhook.go Outdated Show resolved Hide resolved
api/v1beta1/types.go Outdated Show resolved Hide resolved
pkg/services/hcloud/network/network.go Outdated Show resolved Hide resolved
@syself-bot syself-bot bot added size/XL Denotes a PR that changes 800-2000 lines, ignoring generated files. and removed size/L Denotes a PR that changes 200-800 lines, ignoring generated files. labels Nov 24, 2024
@johannesfrey johannesfrey force-pushed the bring-your-own-network branch from bdf6ca4 to 4806de2 Compare November 24, 2024 19:03
@janiskemper
Copy link
Contributor

thanks @johannesfrey! Do you think anything is missing right now? If not, I'd propose the following path:

  • I will do a final review
  • You squash and rebase
  • Someone from our team will again do a functionality test (which I haven't done at all) that the actual behavior is as expected

@johannesfrey
Copy link
Contributor Author

johannesfrey commented Nov 26, 2024

thanks @johannesfrey! Do you think anything is missing right now? If not, I'd propose the following path:

* I will do a final review

* You squash and rebase

* Someone from our team will again do a functionality test (which I haven't done at all) that the actual behavior is as expected

That sounds awesome. Thx!
One thing that would be great to take a look at is my (probably too naive) way of testing the feature in controllers/hetznercluster_controller_test.go. There seems to be a data race, especially in the first test, where the cluster should attach to the previously created network. Most of the time it succeeds but there are cases where the reconciler errors that the requested network cannot be found while passing in the id. I'm struggling to see why this happens because the line of execution should look like this (ginkgo running serially):
create network with fake client -> check that there is no error -> add the id to the hetznercluster spec -> trigger a create of the hetznercluster -> wait until the hetznercluster ist ready and has the correct network condition and status

The reconciler should then use its internal client (which should be the shared fake one from before) to find the network. But it cannot find it, so there must be something in the line that deleted it or there is some race when using the fake client and probably the usage of the mutexes in there?! I tried some variations of changing the locks in there, but to no avail. So wasn't able to really deflake the test. So, would be really cool if you could take another look there. And if the test makes more harm than that it's helping, we could also think of removing/chaning it. WDYT?

@janiskemper
Copy link
Contributor

mmh that's an important observation, thanks @johannesfrey. We will have a look. I'm not able to see anything in the code right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Changes made in the api directory area/code Changes made in the code directory area/test Changes made in the test directory size/XL Denotes a PR that changes 800-2000 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make it possible to use a pre-created private network
3 participants