-
-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSH connectivity issues #498
Comments
Thanks for sharing your experience! Just to clear things up about the firewall: by default, when there's no firewall attached to a server, all ports are open. So, as long as you don't have an existing firewall blocking the selected SSH port, everything should be fine. You're right about the SSH keys, though. That said, I create clusters regularly and haven't run into any issues with SSH connectivity yet due to keys or else. I'll check if it's possible to use the SSH shard for Crystal to verify that the key used by the agent matches the one in your config. I'll also see if it can detect whether the key is protected by a passphrase. Regarding the firewall, you shouldn't set up an additional one in your Hetzner project. Ideally, the project should be solely dedicated to the cluster managed by hetzner-k3s. I'll make sure this information is clearer in the docs. |
I think I have the same problem with a simple example. After the master node has beend created I am able to connect with an ssh-client without a problem with my private key. hetzner_token: <<API-KEY>>
cluster_name: test
kubeconfig_path: "./kubeconfig"
k3s_version: v1.30.8+k3s1
networking:
ssh:
port: 22
use_agent: true # set to true if your key has a passphrase
public_key_path: "./Documents/ssh/gl-new/gl-root.pub"
private_key_path: "./Documents/ssh/gl-new/gl-root"
allowed_networks:
ssh:
- 0.0.0.0/0
api: # this will firewall port 6443 on the nodes
- 0.0.0.0/0
public_network:
ipv4: true
ipv6: true
private_network:
enabled: true
subnet: 10.0.0.0/16
existing_network_name: ""
cni:
enabled: true
encryption: false
mode: flannel
# cluster_cidr: 10.244.0.0/16 # optional: a custom IPv4/IPv6 network CIDR to use for pod IPs
# service_cidr: 10.43.0.0/16 # optional: a custom IPv4/IPv6 network CIDR to use for service IPs. Warning, if you change this, you should also change cluster_dns!
# cluster_dns: 10.43.0.10 # optional: IPv4 Cluster IP for coredns service. Needs to be an address from the service_cidr range
manifests:
cloud_controller_manager_manifest_url: "https://github.com/hetznercloud/hcloud-cloud-controller-manager/releases/download/v1.21.0/ccm-networks.yaml"
csi_driver_manifest_url: "https://raw.githubusercontent.com/hetznercloud/csi-driver/v2.11.0/deploy/kubernetes/hcloud-csi.yml"
# system_upgrade_controller_deployment_manifest_url: "https://github.com/rancher/system-upgrade-controller/releases/download/v0.13.4/system-upgrade-controller.yaml"
# system_upgrade_controller_crd_manifest_url: "https://github.com/rancher/system-upgrade-controller/releases/download/v0.13.4/crd.yaml"
# cluster_autoscaler_manifest_url: "https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/hetzner/examples/cluster-autoscaler-run-on-master.yaml"
datastore:
mode: etcd # etcd (default) or external
#external_datastore_endpoint: postgres://....
schedule_workloads_on_masters: false
image: debian-12 # optional: default is ubuntu-24.04
# autoscaling_image: 103908130 # optional, defaults to the `image` setting
# snapshot_os: microos # optional: specified the os type when using a custom snapshot
masters_pool:
instance_type: cx22
instance_count: 1
location: nbg1
worker_node_pools:
- name: test-node-pool
instance_type: cx22
instance_count: 3
location: nbg1
# image: debian-11
# labels:
# - key: purpose
# value: blah
# taints:
# - key: something
# value: value1:NoSchedule
# - name: medium-autoscaled
# instance_type: cpx31
# instance_count: 2
# location: nbg1
# autoscaling:
# enabled: true
# min_instances: 0
# max_instances: 3
embedded_registry_mirror:
enabled: false # Check if your k3s version is compatible before enabling this option. You can find more information at https://docs.k3s.io/installation/registry-mirror
additional_packages:
- htop
post_create_commands:
- apt update
- apt upgrade -y
- apt autoremove -y Log output:mh@ione56 hetzner-k8s % hetzner-k3s create --config ./cluster-config.yml
[Configuration] Validating configuration...
[Configuration] ...configuration seems valid.
[Private Network] Creating private network...
[Private Network] ...private network created
[SSH key] Creating SSH key...
[SSH key] ...SSH key created
[Placement groups] Creating placement group test-masters...
[Placement groups] ...placement group test-masters created
[Placement groups] Creating placement group test-test-node-pool-2...
[Placement groups] ...placement group test-test-node-pool-2 created
[Instance test-master1] Creating instance test-master1 (attempt 1)...
[Instance test-master1] Instance status: starting
[Instance test-master1] Powering on instance (attempt 1)
[Instance test-master1] Waiting for instance to be powered on...
[Instance test-master1] Instance status: running
[Instance test-master1] Waiting for successful ssh connectivity with instance test-master1...
[Instance test-master1] Instance test-master1 already exists, skipping create
[Instance test-master1] Instance status: running
[Instance test-master1] Waiting for successful ssh connectivity with instance test-master1...
[Instance test-master1] Instance test-master1 already exists, skipping create
[Instance test-master1] Instance status: running
[Instance test-master1] Waiting for successful ssh connectivity with instance test-master1...
Error creating instance: timeout after 00:01:00
Instance creation for test-master1 failed. Try rerunning the create command. Let me know if you need more informations or more logs. EnvironmentI installed it with brew on macos Sonoma (14.6.1) upgraded to Sequoia (15.2) still the same problem. Tested with: Also with different mh@ione56 hetzner-k8s % hetzner-k3s --version
2.0.9 Is there something wrong in my config? thanks in advance |
@MarcelHaldimann do you have a passphrase on your key? |
@vitobotta |
Did you add the SSH key to Keychain? |
Wow I am an idiot. Now it works like a charm! Thank you! ssh-add --apple-use-keychain gl-root "~/Documents/ssh/gl-new/gl-root" I thought I would be asked for the password. Is there more documentation than in this example? Thanks for the work and the support! |
Good point, looks like I forgot to add a mention about this for macOS. Would you mind making a small PR for this? :) |
Hi! I'm having the same problem. Using an ssh key with no passphrase. My main difference is that I'm running it from a pipeline. It's a dockerfile I created previously. The first time I ran the same cluster config on a pipeline it worked perfectly without any problems. But this time I get this ssh error Any idea what cloud be happening? |
Hi @FernandoJCa It's difficult to give you suggestion without knowing more on the setup. What kind of pipeline and how do you supply the key for use with hetzner-k3s? |
Hello @vitobotta It's a Gitlab CI pipeline. ---
stages:
- deploy
- delete
default:
tags:
- gitlab-org
before_script:
- apk update && apk upgrade && apk add openssh-client
- eval $(ssh-agent -s)
- chmod 400 "$SSH_PRIVATE_KEY"
- ssh-add "$SSH_PRIVATE_KEY"
- mkdir -p ~/.ssh
- chmod 700 ~/.ssh
- mv $SSH_PRIVATE_KEY ~/.ssh/id_ed25519
- mv $SSH_PUBLIC_KEY ~/.ssh/id_ed25519.pub
- ls ~/.ssh/
deploy-cluster:
image: registry.gitlab.com/snoopy-group/hetzner-cli-docker:v1.0.1
stage: deploy
script:
- hetzner-k3s create --config cluster_config.yaml | tee create.log
artifacts:
paths:
- kubeconfig.yaml
expire_in: 2 h
delete-cluster:
image: registry.gitlab.com/snoopy-group/hetzner-cli-docker:v1.0.1
stage: delete
script:
- hetzner-k3s delete --config cluster_config.yaml | tee delete.log
when: manual The docker image is a custom one I created using the docker image from your repo as a base. Just install the dependencies and the hetzner-k3s cli. The SSH key is provided via CI/CD variables. As you can see in the It worked once on one of my tests and now for some reason it does not work anymore. It's nothing critical since I deployed the cluster using my host, but I'm still curious what the problem might be. If you need more information let me know! |
You are running the tool via docker but you are not mounting the SSH keypair in the container, right? |
GitLab is running all of the commands on the container, so the SSH key pair will be mounted. I already checked that just in case. |
I'll try to add some additional debugging info to the SSH connection. I'll see if I can make a temp release in a bit. |
Let me know, so I can do a quick test! |
I have added some debugging info for when it fails to open an SSH session. Try the build when it's ready https://github.com/vitobotta/hetzner-k3s/actions/runs/12969090467 |
New test with the new build, no luck. This is the cluster config I'm using ---
cluster_name: snoopy-cluster
kubeconfig_path: "./kubeconfig.yaml"
k3s_version: v1.32.0+k3s1
networking:
ssh:
port: 22
use_agent: false
public_key_path: "~/.ssh/id_ed25519.pub"
private_key_path: "~/.ssh/id_ed25519"
allowed_networks:
ssh:
- 0.0.0.0/0
api:
- 0.0.0.0/0
public_network:
ipv4: true
ipv6: true
private_network:
enabled: true
subnet: 10.0.0.0/16
existing_network_name: ""
cni:
enabled: true
encryption: false
mode: flannel
schedule_workloads_on_masters: false
masters_pool:
instance_type: cx22
instance_count: 1
location: hel1
worker_node_pools:
- name: snoopy-house
instance_type: cx32
instance_count: 2
location: hel1 And here is the pipeline log: Should I try with a different cluster config o ssh key? Well, I can login with ssh with that ssh keypair in my host without any issue |
Did you try with |
Did a few more tries, here are the results:
|
Since I am not familiar with Gitlab I am confused by this: deploy-cluster:
image: registry.gitlab.com/snoopy-group/hetzner-cli-docker:v1.0.1
stage: deploy
script:
- hetzner-k3s create --config cluster_config.yaml | tee create.log
artifacts:
paths:
- kubeconfig.yaml
expire_in: 2 h This is running a new container inside the main container I guess or something like that. Where is it mounting the SSH keypair at ~./.ssh inside the hetzner-k3s container? |
I read that Gitlab stores files in Secure Files and they are accessible at the $CI_SECURE_FILES_DIR location. How are you handling this? |
The mounting part is in the Gitlab use three ways to send commands to the container To be 100% that the SSH keypair is being added to the container I did a cat on both public and private just before the So they are on the container. And can be accessed through the CLI, as far as I understand. And just to be clear, the gitlab runners use my docker container to execute all the commands. As for files, it treats them like a normal Linux ENV (https://docs.gitlab.com/ee/ci/variables/#use-file-type-cicd-variables) |
Since you can otherwise SSH into the hosts, can you monitor /var/log/auth.log while hetzner-k3s is running to see if you can find useful info there? |
I am gonna try to get rid of the ssh2 library and just use the SSH binary via shell command. This library has been causing several headaches already. Bear with me. I will make a new build soon. |
Well, found something funny. A couple of Asian IP's trying to reach my master node, but nothing else. I'm doing more research/trying stuff to see any more logs |
I'm almost done removing that library. |
Take your time, there's no rush! |
Hopefully this will work better https://github.com/vitobotta/hetzner-k3s/actions/runs/12969882972 Try the new build when ready. I have removed the library I mentioned and now I am just using regular ssh via shell. Should no longer run into weird issues with keys. |
Looks like one job failed |
Sorry, this one https://github.com/vitobotta/hetzner-k3s/actions/runs/12969906432 I forgot to remove one reference. |
Still failing 😅 |
I had to remove something else https://github.com/vitobotta/hetzner-k3s/actions/runs/12969938981 |
Did you try? |
hetzner-k3s is now using plain ssh binary via shell, not anymore a library which was a bit problematic in some cases. If this doesn't work then it must be something with the environment or settings. |
Could be, I will try to polish a bit the Dockerfile and see if I can find a solution. I'll let you know! Thank you so much for all your help! |
Np. |
Hi @vitobotta Came back with new info. I was trying to run the create command on my host and found something interesting in auth.log while hetzner-k3s was running. Did a test with I did the following tests:
Same thing happens with v2.1.1.rc6 Trying with a brand new config, same result. I still can login normally with ssh from my host but is not able to login with hetzner-k3s. |
@FernandoJCa I'm not sure what else we can try. I've never had issues with SSH connections with hetzner-k3s, though a few people have mentioned problems here and there. I thought it might be due to the library I was using, but now hetzner-k3s just uses the hetzner-k3s uses these ssh options: Can you try SSHing into the nodes manually using the same options? I doubt it will make a difference, but it's worth a shot. Also, can you remind me which OS you're using on the host? I use macOS but develop hetzner-k3s in an Alpine dev container and have tested it a lot on Ubuntu. Never had any issues with any of these systems. |
I am gonna test some things and get back to you in a bit. |
I think I may have just reproduced the issue! Investigating... |
NVM.. I had transferred the wrong key. I am testing from a Ubuntu server now and all good. If you tell me your host OS I can try to reproduce on the same system. |
This is based on Ubuntu so it should be the same. Let me know if you find something. |
No luck, tried a bunch of stuff but still the same issue. Could be a problem on my OS? Not sure, will test later with a friend on his PC to see if I'm the issue or what is happening. One thing is sure, hetzner-k3s is trying to logging with ssh but is instantly disconnecting for some reason that I can not understand. Its funny that for some reason it only worked one time and it never worked again hahaha. PSA: Did a few tries with new ssh keys deploying a new cluster but no luck either. |
@vitobotta Switched back to v2.0.9 and works. Any idea why? |
On the same, host, with the same keys and configuration? Is the version of hetzner-k3s the only difference? |
Yep, the version of the hetzner-k3s is the only different. Still not working in the pipeline but looks like thats a Gitlab issue, since is not able to... for some reason... send the connection. Like I was monitoring the auth logs and there's no attempt of connection by any gitlab runner. |
Can you please test again with a new cluster, in a new hetzner project, with both 2.0.9 and 2.1.1.rc6? Ensure you use a new project with nothing inside each time. |
New project-v2.0.9, works New project-v2.1.1.rc6, fails with the same issue |
Can you try v2.1.1.rc7 with DEBUG=true? |
Could you please guide how I pass the DEBUG flag? I tried with: |
It's just an env variable so I think that wait it should work. Or try |
Interesting, that seems to suggest that the connection is OK, so there may be a problem with comparing that string with the expected one. It's just a simple check to verify that SSH commands can be executed correctly. Wait for rc8 to be ready (https://github.com/vitobotta/hetzner-k3s/actions/runs/12978554154) and try again with |
Don't worry @vitobotta, I'm happy to help. Unfortunately I'm not good enough with Crystal to help with the code. I'll test it later today, when I have a bit more time I'll let you know the results! |
Sounds good |
I was just reading that there may be an issue with extra new lines/carriage returns in some cases, so that would make that string comparison fail. I have made a change in rc9 (https://github.com/vitobotta/hetzner-k3s/actions/runs/12978828792) to remove extra characters. Please test with this one. |
I recently started having issues with SSH connectivity (again, but this time also with my existing cluster) and looked into it in more detail. This may or not be a duplicate/related to #443 #415, but I wanted to document my findings in case someone else runs into similar problems.
For me the connectivity issues were a complex mix of different things that overlaid each other. All of these lead to hetzner-k3s hanging (silently) in "Waiting for successful ssh connectivity".
use_agent: true
will ignore the configured keys and use the wrong keys and fail silentlyuse_agent: false
: did also fail silently because the key pair was using a password (this is documented somewhere but still a caveat)Here are the workarounds I used:
use_agent: true
, usessh-add -D
, andssh-add <key>
to add the right key to the ssh agentuse_agent: false
use a key without a password (not recommended)cluster
label to enable port 22 even before hetzner-k3s creates the firewall rules, using a completely clean hetzner cloud project might also workMy main debugging tool in the end was to log in to a node, set debug logging level in
/etc/ssh/sshd_config
and observe the logs during the attempts (will show blocking, wrong keys, etc).Suggestions for hetzner-k3s:
use_agent: true
:use_agent: false
:The text was updated successfully, but these errors were encountered: