-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gRPC stream is closed after 60 seconds of idle even with timeout annotations set #12434
Comments
This issue is currently awaiting triage. If Ingress contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/remove-kind bug Can you please write a step-by-step guide that someone can copy/paste from, to reproduce on a kind cluster. Inclusding the gRPC application. |
Sure. I will work on a sample app and share the details soon. |
Try setting client-body-timeout |
/kind bug Let us know if the client body timeout works, i am also seeing client header timeout as well should be set as well |
@0x113 are you still having issues? If not can you post the resolution and/or close the ticket? |
This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach |
I think I've been dealing with the same issue. Unfortunately I don't have a replication I can share, but I can share more details. Brief context: we have been attempting to support idle gRPC streams up to 120s, but we have been experiencing timeouts at 60s. I'm on a slightly older version of the controller:
Annotations: metadata:
name: lumenvox-api-ingress-grpc
namespace: {{ default .Release.Namespace .Values.global.defaultNamespace }}
annotations:
nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "0"
nginx.ingress.kubernetes.io/server-snippet: |
grpc_read_timeout 120s;
grpc_send_timeout 120s;
client_body_timeout 120s; Here's the server block that is generated from those annotations: ## start server lumenvox-api.testmachine.com
server {
server_name lumenvox-api.testmachine.com ;
http2 on;
listen 80 ;
listen [::]:80 ;
listen 443 ssl;
listen [::]:443 ssl;
set $proxy_upstream_name "-";
ssl_certificate_by_lua_block {
certificate.call()
}
# Custom code snippet configured for host lumenvox-api.testmachine.com
grpc_read_timeout 120s;
grpc_send_timeout 120s;
client_body_timeout 120s;
location / {
set $namespace "lumenvox";
set $ingress_name "lumenvox-api-ingress-grpc";
set $service_name "lumenvox-api-service";
set $service_port "grpc";
set $location_path "/";
set $global_rate_limit_exceeding n;
rewrite_by_lua_block {
lua_ingress.rewrite({
force_ssl_redirect = false,
ssl_redirect = true,
force_no_ssl_redirect = false,
preserve_trailing_slash = false,
use_port_in_redirects = false,
global_throttle = { namespace = "", limit = 0, window_size = 0, key = { }, ignored_cidrs = { } },
})
balancer.rewrite()
plugins.run()
}
# be careful with `access_by_lua_block` and `satisfy any` directives as satisfy any
# will always succeed when there's `access_by_lua_block` that does not have any lua code doing `ngx.exit(ngx.DECLINED)`
# other authentication method such as basic auth or external auth useless - all requests will be allowed.
#access_by_lua_block {
#}
header_filter_by_lua_block {
lua_ingress.header()
plugins.run()
}
body_filter_by_lua_block {
plugins.run()
}
log_by_lua_block {
balancer.log()
plugins.run()
}
port_in_redirect off;
set $balancer_ewma_score -1;
set $proxy_upstream_name "lumenvox-lumenvox-api-service-grpc";
set $proxy_host $proxy_upstream_name;
set $pass_access_scheme $scheme;
set $pass_server_port $server_port;
set $best_http_host $http_host;
set $pass_port $pass_server_port;
set $proxy_alternative_upstream_name "";
client_max_body_size 0;
grpc_set_header Host $best_http_host;
# Pass the extracted client certificate to the backend
# Allow websocket connections
grpc_set_header Upgrade $http_upgrade;
grpc_set_header Connection $connection_upgrade;
grpc_set_header X-Request-ID $req_id;
grpc_set_header X-Real-IP $remote_addr;
grpc_set_header X-Forwarded-For $remote_addr;
grpc_set_header X-Forwarded-Host $best_http_host;
grpc_set_header X-Forwarded-Port $pass_port;
grpc_set_header X-Forwarded-Proto $pass_access_scheme;
grpc_set_header X-Forwarded-Scheme $pass_access_scheme;
grpc_set_header X-Scheme $pass_access_scheme;
# Pass the original X-Forwarded-For
grpc_set_header X-Original-Forwarded-For $http_x_forwarded_for;
# mitigate HTTPoxy Vulnerability
# https://www.nginx.com/blog/mitigating-the-httpoxy-vulnerability-with-nginx/
grpc_set_header Proxy "";
# Custom headers to proxied server
proxy_connect_timeout 5s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
proxy_buffering off;
proxy_buffer_size 4k;
proxy_buffers 4 4k;
proxy_max_temp_file_size 1024m;
proxy_request_buffering on;
proxy_http_version 1.1;
proxy_cookie_domain off;
proxy_cookie_path off;
# In case of errors try the next upstream server before returning an error
proxy_next_upstream error timeout;
proxy_next_upstream_timeout 0;
proxy_next_upstream_tries 3;
# Grpc settings
grpc_connect_timeout 5s;
grpc_send_timeout 60s;
grpc_read_timeout 60s;
# Custom Response Headers
grpc_pass grpc://upstream_balancer;
proxy_redirect off;
}
}
## end server lumenvox-api.testmachine.com The snippet from our annotations can be seen with the correct values just above the start of the I've been able to support longer idle streams by manually updating those values:
However, when the pod restarts, those values are reset back to 60s, and the longer streams start to fail again. I also tried using the metadata:
annotations:
meta.helm.sh/release-name: lumenvox
meta.helm.sh/release-namespace: lumenvox
nginx.ingress.kubernetes.io/backend-protocol: GRPC
nginx.ingress.kubernetes.io/proxy-body-size: "0"
nginx.ingress.kubernetes.io/server-snippet: |
grpc_read_timeout 120s;
grpc_send_timeout 120s;
client_body_timeout 120s;
nginx.ingress.kubernetes.io/configuration-snippet: |
grpc_read_timeout 120s;
grpc_send_timeout 120s;
client_body_timeout 120s;
nginx.ingress.kubernetes.io/ssl-redirect: "true" However, this causes an error:
So to sum up, it looks like the nginx-ingress-controller pod is autogenerating the configuration, and in that auto-generated configuration, there are values that override the grpc-related timeouts. This can be manually fixed with the steps I described, but I haven't been able to find a good long-term solution. |
I had better chance with proxy_send_timeout/proxy_read_timeout (and body-timeout) than the grpc specific ones. Also server-snippets are now disabled by default now, you should use the annotations (and configmap for the body one) now. |
What happened:
The gRPC bi-directional stream is interrupted after 60 of idle even after necessary annotations are set.
Annotations:
I verified that these values are set correctly by execing into the pod and checking
nginx.conf
directly:However, the bi-directional stream between the server and the agent is still closed after 60 seconds.
What you expected to happen:
I expected the stream to be closed after 5 minutes.
I think the default value of 60s is used whenever annotation values are greater than 60s. If I set these 3 annotations to a value less than 60, then the timeout is applied properly. For instance, I set it to "10" and the stream was interrupted after 10 seconds of idle.
NGINX Ingress controller version (exec into the pod and run
/nginx-ingress-controller --version
):Kubernetes version (use
kubectl version
):v1.29.10
Environment:
Cloud provider or hardware configuration: Managed AKS
OS (e.g. from /etc/os-release):
Kernel (e.g.
uname -a
):Install tools:
Basic cluster related info:
How was the ingress-nginx-controller installed:
kubectl describe ...
of any custom configmap(s) created and in useHow to reproduce this issue:
Anything else we need to know:
The text was updated successfully, but these errors were encountered: