Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GCS FT] Redis e2e cleanup check #2773

Merged
merged 3 commits into from
Jan 20, 2025
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ray-operator/controllers/ray/raycluster_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -1230,7 +1230,7 @@ func (r *RayClusterReconciler) buildRedisCleanupJob(ctx context.Context, instanc
"redis_address = os.getenv('RAY_REDIS_ADDRESS', '').split(',')[0]; " +
"redis_address = redis_address if '://' in redis_address else 'redis://' + redis_address; " +
"parsed = urlparse(redis_address); " +
"sys.exit(1) if not cleanup_redis_storage(host=parsed.hostname, port=parsed.port, password=os.getenv('REDIS_PASSWORD', parsed.password), use_ssl=parsed.scheme=='rediss', storage_namespace=os.getenv('RAY_external_storage_namespace')) else None\"",
"sys.exit(1) if not cleanup_redis_storage(host=parsed.hostname, port=parsed.port, password=os.getenv('REDIS_PASSWORD', parsed.password or ''), use_ssl=parsed.scheme=='rediss', storage_namespace=os.getenv('RAY_external_storage_namespace')) else None\"",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind explaining the change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the change, we could pass None to the password parameter and resulting in this error:
image

}

// Disable liveness and readiness probes because the Job will not launch processes like Raylet and GCS.
Expand Down
10 changes: 8 additions & 2 deletions ray-operator/test/e2e/raycluster_gcsft_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ func TestGcsFaultToleranceOptions(t *testing.T) {
g := NewWithT(t)
namespace := test.NewTestNamespace()

deployRedis(test, namespace.Name, tc.redisPassword)
defer deployRedis(test, namespace.Name, tc.redisPassword)()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will the stack trace look like if the check in defer fails? If it is not easy to read, maybe we can return the clean up function and explicitly call it with Eventually at the end of the test logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trace is the defer approach:

=== RUN   TestGcsFaultToleranceAnnotations/GCS_FT_without_redis_password
Modified Ray Image to: rayproject/ray:2.40.0-aarch64 for ARM chips
    raycluster_gcsft_test.go:211: Created RayCluster test-ns-72s65/raycluster-gcsft successfully
    raycluster_gcsft_test.go:213: Waiting for RayCluster test-ns-72s65/raycluster-gcsft to become ready
    raycluster_gcsft_test.go:217: Verifying environment variables on Head Pod
    support.go:253: failed cleanup redis expect 0 but got: 3
    test.go:110: Retrieving Pod Container test-ns-72s65/redis/redis logs
    test.go:98: Creating ephemeral output directory as KUBERAY_TEST_OUTPUT_DIR env variable is unset
    test.go:101: Output directory has been created at: /var/folders/sw/cyfhnvns2hv82r3fj1sgwsb00000gn/T/TestGcsFaultToleranceAnnotationsGCS_FT_without_redis_password2730163886/001
--- FAIL: TestGcsFaultToleranceAnnotations/GCS_FT_without_redis_password (48.50s)

This one is the trace for ExecPodCmd + Eventually.

=== RUN   TestGcsFaultToleranceAnnotations/GCS_FT_without_redis_password
Modified Ray Image to: rayproject/ray:2.40.0-aarch64 for ARM chips
    raycluster_gcsft_test.go:214: Created RayCluster test-ns-s257l/raycluster-gcsft successfully
    raycluster_gcsft_test.go:216: Waiting for RayCluster test-ns-s257l/raycluster-gcsft to become ready
    raycluster_gcsft_test.go:220: Verifying environment variables on Head Pod
    core.go:88: Executing command: [redis-cli --no-auth-warning DBSIZE]
    core.go:101: Command stdout: 3
    core.go:102: Command stderr: 
    ...
    core.go:88: Executing command: [redis-cli --no-auth-warning DBSIZE]
    core.go:101: Command stdout: 3
    core.go:102: Command stderr: 
    core.go:88: Executing command: [redis-cli --no-auth-warning DBSIZE]
    core.go:101: Command stdout: 3
    core.go:102: Command stderr: 
    raycluster_gcsft_test.go:236: 
        Timed out after 30.072s.
        Expected
            <string>: 3
        to be equivalent to
            <string>: 0
    test.go:110: Retrieving Pod Container test-ns-s257l/redis/redis logs
    test.go:98: Creating ephemeral output directory as KUBERAY_TEST_OUTPUT_DIR env variable is unset
    test.go:101: Output directory has been created at: /var/folders/sw/cyfhnvns2hv82r3fj1sgwsb00000gn/T/TestGcsFaultToleranceAnnotationsGCS_FT_without_redis_password315262234/001
--- FAIL: TestGcsFaultToleranceAnnotations/GCS_FT_without_redis_password (51.40s)


if tc.createSecret {
test.T().Logf("Creating Redis password secret")
Expand Down Expand Up @@ -116,6 +116,9 @@ func TestGcsFaultToleranceOptions(t *testing.T) {
} else {
g.Expect(utils.EnvVarExists(utils.REDIS_PASSWORD, headPod.Spec.Containers[utils.RayContainerIndex].Env)).Should(BeTrue())
}

err = test.Client().Ray().RayV1().RayClusters(namespace.Name).Delete(test.Ctx(), rayCluster.Name, metav1.DeleteOptions{})
g.Expect(err).NotTo(HaveOccurred())
})
}
}
Expand Down Expand Up @@ -171,7 +174,7 @@ func TestGcsFaultToleranceAnnotations(t *testing.T) {
redisPassword = tc.redisPasswordInRayStartParams
}

deployRedis(test, namespace.Name, redisPassword)
defer deployRedis(test, namespace.Name, redisPassword)()

// Prepare RayCluster ApplyConfiguration
podTemplateAC := headPodTemplateApplyConfiguration()
Expand Down Expand Up @@ -224,6 +227,9 @@ func TestGcsFaultToleranceAnnotations(t *testing.T) {
} else {
g.Expect(utils.EnvVarExists(utils.REDIS_PASSWORD, headPod.Spec.Containers[utils.RayContainerIndex].Env)).Should(BeTrue())
}

err = test.Client().Ray().RayV1().RayClusters(namespace.Name).Delete(test.Ctx(), rayCluster.Name, metav1.DeleteOptions{})
g.Expect(err).NotTo(HaveOccurred())
})
}
}
69 changes: 54 additions & 15 deletions ray-operator/test/e2e/support.go
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@
package e2e

import (
"bytes"
"embed"
"strings"
"time"

"github.com/stretchr/testify/assert"
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/resource"
appsv1ac "k8s.io/client-go/applyconfigurations/apps/v1"
corev1ac "k8s.io/client-go/applyconfigurations/core/v1"
metav1ac "k8s.io/client-go/applyconfigurations/meta/v1"
"k8s.io/client-go/kubernetes/scheme"
"k8s.io/client-go/tools/remotecommand"

rayv1ac "github.com/ray-project/kuberay/ray-operator/pkg/client/applyconfiguration/ray/v1"
. "github.com/ray-project/kuberay/ray-operator/test/support"
Expand Down Expand Up @@ -177,26 +180,20 @@ func jobSubmitterPodTemplateApplyConfiguration() *corev1ac.PodTemplateSpecApplyC
}))))
}

func deployRedis(t Test, namespace string, password string) {
func deployRedis(t Test, namespace string, password string) func() {
redisContainer := corev1ac.Container().WithName("redis").WithImage("redis:7.4").
WithPorts(corev1ac.ContainerPort().WithContainerPort(6379))
dbSizeCmd := []string{"redis-cli", "--no-auth-warning", "DBSIZE"}
if password != "" {
redisContainer.WithCommand("redis-server", "--requirepass", password)
dbSizeCmd = []string{"redis-cli", "--no-auth-warning", "-a", password, "DBSIZE"}
}

_, err := t.Client().Core().AppsV1().Deployments(namespace).Apply(
_, err := t.Client().Core().CoreV1().Pods(namespace).Apply(
t.Ctx(),
appsv1ac.Deployment("redis", namespace).
WithSpec(appsv1ac.DeploymentSpec().
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for us to use Deployment for Redis in tests. just use Pod instead.

WithReplicas(1).
WithSelector(metav1ac.LabelSelector().WithMatchLabels(map[string]string{"app": "redis"})).
WithTemplate(corev1ac.PodTemplateSpec().
WithLabels(map[string]string{"app": "redis"}).
WithSpec(corev1ac.PodSpec().
WithContainers(redisContainer),
),
),
),
corev1ac.Pod("redis", namespace).
WithLabels(map[string]string{"app": "redis"}).
WithSpec(corev1ac.PodSpec().WithContainers(redisContainer)),
TestApplyOptions,
)
assert.NoError(t.T(), err)
Expand All @@ -213,4 +210,46 @@ func deployRedis(t Test, namespace string, password string) {
TestApplyOptions,
)
assert.NoError(t.T(), err)

checkDBSize := func() string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ExecPodCmd instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

req := t.Client().Core().CoreV1().RESTClient().Post().Resource("pods").
Namespace(namespace).Name("redis").SubResource("exec").
VersionedParams(&corev1.PodExecOptions{
Container: "redis",
Command: dbSizeCmd,
Stdin: false,
Stdout: true,
Stderr: true,
}, scheme.ParameterCodec)

stdout := bytes.NewBuffer(nil)
stderr := bytes.NewBuffer(nil)

config := t.Client().Config()

executor, err := remotecommand.NewSPDYExecutor(&config, "POST", req.URL())
if err != nil {
t.T().Fatalf("failed to create executor: %v", err)
}

err = executor.StreamWithContext(t.Ctx(), remotecommand.StreamOptions{
Stdout: stdout,
Stderr: stderr,
})
if err != nil {
t.T().Fatalf("failed to execute: %v", err)
}

return strings.TrimSpace(stdout.String() + stderr.String())
}
return func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe only return checkDBSize and use Eventually in raycluster_gcsft_test.go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

var output string
for i := 0; i < 30; i++ {
if output = checkDBSize(); output == "0" {
return
}
time.Sleep(time.Second)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait for the cleanup job.

}
t.T().Fatalf("failed cleanup redis expect 0 but got: %s", output)
}
}
Loading