Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GCS FT] Redis e2e cleanup check #2773

Merged
merged 3 commits into from
Jan 20, 2025

Conversation

rueian
Copy link
Contributor

@rueian rueian commented Jan 18, 2025

Why are these changes needed?

Follow-ups for #2696 (comment) and resolves #2764.

  • Add e2e tests to check if Redis is empty at the end of each GCS FT test.
  • Fix Redis cleanup job to accept empty passwords.

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@rueian rueian changed the title Redis e2e cleanup check [GCS FT] Redis e2e cleanup check Jan 18, 2025
t.Ctx(),
appsv1ac.Deployment("redis", namespace).
WithSpec(appsv1ac.DeploymentSpec().
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for us to use Deployment for Redis in tests. just use Pod instead.

if output = checkDBSize(); output == "0" {
return
}
time.Sleep(time.Second)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait for the cleanup job.

@rueian rueian marked this pull request as ready for review January 18, 2025 20:55
@@ -213,4 +210,46 @@ func deployRedis(t Test, namespace string, password string) {
TestApplyOptions,
)
assert.NoError(t.T(), err)

checkDBSize := func() string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ExecPodCmd instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


return strings.TrimSpace(stdout.String() + stderr.String())
}
return func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe only return checkDBSize and use Eventually in raycluster_gcsft_test.go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -1230,7 +1230,7 @@ func (r *RayClusterReconciler) buildRedisCleanupJob(ctx context.Context, instanc
"redis_address = os.getenv('RAY_REDIS_ADDRESS', '').split(',')[0]; " +
"redis_address = redis_address if '://' in redis_address else 'redis://' + redis_address; " +
"parsed = urlparse(redis_address); " +
"sys.exit(1) if not cleanup_redis_storage(host=parsed.hostname, port=parsed.port, password=os.getenv('REDIS_PASSWORD', parsed.password), use_ssl=parsed.scheme=='rediss', storage_namespace=os.getenv('RAY_external_storage_namespace')) else None\"",
"sys.exit(1) if not cleanup_redis_storage(host=parsed.hostname, port=parsed.port, password=os.getenv('REDIS_PASSWORD', parsed.password or ''), use_ssl=parsed.scheme=='rediss', storage_namespace=os.getenv('RAY_external_storage_namespace')) else None\"",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind explaining the change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the change, we could pass None to the password parameter and resulting in this error:
image

@@ -81,7 +81,7 @@ func TestGcsFaultToleranceOptions(t *testing.T) {
g := NewWithT(t)
namespace := test.NewTestNamespace()

deployRedis(test, namespace.Name, tc.redisPassword)
defer deployRedis(test, namespace.Name, tc.redisPassword)()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will the stack trace look like if the check in defer fails? If it is not easy to read, maybe we can return the clean up function and explicitly call it with Eventually at the end of the test logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trace is the defer approach:

=== RUN   TestGcsFaultToleranceAnnotations/GCS_FT_without_redis_password
Modified Ray Image to: rayproject/ray:2.40.0-aarch64 for ARM chips
    raycluster_gcsft_test.go:211: Created RayCluster test-ns-72s65/raycluster-gcsft successfully
    raycluster_gcsft_test.go:213: Waiting for RayCluster test-ns-72s65/raycluster-gcsft to become ready
    raycluster_gcsft_test.go:217: Verifying environment variables on Head Pod
    support.go:253: failed cleanup redis expect 0 but got: 3
    test.go:110: Retrieving Pod Container test-ns-72s65/redis/redis logs
    test.go:98: Creating ephemeral output directory as KUBERAY_TEST_OUTPUT_DIR env variable is unset
    test.go:101: Output directory has been created at: /var/folders/sw/cyfhnvns2hv82r3fj1sgwsb00000gn/T/TestGcsFaultToleranceAnnotationsGCS_FT_without_redis_password2730163886/001
--- FAIL: TestGcsFaultToleranceAnnotations/GCS_FT_without_redis_password (48.50s)

This one is the trace for ExecPodCmd + Eventually.

=== RUN   TestGcsFaultToleranceAnnotations/GCS_FT_without_redis_password
Modified Ray Image to: rayproject/ray:2.40.0-aarch64 for ARM chips
    raycluster_gcsft_test.go:214: Created RayCluster test-ns-s257l/raycluster-gcsft successfully
    raycluster_gcsft_test.go:216: Waiting for RayCluster test-ns-s257l/raycluster-gcsft to become ready
    raycluster_gcsft_test.go:220: Verifying environment variables on Head Pod
    core.go:88: Executing command: [redis-cli --no-auth-warning DBSIZE]
    core.go:101: Command stdout: 3
    core.go:102: Command stderr: 
    ...
    core.go:88: Executing command: [redis-cli --no-auth-warning DBSIZE]
    core.go:101: Command stdout: 3
    core.go:102: Command stderr: 
    core.go:88: Executing command: [redis-cli --no-auth-warning DBSIZE]
    core.go:101: Command stdout: 3
    core.go:102: Command stderr: 
    raycluster_gcsft_test.go:236: 
        Timed out after 30.072s.
        Expected
            <string>: 3
        to be equivalent to
            <string>: 0
    test.go:110: Retrieving Pod Container test-ns-s257l/redis/redis logs
    test.go:98: Creating ephemeral output directory as KUBERAY_TEST_OUTPUT_DIR env variable is unset
    test.go:101: Output directory has been created at: /var/folders/sw/cyfhnvns2hv82r3fj1sgwsb00000gn/T/TestGcsFaultToleranceAnnotationsGCS_FT_without_redis_password315262234/001
--- FAIL: TestGcsFaultToleranceAnnotations/GCS_FT_without_redis_password (51.40s)

@kevin85421 kevin85421 self-assigned this Jan 20, 2025
@rueian rueian force-pushed the redis-e2e-cleanup-check branch from 8b02924 to b73254d Compare January 20, 2025 02:12
@rueian rueian force-pushed the redis-e2e-cleanup-check branch from b73254d to 8f9a030 Compare January 20, 2025 02:18
@kevin85421 kevin85421 merged commit a81ea81 into ray-project:master Jan 20, 2025
24 checks passed
win5923 pushed a commit to win5923/kuberay that referenced this pull request Jan 20, 2025
win5923 pushed a commit to win5923/kuberay that referenced this pull request Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Check Redis cleanup in Golang e2e tests
2 participants