Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(test): configuration changes and fixes needed to scale-test #1085

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alexcastilio
Copy link
Contributor

@alexcastilio alexcastilio commented Nov 29, 2024

Description

Changes:

  • move env var name OUTPUT_FILEPATH to common/common.go
  • adjust timeout for diferent steps
  • enable operator in retina legacy helm chart installation
  • add retries to kube api requests to make it more robust to eventual failures in responses
  • add podStatus to metrics exported
  • add retina operator to get metrics from
  • other small fixes and improvements

Related Issue

If this pull request is related to any issue, please mention it here. Additionally, make sure that the issue is assigned to you before submitting this pull request.

Checklist

  • I have read the contributing documentation.
  • I signed and signed-off the commits (git commit -S -s ...). See this documentation on signing commits.
  • I have correctly attributed the author(s) of the code.
  • I have tested the changes locally.
  • I have followed the project's style guidelines.
  • I have updated the documentation, if necessary.
  • I have added tests, if applicable.

Screenshots (if applicable) or Testing Completed

Please add any relevant screenshots or GIFs to showcase the changes made.

Additional Notes

Add any additional notes or context about the pull request here.


Please refer to the CONTRIBUTING.md file for more information on how to contribute to this project.

@alexcastilio alexcastilio requested a review from a team as a code owner November 29, 2024 16:02
Copy link

This PR will be closed in 7 days due to inactivity.

@github-actions github-actions bot added the meta/waiting-for-author Blocked and waiting on the author label Dec 30, 2024
@alexcastilio alexcastilio removed the meta/waiting-for-author Blocked and waiting on the author label Jan 2, 2025
@alexcastilio alexcastilio force-pushed the alexcastilio/fix-scale-test branch from 9791c43 to 419dc4d Compare January 2, 2025 16:16
matmerr
matmerr previously approved these changes Jan 6, 2025
Copy link
Member

@matmerr matmerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like linter has some feedback though

@alexcastilio alexcastilio force-pushed the alexcastilio/fix-scale-test branch from 419dc4d to c2ef8e9 Compare January 7, 2025 08:54
@alexcastilio alexcastilio force-pushed the alexcastilio/fix-scale-test branch 8 times, most recently from bf79ef1 to c57c17a Compare January 7, 2025 15:50
@@ -30,7 +34,7 @@ func (d *DeleteNamespace) Run() error {
return fmt.Errorf("error creating Kubernetes client: %w", err)
}

ctx, cancel := context.WithTimeout(context.Background(), defaultTimeoutSeconds*time.Second)
ctx, cancel := context.WithTimeout(context.Background(), deleteNamespaceTimeoutSeconds*time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, just inline this. The const makes this less readable (mute the linter if it complains). Also, (even though we're not doing this) the best practice with Go durations is const deleteNamespaceTimeout = 1200 * time.Second. Embedding a unit like Seconds in the name is an antipattern.

Suggested change
ctx, cancel := context.WithTimeout(context.Background(), deleteNamespaceTimeoutSeconds*time.Second)
ctx, cancel := context.WithTimeout(context.Background(), 1200 * time.Second)

Comment on lines +47 to +50
numberOfSteps := 9

backoff := wait.Backoff{
Steps: 6,
Steps: numberOfSteps,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here, the constant just obscures what's going on more than it helps.

Steps: 9

metav1.PatchOptions{},
)
if err != nil {
return fmt.Errorf("error patching pod: %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I wrap errors, the word "error" is typically redundant (it's an error, so it will be logged as such). It's sufficient to just write what you were trying to do when the error occurred, as you've done:

Suggested change
return fmt.Errorf("error patching pod: %w", err)
return fmt.Errorf("patching pod: %w", err)

Copy link
Collaborator

@rbtr rbtr Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree that "error" is redundant, but I prefer to write <what happened> and think that that reads easier when scanning logs.
something like

Suggested change
return fmt.Errorf("error patching pod: %w", err)
return fmt.Errorf("failed to patch Pod: %w", err)

Path: "/metadata/labels/uni-lab-" + fmt.Sprintf("%05d", *counter),
Value: "val",
})
(*counter)++
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just return this? Or let the caller deal with JSON marshalling so they can just look at the len?

@@ -48,11 +53,18 @@ func (c *CreateResources) Run() error {
return fmt.Errorf("error creating Kubernetes client: %w", err)
}

ctx, cancel := context.WithTimeout(context.Background(), defaultTimeoutSeconds*time.Second)
ctx, cancel := context.WithTimeout(context.Background(), timeoutCreateResourcesSeconds*time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find the other instance, but I also want to point out that value * time.Second is dangerous math to do--you should eliminate it every chance you get. The value can easily overflow as you can see here if someone tries to give you a duration: https://go.dev/play/p/tZhE9KQPdju . time.Duration is a first-class type and it should be passed around and used.

Copy link
Collaborator

@rbtr rbtr Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1000000
please use time.Duration for time instead of the primitive numerical types, and use it as soon as possible: at the interface of parsing config; in the func args; whenever you declare a time var or const

Copy link
Contributor Author

@alexcastilio alexcastilio Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timraymond @rbtr This and other places that is following this kind of pattern (variable to store a single number) was requested by the linter, e.g.:

Error: test/e2e/framework/kubernetes/delete-namespace.go:48:13: Magic number: 9, in <assign> detected (mnd)
  		Steps:    9,
  		          ^

Should we disable this mnd linter?

g.stop = make(chan struct{})
g.wg.Add(1)

go func() {

t := time.NewTicker(5 * time.Minute)
t := time.NewTicker(defaultInterval)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leaks a goroutine.

defer t.Stop()

Not a huge deal in tests, but it's a habit you should drill.

Comment on lines +86 to +87
log.Fatalf("error getting and publishing number of restarts: %v", err)
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no point in returning since log.Fatalf will panic. That said, you should not panic :)

@@ -88,53 +125,61 @@ func (g *GetAndPublishMetrics) Prevalidate() error {
if _, ok := g.AdditionalTelemetryProperty["retinaVersion"]; !ok {
return fmt.Errorf("retinaVersion is required in AdditionalTelemetryProperty")
}

if os.Getenv(common.OutputFilePathEnv) == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not crazy about ad-hoc os.Getenv. The trouble is that it gets difficult to reason about "what is the entire configuration for this process" if you do it often enough. Is there some place where you can put this where it will get loaded in a config struct you can pass around?

Comment on lines +175 to +176
permissions := 0o644
file, err := os.OpenFile(g.outputFilePath, os.O_APPEND|os.O_CREATE|os.O_WRONLY, fs.FileMode(permissions))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline permissions

@@ -12,6 +13,10 @@ import (
"k8s.io/client-go/tools/clientcmd"
)

const (
timeoutToLabelAllPodsMinutes = 120
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, please use time.Duration:

Suggested change
timeoutToLabelAllPodsMinutes = 120
podLabelTimeout = 120 * time.Minute

@alexcastilio alexcastilio force-pushed the alexcastilio/fix-scale-test branch 3 times, most recently from 2a7c193 to bb3a29f Compare January 8, 2025 16:06
@alexcastilio alexcastilio force-pushed the alexcastilio/fix-scale-test branch from bb3a29f to bcb2d2d Compare January 8, 2025 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants