Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support ThroughputLimit in samplers #1300

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

VinozzZ
Copy link
Contributor

@VinozzZ VinozzZ commented Aug 23, 2024

Which problem is this PR solving?

relates to #956

Short description of the changes

  • Create a EMAThroughputCalculator to calculate cluster throughputs and publish individual throughput on an interval
  • Change Sampler interface to separate the sample rate calculation logic and sampling decision logic
  • Add a new configuration group EMAThroughputLimit in rules config

@VinozzZ VinozzZ force-pushed the yingrong.throughput_limit branch from 3bdcd4a to 78069b3 Compare August 23, 2024 21:58
@VinozzZ VinozzZ force-pushed the yingrong.throughput_limit branch from 78069b3 to 574e6e0 Compare August 23, 2024 22:01
@kentquirk kentquirk added this to the v2.8 milestone Aug 27, 2024
@VinozzZ VinozzZ marked this pull request as ready for review August 29, 2024 19:58
@VinozzZ VinozzZ requested a review from a team as a code owner August 29, 2024 19:58
return float64(currentEMA) / float64(c.throughputLimit)
}

type throughputReport struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code can be refactored to be a shared logic in both stress relief and throughput calculator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they're quite similar.

Would it make sense to go even farther, and bundle the updates into the same messages? So the system maintains a map of named values that can be updated internally by each peer, and the peers send the map through pubsub?

- name: EMAThroughputLimit
sortorder: 1
title: EMAThroughput Limit
description: >
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this config options in the rules config because it's mostly impacting the samplers instead of refinery operations

c.Pubsub.Subscribe(context.Background(), stressReliefTopic, c.onThroughputUpdate)

go func() {
ticker := c.Clock.NewTicker(c.intervalLength)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only publish if the throughput is different from the previous calculation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

if rule.Drop {
// If we dropped because of an explicit drop rule, then increment that too.
s.Metrics.Increment(s.prefix + "num_dropped_by_drop_rule")
rate = 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have the rule's config in the MakeSamplingDecision function. I changed the logic to return rate as 0 to signal that it's a drop decision due to the Drop config

c.Pubsub.Subscribe(context.Background(), stressReliefTopic, c.onThroughputUpdate)

go func() {
ticker := c.Clock.NewTicker(c.intervalLength)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

hostID string

mut sync.RWMutex
throughputs map[string]throughputReport
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using a generics.SetWithTTL here, which will avoid the explicit timeout checks?

return float64(currentEMA) / float64(c.throughputLimit)
}

type throughputReport struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they're quite similar.

Would it make sense to go even farther, and bundle the updates into the same messages? So the system maintains a map of named values that can be updated internally by each peer, and the peers send the map through pubsub?

return msg.peerID + "|" + fmt.Sprint(msg.throughput)
}

func unmarshalThroughputMessage(msg string) (*throughputMessage, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives me an idea. Instead of taking a string for a message type, Pubsub could take a PubsubMessage, which would maybe just embed encoding.TextMarshaler and encoding.TextUnmarshaler.

That would kind of normalize the way we do these pack and unpack things for pubsub.

Or we could build a general-purpose PubsubMessage class that has the ability to add named fields.

description: >
The duration after which the EMA Dynamic Sampler should recalculate
its internal counters. It should be specified as a duration string.
For example, "30s" or "1m". Defaults to "15s".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For example, "30s" or "1m". Defaults to "15s".
For example, `30s` or `1m`. Defaults to `15s`.

@@ -143,8 +143,15 @@ func (v *RulesBasedDownstreamSampler) NameMeaningfulRate() string {
}

type V2SamplerConfig struct {
RulesVersion int `json:"rulesversion" yaml:"RulesVersion" validate:"required,ge=2"`
Samplers map[string]*V2SamplerChoice `json:"samplers" yaml:"Samplers,omitempty" validate:"required"`
RulesVersion int `json:"rulesversion" yaml:"RulesVersion" validate:"required,ge=2"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my head, this is a configuration option more than a sampler option, because it's global and doesn't depend on the samplers. I also understand doing it this way, though, so I think we should talk it through.

return rate, "dynamic", key
}

func (d *DynamicSampler) MakeSamplingDecision(rate uint, trace *types.Trace) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty happy with the removal of the keep parameter, but I thought I understood that we were not going to add a new function to every sampler. I think MakeSamplingDecision is the same for all samplers, except for the metrics. Why is this better than centralizing that into the call from collect()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the DeterministicSampler's decision making logic is quite different from the rest of the samplers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh. You're right, but deterministic sampling breaks with the throughput throttle anyway, so maybe we should consider treating it differently or exempting it from the throttle.

@VinozzZ VinozzZ modified the milestones: v2.8, v2.9 Aug 30, 2024
@VinozzZ VinozzZ marked this pull request as draft September 11, 2024 15:03
@VinozzZ VinozzZ removed their assignment Sep 11, 2024
@MikeGoldsmith MikeGoldsmith modified the milestones: v2.9, vNext Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants