feat: support ThroughputLimit in samplers #1300

VinozzZ · 2024-08-23T21:56:50Z

Which problem is this PR solving?

relates to #956

Short description of the changes

Create a EMAThroughputCalculator to calculate cluster throughputs and publish individual throughput on an interval
Change Sampler interface to separate the sample rate calculation logic and sampling decision logic
Add a new configuration group EMAThroughputLimit in rules config

VinozzZ · 2024-08-29T19:56:18Z

collect/throughput_calculator.go

+	return float64(currentEMA) / float64(c.throughputLimit)
+}
+
+type throughputReport struct {


This code can be refactored to be a shared logic in both stress relief and throughput calculator

Yes, they're quite similar.

Would it make sense to go even farther, and bundle the updates into the same messages? So the system maintains a map of named values that can be updated internally by each peer, and the peers send the map through pubsub?

VinozzZ · 2024-08-29T19:57:12Z

config/metadata/rulesMeta.yaml

+  - name: EMAThroughputLimit
+    sortorder: 1
+    title: EMAThroughput Limit
+    description: >


I put this config options in the rules config because it's mostly impacting the samplers instead of refinery operations

VinozzZ · 2024-08-29T20:02:00Z

collect/throughput_calculator.go

+	c.Pubsub.Subscribe(context.Background(), stressReliefTopic, c.onThroughputUpdate)
+
+	go func() {
+		ticker := c.Clock.NewTicker(c.intervalLength)


We should only publish if the throughput is different from the previous calculation

VinozzZ · 2024-08-29T20:43:49Z

sample/rules.go

 				if rule.Drop {
-					// If we dropped because of an explicit drop rule, then increment that too.
-					s.Metrics.Increment(s.prefix + "num_dropped_by_drop_rule")
+					rate = 0


We don't have the rule's config in the MakeSamplingDecision function. I changed the logic to return rate as 0 to signal that it's a drop decision due to the Drop config

kentquirk · 2024-08-29T20:32:09Z

collect/throughput_calculator.go

+	c.Pubsub.Subscribe(context.Background(), stressReliefTopic, c.onThroughputUpdate)
+
+	go func() {
+		ticker := c.Clock.NewTicker(c.intervalLength)


kentquirk · 2024-08-29T20:35:20Z

collect/throughput_calculator.go

+	hostID          string
+
+	mut         sync.RWMutex
+	throughputs map[string]throughputReport


How about using a generics.SetWithTTL here, which will avoid the explicit timeout checks?

kentquirk · 2024-08-29T20:40:03Z

collect/throughput_calculator.go

+	return float64(currentEMA) / float64(c.throughputLimit)
+}
+
+type throughputReport struct {


Yes, they're quite similar.

Would it make sense to go even farther, and bundle the updates into the same messages? So the system maintains a map of named values that can be updated internally by each peer, and the peers send the map through pubsub?

kentquirk · 2024-08-29T21:07:15Z

collect/throughput_calculator.go

+	return msg.peerID + "|" + fmt.Sprint(msg.throughput)
+}
+
+func unmarshalThroughputMessage(msg string) (*throughputMessage, error) {


This gives me an idea. Instead of taking a string for a message type, Pubsub could take a PubsubMessage, which would maybe just embed encoding.TextMarshaler and encoding.TextUnmarshaler.

That would kind of normalize the way we do these pack and unpack things for pubsub.

Or we could build a general-purpose PubsubMessage class that has the ability to add named fields.

kentquirk · 2024-08-29T21:40:36Z

config/metadata/rulesMeta.yaml

+        description: >
+          The duration after which the EMA Dynamic Sampler should recalculate
+          its internal counters. It should be specified as a duration string.
+          For example, "30s" or "1m". Defaults to "15s".


Suggested change

For example, "30s" or "1m". Defaults to "15s".

For example, `30s` or `1m`. Defaults to `15s`.

kentquirk · 2024-08-29T21:45:21Z

config/sampler_config.go

@@ -143,8 +143,15 @@ func (v *RulesBasedDownstreamSampler) NameMeaningfulRate() string {
 }

 type V2SamplerConfig struct {
-	RulesVersion int                         `json:"rulesversion" yaml:"RulesVersion" validate:"required,ge=2"`
-	Samplers     map[string]*V2SamplerChoice `json:"samplers" yaml:"Samplers,omitempty" validate:"required"`
+	RulesVersion    int                         `json:"rulesversion" yaml:"RulesVersion" validate:"required,ge=2"`


In my head, this is a configuration option more than a sampler option, because it's global and doesn't depend on the samplers. I also understand doing it this way, though, so I think we should talk it through.

kentquirk · 2024-08-30T00:32:17Z

sample/dynamic.go

+	return rate, "dynamic", key
+}
+
+func (d *DynamicSampler) MakeSamplingDecision(rate uint, trace *types.Trace) bool {


I am pretty happy with the removal of the keep parameter, but I thought I understood that we were not going to add a new function to every sampler. I think MakeSamplingDecision is the same for all samplers, except for the metrics. Why is this better than centralizing that into the call from collect()?

Unfortunately, the DeterministicSampler's decision making logic is quite different from the rest of the samplers

Ugh. You're right, but deterministic sampling breaks with the throughput throttle anyway, so maybe we should consider treating it differently or exempting it from the throttle.

VinozzZ force-pushed the yingrong.throughput_limit branch from 3bdcd4a to 78069b3 Compare August 23, 2024 21:58

feat: support ThroughputLimit in samplers

574e6e0

VinozzZ force-pushed the yingrong.throughput_limit branch from 78069b3 to 574e6e0 Compare August 23, 2024 22:01

kentquirk assigned VinozzZ Aug 27, 2024

kentquirk added this to the v2.8 milestone Aug 27, 2024

VinozzZ added 2 commits August 29, 2024 10:40

implement pub/sub with throughput

d9f2143

add config to rules metadata

7f9c455

VinozzZ marked this pull request as ready for review August 29, 2024 19:58

VinozzZ requested a review from a team as a code owner August 29, 2024 19:58

VinozzZ commented Aug 29, 2024

View reviewed changes

rework the configs

8498f03

kentquirk reviewed Aug 30, 2024

View reviewed changes

wip

378b961

VinozzZ modified the milestones: v2.8, v2.9 Aug 30, 2024

working version

35c8bc3

VinozzZ marked this pull request as draft September 11, 2024 15:03

VinozzZ removed their assignment Sep 11, 2024

MikeGoldsmith assigned VinozzZ Sep 16, 2024

MikeGoldsmith modified the milestones: v2.9, vNext Sep 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support ThroughputLimit in samplers #1300

feat: support ThroughputLimit in samplers #1300

VinozzZ commented Aug 23, 2024 •

edited

Loading

VinozzZ Aug 29, 2024

kentquirk Aug 29, 2024

VinozzZ Aug 29, 2024

VinozzZ Aug 29, 2024

kentquirk Aug 29, 2024

VinozzZ Aug 29, 2024

kentquirk Aug 29, 2024

kentquirk Aug 29, 2024

kentquirk Aug 29, 2024

kentquirk Aug 29, 2024

kentquirk Aug 29, 2024

kentquirk Aug 29, 2024

kentquirk Aug 30, 2024

VinozzZ Aug 30, 2024

kentquirk Aug 30, 2024

	For example, "30s" or "1m". Defaults to "15s".
	For example, `30s` or `1m`. Defaults to `15s`.

feat: support ThroughputLimit in samplers #1300

Are you sure you want to change the base?

feat: support ThroughputLimit in samplers #1300

Conversation

VinozzZ commented Aug 23, 2024 • edited Loading

Which problem is this PR solving?

Short description of the changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VinozzZ commented Aug 23, 2024 •

edited

Loading