Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MeasurementProcessor specification to Metrics SDK #4318

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

Blinkuu
Copy link

@Blinkuu Blinkuu commented Dec 3, 2024

Fixes #4298

This PR adds the MeasurementProcessor concept to the Metrics SDK specification.

The goal is to allow use cases such as:

  • Dynamic injection of additional attributes to measurements based on Context
  • Dropping attributes
  • Dropping individual measurements
  • Modifying measurements

Copy link

linux-foundation-easycla bot commented Dec 3, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

specification/metrics/sdk.md Outdated Show resolved Hide resolved
@pellared

This comment was marked as resolved.

Add status field

Co-authored-by: Robert Pająk <[email protected]>
specification/metrics/sdk.md Outdated Show resolved Hide resolved
Comment on lines +1035 to +1036
For a `MeasurementProcessor` registered directly on SDK `MeterProvider`, the `measurement` mutations MUST be visible in next registered processors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we allow the processor to "drop" the measurement (e.g. the processor decided that it doesn't want the measurement) or other operations beyond modifications on the value and attributes?

Copy link
Member

@pellared pellared Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related question (thus decided to put it here).
Shouldn't the processor also be used when evaluating Enabled?
Shouldn't we also add an OnEnabled hook?

Related comment in other issue:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To allow processors to "drop" measurements, they must be somehow connected to the MetricsReader. I agree that it would be a cool feature to have, providing great flexibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Lightstep Metrics SDK implements a MeasurementProcessor interface which was narrowly scoped to allow modifying the set of attributes for a measurement. In that use-case, we would take the incoming gRPC metadata from the context, look up specific headers, and apply header values as attribute values.

I admit I am not sure what reasons a user would have to modify measured values. Are there well-known use-cases? I found @jack-berg mentioned "unit conversion" here, but I am not sure how that would work--the measurement processor does not change the instrument definition, and the measurement does not include a unit. Are there really use-cases for modifying the value?

That SDK does not permit dropping measurements. Speaking also to @pellared's question about Enabled and whether measurement processors should intercept Enabled calls, I would recommend No. See my position on passing context to the metrics enabled method, #4256 (comment), which states the same. I am nervous about letting measurement processors change measurements and selectively enable/disable call sites because IMO it will make interpreting the resulting data very difficult.

As an example, suppose we have a measurement processor that is designed to redact sensitive attribute values. IMO it would be better to change attributes, not to drop events, because otherwise a user can be easily misled. Suppose we have a counter which counts requests with an attribute for success (boolean) and a client ID (string). We have a policy that says client IDs should not resemble e-mail addresses, otherwise they are invalid. The two options are to redact the client ID (e.g., give it a value like "redacted") or to drop the measurement. If we drop the measurement, all sorts of queries might be impacted. What's my success rate? I have no idea because an unknown number of redacted measurements were dropped.

Therefore, I would propose that measurement processors can only modify attributes, not values, and not drop events.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the measurement processor does not change the instrument definition, and the measurement does not include a unit. Are there really use-cases for modifying the value?

Providing this feature without the ability to do unit conversion or drop measurements would be a miss. Can solve the lack of knowledge about unit by providing the processor access to instrument metadata. I think it could make sense to allow measurements processors to be configurable at the view level, in which case we might also consider allowing views to modify the unit of the resulting stream. Users could then compose a view which: 1. Adds a processor for unit conversion. 2. Adjusts the resulting stream's unit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll come around on this topic. I see how dropping metric events is a useful feature, despite the potential for difficult consequences. Dropping metric events is not very different than sampling traces at 0%. Just like 0% sampling (which we call "non probabilistic"), there is a loss of information, but that is intentional.

@jack-berg Given your statement, I think it means that the Measurement type should be defined as a 3-tuple (Value, Attributes, Instrument). This model works for me--and it resembles the OpenCensus "stats" API. Tangentially, I see a potential for us to form new APIs (like OpenCensus) which accept a list of measurements atomically and apply a single timestamp (e.g., or process the dynamic context once for multiple events).

Let me pose a thought experiment. What does a MeasurementProcessor do better than you could achieve simply by wrapping a MeterProvider with a new instance containing the desired logic? I'm looking at the complexity trade-off here. I see how the desire to modify units comes about -- especially with the base-2 exponential histogram -- we see a desire to change seconds to/from milliseconds w/o loss of information as a compelling use-case. In the wrapped-MeterProvider scenario, the units-conversion wrapper would ("simply") register a new instrument with the delegate MeterProvider having different units and divide/multiply the value on its way through.

I thought of another case that I'm aware of, which calls for modifying the instrument kind, i.e., more than just a change of unit. I'm aware of use-cases for synchronous UpDownCounter instruments where the user would like to separate positive from negative values as two Counters. In this case, the two absolute value instruments convey the rate of ups and down as separate information. Still, the input-to-output mapping is 1:1.

I prefer to think of MeasurementProcessor as something like syntactic sugar for the example I described above, meaning that it can be defined abstractly as a wrapper of meter providers with a per-instrument event translation rule. There seems to be a potential -- do we know any use-cases? -- for one metric API event to translate into more than one metric API event on the wrapped meter provider. In this sense, we could define MeasurementProcessor as a per-instrument function that maps one input measurement into a list of zero or more output measurements, enabling both dropping and proliferation of events.

Copy link
Author

@Blinkuu Blinkuu Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it means that the Measurement type should be defined as a 3-tuple (Value, Attributes, Instrument). This model works for me--and it resembles the OpenCensus "stats" API.

@jmacd I think this makes sense. Having access to an Instrument inside the processor makes it very powerful.

I think it could make sense to allow measurements processors to be configurable at the view level, in which case we might also consider allowing views to modify the unit of the resulting stream. Users could then compose a view which: 1. Adds a processor for unit conversion. 2. Adjusts the resulting stream's unit.

@jack-berg I'm reading the View specification, which explicitly mentions that views work on the "metric" level. Therefore, configuring processors on the Views (instead of on MeterProvider) would require updating the View specification as well, unless I'm misunderstanding something.


Regarding dropping Measurements, changing instrument kinds, modifying the value, or even creating new Measurements on the fly (e.g., split UpDownCounter into two counters), we could make the proposed Measure() method return an array of Measurements instead of Void.

specification/metrics/sdk.md Outdated Show resolved Hide resolved
specification/metrics/sdk.md Outdated Show resolved Hide resolved
Comment on lines +1035 to +1036
For a `MeasurementProcessor` registered directly on SDK `MeterProvider`, the `measurement` mutations MUST be visible in next registered processors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Lightstep Metrics SDK implements a MeasurementProcessor interface which was narrowly scoped to allow modifying the set of attributes for a measurement. In that use-case, we would take the incoming gRPC metadata from the context, look up specific headers, and apply header values as attribute values.

I admit I am not sure what reasons a user would have to modify measured values. Are there well-known use-cases? I found @jack-berg mentioned "unit conversion" here, but I am not sure how that would work--the measurement processor does not change the instrument definition, and the measurement does not include a unit. Are there really use-cases for modifying the value?

That SDK does not permit dropping measurements. Speaking also to @pellared's question about Enabled and whether measurement processors should intercept Enabled calls, I would recommend No. See my position on passing context to the metrics enabled method, #4256 (comment), which states the same. I am nervous about letting measurement processors change measurements and selectively enable/disable call sites because IMO it will make interpreting the resulting data very difficult.

As an example, suppose we have a measurement processor that is designed to redact sensitive attribute values. IMO it would be better to change attributes, not to drop events, because otherwise a user can be easily misled. Suppose we have a counter which counts requests with an attribute for success (boolean) and a client ID (string). We have a policy that says client IDs should not resemble e-mail addresses, otherwise they are invalid. The two options are to redact the client ID (e.g., give it a value like "redacted") or to drop the measurement. If we drop the measurement, all sorts of queries might be impacted. What's my success rate? I have no idea because an unknown number of redacted measurements were dropped.

Therefore, I would propose that measurement processors can only modify attributes, not values, and not drop events.

specification/metrics/sdk.md Outdated Show resolved Hide resolved
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Dec 14, 2024
@Blinkuu
Copy link
Author

Blinkuu commented Dec 19, 2024

This PR was marked stale due to lack of activity. It will be closed in 7 days.

Still working on this; will try to provide another iteration early next year.

@pellared pellared removed the Stale label Dec 19, 2024
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Dec 27, 2024
@Blinkuu Blinkuu changed the title [WIP] Add MeasurementProcessor specification to Metrics SDK Add MeasurementProcessor specification to Metrics SDK Jan 2, 2025
@Blinkuu Blinkuu marked this pull request as ready for review January 2, 2025 12:07
@Blinkuu Blinkuu requested review from a team as code owners January 2, 2025 12:07
@Blinkuu Blinkuu requested a review from jmacd January 2, 2025 12:07
@Blinkuu
Copy link
Author

Blinkuu commented Jan 2, 2025

I have updated the PR. The current proposal should facilitate all use cases we talked about.

specification/metrics/sdk.md Outdated Show resolved Hide resolved

A `MeasuremenetProcessor` may freely modify `measurement` for the duration of the `OnMeasure` call.

A `MeasurementProcessor` MUST invoke `OnMeasure` on the next registered processor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would a MeasurementProcessor do this? Is there a mechanism for the processor to get the "next registered processor"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this wasn't specified. I have added an explicit spec around this issue in 225000a.

@Blinkuu Blinkuu requested a review from reyang January 7, 2025 15:20

* `context` - the resolved `Context` (the explicitly passed `Context` or the current `Context`)
* `measurement` - a [Measurement](./api.md#measurement) that was recorded
* `next` - the `OnMeasure` function from the next processor in the chain
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is next the "next processor" or "the OnMeasure function of the next processor"?


A `MeasuremenetProcessor` MAY freely modify `measurement` for the duration of the `OnMeasure` call.

A `MeasurementProcessor` SHOULD invoke `OnMeasure` on the next registered processor. A `MeasurementProcessor` MAY decide to drop the `Measurement` by not invoking the next processor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would a MeasurementProcessor do that? For example, if it needs to invoke OnMeasure (which requires "next"), how would it figure out all the arguments?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible by making the SDK "bind" arguments such that calling next does not require a reference to the processor itself, e.g., https://go.dev/play/p/wPZRm5xk3nO.

This makes the API easy to implement and doesn't require users to store any state inside the processor (although they still can if they want).

@Blinkuu Blinkuu requested a review from reyang January 9, 2025 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support measurement processors in Metrics SDK
6 participants