Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add implementation of new AvailableComponents message #340

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

mrsillydog
Copy link

Implements the specification added to opamp-spec in open-telemetry/opamp-spec#201

Copy link

codecov bot commented Jan 10, 2025

Codecov Report

Attention: Patch coverage is 95.38462% with 3 lines in your changes missing coverage. Please review.

Project coverage is 78.29%. Comparing base (e6fac32) to head (53d8cd0).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
client/internal/clientcommon.go 92.30% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #340      +/-   ##
==========================================
+ Coverage   77.81%   78.29%   +0.47%     
==========================================
  Files          25       25              
  Lines        2321     2386      +65     
==========================================
+ Hits         1806     1868      +62     
- Misses        408      410       +2     
- Partials      107      108       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

// If components.Hash is nil or an empty []byte, errNoAvailableComponentHash will be returned.
// This method is subject to agent status compression - if components is not
// different from the cached agent state, this method is a no-op.
SetAvailableComponents(components *protobufs.AvailableComponents) error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add to the comment that ReportsAvailableComponents capability must be set in StartSettings otherwise this call with fail?

prepareClient(t, &settings, client)

// Client --->
assert.NoError(t, client.Start(context.Background(), settings))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am probably misunderstanding something but shouldn't this Start() fail for test case "apply AvailableComponents without required capability" because that test case does not set ReportsAvailableComponents capability? Why isn't it failing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not failing because AvailableComponents is ignored during PrepareStart if the capability is not set - think this comment is along a similar vein to the one below.

})
} else {
// Verify that no second status report is delivered (polling is too infrequent for this to happen in 3 seconds)
require.Never(t, func() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this block the test for 3 seconds? If it does, can we find a way to avoid waiting for so long? We don't want test durations to increase, especially if they are simply idling.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduced all blocking calls to assert/require.Never() to 50ms, including one other test besides the ones I added.

@@ -88,6 +89,16 @@ func (c *ClientCommon) PrepareStart(
return ErrHealthMissing
}

if c.Capabilities&protobufs.AgentCapabilities_AgentCapabilities_ReportsAvailableComponents != 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to check for the other related case, where ReportsAvailableComponents is 0 but AvailableComponents is not nil? Is that considered a valid situation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider that a valid situation - since the only real way to affect ReportsAvailableComponents and AvailableComponents before Start() is via the StartSettings, I think this case is covered by the StartSettings.AvailableComponents documentation:

	// Defines the available components of the Agent.
	// Required if the ReportsAvailableComponents capability is set.
	// If the ReportsAvailableComponents capability is not set, this option has no effect.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with this option since there seemed to be precedent based on StartSettings.HeartbeatInterval -

	// If the ReportsHeartbeat capability is disabled, this option has no effect.
	HeartbeatInterval *time.Duration

@@ -134,4 +134,14 @@ type OpAMPClient interface {
// If no error is returned, the channel returned will be closed after the specified
// message is sent.
SendCustomMessage(message *protobufs.CustomMessage) (messageSendingChannel chan struct{}, err error)

// SetAvailableComponents modifies the set of components that are available for configuration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there is 2 ways to define AvailableComponents: 1) via StartSettings and 2) by calling SetAvailableComponents. Can we eliminate one of these and keep only one? If we need to allow re-defining AvailableComponents after Start (do we?) can we change SetAvailableComponents() to be allowed to be called before or after Start() and eliminate StartSettings.AvailableComponents?

See for example how SetAgentDescription() can be called (actually must be) before Start() and can be called after Start() too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually my original plan, but I encountered a problem which made me adjust my approach.

Basically, the defining difference between this and SetAgentDescription() is that SetAgentDescription is not contingent on an AgentCapability being enabled, whereas SetAvailableComponents is. Before Start(), our OpAmpClient has no idea what its agent capabilities are, since those are defined for the client during PrepareStart(), which means that any method which is dependent on those capabilities must be called after Start() -> PrepareStart().

I thought adding the starting AvailableComponents to the StartSettings was the simplest route around this issue, rather than attempting to rework the larger flow of the client. If that's not your preference, I'm happy to take another attempt at it using only SetAvailableComponents.

Copy link
Author

@mrsillydog mrsillydog Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple other alternatives I could think of:

  1. getting rid of the SetAvailableComponents method entirely and just utilizing the StartSettings option - I don't love this option, since it makes it impossible for the client to dynamically change the AvailableComponents after Start(), but maybe that's not a concern given that AvailableComponents is unlikely to change for a given agent after Start()?
    Edit: see you actually already described this option: If we need to allow re-defining AvailableComponents after Start (do we?)
  2. adding some logic to SetAvailableComponents that could somehow (not clear yet) detect whether Start() has been called already and behave differently based on that (i.e. if Start() has been called, care about the Capability, if it hasn't, don't). I do not love this for multiple reasons, but it is an option.

Copy link
Author

@mrsillydog mrsillydog Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR: There may be value in allowing SetAvailableComponents to continue existing when a proxy OpAMP server is involved.

Discovered a slight problem with alternative 1: my planned implementation (which could change, to be fair) using the supervisor in the OpenTelemetry collector requires SetAvailableComponents. Here's the planned implementation, with new additions in bold:

  1. Supervisor starts, starts bootstrap OpAMP server to get agent description and the initial AvailableComponents.
  2. Collector starts, sends bootstrap information, but because of the OpAMP client implementation here, only sends the hash, and not the full component list.
  3. Supervisor receives the bootstrap information, stores it. This only includes the AvailableComponents hash.
  4. Supervisor starts up its own OpAMP client, sends the stored AvailableComponents information to the external OpAMP Server - again, this only includes the hash.
  5. Supervisor starts up its own OpAMP server.

Everything is good - external OpAMP server has the component hash, so does the supervisor OpAMP server, but only the collector client has the full component list.

  1. External OpAMP server decides it needs the full component list. Sends a ServerToAgent message to the supervisor client with the ReportAvailableComponents flag set.
  2. Supervisor OpAMP client proxies the server request flags along to the collector OpAMP client via the supervisor OpAMP server.
  3. Collector OpAMP client receives the message with its flags, returns the full component list to the supervisor OpAMP server.
  4. Supervisor uses its OpAMP server to receive the component list, stores the component list, uses its OpAMP client to send it to the external OpAMP server by calling client.SetAvailableComponents(). The value here is that the Supervisor client didn't have complete knowledge of the available components from the start, just the hash - so having a way to modify the client's known list of available components after Start() does have value.

An alternative approach would be to more heavily modify the supervisor bootstrap process to not only retrieve the initial hash from the collector client, but also request the full list of available components after receiving the initial hash. The upside would be that we would not have to proxy any ServerToAgent flags back to the collector client - the downside would be that the original source of truth, the collector, would be cut off from these messages. It would also complicate the bootstrap process.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to go with the "modify the supervisor bootstrap process" route and eliminate SetAvailableComponents, I've discovered some issues with the process during end-to-end testing that make the existing workflow less than ideal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on alternate 2 and why you don't like it? It seems that would allow to redefine available components dynamically and would eliminate the problem you have with bootstrapping.

Copy link
Author

@mrsillydog mrsillydog Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think a couple aspects of it just don't feel very clean to me.

  1. It seems like it's overloading that single method with two responsibilities
  2. It seems like a divergence from the way that the other Set____ interface methods are implemented, which are agnostic to whether Start() has been called or not.

One solution that I do think solves both these issues in a way that I feel better about is including two new methods in the OpAMPClient interface, SetAvailableComponents and InitializeAvailableComponents - one which would need to be called before Start() if the ReportAvailableComponents capability was defined (but would only set up client state, nothing more), one which could only be called after Start(), which would allow for dynamic available component redefinition. With that being said, I'm not sure that my implementation within the supervisor and OpAMP extension would change substantially - if a case emerges where the available components for a collector can change dynamically, then it will be useful to have the SetAvailableComponents method already defined, but until that case emerges, I think it would likely be unused.

With all that being said, I'm not opposed to being overridden here if getting rid of StartSettings.AvailableComponents and adding this code to (c *ClientCommon) SetAvailableComponents seems like the strongly better option to you.

	if !c.isStarted {
		return c.ClientSyncedState.SetAvailableComponents(components)
	}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reconsidered. Think I'm going to go with the approach.

@mrsillydog mrsillydog marked this pull request as ready for review January 15, 2025 15:56
@mrsillydog mrsillydog requested a review from a team as a code owner January 15, 2025 15:56
@mrsillydog
Copy link
Author

Comfortable moving this into Ready for Review - have successfully manually tested this implementation end-to-end with a custom build of the OpAMP Supervisor and OpAMP extension in open-telemetry/opentelemetry-collector-contrib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants