-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add implementation of new AvailableComponents message #340
base: main
Are you sure you want to change the base?
feat: Add implementation of new AvailableComponents message #340
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #340 +/- ##
==========================================
+ Coverage 77.81% 78.29% +0.47%
==========================================
Files 25 25
Lines 2321 2386 +65
==========================================
+ Hits 1806 1868 +62
- Misses 408 410 +2
- Partials 107 108 +1 ☔ View full report in Codecov by Sentry. |
// If components.Hash is nil or an empty []byte, errNoAvailableComponentHash will be returned. | ||
// This method is subject to agent status compression - if components is not | ||
// different from the cached agent state, this method is a no-op. | ||
SetAvailableComponents(components *protobufs.AvailableComponents) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add to the comment that ReportsAvailableComponents capability must be set in StartSettings otherwise this call with fail?
prepareClient(t, &settings, client) | ||
|
||
// Client ---> | ||
assert.NoError(t, client.Start(context.Background(), settings)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am probably misunderstanding something but shouldn't this Start() fail for test case "apply AvailableComponents without required capability" because that test case does not set ReportsAvailableComponents
capability? Why isn't it failing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not failing because AvailableComponents
is ignored during PrepareStart
if the capability is not set - think this comment is along a similar vein to the one below.
client/httpclient_test.go
Outdated
}) | ||
} else { | ||
// Verify that no second status report is delivered (polling is too infrequent for this to happen in 3 seconds) | ||
require.Never(t, func() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this block the test for 3 seconds? If it does, can we find a way to avoid waiting for so long? We don't want test durations to increase, especially if they are simply idling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduced all blocking calls to assert/require.Never() to 50ms, including one other test besides the ones I added.
client/internal/clientcommon.go
Outdated
@@ -88,6 +89,16 @@ func (c *ClientCommon) PrepareStart( | |||
return ErrHealthMissing | |||
} | |||
|
|||
if c.Capabilities&protobufs.AgentCapabilities_AgentCapabilities_ReportsAvailableComponents != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need to check for the other related case, where ReportsAvailableComponents
is 0 but AvailableComponents
is not nil? Is that considered a valid situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider that a valid situation - since the only real way to affect ReportsAvailableComponents
and AvailableComponents
before Start()
is via the StartSettings
, I think this case is covered by the StartSettings.AvailableComponents
documentation:
// Defines the available components of the Agent.
// Required if the ReportsAvailableComponents capability is set.
// If the ReportsAvailableComponents capability is not set, this option has no effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with this option since there seemed to be precedent based on StartSettings.HeartbeatInterval
-
// If the ReportsHeartbeat capability is disabled, this option has no effect.
HeartbeatInterval *time.Duration
@@ -134,4 +134,14 @@ type OpAMPClient interface { | |||
// If no error is returned, the channel returned will be closed after the specified | |||
// message is sent. | |||
SendCustomMessage(message *protobufs.CustomMessage) (messageSendingChannel chan struct{}, err error) | |||
|
|||
// SetAvailableComponents modifies the set of components that are available for configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there is 2 ways to define AvailableComponents: 1) via StartSettings and 2) by calling SetAvailableComponents. Can we eliminate one of these and keep only one? If we need to allow re-defining AvailableComponents after Start (do we?) can we change SetAvailableComponents() to be allowed to be called before or after Start() and eliminate StartSettings.AvailableComponents?
See for example how SetAgentDescription() can be called (actually must be) before Start() and can be called after Start() too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was actually my original plan, but I encountered a problem which made me adjust my approach.
Basically, the defining difference between this and SetAgentDescription() is that SetAgentDescription is not contingent on an AgentCapability being enabled, whereas SetAvailableComponents is. Before Start(), our OpAmpClient has no idea what its agent capabilities are, since those are defined for the client during PrepareStart(), which means that any method which is dependent on those capabilities must be called after Start() -> PrepareStart().
I thought adding the starting AvailableComponents to the StartSettings was the simplest route around this issue, rather than attempting to rework the larger flow of the client. If that's not your preference, I'm happy to take another attempt at it using only SetAvailableComponents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple other alternatives I could think of:
- getting rid of the SetAvailableComponents method entirely and just utilizing the StartSettings option - I don't love this option, since it makes it impossible for the client to dynamically change the AvailableComponents after Start(), but maybe that's not a concern given that AvailableComponents is unlikely to change for a given agent after Start()?
Edit: see you actually already described this option:If we need to allow re-defining AvailableComponents after Start (do we?)
- adding some logic to SetAvailableComponents that could somehow (not clear yet) detect whether Start() has been called already and behave differently based on that (i.e. if Start() has been called, care about the Capability, if it hasn't, don't). I do not love this for multiple reasons, but it is an option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR: There may be value in allowing SetAvailableComponents to continue existing when a proxy OpAMP server is involved.
Discovered a slight problem with alternative 1: my planned implementation (which could change, to be fair) using the supervisor in the OpenTelemetry collector requires SetAvailableComponents. Here's the planned implementation, with new additions in bold:
- Supervisor starts, starts bootstrap OpAMP server to get agent description and the initial AvailableComponents.
- Collector starts, sends bootstrap information, but because of the OpAMP client implementation here, only sends the hash, and not the full component list.
- Supervisor receives the bootstrap information, stores it. This only includes the AvailableComponents hash.
- Supervisor starts up its own OpAMP client, sends the stored AvailableComponents information to the external OpAMP Server - again, this only includes the hash.
- Supervisor starts up its own OpAMP server.
Everything is good - external OpAMP server has the component hash, so does the supervisor OpAMP server, but only the collector client has the full component list.
- External OpAMP server decides it needs the full component list. Sends a ServerToAgent message to the supervisor client with the ReportAvailableComponents flag set.
- Supervisor OpAMP client proxies the server request flags along to the collector OpAMP client via the supervisor OpAMP server.
- Collector OpAMP client receives the message with its flags, returns the full component list to the supervisor OpAMP server.
- Supervisor uses its OpAMP server to receive the component list, stores the component list, uses its OpAMP client to send it to the external OpAMP server by calling client.SetAvailableComponents(). The value here is that the Supervisor client didn't have complete knowledge of the available components from the start, just the hash - so having a way to modify the client's known list of available components after Start() does have value.
An alternative approach would be to more heavily modify the supervisor bootstrap process to not only retrieve the initial hash from the collector client, but also request the full list of available components after receiving the initial hash. The upside would be that we would not have to proxy any ServerToAgent flags back to the collector client - the downside would be that the original source of truth, the collector, would be cut off from these messages. It would also complicate the bootstrap process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to go with the "modify the supervisor bootstrap process" route and eliminate SetAvailableComponents, I've discovered some issues with the process during end-to-end testing that make the existing workflow less than ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand on alternate 2 and why you don't like it? It seems that would allow to redefine available components dynamically and would eliminate the problem you have with bootstrapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I think a couple aspects of it just don't feel very clean to me.
- It seems like it's overloading that single method with two responsibilities
- It seems like a divergence from the way that the other
Set____
interface methods are implemented, which are agnostic to whether Start() has been called or not.
One solution that I do think solves both these issues in a way that I feel better about is including two new methods in the OpAMPClient
interface, SetAvailableComponents
and InitializeAvailableComponents
- one which would need to be called before Start() if the ReportAvailableComponents capability was defined (but would only set up client state, nothing more), one which could only be called after Start(), which would allow for dynamic available component redefinition. With that being said, I'm not sure that my implementation within the supervisor and OpAMP extension would change substantially - if a case emerges where the available components for a collector can change dynamically, then it will be useful to have the SetAvailableComponents method already defined, but until that case emerges, I think it would likely be unused.
With all that being said, I'm not opposed to being overridden here if getting rid of StartSettings.AvailableComponents and adding this code to (c *ClientCommon) SetAvailableComponents seems like the strongly better option to you.
if !c.isStarted {
return c.ClientSyncedState.SetAvailableComponents(components)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reconsidered. Think I'm going to go with the approach.
Comfortable moving this into Ready for Review - have successfully manually tested this implementation end-to-end with a custom build of the OpAMP Supervisor and OpAMP extension in open-telemetry/opentelemetry-collector-contrib. |
This reverts commit 9e2acf3.
Implements the specification added to opamp-spec in open-telemetry/opamp-spec#201