Terminate the EC2 on receiving an exception #952
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem:
Currently, for our infrastructure we are executing jobs on our EC2 nodes. In the node initialisation script we we will fetch one of the mandate auth file. If the file is not present or if it is empty we are using
exit 1
to raise an exception.Ideally, on receiving
exit 1
the container should be gracefully terminated but we have noticed that as per scriptEC2UnixLauncher.java
it will just log a warning message & continue. No termination action is taken.Proposed Solution:
This PR makes a simple change to the EC2 termination logic so that it takes account of exception raised.
We would like the EC2 node be terminated gracefully whenever it encounters any
IOException
at the same time log a warning message.Testing done
None yet - EC2UnixLauncher.java seems like the right place to add a test case for this specific scenario.
Right now this has been causing potential impact for our end users but the frequency is minimal. We want to at least get the proposed solution be analysed & reviewed for discussion.