-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional error handling to CosmosHealthCheck #4781
base: main
Are you sure you want to change the base?
Add additional error handling to CosmosHealthCheck #4781
Conversation
src/Microsoft.Health.Fhir.CosmosDb.UnitTests/Features/Health/CosmosHealthCheckTests.cs
Fixed
Show resolved
Hide resolved
ex, | ||
"Failed to connect to the data store. Request has timed out."); | ||
|
||
return HealthCheckResult.Degraded( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should timeout requests to CosmosDB result in Degraded or ServiceUnavailable? 408 status code can mean the database is overloaded from client requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with @fhibf - degraded is the proper behavior here.
VerifyErrorInResult(result.Data, "Error", FhirHealthErrorCode.Error408.ToString()); | ||
} | ||
|
||
private void VerifyErrorInResult(IReadOnlyDictionary<string, object> dictionary, string key, string expectedMessage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this helper due to code scanning errors.
// Reference: https://learn.microsoft.com/azure/cosmos-db/nosql/conceptual-resilient-sdk-applications#should-my-application-retry-on-errors | ||
static bool IsRetryableException(Exception ex) => | ||
ex is CosmosOperationCanceledException || | ||
(ex is CosmosException cex && (cex.StatusCode == HttpStatusCode.ServiceUnavailable || cex.StatusCode == (HttpStatusCode)449)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we include HttpStatusCode 408 as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the code now I see that HTTP408 is treated differently. That's fine.
|
||
void LogAdditionalRetryableExceptionDetails(Exception exception) | ||
{ | ||
if (exception is CosmosException cosmosException && cosmosException.StatusCode == HttpStatusCode.ServiceUnavailable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also add the logginc for HTTP449 and HTTP408?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this can be removed because it's added when useful to the exception message in the CosmosSDK.
src/Microsoft.Health.Fhir.CosmosDb.UnitTests/Features/Health/CosmosHealthCheckTests.cs
Fixed
Show resolved
Hide resolved
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Description
Related issues
AB#137391
Testing
Unit testing.
FHIR Team Checklist
Semver Change (docs)
Patch|Skip|Feature|Breaking (reason)