Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension randomly fails to deliver logs #521

Open
aaliakseyenka opened this issue Jan 24, 2025 · 3 comments
Open

Extension randomly fails to deliver logs #521

aaliakseyenka opened this issue Jan 24, 2025 · 3 comments

Comments

@aaliakseyenka
Copy link

aaliakseyenka commented Jan 24, 2025

Hi,

Just 2 days ago we started getting random errors from lambda, which seem to be related to the extension. No changes or deployment were done, just errors appeared at some point and now randomly continue to popup. Our lambda uses the following configuration
Memory 256Mb
Runtime Amazon Linux 2023
Architecture arm64

Extension layer arn:aws:lambda:eu-central-1:464622532012:layer:Datadog-Extension-ARM:68

DD_ENV prod
DD_LOGS_CONFIG_PROCESSING_RULES [{"type": "exclude_at_match","name": "exclude_lambda_metadata","pattern": "^(START|END|INIT|REPORT).*"}]
DD_SERVERLESS_LOGS_ENABLED true
DD_SERVICE service
DD_SITE datadoghq.eu

The errors that appear:

DD_EXTENSION | ERROR | Failed to send logs to datadog: error sending request for url (https://http-intake.logs.datadoghq.eu/api/v2/logs)
Exiting: timeout, deadline: 1737666472995

The lambda has timeout of 5 seconds, but its runtime duration is only 2ms. Base only CloudWatch Logs we see that for ex. lambda finished at

2025-01-23T12:00:43.030Z

and then the error log from extension is

2025-01-23T12:00:48.058Z
DD_EXTENSION | ERROR | Failed to send logs to datadog: error sending request for url (https://http-intake.logs.datadoghq.eu/api/v2/logs)

This doesn't happen during cold starts, based on CloudWatch Logs it happens when lambda is already warm.

Does someone have any idea what could go wrong.

@astuyve
Copy link
Contributor

astuyve commented Jan 24, 2025

Hi @aaliakseyenka - thanks for the report.
It's really hard to tell from incomplete logs, but it looks like the extension is repeatedly trying to send data to datadog and then failing for some reason, and then the last request is aborted because the function times out as there is 5s between your posted timestamp and the end.

Can you open a support ticket so that we can share the full debug logs in a secure place? You can email [email protected] to get started, then ask them to escalate it to AJ so that I can find it.

Thanks!
AJ

@aaliakseyenka
Copy link
Author

thanks for the answer @astuyve. What is the best way to capture more logs from DD in order to provide them to you? because in cloudwatch we have only 2 entries right now:

  • app's result log
  • DD error I shared

@astuyve
Copy link
Contributor

astuyve commented Jan 24, 2025

You'll want to set DD_LOG_LEVEL: debug, the support team will walk you through anything additional we may need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants