Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[elasticsearchexporter] Direct serialization without objmodel in OTel mode #37032

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

felixbarny
Copy link
Contributor

@felixbarny felixbarny commented Jan 6, 2025

Directly serializes pdata to JSON in OTel mode

@felixbarny
Copy link
Contributor Author

felixbarny commented Jan 8, 2025

Benchmark results:

TL;DR: the throughput is 2x for metrics and 3x for logs and traces. The allocated bytes/op are reduced 82% for metrics and 95% for logs and traces.

                                      │   old.txt   │            new_pool.txt             │
                                      │   sec/op    │   sec/op     vs base                │
Exporter/logs/otel/small_batch-10       79.16µ ± 1%   29.35µ ± 2%  -62.93% (p=0.000 n=10)
Exporter/logs/otel/medium_batch-10      757.0µ ± 2%   250.9µ ± 1%  -66.86% (p=0.000 n=10)
Exporter/logs/otel/large_batch-10       7.392m ± 1%   2.357m ± 1%  -68.12% (p=0.000 n=10)
Exporter/logs/otel/xlarge_batch-10      70.50m ± 1%   23.17m ± 1%  -67.14% (p=0.000 n=10)
Exporter/metrics/otel/small_batch-10    414.8µ ± 1%   197.8µ ± 5%  -52.32% (p=0.000 n=10)
Exporter/metrics/otel/medium_batch-10   3.960m ± 0%   1.867m ± 0%  -52.86% (p=0.000 n=10)
Exporter/metrics/otel/large_batch-10    39.97m ± 1%   19.67m ± 2%  -50.79% (p=0.000 n=10)
Exporter/metrics/otel/xlarge_batch-10   421.3m ± 1%   206.5m ± 1%  -50.99% (p=0.000 n=10)
Exporter/traces/otel/small_batch-10     79.64µ ± 0%   28.10µ ± 1%  -64.72% (p=0.000 n=10)
Exporter/traces/otel/medium_batch-10    765.5µ ± 1%   240.2µ ± 5%  -68.62% (p=0.000 n=10)
Exporter/traces/otel/large_batch-10     7.341m ± 1%   2.226m ± 1%  -69.67% (p=0.000 n=10)
Exporter/traces/otel/xlarge_batch-10    71.74m ± 1%   21.92m ± 3%  -69.44% (p=0.000 n=10)
geomean                                 4.171m        1.554m       -62.74%

                                      │   old.txt   │             new_pool.txt             │
                                      │  events/s   │  events/s    vs base                 │
Exporter/logs/otel/small_batch-10       126.3k ± 1%   340.7k ± 2%  +169.72% (p=0.000 n=10)
Exporter/logs/otel/medium_batch-10      132.1k ± 2%   398.6k ± 1%  +201.71% (p=0.000 n=10)
Exporter/logs/otel/large_batch-10       135.3k ± 1%   424.3k ± 1%  +213.68% (p=0.000 n=10)
Exporter/logs/otel/xlarge_batch-10      141.8k ± 1%   431.6k ± 1%  +204.32% (p=0.000 n=10)
Exporter/metrics/otel/small_batch-10    168.8k ± 1%   354.0k ± 5%  +109.72% (p=0.000 n=10)
Exporter/metrics/otel/medium_batch-10   176.7k ± 0%   375.0k ± 0%  +112.15% (p=0.000 n=10)
Exporter/metrics/otel/large_batch-10    175.1k ± 1%   355.9k ± 2%  +103.23% (p=0.000 n=10)
Exporter/metrics/otel/xlarge_batch-10   166.1k ± 1%   339.0k ± 1%  +104.04% (p=0.000 n=10)
Exporter/traces/otel/small_batch-10     125.6k ± 0%   355.9k ± 1%  +183.44% (p=0.000 n=10)
Exporter/traces/otel/medium_batch-10    130.6k ± 1%   416.3k ± 4%  +218.70% (p=0.000 n=10)
Exporter/traces/otel/large_batch-10     136.2k ± 0%   449.2k ± 1%  +229.72% (p=0.000 n=10)
Exporter/traces/otel/xlarge_batch-10    139.4k ± 1%   456.2k ± 3%  +227.27% (p=0.000 n=10)
geomean                                 145.0k        389.2k       +168.39%

                                      │    old.txt    │             new_pool.txt             │
                                      │     B/op      │     B/op      vs base                │
Exporter/logs/otel/small_batch-10       80.579Ki ± 0%   4.692Ki ± 0%  -94.18% (p=0.000 n=10)
Exporter/logs/otel/medium_batch-10      793.10Ki ± 0%   36.90Ki ± 0%  -95.35% (p=0.000 n=10)
Exporter/logs/otel/large_batch-10       7912.7Ki ± 0%   354.2Ki ± 1%  -95.52% (p=0.000 n=10)
Exporter/logs/otel/xlarge_batch-10      77.155Mi ± 0%   3.365Mi ± 0%  -95.64% (p=0.000 n=10)
Exporter/metrics/otel/small_batch-10    403.89Ki ± 0%   71.15Ki ± 0%  -82.38% (p=0.000 n=10)
Exporter/metrics/otel/medium_batch-10   4020.0Ki ± 0%   686.9Ki ± 0%  -82.91% (p=0.000 n=10)
Exporter/metrics/otel/large_batch-10    39.512Mi ± 0%   7.095Mi ± 0%  -82.04% (p=0.000 n=10)
Exporter/metrics/otel/xlarge_batch-10   390.35Mi ± 0%   67.93Mi ± 0%  -82.60% (p=0.000 n=10)
Exporter/traces/otel/small_batch-10     80.745Ki ± 0%   5.317Ki ± 0%  -93.42% (p=0.000 n=10)
Exporter/traces/otel/medium_batch-10    794.59Ki ± 0%   42.42Ki ± 1%  -94.66% (p=0.000 n=10)
Exporter/traces/otel/large_batch-10     7924.7Ki ± 0%   408.4Ki ± 1%  -94.85% (p=0.000 n=10)
Exporter/traces/otel/xlarge_batch-10    77.343Mi ± 0%   3.945Mi ± 0%  -94.90% (p=0.000 n=10)
geomean                                  4.219Mi        334.2Ki       -92.26%

                                      │   old.txt   │            new_pool.txt             │
                                      │  allocs/op  │  allocs/op   vs base                │
Exporter/logs/otel/small_batch-10        553.0 ± 0%    112.0 ± 0%  -79.75% (p=0.000 n=10)
Exporter/logs/otel/medium_batch-10      5.430k ± 0%   1.019k ± 0%  -81.23% (p=0.000 n=10)
Exporter/logs/otel/large_batch-10       54.19k ± 0%   10.08k ± 0%  -81.40% (p=0.000 n=10)
Exporter/logs/otel/xlarge_batch-10      541.7k ± 0%   100.6k ± 0%  -81.43% (p=0.000 n=10)
Exporter/metrics/otel/small_batch-10    4.301k ± 0%   1.568k ± 0%  -63.54% (p=0.000 n=10)
Exporter/metrics/otel/medium_batch-10   42.83k ± 0%   15.50k ± 0%  -63.81% (p=0.000 n=10)
Exporter/metrics/otel/large_batch-10    428.0k ± 0%   154.8k ± 0%  -63.83% (p=0.000 n=10)
Exporter/metrics/otel/xlarge_batch-10   4.278M ± 0%   1.546M ± 0%  -63.85% (p=0.000 n=10)
Exporter/traces/otel/small_batch-10      594.0 ± 0%    132.0 ± 0%  -77.78% (p=0.000 n=10)
Exporter/traces/otel/medium_batch-10    5.830k ± 0%   1.218k ± 0%  -79.11% (p=0.000 n=10)
Exporter/traces/otel/large_batch-10     58.19k ± 0%   12.07k ± 0%  -79.26% (p=0.000 n=10)
Exporter/traces/otel/xlarge_batch-10    581.7k ± 0%   120.6k ± 0%  -79.28% (p=0.000 n=10)
geomean                                 35.09k        8.573k       -75.57%

scopeMetrics := scopeMetrics.At(j)
scope := scopeMetrics.Scope()
groupedDataPointsByIndex := make(map[string]map[uint32][]dataPoint)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewer: I made it so that documents from different scopes are never merged. This simplified the serialization logic and also fixes a subtle bug in the current implementation where we're only hashing the scope attributes but not the scope name. This leads to grouping of potentially different scopes to the same document. I guess as a consequence, we should also add the scope name as a dimension in the mappings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think by moving this here, rather than outside of the scopeMetrics loop, we're assuming that there will never be two identical scopes within a resource. Is that a safe assumption?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it's no worse than the existing assumption that resourceMetrics is free of duplicate resources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will make it safe is that the only consequence of being wrong in the assumption is leaving some storage savings on the table. In other words, we should prioritize elastic/elasticsearch#99123, which turns out to be more of an issue than anticipated in various contexts.

Copy link
Contributor

@axw axw Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't duplicates resources/scopes lead to duplicate _tsid & doc rejections? Definitely agree on prioritising that issue though...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would, until we fix the referenced issue.

@felixbarny felixbarny marked this pull request as ready for review January 10, 2025 07:42
@felixbarny felixbarny requested a review from a team as a code owner January 10, 2025 07:42
@felixbarny felixbarny requested a review from songy23 January 10, 2025 07:42
bytes.Buffer.Write is guaranteed to not return an error
Copy link
Contributor

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The amount of handwritten serialisation makes me a little uncomfortable, but we can perhaps improve that with code generation later.

exporter/elasticsearchexporter/pdata_serializer.go Outdated Show resolved Hide resolved
exporter/elasticsearchexporter/pdata_serializer.go Outdated Show resolved Hide resolved
@felixbarny
Copy link
Contributor Author

felixbarny commented Jan 10, 2025

I've introduced pooling for the buffer holding the serialized events in 5e523c5. This reduced allocation by another 60% and increased throughput as well. I suppose most of the remaining allocations are from creating the pdata model itself, something that we can't easily optimize and not allocations that are directly caused by the ES exporter itself. I've updated the benchmark results in #37032 (comment).

felixbarny and others added 4 commits January 10, 2025 17:58
exporter/elasticsearchexporter has more than one function: "NewBufferPool,NewFactory"
Copy link

codecov bot commented Jan 10, 2025

Codecov Report

Attention: Patch coverage is 92.37875% with 33 lines in your changes missing coverage. Please review.

Project coverage is 79.60%. Comparing base (992d3b0) to head (29e9daf).

Files with missing lines Patch % Lines
exporter/elasticsearchexporter/pdata_serializer.go 94.53% 11 Missing and 6 partials ⚠️
exporter/elasticsearchexporter/exporter.go 68.18% 6 Missing and 8 partials ⚠️
exporter/elasticsearchexporter/model.go 96.92% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #37032      +/-   ##
==========================================
- Coverage   79.60%   79.60%   -0.01%     
==========================================
  Files        2252     2254       +2     
  Lines      211920   212032     +112     
==========================================
+ Hits       168704   168781      +77     
- Misses      37549    37576      +27     
- Partials     5667     5675       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants