-
Notifications
You must be signed in to change notification settings - Fork 65
Use item count instead of message count in transform proc #1645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Use item count instead of message count in transform proc #1645
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1645 +/- ##
=======================================
Coverage 84.14% 84.15%
=======================================
Files 458 458
Lines 129996 129996
=======================================
+ Hits 109384 109397 +13
+ Misses 20078 20065 -13
Partials 534 534
🚀 New features to boost your workflow:
|
|
@albertlockett I opened to have a discussion (I am not fully familiar yet with all internals!)... Let me know your thoughts on this... |
albertlockett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @cijothomas! I understand what you're getting at here, that the number of telemetry items we process is more useful than the number of batches and I think that makes sense.
My one hesitation is that items_transformed might lead to misinterpretation in the way it's currently calculated. It's not actually how many rows were processed/modified, it's the number of items that were fed input into the columnar query engine.
The distinction between these two interpretations becomes more clear when we consider a transformation like to rename an attribute:
logs | project-rename attributes["x"] = attributes["y"]In this case, items_transformed would count the number of logs, and not how many attributes were scanned/renamed.
That said, I understand that in a lot of cases the number of items we input into the transformation pipeline could be a rough proxy for how the pipeline performs than the more opaque # of batches.
I guess I'm not sure what the best immediate solution is.
I can imagine a future where the columnar query engine could output some statistics about how many rows it scanned, filtered, modified etc, and we could expose those as processor metrics.
However for now, maybe we should just omit items_transformed or keep it in units of messages to avoid confusion? What do you think?
Yes that makes sense (the overall explanation too!). I'll get back to this PR later. |
|
I believe our goal should be to follow the Collector's standard pipeline metrics plan, documented here: This question, the one raised here, is addressed in that proposal by having separate consumer- and producer-oriented counters so we can see when a component is filtering. Another answer, related to #1633 is that it makes sense to recognize three sizes for all pipeline data: Requests, Items, and Bytes. We do not currently have a bytes metric for OTAP records payloads, but we can come up with an estimate to count how much memory is used by a payload. |
jmacd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mixed feelings:
As mentioned, the Collector has an RFC (https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/rfcs/component-universal-telemetry.md) about this topic, and IMO a filter processor should be observable in all the ways: requests/items/bytes and produced/consumed. Hence, why not both?
I'm also concerned that components should not have to instrument themselves. That is something I believe we will solve in the coming weeks, through instrumenting the channels directly.
@jmacd sounds good. So would the suggestion here be we remove these metrics entirely as they're generic enough and can be calculated by an instrumented channel? And after, if/when we have the columnar query engine emitting metrics, those are what we'd expose as metrics for this component?
I think this'd be straight forward. We could just add up the size of all the |
Opening for discussion... I am wondering if the metrics should count the number of items, rather than the count of messages.
A transform with query
logs | where severity_text == "ERROR"processing 1000 logs (assume batching_processor made a batch of 1000) will now show 1000 items received, 100 items sent (if only 100 were errors), making it easy to known 900 were dropped/filtered out....OR we need both?