Skip to content

Conversation

@cijothomas
Copy link
Member

Opening for discussion... I am wondering if the metrics should count the number of items, rather than the count of messages.

A transform with query logs | where severity_text == "ERROR" processing 1000 logs (assume batching_processor made a batch of 1000) will now show 1000 items received, 100 items sent (if only 100 were errors), making it easy to known 900 were dropped/filtered out....

OR we need both?

@cijothomas cijothomas requested a review from a team as a code owner December 17, 2025 16:59
@github-actions github-actions bot added the rust Pull requests that update Rust code label Dec 17, 2025
@codecov
Copy link

codecov bot commented Dec 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.15%. Comparing base (8657300) to head (a7909b8).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1645   +/-   ##
=======================================
  Coverage   84.14%   84.15%           
=======================================
  Files         458      458           
  Lines      129996   129996           
=======================================
+ Hits       109384   109397   +13     
+ Misses      20078    20065   -13     
  Partials      534      534           
Components Coverage Δ
otap-dataflow 85.84% <100.00%> (+0.01%) ⬆️
query_abstraction 80.61% <ø> (ø)
query_engine 90.01% <ø> (ø)
syslog_cef_receivers ∅ <ø> (∅)
otel-arrow-go 53.50% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cijothomas
Copy link
Member Author

@albertlockett I opened to have a discussion (I am not fully familiar yet with all internals!)... Let me know your thoughts on this...

Copy link
Member

@albertlockett albertlockett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @cijothomas! I understand what you're getting at here, that the number of telemetry items we process is more useful than the number of batches and I think that makes sense.

My one hesitation is that items_transformed might lead to misinterpretation in the way it's currently calculated. It's not actually how many rows were processed/modified, it's the number of items that were fed input into the columnar query engine.

The distinction between these two interpretations becomes more clear when we consider a transformation like to rename an attribute:

logs | project-rename attributes["x"] = attributes["y"]

In this case, items_transformed would count the number of logs, and not how many attributes were scanned/renamed.

That said, I understand that in a lot of cases the number of items we input into the transformation pipeline could be a rough proxy for how the pipeline performs than the more opaque # of batches.

I guess I'm not sure what the best immediate solution is.

I can imagine a future where the columnar query engine could output some statistics about how many rows it scanned, filtered, modified etc, and we could expose those as processor metrics.

However for now, maybe we should just omit items_transformed or keep it in units of messages to avoid confusion? What do you think?

@cijothomas
Copy link
Member Author

However for now, maybe we should just omit items_transformed or keep it in units of messages to avoid confusion?

Yes that makes sense (the overall explanation too!). I'll get back to this PR later.

@jmacd
Copy link
Contributor

jmacd commented Dec 17, 2025

I believe our goal should be to follow the Collector's standard pipeline metrics plan, documented here:
https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/rfcs/component-universal-telemetry.md

This question, the one raised here, is addressed in that proposal by having separate consumer- and producer-oriented counters so we can see when a component is filtering.

Another answer, related to #1633 is that it makes sense to recognize three sizes for all pipeline data: Requests, Items, and Bytes. We do not currently have a bytes metric for OTAP records payloads, but we can come up with an estimate to count how much memory is used by a payload.

Copy link
Contributor

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixed feelings:
As mentioned, the Collector has an RFC (https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/rfcs/component-universal-telemetry.md) about this topic, and IMO a filter processor should be observable in all the ways: requests/items/bytes and produced/consumed. Hence, why not both?
I'm also concerned that components should not have to instrument themselves. That is something I believe we will solve in the coming weeks, through instrumenting the channels directly.

@albertlockett
Copy link
Member

I believe our goal should be to follow the Collector's standard pipeline metrics plan, documented here: https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/rfcs/component-universal-telemetry.md

This question, the one raised here, is addressed in that proposal by having separate consumer- and producer-oriented counters so we can see when a component is filtering.

@jmacd sounds good. So would the suggestion here be we remove these metrics entirely as they're generic enough and can be calculated by an instrumented channel? And after, if/when we have the columnar query engine emitting metrics, those are what we'd expose as metrics for this component?

We do not currently have a bytes metric for OTAP records payloads, but we can come up with an estimate to count how much memory is used by a payload.

I think this'd be straight forward. We could just add up the size of all the RecordBatchs for every payload type using https://docs.rs/arrow/latest/arrow/array/struct.RecordBatch.html#method.get_array_memory_size

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants