Use item count instead of message count in transform proc #1645

cijothomas · 2025-12-17T16:59:29Z

Opening for discussion... I am wondering if the metrics should count the number of items, rather than the count of messages.

A transform with query logs | where severity_text == "ERROR" processing 1000 logs (assume batching_processor made a batch of 1000) will now show 1000 items received, 100 items sent (if only 100 were errors), making it easy to known 900 were dropped/filtered out....

OR we need both?

codecov · 2025-12-17T17:02:34Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.15%. Comparing base (8657300) to head (a7909b8).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1645   +/-   ##
=======================================
  Coverage   84.14%   84.15%           
=======================================
  Files         458      458           
  Lines      129996   129996           
=======================================
+ Hits       109384   109397   +13     
+ Misses      20078    20065   -13     
  Partials      534      534

Components	Coverage Δ
otap-dataflow	`85.84% <100.00%> (+0.01%)`	⬆️
query_abstraction	`80.61% <ø> (ø)`
query_engine	`90.01% <ø> (ø)`
syslog_cef_receivers	`∅ <ø> (∅)`
otel-arrow-go	`53.50% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cijothomas · 2025-12-17T17:30:10Z

@albertlockett I opened to have a discussion (I am not fully familiar yet with all internals!)... Let me know your thoughts on this...

albertlockett

hey @cijothomas! I understand what you're getting at here, that the number of telemetry items we process is more useful than the number of batches and I think that makes sense.

My one hesitation is that items_transformed might lead to misinterpretation in the way it's currently calculated. It's not actually how many rows were processed/modified, it's the number of items that were fed input into the columnar query engine.

The distinction between these two interpretations becomes more clear when we consider a transformation like to rename an attribute:

logs | project-rename attributes["x"] = attributes["y"]

In this case, items_transformed would count the number of logs, and not how many attributes were scanned/renamed.

That said, I understand that in a lot of cases the number of items we input into the transformation pipeline could be a rough proxy for how the pipeline performs than the more opaque # of batches.

I guess I'm not sure what the best immediate solution is.

I can imagine a future where the columnar query engine could output some statistics about how many rows it scanned, filtered, modified etc, and we could expose those as processor metrics.

However for now, maybe we should just omit items_transformed or keep it in units of messages to avoid confusion? What do you think?

cijothomas · 2025-12-17T18:47:10Z

However for now, maybe we should just omit items_transformed or keep it in units of messages to avoid confusion?

Yes that makes sense (the overall explanation too!). I'll get back to this PR later.

jmacd · 2025-12-17T19:20:58Z

I believe our goal should be to follow the Collector's standard pipeline metrics plan, documented here:
https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/rfcs/component-universal-telemetry.md

This question, the one raised here, is addressed in that proposal by having separate consumer- and producer-oriented counters so we can see when a component is filtering.

Another answer, related to #1633 is that it makes sense to recognize three sizes for all pipeline data: Requests, Items, and Bytes. We do not currently have a bytes metric for OTAP records payloads, but we can come up with an estimate to count how much memory is used by a payload.

jmacd

Mixed feelings:
As mentioned, the Collector has an RFC (https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/rfcs/component-universal-telemetry.md) about this topic, and IMO a filter processor should be observable in all the ways: requests/items/bytes and produced/consumed. Hence, why not both?
I'm also concerned that components should not have to instrument themselves. That is something I believe we will solve in the coming weeks, through instrumenting the channels directly.

albertlockett · 2025-12-18T15:46:58Z

I believe our goal should be to follow the Collector's standard pipeline metrics plan, documented here: https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/rfcs/component-universal-telemetry.md

This question, the one raised here, is addressed in that proposal by having separate consumer- and producer-oriented counters so we can see when a component is filtering.

@jmacd sounds good. So would the suggestion here be we remove these metrics entirely as they're generic enough and can be calculated by an instrumented channel? And after, if/when we have the columnar query engine emitting metrics, those are what we'd expose as metrics for this component?

We do not currently have a bytes metric for OTAP records payloads, but we can come up with an estimate to count how much memory is used by a payload.

I think this'd be straight forward. We could just add up the size of all the RecordBatchs for every payload type using https://docs.rs/arrow/latest/arrow/array/struct.RecordBatch.html#method.get_array_memory_size

cijothomas added 2 commits December 17, 2025 08:51

Use item count instead of message count in transform proc

85b8e9e

debug

a7909b8

cijothomas requested a review from a team as a code owner December 17, 2025 16:59

github-project-automation bot added this to OTel-Arrow Dec 17, 2025

github-actions bot added the rust Pull requests that update Rust code label Dec 17, 2025

albertlockett reviewed Dec 17, 2025

View reviewed changes

jmacd reviewed Dec 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use item count instead of message count in transform proc #1645

Use item count instead of message count in transform proc #1645

Uh oh!

cijothomas commented Dec 17, 2025

Uh oh!

codecov bot commented Dec 17, 2025 •

edited

Loading

Uh oh!

cijothomas commented Dec 17, 2025

Uh oh!

albertlockett left a comment

Uh oh!

cijothomas commented Dec 17, 2025

Uh oh!

jmacd commented Dec 17, 2025

Uh oh!

jmacd left a comment

Uh oh!

albertlockett commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use item count instead of message count in transform proc #1645

Are you sure you want to change the base?

Use item count instead of message count in transform proc #1645

Uh oh!

Conversation

cijothomas commented Dec 17, 2025

Uh oh!

codecov bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cijothomas commented Dec 17, 2025

Uh oh!

albertlockett left a comment

Choose a reason for hiding this comment

Uh oh!

cijothomas commented Dec 17, 2025

Uh oh!

jmacd commented Dec 17, 2025

Uh oh!

jmacd left a comment

Choose a reason for hiding this comment

Uh oh!

albertlockett commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Dec 17, 2025 •

edited

Loading