Skip to content

Metric values drop after upgrade to 0.51.0 #24184

@mscanlon72

Description

@mscanlon72

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

After updating to the 0.51.0-alpine version of Vector, most of my metrics dropped to null or very low values. The internal_metrics are sent to a Datadog sink, and internal_logs and kubernetes_logs are sent to Splunk.

For vector.component_received_events_total:
The metrics for the kubernetes_logs source go to null, showing no data. Even when using vector top the value for this sink is N/A.
The metrics for the internal_logs source drop significantly, but do not go to zero or null. This is confirmed in Splunk, Vector is actually receiving less logs.
The metrics for the internal_metrics source drop only slightly.

Similarly, the vector.component_sent_events_total metrics also drop significantly for:
console sink
splunk_internal_logs sink

For the splunk_k8s_logs sink the metrics drop to 0

The datadog sink metrics drop slightly

Resource metrics also drop significantly. This is likely due to the metrics showing that Vector is receiving (generating) less logs and therefore less metrics. An example would be Kubernets CPU and Memory metrics.

Other Kubernetes metrics drop to zero, such as kubernetes.io.write_bytes and kubernetes.io.read_bytes

We have a lot of data for this, so please do not hesitate to ask. It'll be difficult to obfuscate some data, but we will do our best.

Configuration

sources:
  # Internal logs for Vector
  internal_logs:
    type: internal_logs
  # Internal metrics for Vector
  internal_metrics:
    type: internal_metrics
    scrape_interval_secs: 10
  # Kubernetes logs
  k8s_logs:
    type: kubernetes_logs
    glob_minimum_cooldown_ms: 500
    max_read_bytes: 2097152
    oldest_first: false

sinks:
  # Log to console
  stdout:
    type: console
    inputs: [internal_logs]
    buffer:
      max_events: 100000
    encoding:
      codec: json
      json:
        pretty: true
  # Send vector metrics to Datadog
  datadog:
    type: datadog_metrics
    inputs: [enrich_internal_metrics]
    default_api_key: "${DATADOG_API_KEY_ENV_VAR}"
  # Send k8s and internal logs to Splunk
  splunk:
    type: splunk_hec_logs
    inputs: [throttle_node_logs]
    buffer:
      max_events: 100000
    compression: gzip
    default_token: "${SPLUNK_TOKEN}"
    # vector.customConfig.sinks.splunk.endpoint needs to be overridden in Argo
    endpoint: ""
    encoding:
      codec: json

Version

0.51.0-alpine

Debug Output


Example Data

No response

Additional Context

Alpine variant running on EKS. Rolling back to <0.51.0 immediately resolves the issue.

References

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    meta: regressionThis issue represents a regressiontype: bugA code related bug.

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions