-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Issue:
When using the OpenTelemetry Output Pipeline/Plugin to send logs to an opentelemetry endpoint, the output json/payload/fields are not formatted correctly. They should be formatted according to the opentelemetry specifications. As a result, opentelemetry is unable to process the request, from fluentbit, correctly.
Fluentbit Logs showing logs being sent to opentelemtry endpoint:
[36] ebiz: [[1704434606.150000000, {}], {"message"=>"This is a test log message for abc application", "loglevel"=>"INFO", "service"=>"helloworld", "clientIp"=>"111.222.888", "timestamp"=>"2024-01-08 18:37:08.150", "testtag"=>"fluentbit", "trace_id"=>"7ada6c95a1bd243fa9013cab515173a9", "span_id"=>"9c1544cc4f7ff369"}]
[2024/01/08 18:37:10] [debug] [upstream] proxy returned 200
[2024/01/08 18:37:10] [debug] [http_client] flb_http_client_proxy_connect connection #32 connected to myproxy.com:8080.
[2024/01/08 18:37:10] [debug] [upstream] proxy returned 200
[2024/01/08 18:37:10] [debug] [http_client] flb_http_client_proxy_connect connection #31 connected to myproxy.com:8080.
[2024/01/08 18:37:10] [debug] [upstream] KA connection #32 to myproxy.com:8080 is connected
[2024/01/08 18:37:10] [debug] [http_client] not using http_proxy for header
[2024/01/08 18:37:10] [debug] [upstream] KA connection #31 to myproxy.com:8080 is connected
[2024/01/08 18:37:10] [debug] [http_client] not using http_proxy for header
[2024/01/08 18:37:10] [ info] [output:opentelemetry:opentelemetry.1] ingest.privateotel.com:443, HTTP status=200
My fluentbit configuration:
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentbit
namespace: otel
data:
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentbit
namespace: otel
data:
custom_parsers.conf: |
[MULTILINE_PARSER]
Name multiline-rules
Type regex
Flush_timeout 2000
rule "start_state" "/(\d{4}-\d{1,2}-\d{1,2})(.*)/" "cont"
rule "cont" "/^\D(.*)/" "cont"
[PARSER]
Name named-captures
Format regex
Regex /(?<timestamp>[^ ]* .*):(?<loglevel>DEBUG|ERROR|INFO)([\s\s]*)-\|(?<id>[\w\-]*)\|(?<clientIp>[0-9\.]*)\|(?<trace_id>[0-9A-Za-z]*)\|(?<span_id>[0-9A-Za-z]*)\|(?<message>.*)/m
Time_key timestamp
Time_Format %Y-%m-%d %H:%M:%S.%L
Time_Offset -0600
Time_Keep On
fluent-bit.conf: |
[SERVICE]
Daemon Off
Flush 1
Log_Level debug
Parsers_File /fluent-bit/etc/parsers.conf
Parsers_File /fluent-bit/etc/custom_parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
Health_Check On
[INPUT]
Name tail
Log_Level error
multiline.parser multiline-rules
Path /app/logs/*.log
Tag logs
[FILTER]
Name parser
Match *
key_name log
parser named-captures
[FILTER]
Name modify
Match *
Add service ${SERVICE_NAME}
[FILTER]
Name modify
Match *
Add testtag fluentbit
[OUTPUT]
Name stdout
Log_Level trace
Match *
[OUTPUT]
Name opentelemetry
Match *
Log_Level trace
Host ingest.privateotel.com
Port 443
Header token ***************
Log_response_payload True
Tls On
Tls.verify Off
add_label app local
fluent-bit.conf: |
[SERVICE]
Daemon Off
Flush 1
Log_Level debug
Parsers_File /fluent-bit/etc/parsers.conf
Parsers_File /fluent-bit/etc/custom_parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
Health_Check On
[INPUT]
Name tail
Log_Level error
multiline.parser multiline-rules
Path /app/logs/*.log
Tag logs
[FILTER]
Name parser
Match *
key_name log
parser named-captures
[FILTER]
Name modify
Match *
Add service ${SERVICE_NAME}
[FILTER]
Name modify
Match *
Add testtag fluentbit
[OUTPUT]
Name stdout
Log_Level trace
Match *
[OUTPUT]
Name opentelemetry
Match *
Log_Level trace
Host ingest.privateotel.com
Port 443
Header token ***************
Log_response_payload True
Tls On
Tls.verify Off
add_label app local
My opentelemetry endpoint receives the request formatted as such:
{
"body": {
"clientIp": "111.222.888",
"loglevel": "INFO",
"message": "This is a test log message for abc application",
"service": "helloworld",
"span_id": "9c1544cc4f7ff369",
"testtag": "fluentbit",
"timestamp": "2024-01-08 18:37:08.150",
"trace_id": "7ada6c95a1bd243fa9013cab515173a9"
},
"instrumentation.name": "",
"instrumentation.version": "",
"observed_timestamp": 0,
"severity_text": "",
"span_id": "",
"trace_id": ""
}
As you can see above every named pair gets nested under the body. The body should simply contain the log message and all the other fields I choose to send should be nested under "fields". The proper format would looks something like what i have below.
Expected behavior:
{
"body": {
"message": "This is a test log message for abc application",
},
"clientIp": "111.222.888",
"loglevel": "INFO",
"service": "helloworld",
"instrumentation.name": "",
"instrumentation.version": "",
"observed_timestamp": 0,
"testtag": "fluentbit",
"severity_text": "",
"span_id": "9c1544cc4f7ff369",
"timestamp": "2024-01-08 18:37:08.150",
"trace_id": "7ada6c95a1bd243fa9013cab515173a9"
}
To Reproduce
Use a configuration similiar to mine. Generate a log line, extract some fields from that log line including traceid, spanid, timestamp, body, etc. Then use the opentelemetry output and point it to a opentelemetry endpoint.
Your Environment
I am running in a Kubernetes cluster using a daemonset and the configmap above.
The version on fluentbit I am using is 2.1.10.
The image is fluent/fluent-bit:2.1.10-debug
Additional context
Opentelemetry is unable to correctly process all the fields such as traceid and span id since they are not in the proper format schema
For more information on the opentelemetry logging data model and schema, please see below:
Please fix the issue or if there is something I am doing incorrectly in my fluentbit config then please advise. Thank you.