Skip to content

Conversation

@adelinowona
Copy link
Contributor

No description provided.

@adelinowona adelinowona added the feature Adds new user-facing functionality. label Dec 9, 2025
Copy link
Member

@sanych-sun sanych-sun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review. Also please check failing tests

{
// Check if all requests are of the same type
var firstType = requests[0].ModelType;
var allSameType = requests.All(r => r.ModelType == firstType);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this code here will lead to multiple enumerations of the requests. We had a ticket to remove the multiple enumerations. Should we move this code into the CreateBulkWriteOperation and merge it into the mapping loop?

connectionId,
$"{state.CommandName} command failed",
null,
reply);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we construct the exception just to record it and never throw it? Looks suspicious.

Copy link
Contributor Author

@adelinowona adelinowona Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - this does look suspicious at first glance. This pattern actually predates my telemetry changes and is part of the existing architecture.

Pre-existing design:
When the server returns ok: 0, there are two separate exceptions created for different layers:

  1. Event/Observability exception - never thrown
    - Created for the CommandFailedEvent public API
    - Event subscribers (logging infrastructure, TraceSource) seem to need the exception object.
    - This is the exception you're seeing created here
  2. Protocol exception (ProcessReply in wire protocol layer) - actually thrown
    - Created after event processing completes
    - This is what user code catches

With my telemetry changes:
I'm just reusing the first exception for telemetry tags via RecordException.

<PackageReference Include="Snappier" Version="1.0.0" />
<PackageReference Include="ZstdSharp.Port" Version="0.7.3" />
<PackageReference Include="System.Buffers" Version="4.5.1" />
<PackageReference Include="System.Diagnostics.DiagnosticSource" Version="8.0.0" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the System.Diagnostics.DiagnosticSource version should match the target framework.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And looks like it's not needed for net6.

return new OperationContext(Clock, InitialTimestamp, Timeout, CancellationToken)
{
RootContext = RootContext,
OperationName = operationName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should copy values of all otel related fields in WithTimeout method as well. Otherwise they will be lost if we have to apply the timeout.


internal OperationContext WithOperationMetadata(string operationName, string databaseName, string collectionName, bool isTracingEnabled)
{
return new OperationContext(Clock, InitialTimestamp, Timeout, CancellationToken)
Copy link
Member

@sanych-sun sanych-sun Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Providing InitialTimestamp and Timeout here looks wrong to me, because copied operationContext is suppose to "continue from the point where it was copied". We probably should provide RemainingTimeout instead, in the same way as WithTimeout doing..

};
}

internal OperationContext WithOperationMetadata(string operationName, string databaseName, string collectionName, bool isTracingEnabled)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add unit tests for the new method.

public bool LoadBalanced => _loadBalanced;
public TimeSpan LocalThreshold { get { return _localThreshold; } }
public LoggingSettings LoggingSettings { get { return _loggingSettings; } }
public TracingOptions TracingOptions { get { return _tracingOptions; } }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add comparison of the TracingOptions to the Equals method?

Copy link
Contributor

@BorisDog BorisDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, partial review.

<PackageReference Include="Snappier" Version="1.0.0" />
<PackageReference Include="ZstdSharp.Port" Version="0.7.3" />
<PackageReference Include="System.Buffers" Version="4.5.1" />
<PackageReference Include="System.Diagnostics.DiagnosticSource" Version="8.0.1" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include this in BSON lib instead, just in case we'd want to add serialization Otel?

/// </summary>
public static class MongoTelemetry
{
private static readonly string s_driverVersion = ClientDocumentHelper.GetAssemblyVersion(typeof(MongoClient).Assembly);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__driverVersion

/// The ActivitySource used by MongoDB driver for OpenTelemetry tracing.
/// Applications can subscribe to this source to receive MongoDB traces.
/// </summary>
public static readonly ActivitySource ActivitySource = new("MongoDB.Driver", s_driverVersion);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want/need to expose this publicly? Is it the recommended way?

{
var activity = ActivitySource.StartActivity(commandName, ActivityKind.Client);

if (activity == null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: activity is checked for null by the callers, so maybe check in one place only?


SetConnectionTags(activity, connectionId);

if (command != null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what cases is the command null?

else
{
// For command-level exceptions, only set error status without recording details
activity.SetStatus(ActivityStatusCode.Error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: maybe this whole if/else block should move into MongoTelemetry.RecordException?

/// When set to true, no OpenTelemetry activities will be created for this client's operations.
/// Default is false (tracing enabled if configured via TracerProvider).
/// </summary>
public bool Disabled { get; set; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this additional Disabled switch?

_shouldTrackState = _shouldTrackSucceeded || _shouldTrackFailed;

// Check if tracing is disabled for this client
_shouldTrace = _tracingOptions?.Disabled != true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably not sufficient.
Do we still want to trace if there are not listeners?

operationContext.OperationName,
operationContext.DatabaseName,
operationContext.CollectionName)
: null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider more compact overload:
MongoTelemetry.StartOperationActivity(operationContext);

And ideally (if it's reasonable of course) everything related to telemetry should be handled in single place, so should
TransactionActivityScope be under MongoTelemetry? Should MongoTelemetry.StartOperationActivity also invoke TransactionActivityScope ?

public CollectionNamespace QueryNamespace;
public ExpectedResponseType ExpectedResponseType;
public bool ShouldRedactReply;
public Activity CommandActivity;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this Activity need to be disposed?

_settings.TranslationOptions);

private OperationContext CreateOperationContext(IClientSessionHandle session, TimeSpan? timeout, CancellationToken cancellationToken)
private OperationContext CreateOperationContext(IClientSessionHandle session, TimeSpan? timeout, CancellationToken cancellationToken, string operationName = null)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw I know Cancellation token should be the last parameter here but I did it this way for now since this work is still a draft and I didn't want to have to update every call.

Same idea stands for ExecuteReadOperation and ExecuteWriteOperation

Copy link
Contributor

@ajcvickers ajcvickers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think that Activity can be used in this way. Activity maintains a single ambient instance per execution context. (By execution context, I mean the state around a thread of execution that acts like a conceptual thread to the CLR even though the physical thread may change underneath it.) This means that anything that sets Activity.Current (either implicitly or explicitly) must be paired with an associated resetting to the parent in such a way that the reset can only happen when the Activity.Current is pointing at the correct instance.

This has two consequences. First, a given activity pushed to Current when it starts must be executing in the same execution context when it ends. If the thing we are trying to capture into an activity can start in one execution context and end in another, then I don’t think there is a way of implementing that on .NET.

Second, activities can’t be interleaved. So I can do this:

  • Start Activity 1
    • Start Activity 2
    • End Activity 2
  • End Activity 1

But not this:

  • Start Activity 1
    • Start Activity 2
  • End Activity 1
    • End Activity 2

Because when I end Activity 1, it is not the Current.

These are inherent issues with having an ambient Activity.Current and as far as I know cannot be worked around in any meaningful way.

The design of Activity is that you set Current, make some calls, and then reset it before the method that set it goes out of scope–i.e. as the call stack unwinds. This means that the current activity should never need to be stored anywhere by Activity.Current, and doing so indicates an issue with the design.

@ajcvickers
Copy link
Contributor

So, I spent a few hours doing more research on this, talking to Shay, and doing some experiments. Here’s the outcome:

  1. We should not be touching Activity.Current ourselves, no matter what we do. It is handled internally by StartActivity and then the disposal of the Activity object.
  2. But, Activity.Current can only be used when an activity is started and ended in the same execution context, and where activities will be disposed in the reverse order that they were created--see my test code below.
  3. Since we cannot guarantee these things, as indeed Npgsql cannot, then we can’t use Activity.Current. So instead we need to start activities with Source.StartActivity(spanName, ActivityKind.Client) "Client" means it will not touch Activity.Current.
    a. We need to store the Activity and dispose it when the activity ends.
  4. This means that Activity.Current will never be the parent of any of the Activity instances that we create. This is what Npgsql does, and there have not been any complaints.

Test code showing Activity.Current corruption:

var source1 = new ActivitySource("Sample.DistributedTracing1", "1.0.0");
var source2 = new ActivitySource("Sample.DistributedTracing2", "1.0.0");

using var tracerProvider = Sdk.CreateTracerProviderBuilder()
    .SetResourceBuilder(ResourceBuilder.CreateDefault().AddService("MySample"))
    .AddSource("Sample.DistributedTracing1")
    .AddSource("Sample.DistributedTracing2")
    .AddConsoleExporter()
    .Build();

Console.WriteLine($"Initial> Activity.CurrentId: {Activity.Current?.Id} with parent {Activity.Current?.Parent?.Id}");

var activity1 = source1.StartActivity("My First Activity");
Console.WriteLine($"After starting activity 1> Activity.CurrentId: {Activity.Current?.Id} with parent {Activity.Current?.Parent?.Id}");

var activity2 = source2.StartActivity("My Second Activity");
Console.WriteLine($"After starting activity 2> Activity.CurrentId: {Activity.Current?.Id} with parent {Activity.Current?.Parent?.Id}");

activity1?.Dispose();
Console.WriteLine($"After disposing activity 1> Activity.CurrentId: {Activity.Current?.Id} with parent {Activity.Current?.Parent?.Id}");

activity2?.Dispose();
Console.WriteLine($"After disposing activity 2> Activity.CurrentId: {Activity.Current?.Id} with parent {Activity.Current?.Parent?.Id}");

Output. Notice that:

  • After disposing one activity, Current is null when the other is still active.
  • At the end, even though both activities have been disposed, Activity.Current is still pointing at one of the disposed activities.
Initial> Activity.CurrentId:  with parent 
After starting activity 1> Activity.CurrentId: 00-816662818bab9ca191d40bd5c7464071-891dd16e565f1d39-01 with parent 
After starting activity 2> Activity.CurrentId: 00-816662818bab9ca191d40bd5c7464071-35373ab46a85ece2-01 with parent 00-816662818bab9ca191d40bd5c7464071-891dd16e565f1d39-01
After disposing activity 1> Activity.CurrentId:  with parent 
After disposing activity 2> Activity.CurrentId: 00-816662818bab9ca191d40bd5c7464071-891dd16e565f1d39-01 with parent 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Adds new user-facing functionality.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants