Skip to content

Commit 2907ee5

Browse files
committed
Fix race condition causing crash when setting content filter during spin (#1372)
Investigated and resolved an intermittent crash (`SIGABRT` / "pure virtual method called") occurring in the underlying FastDDS layer. **Crash log** ```bash ✔ setContentFilter (2003ms) pure virtual method called terminate called without an active exception Aborted (core dumped) Error: Process completed with exit code 134. ``` **Callstack** ```bash #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x00007cfa4744527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007cfa474288ff in __GI_abort () at ./stdlib/abort.c:79 #5 0x00007cfa478a5ff5 in __gnu_cxx::__verbose_terminate_handler () at ../../../../src/libstdc++-v3/libsupc++/vterminate.cc:95 #6 0x00007cfa478bb0da in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:48 #7 0x00007cfa478a5a55 in std::terminate () at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:58 #8 0x00007cfa478bbfc3 in __cxxabiv1::__cxa_pure_virtual () at ../../../../src/libstdc++-v3/libsupc++/pure.cc:50 #9 0x00007cf9f689e916 in eprosima::fastdds::dds::detail::DataReaderHistory::get_first_untaken_info (this=0x2d825968, info=...) at /home/jenkins-agent/workspace/ci_packaging_linux/ws/src/eProsima/Fast-DDS/src/cpp/fastdds/subscriber/history/DataReaderHistory.cpp:364 #10 0x00007cf9f688d9fb in eprosima::fastdds::dds::DataReaderImpl::get_first_untaken_info (this=<optimized out>, info=info@entry=0x7cf9e57f8d80) at /home/jenkins-agent/workspace/ci_packaging_linux/ws/src/eProsima/Fast-DDS/src/cpp/fastdds/subscriber/DataReaderImpl.cpp:793 #11 0x00007cf9f688a2dd in eprosima::fastdds::dds::DataReader::get_first_untaken_info (this=<optimized out>, info=info@entry=0x7cf9e57f8d80) at /home/jenkins-agent/workspace/ci_packaging_linux/ws/src/eProsima/Fast-DDS/src/cpp/fastdds/subscriber/DataReader.cpp:278 #12 0x00007cfa44644c46 in rmw_fastrtps_shared_cpp::__rmw_wait (identifier=<optimized out>, subscriptions=<optimized out>, guard_conditions=<optimized out>, services=0x7cf9dc000bc0, clients=0x7cf9dc000ba8, events=0x7cf9dc000bd8, wait_set=0x7cf9dc000f60, wait_timeout=0x7cf9e57f8ea0) at /home/jenkins-agent/workspace/ci_packaging_linux/ws/src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/rmw_wait.cpp:235 #13 0x00007cfa446ac27f in rmw_wait (subscriptions=<optimized out>, guard_conditions=<optimized out>, services=<optimized out>, clients=<optimized out>, events=<optimized out>, wait_set=<optimized out>, wait_timeout=0x7cf9e57f8ea0) at /home/jenkins-agent/workspace/ci_packaging_linux/ws/src/ros2/rmw_fastrtps/rmw_fastrtps_cpp/src/rmw_wait.cpp:33 #14 0x00007cfa44adf742 in rcl_wait (wait_set=0x7cf9e57f9890, timeout=<optimized out>) at /home/jenkins-agent/workspace/ci_packaging_linux/ws/src/ros2/rcl/rcl/src/rcl/wait.c:670 #15 0x00007cfa44b54625 in rclnodejs::Executor::WaitForReadyCallbacks(rcl_wait_set_s*, int) () from /home/minggang/proj/rclnodejs/build/Release/rclnodejs.node #16 0x00007cfa44b561aa in rclnodejs::Executor::Run(void*) () from /home/minggang/proj/rclnodejs/build/Release/rclnodejs.node #17 0x00007cfa4749caa4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447 #18 0x00007cfa47529c6c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 ``` **Root Cause Analysis:** The crash was caused by a race condition between the main JavaScript thread and the background worker thread: 1. The node creates a background thread when `node.spin()` is called to run the `rcl` executor and wait for messages. 2. Calling `subscription.setContentFilter()` from the main thread invokes `rcl_subscription_set_content_filter`, which modifies the underlying DDS DataReader. 3. If the background thread is simultaneously accessing that same DataReader (e.g., in `rcl_wait` or checking for new data), it leads to memory corruption or accessing an object in an invalid state, resulting in the "pure virtual method called" termination. **Fix Implementation:** Updated tests, examples, and documentation to enforce a thread-safe pattern. The node must now explicitly stop the background executor before modifying the content filter, and restart it afterwards: ```javascript node.stop(); // Stop background thread subscription.setContentFilter(filter); // Safe modification node.spin(); // Restart background thread ``` Fix: #1368
1 parent fa58207 commit 2907ee5

File tree

2 files changed

+13
-3
lines changed

2 files changed

+13
-3
lines changed

src/rcl_subscription_bindings.cpp

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -283,18 +283,22 @@ Napi::Value SetContentFilter(const Napi::CallbackInfo& info) {
283283
free(argv);
284284
}
285285

286+
std::string error_string = "";
287+
if (ret != RCL_RET_OK) {
288+
error_string = rcl_get_error_string().str;
289+
rcl_reset_error();
290+
}
291+
286292
rcl_ret_t fini_ret =
287293
rcl_subscription_content_filter_options_fini(subscription, &options);
288294

289295
if (ret != RCL_RET_OK) {
290-
std::string error_string = rcl_get_error_string().str;
291-
rcl_reset_error();
292296
Napi::Error::New(env, error_string).ThrowAsJavaScriptException();
293297
return env.Undefined();
294298
}
295299

296300
if (fini_ret != RCL_RET_OK) {
297-
std::string error_string = rcl_get_error_string().str;
301+
error_string = rcl_get_error_string().str;
298302
rcl_reset_error();
299303
Napi::Error::New(env, error_string).ThrowAsJavaScriptException();
300304
return env.Undefined();

test/test-subscription-content-filter.js

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -295,7 +295,10 @@ describe('subscription content-filtering', function () {
295295
const contentFilter5 = {
296296
expression: 'data = 5',
297297
};
298+
// Stop spinning before changing content filter to avoid race condition
299+
this.subscriberNode.stop();
298300
subscription.setContentFilter(contentFilter5);
301+
this.subscriberNode.spin();
299302
resolve(msgCnt);
300303
}, SUBSCRIBER_WAIT_TIME)
301304
);
@@ -364,7 +367,10 @@ describe('subscription content-filtering', function () {
364367
const p1 = new Promise((resolve) =>
365368
setTimeout(() => {
366369
const result = !msgCnt0 && msgCnt5 && !fail;
370+
// Stop spinning before changing content filter to avoid race condition
371+
this.subscriberNode.stop();
367372
subscription.setContentFilter();
373+
this.subscriberNode.spin();
368374
resolve(result);
369375
}, SUBSCRIBER_WAIT_TIME)
370376
);

0 commit comments

Comments
 (0)