Commit 2907ee5
committed
Fix race condition causing crash when setting content filter during spin (#1372)
Investigated and resolved an intermittent crash (`SIGABRT` / "pure virtual method called") occurring in the underlying FastDDS layer.
**Crash log**
```bash
✔ setContentFilter (2003ms)
pure virtual method called
terminate called without an active exception
Aborted (core dumped)
Error: Process completed with exit code 134.
```
**Callstack**
```bash
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007cfa4744527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007cfa474288ff in __GI_abort () at ./stdlib/abort.c:79
#5 0x00007cfa478a5ff5 in __gnu_cxx::__verbose_terminate_handler () at ../../../../src/libstdc++-v3/libsupc++/vterminate.cc:95
#6 0x00007cfa478bb0da in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:48
#7 0x00007cfa478a5a55 in std::terminate () at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:58
#8 0x00007cfa478bbfc3 in __cxxabiv1::__cxa_pure_virtual () at ../../../../src/libstdc++-v3/libsupc++/pure.cc:50
#9 0x00007cf9f689e916 in eprosima::fastdds::dds::detail::DataReaderHistory::get_first_untaken_info (this=0x2d825968, info=...) at /home/jenkins-agent/workspace/ci_packaging_linux/ws/src/eProsima/Fast-DDS/src/cpp/fastdds/subscriber/history/DataReaderHistory.cpp:364
#10 0x00007cf9f688d9fb in eprosima::fastdds::dds::DataReaderImpl::get_first_untaken_info (this=<optimized out>, info=info@entry=0x7cf9e57f8d80) at /home/jenkins-agent/workspace/ci_packaging_linux/ws/src/eProsima/Fast-DDS/src/cpp/fastdds/subscriber/DataReaderImpl.cpp:793
#11 0x00007cf9f688a2dd in eprosima::fastdds::dds::DataReader::get_first_untaken_info (this=<optimized out>, info=info@entry=0x7cf9e57f8d80) at /home/jenkins-agent/workspace/ci_packaging_linux/ws/src/eProsima/Fast-DDS/src/cpp/fastdds/subscriber/DataReader.cpp:278
#12 0x00007cfa44644c46 in rmw_fastrtps_shared_cpp::__rmw_wait (identifier=<optimized out>, subscriptions=<optimized out>, guard_conditions=<optimized out>, services=0x7cf9dc000bc0, clients=0x7cf9dc000ba8, events=0x7cf9dc000bd8, wait_set=0x7cf9dc000f60, wait_timeout=0x7cf9e57f8ea0)
at /home/jenkins-agent/workspace/ci_packaging_linux/ws/src/ros2/rmw_fastrtps/rmw_fastrtps_shared_cpp/src/rmw_wait.cpp:235
#13 0x00007cfa446ac27f in rmw_wait (subscriptions=<optimized out>, guard_conditions=<optimized out>, services=<optimized out>, clients=<optimized out>, events=<optimized out>, wait_set=<optimized out>, wait_timeout=0x7cf9e57f8ea0)
at /home/jenkins-agent/workspace/ci_packaging_linux/ws/src/ros2/rmw_fastrtps/rmw_fastrtps_cpp/src/rmw_wait.cpp:33
#14 0x00007cfa44adf742 in rcl_wait (wait_set=0x7cf9e57f9890, timeout=<optimized out>) at /home/jenkins-agent/workspace/ci_packaging_linux/ws/src/ros2/rcl/rcl/src/rcl/wait.c:670
#15 0x00007cfa44b54625 in rclnodejs::Executor::WaitForReadyCallbacks(rcl_wait_set_s*, int) () from /home/minggang/proj/rclnodejs/build/Release/rclnodejs.node
#16 0x00007cfa44b561aa in rclnodejs::Executor::Run(void*) () from /home/minggang/proj/rclnodejs/build/Release/rclnodejs.node
#17 0x00007cfa4749caa4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#18 0x00007cfa47529c6c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
```
**Root Cause Analysis:**
The crash was caused by a race condition between the main JavaScript thread and the background worker thread:
1. The node creates a background thread when `node.spin()` is called to run the `rcl` executor and wait for messages.
2. Calling `subscription.setContentFilter()` from the main thread invokes `rcl_subscription_set_content_filter`, which modifies the underlying DDS DataReader.
3. If the background thread is simultaneously accessing that same DataReader (e.g., in `rcl_wait` or checking for new data), it leads to memory corruption or accessing an object in an invalid state, resulting in the "pure virtual method called" termination.
**Fix Implementation:**
Updated tests, examples, and documentation to enforce a thread-safe pattern. The node must now explicitly stop the background executor before modifying the content filter, and restart it afterwards:
```javascript
node.stop(); // Stop background thread
subscription.setContentFilter(filter); // Safe modification
node.spin(); // Restart background thread
```
Fix: #13681 parent fa58207 commit 2907ee5
File tree
2 files changed
+13
-3
lines changed- src
- test
2 files changed
+13
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
283 | 283 | | |
284 | 284 | | |
285 | 285 | | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
286 | 292 | | |
287 | 293 | | |
288 | 294 | | |
289 | 295 | | |
290 | | - | |
291 | | - | |
292 | 296 | | |
293 | 297 | | |
294 | 298 | | |
295 | 299 | | |
296 | 300 | | |
297 | | - | |
| 301 | + | |
298 | 302 | | |
299 | 303 | | |
300 | 304 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
295 | 295 | | |
296 | 296 | | |
297 | 297 | | |
| 298 | + | |
| 299 | + | |
298 | 300 | | |
| 301 | + | |
299 | 302 | | |
300 | 303 | | |
301 | 304 | | |
| |||
364 | 367 | | |
365 | 368 | | |
366 | 369 | | |
| 370 | + | |
| 371 | + | |
367 | 372 | | |
| 373 | + | |
368 | 374 | | |
369 | 375 | | |
370 | 376 | | |
| |||
0 commit comments