macos: fix missing pthread mutex init after calloc#21
macos: fix missing pthread mutex init after calloc#21sitano wants to merge 1 commit intoMariaDB:mariadb-4.xfrom
Conversation
calls constructor for a mutex in a struct value init-ed with gu_calloc. in path `gcs_core_create() -> gcs_group_init()`, the first one allocates `gcs_core_t* core` with gu_calloc() whereas `gcs_code_t` has `gcs_group_t group` with `gu::Mutex memb_mtx_`. After memory allocation gu::Mutex constructor was not called that lead to an error on Darwin in a call to pthread mutex lock. Signed-off-by: Ivan Prisyazhnyy <john.koepi@gmail.com>
67981b9 to
ec7d79d
Compare
|
created MDEV-34717 to refer to this. |
| int const appl_proto_ver) | ||
| { | ||
| // here we also create default node instance. | ||
| new (&group->memb_mtx_) gu::Mutex(NULL); |
There was a problem hiding this comment.
This reminds me of the recent MDEV-34625 and MariaDB/server#3408. I understood that macOS would use a clang based compiler by default.
I think that use of placement new after a call to a calloc like operation may lead to surprises when using GCC 6 or later, if the constructor is expecting that some fields were already initialized by a previous write. GCC 6 and later could optimize away such pre-constructor writes, thanks to -flifetime-dse.
That macOS (as well as AIX) are not happy with zero initialization for pthread_mutex_t bit me in MariaDB/server#3433.
There was a problem hiding this comment.
This reminds me of the recent MDEV-34625 and MariaDB/server#3408. I understood that macOS would use a clang based compiler by default.
I think that use of placement new after a call to a calloc like operation may lead to surprises when using GCC 6 or later, if the constructor is expecting that some fields were already initialized by a previous write. GCC 6 and later could optimize away such pre-constructor writes, thanks to -flifetime-dse.
MacOS uses Clang yes. But MDEV-34625 IMHO is different. Moreover, the Godbolt example does not look correct - it is fine that the memset in https://gcc.godbolt.org/z/5n87z1raG is optimized out because it is expected (I suppose) that after a class constructor is called all class variables are initialized. Thus, the compiler may conclude that if we have void *buf = malloc(size S); s = new (buf) buf; is equivalent to S *s = new S;.
But it is different if you have malloc() buf size bigger than the memory that the constructor() is expected to touch:
https://gcc.godbolt.org/z/Y43YW7vKj
struct S {
int i; // uninitialized in consturctor
S() {};
};
struct A {
char a[256];
S s;
};
int bar() {
A *buf = (A *)malloc(sizeof(A));
memset(buf, 0, sizeof(A));
S* s = new(&buf->s) S;
return s->i;
}
became call calloc in assembly or rep stosq. So memset is Not eliminated in this case and will not be eliminated in the case of gcs_core_t* core = GU_CALLOC (1, gcs_core_t); as far as sizeof(gcs_core_t) >> gcs_group_t.
it does not eliminate calloc() even if -flifetime-dse=0 (try saving *buf to external volatile var) (default is 2) (https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html).
if you still think it is not safe, I can offer that we could replace the mutex there instead of gu::Mutex memb_mtx_ we could write gu_mutex_t mutable value_ and init without calling class constructors. WDYT?
There was a problem hiding this comment.
if the constructor is expecting that some fields were already initialized by a previous write.
I think its forbidden by the spec (speculating) - constructor must init all class fields
There was a problem hiding this comment.
Yes, it is UB, with no doubt. If all fields of gu::Mutex are initialized by the constructor, then there should be no issue with the GCC -flifetime-dse on any platform.
There was a problem hiding this comment.
from what we can see in gu_mutex.hpp they all are init-ed
namespace gu
{
class Mutex
{
public:
Mutex (const wsrep_mutex_key_t* key) : value_()
#ifdef GU_MUTEX_DEBUG
, owned_()
, locked_()
#endif /* GU_MUTEX_DEBUG */
{
if (gu_mutex_init (key, &value_))
gu_throw_fatal;
}
protected:
gu_mutex_t mutable value_;
#ifdef GU_MUTEX_DEBUG
gu_thread_t mutable owned_;
bool mutable locked_;
#endif /* GU_MUTEX_DEBUG */
};
}
|
This issue is currently worked by Alexey and his comment was "I'd rather make a constructor for group struct." |
|
@janlindstrom shall we close this PR then? Speaking of "I'd rather make a constructor for group struct." the less invasive thing could be just to replace the |
|
@sitano Lets wait for Alexey's decision. Galera and MariaDB currently do not support officially MacOS so this is not very high-priority issue currently. |
|
Sorry, I am a bit short on time. Hope I will do the trials tomorrow. |
|
tested 78f68e9 on MBP M3Max ( works well! I have managed to execute Galera node and join another node. (@janlindstrom, @ayurchen ) |
|
how things are going with #21 (comment) ? |
|
I see those (#21 (comment)) changes are not merged just yet https://github.com/MariaDB/galera/blob/mariadb-4.x/gcs/src/gcs_group.cpp#L60? |
|
close this one as soon as #23 is closed per changes https://github.com/MariaDB/galera/blob/mariadb-4.x/gcs/src/gcs_core.cpp#L128 are already there. See #23 (comment). |
530dcad to
73e530e
Compare
|
Closing as done. |
calls constructor for a mutex in a struct value init-ed with gu_calloc.
in path
gcs_core_create() -> gcs_group_init(), the first one allocatesgcs_core_t* corewith gu_calloc() whereasgcs_code_thasgcs_group_t groupwithgu::Mutex memb_mtx_. After memory allocation gu::Mutex constructor was not called that lead to an error on Darwin in a call to pthread mutex lock.cherry-pick from #20.
Stacktraces from Darwin
Originally it failed on boot in:
where as init previously called from: