Skip to content

vcontrold seems to crash if it receives a signal at the same time a request is processed #141

@l3u

Description

@l3u

Hi all,

I'm not completely sure what causes this yet. I changed two things recently: I switched the machine vcontrold runs on from Devuan to Alpine, which is built upon musl and not glibc. However, I'm pretty sure this does not cause the issue, because:

The other thing I changed was: I noticed the vcontrold log to grow big, as it ran for years and I never cared about the log file. So I set up a logrotate script. I thought it was possibly a good idea to restart the daemon via a postrotate script.

What happened was: As of today, for the second time now, vcontrold crashed. I run it since 2015 and I never-ever experienced a single crash. First time, I thought this could be due to Alpine/musl, restarted the daemon and moved on. But today, I has a closer look. The crash happened at the very time the logrotate script ran, after the script would stop the daemon. The log contained "Received SIGTERM" as the last entry. Then, the init system (OpenRC) would not restart the daemon anymore due to it being crashed.

And the time this happened could be the very point in time where my other automated scripts polled the daemon for data. Which possibly works in most of cases, because it's not the very second the request is processed, but those two times it crashed, it may have been really simultaneously.

I thus concluded that maybe, the daemon crashes if some request is processed, and whilst this happens, the daemon receives a signal. Could this be the case? I have write access here and back then, I revised the build system – but I'm not a C pro and I'm not really into such deep stuff like daemon programming. So I fear I can't say much about the code … It would be nice if someone with more insight of this could have a look at the sources and check if we possibly have a bug here.

Apart from that, I now try to simply copytruncate the log and not restart the daemon via logrotate to see if this works – after all, there seems to be no real reason to restart it in the first place. But If we actually have a problem with concurrency, I think we should fix it. And I know that things that happen at the same time through different means can cause a lot of trouble ;-)

Thanks a lot!

Cheers, Tobias

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions