Conversation
basil
left a comment
There was a problem hiding this comment.
This looks great so far! I like the use of a ShutdownHook, and the backend APIs are pretty much exactly what I expected.
Have you given any thought as to how this functionality might be activated or deactivated? The Java API states:
Shutdown hooks should also finish their work quickly. When a program invokes exit the expectation is that the virtual machine will promptly shut down and exit. When the virtual machine is terminated due to user logoff or system shutdown the underlying operating system may only allow a fixed amount of time in which to shut down and exit. It is therefore inadvisable to attempt any user interaction or to perform a long-running computation in a shutdown hook.
For this reason as well as to preserve backwards compatibility of the existing command-line API, I think this new functionality should probably be opt-in rather than opt-out by default. What do you think?
|
|
||
|
|
||
| public static class ShutdownHook extends Thread { | ||
| private SwarmClient client; |
| try { | ||
| client.takeSlaveOfflineAndWait(target, "JVM is shutting down"); | ||
| } catch (Exception e) { | ||
| logger.log(Level.SEVERE, "Unable to perform graceful slave shutdown: ", e); |
There was a problem hiding this comment.
What would happen if you started a shutdown operation (e.g., by sending a SIGINT with Ctrl-C or a SIGTERM with kill(1)) and then sent another signal (e.g., another SIGINT by pressing Ctrl-C again) while we were in the middle of takeSlaveOfflineAndWait()? I think the expected behavior is that we would stop immediately without continuing to wait in takeSlaveOfflineAndWait(). I think that your current code probably already handles this in that an InterruptedException would be thrown (and you are catching Exception here). It might be worth validating this with a manual test. Even if the code works as expected, might it be worth printing an additional message in this case (e.g., "terminating with extreme prejudice")?
|
|
||
| node.toComputer().setTemporarilyOffline(true, new OfflineCause.UserCause(User.current(), reason)); | ||
|
|
||
| normalResponse(req, rsp, "<response/>"); //TODO: not sure what to send here |
There was a problem hiding this comment.
I'm not sure either offhand. Assuming that the default response status is javax.servlet.http.HttpServletResponse.SC_OK, can we rely on the response status alone and simply avoid calling rsp.getCompressedWriter() in the first place for such responses? I can look into this further if you're having trouble.
| } | ||
|
|
||
| public void doGetSlaveReadyForShutdown(StaplerRequest req, StaplerResponse rsp, @QueryParameter String name) throws IOException { | ||
| Jenkins jenkins = Jenkins.getInstance(); |
There was a problem hiding this comment.
Regarding checking for secrets/permissions: I can't see any disadvantages to checking, and I can't see any advantages to not checking. In contrast, if we don't check, we increase the attack surface for potential security issues. Based on the above reasoning, I think I'd prefer to check secrets/permissions.
I have mixed feeling about the design. Then there is the opt-in. I see two main approaches:
Number 2 is not really opt-in since it will require instrumentation changes. It's also not trivial to do using only java's shutdown hooks (without explicitly registering signal hooks), but we might as well require Which leaves these options:
I'm in favor of choice 3 because I am not confident about the suitability of shutdown hooks for this use case. HTTP listener is probably too heavy and opens some security vulnerabilities, but a command file (a la nagios.cmd) would work, eg |
|
Another half-baked idea:
This might not work with unique client names. Alternatively:
Again unique client names makes finding the right one to work with difficult if multiple instances are present. |
Work in progress. Seems to be working so far.
Question: should
doGetSlaveReadyForShutdownalso check secrets/permissions?