Fix issues with supervisor startup timeout #11626
Open
+25
−24
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change addresses two issues with the kernel supervisor's startup timeout measurement.
First, the timeout countdown starts when we ask for the terminal to be created, not when the terminal is actually established. When the supervisor extension starts early in the boot process, the terminal infrastructure/ptyhost might not be fully online, so the
createTerminalcan spend a long time waiting for it to be ready, exhausting most or all of the timeout.Second, we retry every 100ms to find the connection file, but don't try more than 100 times no matter what that timeout is set to, with the result that you can only ever wait for 10s once we get into the main retry loop.
There are four fixes in this PR:
Note that all of these changes are speculative as I have not been able to reproduce the error myself; they are based on analysis of logs provided by @jennybc.
Addresses #11010
Release Notes
New Features
Bug Fixes
QA Notes
Jenny noted that this happens specifically when updating to a new version of the daily build.