-
-
Notifications
You must be signed in to change notification settings - Fork 431
asyncio extension #930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
asyncio extension #930
Conversation
Revert before merge. Allows tests to run on guix/nix.
Includes some shortcuts in logic. There seems to be an issue with reporting pertaining now that it is possible to collect more than one frame per thread.
jaltmayerpizzorno
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preliminary review while I wait for an answer to a question I posed on Slack.
|
With the recent refactoring of scalene_profiler.py, this is going to take some work to bring up to date. |
Is there room for improving how Scalene reports asyncio code? Attributing CPU time to the event loop internals is not very useful to the user, but it is where the CPU actually spends time when the event loop sits in the select call with nothing to do (#805 shows how the correct results are still un-intuitive). There didn't seem to be an easy way to work in 'asynchronous time' into the profile results like other profilers do unless we are considering adding a new column/flag? If not, this pull request can probably be closed. |
Currently, it is impossible for Scalene to report over 100% for a program; native code which executes in parallel should still be associated with a single line of python, and even if this were not the case (this applies to the multiprocessing library too), the samples can only be accounted for if they are present in the new_frames variable during each sample, each of which are currently treated as separate threads and assigned a normalized time based on how many there are.
This means the reporting problem is a new issue: it would be wrong to add the idle frames to new_frames, because that would imply they block the python interpreter from doing anything else while they're waiting.
When an asynchronous task is suspended, we instead assume it is waiting for the entire sampling interval. The solution this PR implements for the reporting problem is to treat idle tasks as if they run sequentially after non-waiting code, one after the other (i.e., nothing truly happens in parallel). This means the total CPU time passed is adjusted to match every sample.
Because asynchronous tasks run regardless of what the GIL is doing, the results are usually biased towards asynchronous code. It still leads to the behavior most users would likely expect.
Current state: