Skip to content

Conversation

@bdunahu
Copy link

@bdunahu bdunahu commented Aug 6, 2025

Currently, it is impossible for Scalene to report over 100% for a program; native code which executes in parallel should still be associated with a single line of python, and even if this were not the case (this applies to the multiprocessing library too), the samples can only be accounted for if they are present in the new_frames variable during each sample, each of which are currently treated as separate threads and assigned a normalized time based on how many there are.

This means the reporting problem is a new issue: it would be wrong to add the idle frames to new_frames, because that would imply they block the python interpreter from doing anything else while they're waiting.

When an asynchronous task is suspended, we instead assume it is waiting for the entire sampling interval. The solution this PR implements for the reporting problem is to treat idle tasks as if they run sequentially after non-waiting code, one after the other (i.e., nothing truly happens in parallel). This means the total CPU time passed is adjusted to match every sample.

Because asynchronous tasks run regardless of what the GIL is doing, the results are usually biased towards asynchronous code. It still leads to the behavior most users would likely expect.

Current state:

  • idle-task frame collection logic is implemented in scalene_asyncio
  • it is possible to prevent time from being assigned to the asyncio event loop by filtering out frames which belong to a thread that is running an event loop but has no current task. However, I do not do this, because throwing out frames complicates the above approach to reporting results.
  • the passing of Scalene.should_trace is hacky. This function is also passed to scalene_utility.add_stack, is it possible for the implementation be moved to the utility file?

Copy link
Collaborator

@jaltmayerpizzorno jaltmayerpizzorno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preliminary review while I wait for an answer to a question I posed on Slack.

@emeryberger
Copy link
Member

With the recent refactoring of scalene_profiler.py, this is going to take some work to bring up to date.

@bdunahu
Copy link
Author

bdunahu commented Dec 22, 2025

With the recent refactoring of scalene_profiler.py, this is going to take some work to bring up to date.

Is there room for improving how Scalene reports asyncio code? Attributing CPU time to the event loop internals is not very useful to the user, but it is where the CPU actually spends time when the event loop sits in the select call with nothing to do (#805 shows how the correct results are still un-intuitive).

There didn't seem to be an easy way to work in 'asynchronous time' into the profile results like other profilers do unless we are considering adding a new column/flag? If not, this pull request can probably be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants