Replies: 2 comments
-
|
So it seems that |
Beta Was this translation helpful? Give feedback.
-
|
This could be a great addition to the developer documentation.
This kind of stuff is good motivation for making improvements to the symbol names—and perhaps design—of the generated code, because Assuming only one |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I think I finally figured out a way to get good profiling information about numba-compiled graphs. Usually, I use
perfto profile most things, but this doesn't work well at all with numba most of the time, because the jit-compilation doesn't save the necessary information in a way that perf can access it.Turns out however, that if we use the ahead-of-time compilation in numba, and ask it to export debugging symbols, we get everything we need.
Before we import numba, we set an environment variable
NUMBA_DEBUGINFO=1:Let's for instance say we want to profile a function like this:
Now we can extract the numba function and compile it into a shared library with debugging symbols:
Now we just have to import the module (using importlib because we defined the name as str)
While this is running we can now run any
perfcommand, for instanceperf -p{thepid} recordand CTRL-C after a while andperf reportto see the results. (pressingaon a line shows the assembly with inlined python code, and indications about how much time we spend with this instruction). Usingperf stat -dwe can also get information about cache misses, branch mispredictions etc.In this particular case we can see for instance, that we spend most of our time in
numba_funcified_graph, because the elemwise operationssquareandmuldid not get inlined for some reason I don't understand yet.Beta Was this translation helpful? Give feedback.
All reactions