Performance additions surrounding log parsing#36
Conversation
Add _cpuinfo_field_value() for model name and cpu MHz lines. It matches the previous re.sub(r'^.*: *', '', line.rstrip(), count=1) semantics (first colon, ASCII spaces only after it, unchanged line when no colon). Document non-equivalent split/strip shortcuts in comments. This runs at startup and for --debug-info only, so impact is small but avoids the regex engine on that path.
Move the 3 parse_stat_file and 9 parse_gpuowl_log_file re.compile patterns to module-level constants (MLUCAS_STAT_* and GPUOWL_LOG_*) so each progress parse avoids repeated compilation. Matching behavior is unchanged; no per-line regex guards.
Add _gpuowl_log_need_* helpers and GPUOWL_LOG_MAIN_LINE_GUARD_RE so parse_gpuowl_log_file skips Pattern.search when a line cannot match. Keeps the same reversed scan order and match semantics. Document each helper with a short docstring.
Extract _iter_lines_reversed_chunked and forward _iter_lines_forward_normalized. On OSError from getsize/seek/read, iter_lines_reversed reads the file forward then yields reversed lines; read_last_n_lines uses deque(maxlen=n) for bounded memory. Document UTF-8/CRLF handling, fallback behavior, and concurrent-append caveat in a block comment above _LOG_TAIL_CHUNK_SIZE.
…river Add CONFIG blocks for default log corpus paths (repo sibling logfiles), output paths, timing repeats, tail line count, globs, and subprocess entrypoint. Unify GpuOwl default directory with benchmark_log_tail; replace hardcoded Windows paths in gpuowl benchmark and validate.
Replace yield-from with explicit for/yield so parsing does not fail when PATH python is Python 2 or Python 3 < 3.3 (e.g. ubuntu default python). Document in docstring; autoprimenet remains Python 3 overall—use python3.
tdulcet
left a comment
There was a problem hiding this comment.
Thanks for this major PR! It looks like it should significantly improve the performance in these areas.
Regarding only having access to a Windows system to test on, you could use the WSL (Windows Subsystem for Linux) to for example test the /proc/cpuinfo changes. Also feel free to temporary update the GitHub Actions CI workflow to run any of your scripts. The CI currently tests on various versions of Windows, macOS and Linux.
When you are finished, could you please go ahead and remove the separate scripts and benchmark code. GitHub will persevere your commits in this PR for posterity, so we will always be able to access the scripts in the future if needed.
Lastly, could you reduce the AI generated docstrings and comments. Most functions should only need a single sentence to describe what they do.
…dback. Benchmarking the guarded lines code went from 3.03 to 4.27x faster.
…oprimenet.py, benchmarked 64, 256 and 1024 chunk sizes, 256 was most optimal - Introduced _reversed_complete_lines_in_block for efficient line splitting in reversed order. - Updated _iter_lines_reversed_chunked to utilize memoryview for improved performance. - Added chunk_sweep option in run_benchmark_log_io for benchmarking with varying chunk sizes.
… commit was 8-13%. Removed benchmark code and arguments.
…u 22.04, kernel 6.6.87.2-microsoft-standard-WSL2, x86_64. Tests pass.
|
Man. Sorry the comments got out of hand. I saw the docstrings but somehow didn't see the wall of comments in a few of those functions. WSL was a good call too. I think I'm going to setup a laptop as dual boot and continue testing. I'll create another PR if I find anything. |
There was a problem hiding this comment.
This looks great. Thanks for all your work work on this! It will be included in the next update, likely the version 2.0 release.
I just made some minor simplifications, to reduce the number of new functions. I also removed the usage of deque and added support for files with \r macOS style line endings.
To clarify my previous request, feel free to add comments in the future for any complex code. It was just the verbose AI generated comments that were largely unhelpful.
I tried to keep the commits self contained so you could take or leave any of them. The PR is a bit messy as I did check in the benchmarking scripts. I also left some of the benchmarking code in autoprimenet.py should you want to run them. I didn't create modules for anything as I wanted to keep everything in autoprimnet as you have. The benchmarking code should be easy to strip out if/when the time comes.
The benchmarks for the items on the first few checkins result in a 2-3x improvement but those improvements are already sub millisecond and not really in the hot path.
Reverse read order: I did see a 900x-ish improvement in the parsing of the largest logfile with the code in the 4th checkin. Everything is returning the same values as the original code. Caveat is that I only have a windows machine to test on. I also manufactured a few other log files, one 8 MB and one 21MB. Parsing those was about 300x faster. I also have a really quick primary disk on my dev machine. Not sure how/if that will sway the results. If I have missed the boat on the reverse log file reading, LMK. If you want to scrap the whole thing, that's also fine. I'm not trying to create more work for you. :)
benchmark_logs.zip