Skip to content

fix: prevent stdlib pyc files from invalidating runtime repos#3661

Open
rickeylev wants to merge 7 commits intobazel-contrib:mainfrom
rickeylev:fix.preserve.stdlib.pycache
Open

fix: prevent stdlib pyc files from invalidating runtime repos#3661
rickeylev wants to merge 7 commits intobazel-contrib:mainfrom
rickeylev:fix.preserve.stdlib.pycache

Conversation

@rickeylev
Copy link
Collaborator

The runtime repositories are being constantly invalidated due to pyc creation under Bazel 9
because, starting in Bazel 9, glob() functions implicitly register repository_ctx.watch()
calls on the files and directories they match. Thus, the directories where __pycache__
directories are created end up being considered changed (either directly because their
mtimes change, or indirectly, because their directory listing changes), which then invalidates
the repo, causing it to re-run.

This glob-induced-watching seems to occur even if an exclude would have excluded the file.

Note that this only seems to occur if reproducible=False, which generally wouldn't occur,
but could occur if a user is registering their own runtime and doesn't care about the sha.
Regardless, this still seems worthwhile because it allows pyc to be more safely be
generated without causing repo invalidations, while allowing them to be persisted between
repo-phase invocations.

To fix, create __pycache__ directories ahead of time and symlink them to a location that Bazel
isn't watching, i.e. outside the repository's directory. I tried creating a separate top-level
folder that wasn't matched by any globs and symlinking to it, but Bazel would read through
the symlinks and watch the underlying locations.

This also has a side-bonus that allows pyc files to be re-used in between
repository-phase invocations.

Fixes #3643

@rickeylev rickeylev requested a review from aignas as a code owner March 12, 2026 03:40
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue in Bazel 9 where the implicit watching behavior of glob() functions caused Python runtime repositories to be constantly invalidated by the creation of __pycache__ directories. The solution involves strategically symlinking these __pycache__ directories to locations outside of Bazel's watch scope, which not only prevents unnecessary repository re-runs but also enables the persistence and reuse of compiled Python bytecode (.pyc files) across build phases.

Highlights

  • Prevented Runtime Repository Invalidation: Addressed a Bazel 9 issue where glob() functions implicitly watching __pycache__ directories led to continuous invalidation of runtime repositories.
  • Implemented External __pycache__ Symlinking: Introduced logic to create __pycache__ directories outside the watched repository structure and symlink them back, preventing Bazel from detecting changes.
  • Enabled .pyc File Reuse: The new symlinking strategy allows .pyc files to persist and be reused across different repository-phase invocations, improving build performance.
  • Updated Bazel and rules_go Versions: Bumped the required Bazel version to 9.x and updated the rules_go dependency.
  • Added Reproduction Test: Included new test files (BUILD.bazel, bin.py, repro.sh) to demonstrate and verify the fix for the __pycache__ invalidation problem.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • .bazelversion
    • Updated Bazel version from 8.x to 9.x.
  • MODULE.bazel
    • Updated rules_go dependency version from 0.41.0 to 0.60.0.
  • examples/bzlmod/.bazelversion
    • Updated Bazel version from 7.x to 9.x.
  • examples/bzlmod/tests/version.py
    • Added importlib to the script.
  • python/private/hermetic_runtime_repo_setup.bzl
    • Refactored glob include and exclude patterns into separate variables for clarity and maintainability.
  • python/private/python_repository.bzl
    • Introduced _get_pycache_root function to determine and create the appropriate root directory for __pycache__ files, considering environment variables and OS specifics.
    • Added _create_pycache_symlinks function to traverse directories, identify Python files, and create symlinks for __pycache__ directories to an external, unwatched location or to /dev/null if no root is found.
    • Called _create_pycache_symlinks within _python_repository_impl to apply the pycache symlinking logic.
    • Explicitly set reproducible = False for repository metadata, likely for testing the specific scenario where the issue manifests.
  • python/private/repo_utils.bzl
    • Added _mkdir utility function to create directories, handling both in-repo and external paths.
    • Added _repo_root_relative_path utility function to get a path relative to the repository root.
    • Exposed new utility functions (_mkdir, _repo_root_relative_path) in the repo_utils struct.
  • tests/repro/BUILD.bazel
    • Added a new BUILD file to define a py_binary target for reproduction testing.
  • tests/repro/bin.py
    • Added a simple Python script to print "hello" and Python version for reproduction testing.
  • tests/repro/repro.sh
    • Added a shell script to demonstrate and test the pyc invalidation issue and its fix across different Bazel versions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fix to prevent repository invalidation from .pyc file creation under Bazel 9 by symlinking __pycache__ directories to a location outside the repository. The approach is sound and well-implemented. My review includes one critical comment about a hardcoded debugging flag that must be removed, and a few medium-severity suggestions for code cleanup related to unused variables and imports.

Comment on lines +109 to +110
root = rctx.path(".")
root_str = str(root)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable root_str is defined on line 110 but is not used within the function. It should be removed to improve code clarity.

Suggested change
root = rctx.path(".")
root_str = str(root)
root = rctx.path(".")

Copy link
Collaborator

@aignas aignas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty neat!

os_name = repo_utils.get_platforms_os_name(rctx)
is_windows = os_name == "windows"

# 1. RULES_PYTHON_PYCACHE_DIR
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document the env vars that we use here somewhere plus don't forget the changelog notes. Might be great to mention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bazel sporadically emits warning aboud importlib being modified externally - tends to break automated builds

2 participants