Skip to content

Conversation

@jeth-ro
Copy link

@jeth-ro jeth-ro commented Dec 6, 2025

A number of changes towards the end goal of more granular control over which packages are to be installed using binary packages instead of ebuilds.

  1. Extend emerge command line interface with the following options:
    --usepkg-include <atoms> defines a whitelist for binary packages (complements existing --usepkg-exclude option)
    --getbinpkg-exclude <atoms> defines blacklist for remote binary packages
    --getbinpkg-include <atoms> defines whitelist for remote binary packages
    --nobindeps uses install target as the binary package whitelist
    No effect without -k or -g.

  2. Extend repos.conf syntax with the following attributes:
    usepkg-exclude configures a persistent and repository specific blacklist for binary packages
    usepkg-include configures a persistent and repository specific whitelist for binary packages
    No effect without -k.

  3. Extend binrepos.conf syntax with the following attributes:
    getbinpkg-exclude configures a persistent and binhost specific blacklist for remote binary packages
    getbinpkg-include configures a persistent and binhost specific whitelist for remote binary packages
    No effect without -g.

  4. Populate an implicit --getbinpkg-include list from any packages with FEATURES="getbinpkg" set in package.env and also imply -g if said list is not empty (see bug #463964).
    Only works in absence of -g or FEATURES=getbinpkg (in make.conf or emerge process environment) both of which still have global effect.

  5. Add visible indication in emerge output of when remote binary packages are being used by showing a g flag in the same place as f and F would appear for fetch restrict ebuilds.

Finally, have slipped in allowing the local cache of a binhost pkgindex to persist when said host is unreachable provided the --pretend option is in effect. This might not be desired, but I found this behaviour useful.

No tests yet, but will look at that if there is interest in actually merging this. For now have done my best to ensure existing tests still pass.

There remain a number of obvious problems, and no doubt others:

  • no warnings for no-ops like specifying --usepkg-include without -k or --getbinpkg-include without -g
  • no warnings for conflicting selections, e.g. same package for --usepkg-include and --usepkg-exclude, etc
  • the order of priority or override between *.conf and the command line is probably not correct (exclude always wins)
  • the package.env implementation is proof-of-concept at best, I suspect passing atoms to config.setcpv() is not sufficient
  • use of ::repo notation or operators is restricted on command line, but used in new repos.conf or binrepos.conf attributes will probably crash emerge

Will tackle these things if it looks like this will go anywhere, but felt I'd taken this far enough to seek feedback.

pkgsettings = portage.config(clone=emerge_config.target_config.settings)
for cp in penvdict:
for atom, env in penvdict[cp].items():
pkgsettings.setcpv(atom)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suspect this is probably not correct in that other uses of setcpv() usually pass a _pkg_str or Package rather than Atom. Suggestions welcome!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it seems that we only access pkgsettings.features, maybe we could add an optimized pkgsettings.features_contains(atom, feature) method to call here.

@jeth-ro
Copy link
Author

jeth-ro commented Dec 6, 2025

For what it's worth, this approach has been taken for a few of reasons:

  • builds on and is consistent with existing interfaces, i.e. --usepkg-exclude
  • separates the business of local and remote binary packages, which may be useful to some
  • is more flexible than package.env as portage currently does not implement things like FEATURES="-getbinpkg"

In discussions in #gentoo-portage it seemed pretty clear there was a desire for FEATURES set via package.env to be the mechanism to achieve this, so I've tried to achieve that too (yup still needs some work). The primary challenge around this is that package.env is not really in play during the population of the binary tree or dependency resolution, as I think has been noted already on the above linked bug.

I did also try this, which seemed like an easy win:

diff --git a/lib/portage/dbapi/bintree.py b/lib/portage/dbapi/bintree.py
index 9ee35ff1bd..d53ac1f07c 100644
--- a/lib/portage/dbapi/bintree.py
+++ b/lib/portage/dbapi/bintree.py
@@ -1778,6 +1778,7 @@ class binarytree:
             have_getbinpkg_exclude = not getbinpkg_exclude.isEmpty()
             have_getbinpkg_include = not getbinpkg_include.isEmpty()
             remote_base_uri = pkgindex.header.get("URI", base_url)
+            pkgsettings = portage.config(clone=self.settings)
             for d in pkgindex.packages:
                 cpv = _pkg_str(
                     d["CPV"],
@@ -1787,6 +1788,12 @@ class binarytree:
                     repoconfig=repo,
                 )

+                # Check for package specific FEATURES=getbinpkg
+                if "getbinpkg" not in self.settings.features:
+                    pkgsettings.setcpv(cpv)
+                    if "getbinpkg" not in pkgsettings.features:
+                        continue
+
                 # Respect remote binary exclude and include lists if defined
                 in_getbinpkg_exclude = (
                     have_getbinpkg_exclude and getbinpkg_exclude.containsCPV(cpv)

This works provided -g is also forced. However, it is seriously non-performant. On under-powered systems, which are presumably bigger users of binary packages, executing binarytree.populate_remote() goes from ~3 seconds to almost 40 seconds (timing on core i5 limited to 800 MHz). So config.setcpv() obviously does a lot of stuff and I suspect was never intended to be called on an an entire tree, but it also is presumably the correct means resolve features for a given package.

Using package.env to imply --getbinpkg-include instead only pays the cost of setcpv() for packages set under package.env. There also seems to be precedent for mapping features to command line options like this, it's how FEATURES=getbinpkg is implemented globally after all.

@thesamesam
Copy link
Member

thesamesam commented Dec 6, 2025

In discussions in #gentoo-portage it seemed pretty clear there was a desire for FEATURES set via package.env to be the mechanism to achieve this

I think some people felt that way but I didn't at least. FEATURES is really overloaded and it involves a bunch of extra work as you've noted. It's kind of the old way of doing things when we don't have a better way (sometimes it is the best way still, but it shouldn't be the default knob).

Having a proper config file for this doesn't really seem to have any downsides other than needing to document it, and has upsides like making scripting a bit easier, and letting us add in more information/options in future. Having more files with a specific purpose rather than "god files" is something I'm trying to steer us towards.

It's also taken a while to get populate_remote to be decently fast, it'd be a shame if we lost that.

It may well cause other problems if there's contexts where FEATURES needs to be re-evaluated but isn't currently...

Compare this to how we previously had PORTAGE_BINHOST as an envvar/var in make.conf, now we have binrepos.conf. We also now recommend setting any USE_EXPAND via package.use (even globally with */* FOO: blah) because of the downsides of the single-var approach. We're discussing a similar thing for mirrors right now too.

TL;DR: I don't think we should concern ourselves with package.env at all for now, possibly ever.

@thesamesam thesamesam requested a review from zmedico December 6, 2025 13:06
@jeth-ro
Copy link
Author

jeth-ro commented Dec 6, 2025

My perception (as a user) is that package.env is a means to influence the build environment more than emerge itself, but portage(5) isn't particularly clear on that so maybe I've got it backwards. If a separate interface for this is the chosen way, then the docs could draw a clearer line on what works via package.env and what doesn't. Ideally emerge would then emit warnings and tips if package environments are set in ways which don't work as expected. If it is desired that these things do work then it is potentially a much more invasive change, which I'm willing to look at but could turn out to be a long road.

Put another way... It would be much easier to close out this change without support for package.env. While I've got the package specific FEATURES="getbinpkg" thing to sort of work, I cannot clearly see the "right" path to implementing it yet. Without it, the loose ends here are hopefully more like tidy up and admin.

@eli-schwartz
Copy link
Member

I think some people felt that way but I didn't at least. FEATURES is really overloaded and it involves a bunch of extra work as you've noted. It's kind of the old way of doing things when we don't have a better way (sometimes it is the best way still, but it shouldn't be the default knob).

FTR, while I do feel this way, it's less about "it's a good UX" and more that I think it's an absolutely terrible UX for something to be in FEATURES in make.conf but silently ignored in package.env. (The obvious response to this is "Hey Eli betcha didn't know this other thing also gets ignored in package.env" but don't worry, I think that's terrible too. :D)

I daresay I'd also be happy with the compromise option "we will raise a configuration validation error that aborts emerge if you put certain FEATURES in package.env". And maybe, that will run late enough that it's already loaded so "slowdown problem SOLVED".

@thesamesam
Copy link
Member

That could indeed work out (though I wouldn't necessarily ask @jeth-ro to do it as part of this PR).

@jeth-ro
Copy link
Author

jeth-ro commented Dec 15, 2025

Updates:

  • Command line options now have higher priority than binrepos.conf or repos.conf
  • Warnings now issued for conflicting getbinpkg-exclude with getbinpkg-include in binrepos.conf
  • Warnings now issued for conflicting usepkg-exclude with usepkg-include in repos.conf
  • Warnings now issued for unsupported atom notations in repos.conf or binrepos.conf

When a command line option overrides something from repos.conf or binrepos.conf then a warning is issued to that effect. For example, using --usepkg-include foo/bar on the command line while usepkg-exclude = foo/bar exists in repos.conf.

"Conflicting" means a non-empty intersection of the include and exclude package sets, as dictated by whatever the Atom class considers equality. For example, this means that specifying usepkg-exclude = foo/bar in the repos.conf for [gentoo] and usepkg-include = foo/bar under [local] is not a conflict as these config attributes are repository specific and that is reflected in the Atom instances.

Emerge will continue after all these warnings. For conflicts in repos.conf or binrepos.conf the behaviour may not be as expected (exclude wins), but the user is at least made aware. Atoms with unsupported adornments are ignored after the warning, but valid atoms on the same attribute remain in effect.

No warnings have been added for conflicting include/exclude lists on the command line, or for use of these options on the command line without -k or -g. Can add these if desired, but silence is in line with current the behaviour for --usepkg-exclude.

I believe this covers off my initial list of "obvious problems" :)

@jeth-ro
Copy link
Author

jeth-ro commented Dec 15, 2025

Will continue looking at additional tests for now, but if nobody objects by the time I finish that I will look to drop the below commit and so remove the package specific FEATURES=getbinpkg attempt from this change.

emerge: allow FEATURES=getbinpkg set via package.env (bug 463964)

Am happy to open another a separate pull request for that and to look at what can be done to either support this or at least add warnings as discussed above, but would ideally like to progress this change first.

@jeth-ro
Copy link
Author

jeth-ro commented Dec 15, 2025

A passing thought... Would @usepkg and @getbinpkg sets (reflecting those packages configured in repos.conf and binrepos.conf respectively) be useful to add?

This might be of limited utility as would only reflect the current config, but perhaps there are use cases if said config is relatively static. For example, emerge --update @usepkg where the binpkgs have just been built on another host and you don't (for some reason) want to update all of @world at the same time.

@eli-schwartz
Copy link
Member

eli-schwartz commented Dec 15, 2025

I don't really care about the list of configured atoms.

But I do fairly often run emerge --pretend --getbinpkg to check which packages are currently available as binaries, and then rerun without --pretend, with those copy pasted packages.

So perhaps I'd like an @availablebinpkg set to stop my need for elaborate sed commands. :) But it would probably be better off as @world --ignore-missing-binpkgs. It would be like --getbinpkgonly except instead of forcing an error it removes packages from the mergelist.

@jeth-ro
Copy link
Author

jeth-ro commented Dec 16, 2025

I believe that --getbinpkgonly implies --usepkgonly (although the man page doesn't say so explicitly, it's in the region of actions.py:3570) and --usepkgonly is implemented by removing porttree from the dependency calculation entirely. So while this indeed forces an error if there's no available binary packages for something which is needed, this is presumably because emerge cannot find a solution (without porttree) that won't potentially break your system...?

In the case of --update --getbinpkgonly then vartree can presumably be used for unavailable binaries, i.e. simply don't update them rather than bailing. A quick rudimentary test indicates that this may be the current behaviour.

So I think I'm missing something as you say you are doing this already. When if I run emerge -1gp <x> and then emerge -1gp <y> where <y> is a list of the binary-only subset of the packages to be installed then the second command still pulls the remaining packages from ebuilds. Of course there's always --nodep, but I'd worry that adding features that need to imply --nodep to work is going to lead nasty surprises for users.

@eli-schwartz
Copy link
Member

eli-schwartz commented Dec 16, 2025

So I think I'm missing something as you say you are doing this already. When if I run emerge -1gp <x> and then emerge -1gp <y> where <y> is a list of the binary-only subset of the packages to be installed then the second command still pulls the remaining packages from ebuilds. Of course there's always --nodep, but I'd worry that adding features that need to imply --nodep to work is going to lead nasty surprises for users.

I think that you're overthinking this a lot.

I have 5 leaf packages in my world file. Two of them are available right now as binaries, and three are not available as binaries. All of them have gotten new version bumps. A theoretical emerge -uDNg @world --ignore-missing-binpkgs would result in only installing the two that have binaries and simply ignoring the other three (which I'll probably install tomorrow or something, after the server gets around to building binaries for them).

Recursive dependencies either don't exist or for the purpose of this discussion already have binaries if the leaf package does.

Edit: emerge -uDN @world --getbinpkgonly seems to do what I wanted, yes. I didn't expect it would, since explicitly specifying a package can and does fail if not satisfiable. It also implies --rebuilt-binaries but I can disable that.

@thesamesam
Copy link
Member

i.e. it's in the realm of helping bug 924772.

@thesamesam
Copy link
Member

thesamesam commented Dec 16, 2025

I will give this a closer look shortly (next few days). I think we've converged on the design and functionality (though I can't promise I won't have a thought involving those).

Could you add some tests now for making sure these options (inc. combinations of cmdline and config) do what we want? The ResolverPlayground infra should make this easy as it already has a way to say we want something as a binary, to mark which binaries are available, and to add files in (like binrepos.conf which I have in #1522).

The tests where you want to check for binaries from a specific repo will require a little more work, I think, like I'm doing over there. Note that in that PR, because of what it implements, it does not contain the easier form you can use here too. See test_runtime_cycle_merge_order.py for an example of that.

@jeth-ro
Copy link
Author

jeth-ro commented Dec 16, 2025

Yeah, I may be overthinking it :) I haven't yet run into #924772 myself, so had not appreciated that complexity. My usage of binpkgs is pretty light, thus the desire for a whitelist.

@thesamesam - assuming looking closer means code changes, here's a few things I'm pretty sure I've not take the ideal approach with. Stlil don't know the code very well...

  • Extending the binarytree.populate() interface feels messy, in particular the need to prepare kwargs in multiple places. Is there a better way to get the command line lists into binarytree?
  • Whether it's ok to update depgraph._frozen_config in the _resolve() function, unsure if "frozen" implies some degree of immutability? Did it here because there looks to be a lot of machinery to arrive at the desired Atoms which hasn't yet run at the time of _frozen_config.__init__().
  • If portage/repository/config.py is an appropriate place to have added _validate_usepkg_list() or not. Could potentially reuse _find_bad_atoms() from _emerge/main.py, but was unsure if imports from _emerge into portage are desirable or not.

Might still be overthinking... Thanks both!

@thesamesam
Copy link
Member

@zmedico Would you mind taking a look at the above?

@zmedico
Copy link
Member

zmedico commented Dec 27, 2025

  • Whether it's ok to update depgraph._frozen_config in the _resolve() function, unsure if "frozen" implies some degree of immutability? Did it here because there looks to be a lot of machinery to arrive at the desired Atoms which hasn't yet run at the time of _frozen_config.__init__().

The _frozen_config state is shared for all depgraph instances created during backtracking. If you mutate it in a way that is consistent across all backtracking runs, then it won't do any harm.

  • If portage/repository/config.py is an appropriate place to have added _validate_usepkg_list() or not. Could potentially reuse _find_bad_atoms() from _emerge/main.py, but was unsure if imports from _emerge into portage are desirable or not.

Imports from imports from _emerge into portage are perfectly fine.

Comment on lines +757 to +762
with open(os.path.join(self.eprefix, USER_CONFIG_PATH, "repos.conf")) as f:
env["PORTAGE_REPOSITORIES"] = f.read()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not think this should be necessary as repos.conf is now always written to the resolver playground's temporary environment, but many (not all) unrelated tests failed without this environment variable set.

"use.force",
"use.mask",
"use.stable",
"layout.conf",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing duplicate, otherwise unrelated to this change.

Comment on lines -414 to +465
bintree = binarytree(pkgdir=self.pkgdir, settings=self.settings)
bintree.populate(force_reindex=True)
bintree = binarytree(pkgdir=repo_dir, settings=self.settings)
bintree.populate(force_reindex=True)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not think it necessary to call populate for every binpkg created, once at the end seems to work too?

@jeth-ro
Copy link
Author

jeth-ro commented Jan 19, 2026

So this has required a bit surgery on ResolverPlayground... Specifically:

  • Adding (optional) binary repositories to the temporary playground environment
  • Merging any repos.conf/binrepos.conf from user_config with internally generated config files
  • Exposing Package.remote in the merge list, chose to do "[binary,remote]" but suggestions welcome
  • Running correctly with -g and -G and the other options those imply

The tests I've included cover the effects of the command line options and settings in repos.conf and binrepos.conf on the resolver, and the minor changes PackageSet subclasses. I did not find existing tests covering RepoConfigLoader or BinRepoConfigLoader, but presumably they exist and if someone would be so kind as to point 'em out I can add the new config attributes in. Also have not added tests for warnings emitted for bad config entries, nor for the addition of g to the flags shown by emerge --pretend.

A few other things have occurred to me along the way:

  1. emerge --info dumps selected attributes from repos.conf and binrepos.conf, should the newly added ones be among them?
  2. --nobindeps can interact in potentially confusing ways with usepkg-include or usepkg-exclude
    in repos.conf, especially if you've forgotten those entries are there. I've alluded to this in the man page, but perhaps it would be preferred those config lines be ignored if --nobindeps is given? It's a trivial change.
  3. Having repo and binhost specific options in the config files provides flexibility, but has proved complicated to test and I do
    wonder if simply duplicating the command line option functionality under the [DEFAULT] section might be saner.

The FEATURES=getbinpkg proof-of-concept commit has been dropped from this branch.

@jeth-ro jeth-ro marked this pull request as ready for review January 19, 2026 12:35
@jeth-ro jeth-ro marked this pull request as draft January 19, 2026 19:29
@thesamesam
Copy link
Member

emerge --info dumps selected attributes from repos.conf and binrepos.conf, should the newly added ones be among them?

I can see this being quite long, so I'm inclined to say no.

I've alluded to this in the man page, but perhaps it would be preferred those config lines be ignored if --nobindeps is given? It's a trivial change.

I agree. Let's start with ignoring it.

Having repo and binhost specific options in the config files provides flexibility, but has proved complicated to test and I do wonder if simply duplicating the command line option functionality under the [DEFAULT] section might be saner.

I'm not sure I follow this yet? Do you mean just for testing?

@jeth-ro
Copy link
Author

jeth-ro commented Jan 20, 2026

emerge --info dumps selected attributes from repos.conf and binrepos.conf, should the newly added ones be among them?

I can see this being quite long, so I'm inclined to say no.

Alternatively a single line indicating that such options are set? Only suggest as potentially helpful for support.

Having repo and binhost specific options in the config files provides flexibility, but has proved complicated to test and I do wonder if simply duplicating the command line option functionality under the [DEFAULT] section might be saner.

I'm not sure I follow this yet? Do you mean just for testing?

I guess I'm just seeking to confirm we really do want repository and binhost specific configuration, or whether the global behaviour of the command line options might (a) enough for many use cases and (b) less confusing for users. That said, I ought to get to the bottom of how I've managed to break the tests for CI and not myself. It seemed all ship shape when I posted the above comment, but it's possible that (b) only applies to me ;)

@thesamesam
Copy link
Member

thesamesam commented Jan 20, 2026

I don't see a need for the ability to configure this in repos.conf, we should discourage co-mingling going forward of separate repos being mixed inside of different binrepos, so only being able to configure this in binrepos.conf or the command line sounds good.

EDIT: We discussed it more on IRC.

@jeth-ro
Copy link
Author

jeth-ro commented Jan 26, 2026

Latest changes:

  • --nobindeps now cancels --usepkg-exclude, --usepkg-include, or anything set in repos.conf (as above)
  • specifying the same atom for both excludes and includes is now a no-op (was exclude wins before)
  • additional tests to cover slot atoms with command line options, conflicting options, and unmatched atoms

There are now numerous checks and reconciliations between command line options and *.conf files and these are currently implemented in depgraph.py (as part of _frozen_depgraph_config) and in bintree.py (as part of populate_remote()). I do wonder if this is a bit late in the process? In particular, printing warnings from in _frozen_depgraph_config clashes with the spinner.

Happy to look at moving those checks up the stack, e.g. into action.py, but have not done it yet as it's not entirely trivial. Specifically, the package sets built immediately prior said checks are not available earlier.

Facilitate selection of specific binary packages during a build
action, with all others to be satisfied by ebuilds. Has no effect
unless -k is specified or implied.

This is the logical inverse of existing argument --usepkg-exclude.

Signed-off-by: Jethro Donaldson <[email protected]>
Shorthand for emerge --usepkg-include <pkgs> <pkgs> for installing
from a binary package without allowing portage to pull any binary
packages to satisfy dependencies. Has no effect without -k.

Signed-off-by: Jethro Donaldson <[email protected]>
If --pretend is in effect with --getbinpkgs then keep cached binary
package index in play even if the remote repository is unreachable.
This allows emerge to still show what it would do when the binhost
is available.

Signed-off-by: Jethro Donaldson <[email protected]>
Provide visual feedback to the user as to whether a selected binary
package is local or remote. The symbol 'g' is used to indicate that
a remote binary will be fetched due to --getbinpkg, and is shown in
the same place as 'f' or 'F'. These flags represent a similar concept
yet should never shown for remote binaries, but will still be shown
preferentially should they somehow be applicable.

Signed-off-by: Jethro Donaldson <[email protected]>
Allow specification of which binary packages can or cannot be satisfied
using remote binary packages, with a interface consistent with the
--usepkg-include and --usepkg-exclude options. These additional options
influence fetching of remote binaries only and have no effect unless -g
is also supplied or implied.

Signed-of-by: Jethro Donaldson <[email protected]>
Suggested-by: Sam James <[email protected]>
Allow configuration in repos.conf of specific packages which are to
be satisfied (or not) using binary packages. These attributes appear
under the repository section and are specific to that repository,
with behaviour otherwise identical to the command line arguments of
the same name.

Where atoms specified with usepkg-exclude or usepkg-include in
repos.conf match those for the opposing command line options then
emit a warning and override the repos.conf atoms.

Signed-off-by: Jethro Donaldson <[email protected]>
Allow configuration in binrepos.conf of specific packages which are
to be satisfied (or not) using remote binary packages. These attributes
appear under the repository section and are specific to that binary
package host, with the behaviour otherwise identical to the command
line arguments of the same name.

Where atoms specified with getbinpkg-exclude or getbinpkg-include in
binrepos.conf match those for the opposing command line options then
emit a warning and override the binrepos.conf atoms.

Signed-off-by: Jethro Donaldson <[email protected]>
Suggested-by: Sam James <[email protected]>
@jeth-ro
Copy link
Author

jeth-ro commented Jan 31, 2026

... In particular, printing warnings from in _frozen_depgraph_config clashes with the spinner.

Above is fixed; validation of *.conf files now in BinRepoConfig and RepoConfig, validation of emerge command line options now (mostly) earlier in action.py. The business of --getbinpkg-* command line options overriding binrepos.conf is still implemented in bintree.py as the parsed configuration that these options override isn't really exposed.

I might still prefer to not have binarytree doing this last bit of config fix-up and to also avoid the need to change the populate() interface (see above comment 16-Dec), but don't see any minimally invasive way around these points. So will draw a line under it for now. Unlike the above spinner thing this isn't user visible as binarytree.populate() is called early enough for those warnings to appear up front and not interleaved with later emerge output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants