At the moment, benchmark generates two different flavors of an exported resources query when --catalog-query-pct is positive. Both of these filter on the resource type, without any consideration of the frequency of that type. In some testing with larger system (simulating 100k nodes), the Service type was frequently chosen for the query, a type that was extremely common, matching 11m rows in catalog_resources, causing an expensive full table scan, and a very large result.
Consider adjusting benchmark somehow, to prefer more selective exported resource queries.