Skip to content

Conversation

@AlecThomson
Copy link

Hey @yymao,

Thanks so much for this awesome tool!

This is just a minor PR to change the number of default threads from the hard-coded 8, and use the number of detected cores instead from os.cpu_count()

@yymao
Copy link
Owner

yymao commented May 13, 2025

Thanks for the PR @AlecThomson! Can you say a bit about the motivation for this change? I limited the default to 8 to ensure the number of requests we send to ADS are not too high within a small period of time, especially given that nowadays many computers comes with a high number of cores.

@AlecThomson
Copy link
Author

Hey @yymao, that's a fair concern. My motivation came from using a 12-core laptop and wanting to get the speed boost by default. I haven't DDoS-ed ADS so far.

If thread limit is a concern, I it would probably be better to enforce a global limit rather than a default. For example, we could set a constant like:

MAX_CORES = 16 # Do not allow more than 16 threads to hit ADS
...
if args.parallel:
    threads = args.threads
    if threads is None:
        threads = os.cpu_count() or 1
        # enforce global maximum
        threads = min(threads, MAX_CORES)
    Parallel(n_jobs=threads, prefer="threads")(delayed(update)(key) for key in keys)

Here I've guessed 16 as a maximum. I'm not sure what the most reasonable limit would be...

@yymao
Copy link
Owner

yymao commented May 14, 2025

Thanks @AlecThomson.

Given this conversation, I am not sure if this edit is actually needed. I think capping the threads to MAX_CORES is functionally identical to hard coding MAX_CORES . If the machine has more cores available, we don't want to use them anyway. If the machine has fewer cores available, setting a higher number would still work ok for joblib and the overhead is not significant in this case.

If I set MAX_CORES to 8, then I'll need to create a new option so that, in your specific use case, you can change the MAX_CORES to 12. This doesn't seem to be easier for the user than just changing the threads option.

In addition, there's now a way to set default options via an environmental variable. For example, if someone wants to always use all available cores, they can set:

export ADSTEX_ARGS="--parallel --threads=-1"

Does this make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants