-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize auditing? #14
Comments
Hi @woodruffw. FYI a while back I submitted this PR for the I'm not familiar with abi3audit code base. If by "specs" you mean "files" that you read and process independently, then this is definitively the sort of work that you can delegate to a process pool and see a noticeable speedup. As for how to properly log / print to stdout: import multiprocessing
with multiprocessing.Pool() as pool:
futs = []
# populate the pool
for file in files:
fut = pool.apply_async(audit_file, args=(file, ))
futs.append(fut)
# get workers results
for fut in futs:
try:
result = fut.get()
except Exception as err:
print(err)
else:
print(result) If on the other hand you print progress inside your worker function ( Hope this helps. |
Thanks, that is indeed helpful! A "spec" in |
Got it. In that case you probably want to fetch all the wheels first and put them in a list, and process that list via the process pool. If the fetching operation consists of downloading files from the internet, you can also use a separate pool just for that, but it should be a thread pool, since it's I/O bound rather than CPU bound. For thread pools you can use https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor. For process pools I personally find "multiprocessing.Pool" more convenient (but forgot why :)). |
I'm not sure if this is a good idea yet.
When dealing with lots of specs (especially full Python package histories), auditing is pretty slow (since it's entirely serial). It doesn't need to be this way, since auditing is embarassingly parallel (each step is entirely independent).
The only real obstacles here are UI/UX ones: if we break auditing up into a pool of threads or processes, we'll want to make sure that the current output and progress bars remain about the same (or get nicer).
The text was updated successfully, but these errors were encountered: