Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement parallel dbuf eviction #16487

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

allanjude
Copy link
Contributor

Sponsored-by: Expensify, Inc.
Sponsored-by: Klara, Inc.

Motivation and Context

Replace dbuf_evict_one() with dbuf_evict_many() to more efficiently evict dbuf objects without looping over the same locked objects over and over.

Description

In the previous code, dbuf_evict_thread() would called dbuf_evict_one() in a look while dbuf_cache_above_lowater().

dbuf_evict_one() would select a random sublist from the dbuf cache, then walk it from the tail forward, attempting to acquire the lock on each object until it succeeded, then evict that object and return.

As the name suggests, it would evict only a single object from the cache. However, evicting one object is not likely to bring us below the desired low water mark, so dbuf_evict_one() will be called again, where it will loop over all of the same busy objects again, until it founds one it can evict.

This has been replaced with dbuf_evict_many() which takes a specific sublist as a parameter, as well as a desired amount of data to evict. It then walks the sublist from the tail forward, evicting what it can until the number of bytes evicted satisfies the input parameter or the head of the sublist is reached.

The dbuf_evict_thread now runs is parallel as well, allowing it to keep up with demand more easily. For the dbuf cache, if the single thread was not able to keep up, ZFS would shift the work of evicting some items to each incoming I/O thread. While that is still the case it should be seen much less often now that dbuf_evict is more efficient and no longer bottlenecked to a single thread.

How Has This Been Tested?

Performance benchmarks while the dbuf cache is full

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

module/zfs/dbuf.c Outdated Show resolved Hide resolved
module/zfs/dbuf.c Outdated Show resolved Hide resolved
module/zfs/dbuf.c Outdated Show resolved Hide resolved
module/zfs/dbuf.c Outdated Show resolved Hide resolved
module/zfs/dbuf.c Show resolved Hide resolved
@0mp 0mp force-pushed the parallel_dbuf_evict branch 3 times, most recently from 54888d8 to 883cd30 Compare September 12, 2024 11:59
@0mp
Copy link
Contributor

0mp commented Sep 12, 2024

I have updated the patch with a different logic for picking the default maximum number of dbuf eviction threads. The new logic aims to pick the number that is one-eighth of the available CPUs, with a minimum of 2 and a maximum of 16.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Sep 13, 2024
man/man4/zfs.4 Outdated Show resolved Hide resolved
@0mp 0mp mentioned this pull request Oct 23, 2024
13 tasks
@amotin amotin added the Status: Revision Needed Changes are required for the PR to be accepted label Oct 29, 2024
@0mp 0mp force-pushed the parallel_dbuf_evict branch from 883cd30 to 47c490b Compare November 13, 2024 14:57
@github-actions github-actions bot removed the Status: Revision Needed Changes are required for the PR to be accepted label Nov 13, 2024
@0mp
Copy link
Contributor

0mp commented Nov 13, 2024

I've rebased onto latest master. I'll address the feedback soon.

man/man4/zfs.4 Outdated Show resolved Hide resolved
@amotin amotin added the Status: Revision Needed Changes are required for the PR to be accepted label Nov 13, 2024
@alex-stetsenko alex-stetsenko force-pushed the parallel_dbuf_evict branch 2 times, most recently from 1985db5 to 04f57eb Compare November 21, 2024 09:11
@github-actions github-actions bot removed the Status: Revision Needed Changes are required for the PR to be accepted label Nov 21, 2024
@alex-stetsenko alex-stetsenko force-pushed the parallel_dbuf_evict branch 3 times, most recently from 5615676 to 882150d Compare January 21, 2025 13:47
Sponsored-by: Expensify, Inc.
Sponsored-by: Klara, Inc.
Co-authored-by: Allan Jude <[email protected]>
Co-authored-by: Mateusz Piotrowski <[email protected]>
Signed-off-by: Alexander Stetsenko <[email protected]>
Signed-off-by: Allan Jude <[email protected]>
Signed-off-by: Mateusz Piotrowski <[email protected]>
@adamdmoss
Copy link
Contributor

adamdmoss commented Feb 2, 2025

Similarly to my issue with the parallel ARC eviction PR, I can pretty quickly end up with the dbuf_evict thread spinning forever consuming one core with this PR.
shot from perftop (the only tool I had at hand)

dbuf_evict(void)
{
int64_t bytes = (zfs_refcount_count(&dbuf_caches[DB_DBUF_CACHE].size) -
dbuf_cache_lowater_bytes());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you are making every thread to evict everything extra. At best you might end up evicting all the cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants