Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid unnecessary fragmentation for small streams #2135

Open
0x0ece opened this issue Jan 21, 2025 · 3 comments
Open

Avoid unnecessary fragmentation for small streams #2135

0x0ece opened this issue Jan 21, 2025 · 3 comments

Comments

@0x0ece
Copy link

0x0ece commented Jan 21, 2025

Hi, I wanted to float an idea before possibly submitting a PR.

When sending a large number of small streams (e.g., smaller than MTU), Quinn currently tends to split streams into 2 frames across 2 packets. My proposal would be to explicitly avoid this behavior, sending the entire stream in a single frame, in a separated packet.

The change itself would be relatively small, kind of like this example placed after this line:

// Avoid fragmentation for small streams
let unsent_sz = (stream.pending.offset - stream.pending.unsent) as usize;
if unsent_sz <= 1200 && unsent_sz > max_buf_size {
    continue;
}

Here are some caveats:

  • I think that the size should be part of the config, a good default may be 1200 but may depend on the use case
  • stream.pending.offset and unsent are not visible in that fn, so we should probably add a method to stream.pending to retrieve the unsent_sz
  • An additional condition could be stream.pending.unsent == 0, i.e. the entire stream is small, not just the remaining portion
  • The behavior could be continue or break. I think continue makes more sense but I can also see valid reasons to break
    (Anyway this is just to say that I think that the change is small, but there’s a few “little” details that you’re probably best at deciding than me.)

As for the concrete use case / how we found the improvement. The Solana blockchain uses Quinn for ingesting new transactions. Transactions are relatively small (<= 1232 bytes, this would be the config I’d propose to use for Solana) and they’re sent creating a new stream per transaction (in principle, they could use datagrams, but using streams will make it easier in future to grow the transaction size).

We gathered some stats on 1h of traffic, during which we saw an average of 5k transactions/sec and of those ~15% were fragmented over 2 packets (note that the max tx size is 1232 but average is prob like 6-800 bytes). With the proposed change, transactions that don’t fit would spill in the next packet, but note that the total number of packets would be more or less the same. And, on the server side, there’d be no work to defrag.

In terms of resources, I don’t have numbers for Quinn, but we’re working on a custom server in C that only handles the specific case to defrag 1 tx split in 2 packets and that consumes ~1GB ram to sustain 5k tx/s — not trivial. Since Quinn server handles the generic case, I assume it’d require the same amount of resources or more.

In summary, I think that fragmenting small streams can be detrimental for use cases with many small streams, and avoiding that would be a relatively small change.

I’d like to hear your opinion and, as I mentioned, I’m happy to submit a PR if you think that’d help.

@Ralith
Copy link
Collaborator

Ralith commented Jan 22, 2025

This sounds reasonable to me. There's been interest in something like this in the past as well. I'd like to see:

  • An end-to-end benchmark to demonstrate the impact. Probably as simple as modifying the perf client/server to allow the stream fragmentation threshold to be set on the command line, though this won't accurately model varying stream sizes.
  • Data from that benchmark and/or your application to demonstrate that there is a benefit for a realistic workload. Dimensions of interest: memory use and CPU load on both sender and receiver, and perhaps request latency distribution.
  • An argument for the specific choice of heuristic (e.g. whether to only consider small streams)

I think that the size should be part of the config, a good default may be 1200 but may depend on the use case

Should this be global or per-stream? Can we come up with a good-enough default, or are there clear cases where different values would be required?

The behavior could be continue or break. I think continue makes more sense but I can also see valid reasons to break

If stream A wouldn't fit but stream B would, then halting after encountering stream A is suboptimal. However, scanning the set of pending streams is more complex than a simple continue: depending on priorities and fairness, you might just keep getting the same stream over and over. It might also not be desirable to fit a low-priority stream into extra space when high-priority data is pending. In the interest of simplicity, I'm open to an implementation that bails out on the first stream that fails this test.

stream.pending.offset and unsent are not visible in that fn, so we should probably add a method to stream.pending to retrieve the unsent_sz

Might be cleaner to add a helper function on SendBuffer or similar that performs the test itself, or returns the minimum buffer space required. Generally we prefer to push logic down to avoid huge complex functions.

the total number of packets would be more or less the same.

The total number of packets will increase by a proportion roughly equal to the unused portion of the MTU, which will incur some CPU cost. Hard to guess how it'll net out. If this change causes packets to frequently vary in size in your application, then the loss of segmentation offload might substantially increase CPU overhead. This is the main reason why benchmarking will be important to justify the change.

@0x0ece
Copy link
Author

0x0ece commented Jan 22, 2025

Thank you for the detailed response, I'll work on the bench and follow up.

@Ralith
Copy link
Collaborator

Ralith commented Jan 25, 2025

Thanks! Be sure to play with the fairness parameter as well: with small-but-fragmented streams, unfair scheduling will drastically reduce the amount of buffering required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants