bug(quinn): making a connection that never gets to a full `Connection` causes excessive delays in `Endpoint::wait_idle` #2147

ramfox · 2025-01-29T00:55:35Z

I would appreciate any help.

The bug is illustrated in the endpoint_wait_idle test in draft PR #2146, but I will also copy the permalink:

Lines 154 to 177 in dd14b01

    
               // Create an endpoint that will not make a successful connection 
        
               let endpoint = factory.endpoint(); 
        
               // Connect to un-dialable addr 
        
               let connecting = endpoint 
        
                   .connect( 
        
                       SocketAddr::new(IpAddr::V4(Ipv4Addr::LOCALHOST), 1234), 
        
                       "localhost", 
        
                   ) 
        
                   .unwrap(); 
        
               // decide you don't want to keep connecting or something goes wrong 
        
               // while connecting: 
        
               drop(connecting); 
        
               // close the endpoint 
        
               endpoint.close(0u8.into(), b""); 
        
               let duration = Duration::from_millis(2900); 
        
               let res = tokio::time::timeout(duration, endpoint.wait_idle()).await; 
        
               // this fails, `wait_idle` takes around 3s 
        
               assert!( 
        
                   res.is_ok(), 
        
                   "Endpoint took longer than {duration:?} to close" 
        
               ); 
        
           }

To summarize, if you have an Endpoint that attempts a connection, and for whatever reason, that connection does not make it past the Connecting stage, it causes the Endpoint::wait_idle call (when attempting to gracefully close the endpoint) to always take ~3s. I'm assuming something happens like the ConnectionHandler that is created when Connecting never gets cleaned up and so wait_idle only returns after a timeout.

I'm only familiar enough with quinn to know something is wrong, but not what to do to fix it.

The text was updated successfully, but these errors were encountered:

Ralith · 2025-01-29T07:08:18Z

What leads you to believe that this behavior is incorrect? My guess would be that this is exactly the draining period of the attempted connection. Waiting for that to complete is exactly the role of wait_idle.

djc · 2025-01-29T08:11:44Z

I think the suggestion/surprise is that Connecting instances might not deserve the full draining period?

flub · 2025-01-29T16:53:48Z

I assume what happens is that on dropping the Connecting a CONNECTION_CLOSE frame is sent, even though only the initial packet was ever sent. Since CONNECTION_CLOSE is ack-eliciting you then have to wait for 3*PTO to expire and the RTT for the PTO calculation will be set to the initial values?

This makes sense from a protocol point of view. From a user point of view I understand that it is a little surprising that the timeout happens at all in this case. So @djc's question is fair: would it be allowed to shorten the draining period? Would it be desirable?

ramfox · 2025-01-29T17:04:46Z

I think the suggestion/surprise is that Connecting instances might not deserve the full draining period?

Yes, thank you, this is what is surprising for me. We have the "knowledge" that we don't have a Connection yet, since we have not reached that point in the life cycle (because we are still at Connecting), so it felt surprising that I would have to wait for that to close.

Ralith · 2025-01-30T03:23:33Z

You don't have to; wait_idle is strictly optional. It does serve a useful purpose: it helps the peer dispose of any state it may have already allocated in a timely manner. The connection not having been fully established doesn't preclude the peer having allocated state.

djc · 2025-02-01T11:07:38Z

You don't have to; wait_idle is strictly optional. It does serve a useful purpose: it helps the peer dispose of any state it may have already allocated in a timely manner. The connection not having been fully established doesn't preclude the peer having allocated state.

But if the goal is to help the peer dispose of any state, arguably we might be able to use a shorter timeout? Like, I think you mean we should allow the Connecting to yield a CONNECTION_CLOSE message, but once that is done we probably don't need to wait for anything -- so a 250-500ms timeout might be sufficient for Connecting instances?

Ralith · 2025-02-02T20:27:45Z

Why would we be able to use a shorter timeout? As far as I can go, a client connection with a handshake in progress isn't in any way special as far as lifetime/connection-level resource concerns go.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(quinn): making a connection that never gets to a full `Connection` causes excessive delays in `Endpoint::wait_idle` #2147

bug(quinn): making a connection that never gets to a full `Connection` causes excessive delays in `Endpoint::wait_idle` #2147

ramfox commented Jan 29, 2025

Ralith commented Jan 29, 2025

djc commented Jan 29, 2025 •

edited

Loading

flub commented Jan 29, 2025

ramfox commented Jan 29, 2025

Ralith commented Jan 30, 2025

djc commented Feb 1, 2025

Ralith commented Feb 2, 2025

bug(quinn): making a connection that never gets to a full Connection causes excessive delays in Endpoint::wait_idle #2147

bug(quinn): making a connection that never gets to a full Connection causes excessive delays in Endpoint::wait_idle #2147

Comments

ramfox commented Jan 29, 2025

Ralith commented Jan 29, 2025

djc commented Jan 29, 2025 • edited Loading

flub commented Jan 29, 2025

ramfox commented Jan 29, 2025

Ralith commented Jan 30, 2025

djc commented Feb 1, 2025

Ralith commented Feb 2, 2025

bug(quinn): making a connection that never gets to a full `Connection` causes excessive delays in `Endpoint::wait_idle` #2147

bug(quinn): making a connection that never gets to a full `Connection` causes excessive delays in `Endpoint::wait_idle` #2147

djc commented Jan 29, 2025 •

edited

Loading