Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't resend envelopes that get rejected by the server #3938

Merged
merged 5 commits into from
Feb 9, 2025
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
### Fixes

- OTel activities that are marked as not recorded are no longer sent to Sentry ([#3890](https://github.com/getsentry/sentry-dotnet/pull/3890))
- Fixed envelopes with oversized attachments getting stuck in __processing ([#3938](https://github.com/getsentry/sentry-dotnet/pull/3938))

## 5.1.0

Expand Down
1 change: 1 addition & 0 deletions src/Sentry/Internal/DiscardReason.cs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ namespace Sentry.Internal;
{
// See https://develop.sentry.dev/sdk/client-reports/ for list
public static DiscardReason BeforeSend = new("before_send");
public static DiscardReason BufferOverflow = new("buffer_overflow");
public static DiscardReason CacheOverflow = new("cache_overflow");
public static DiscardReason EventProcessor = new("event_processor");
public static DiscardReason NetworkError = new("network_error");
Expand Down
20 changes: 18 additions & 2 deletions src/Sentry/Internal/Http/CachingTransport.cs
Original file line number Diff line number Diff line change
Expand Up @@ -300,13 +300,22 @@ private async Task ProcessCacheAsync(CancellationToken cancellation)
}
}

private static bool IsNetworkError(Exception exception) =>
private static bool IsNetworkUnavailableError(Exception exception) =>
exception switch
{
// TODO: Could we make this more specific? Lots of these errors are unrelated to network availability.
HttpRequestException or WebException or IOException or SocketException => true,
jamescrosswell marked this conversation as resolved.
Show resolved Hide resolved
_ => false
};

private static bool IsRejectedByServer(Exception ex)
{
// When envelopes are too big, the server will reset the connection as soon as the maximum size is exceeded
// (it doesn't wait for us to finish sending the whole envelope).
return ex is SocketException { ErrorCode: 32 /* Broken pipe */ }
|| (ex.InnerException is { } innerException && IsRejectedByServer(innerException));
}

private async Task InnerProcessCacheAsync(string file, CancellationToken cancellation)
{
if (_options.NetworkStatusListener is { Online: false } listener)
Expand Down Expand Up @@ -346,8 +355,15 @@ private async Task InnerProcessCacheAsync(string file, CancellationToken cancell
// Let the worker catch, log, wait a bit and retry.
throw;
}
catch (Exception ex) when (IsNetworkError(ex))
catch (Exception ex) when (IsRejectedByServer(ex))
{
_options.ClientReportRecorder.RecordDiscardedEvents(DiscardReason.BufferOverflow, envelope);
_options.LogError(ex, "Failed to send cached envelope: {0}. The envelope is likely too big and will be discarded.", file);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have a specific "too big" client report? It's possible we'll get that from the server though in which case this is fine.

But when debugging data loss on behalf of customers, when we see "send error" it's often hard to know what to do next.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just heard that Relay doesn't report anything btw. So we must

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't no. According to the docs:

The following discard reasons are currently defined for discarded_events:

  • queue_overflow: a SDK internal queue (eg: transport queue) overflowed
  • cache_overflow: an SDK internal cache (eg: offline event cache) overflowed
  • buffer_overflow: an SDK internal buffer (eg. breadcrumbs buffer) overflowed
  • ratelimit_backoff: the SDK dropped events because an earlier rate limit instructed the SDK to back off.
  • network_error: events were dropped because of network errors and were not retried.
  • sample_rate: an event was dropped because of the configured sample rate.
  • before_send: an event was dropped in before_send
  • event_processor: an event was dropped by an event processor; may also be used for ignored exceptions / errors
  • send_error: an event was dropped because of an error when sending it (eg: 400 response)
  • internal_sdk_error: an event was dropped due to an internal SDK error (eg: web worker crash)
  • insufficient_data: an event was dropped due to a lack of data in the event (eg: not enough samples in a profile)
  • backpressure: an event was dropped due to downsampling caused by the system being under load

I couldn't see anything that matched up neatly with the situation we're experiencing so I used send_error. Possibly buffer_overflow is better here though?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there wasn't anything better, I went with buffer_overflow

}
catch (Exception ex) when (IsNetworkUnavailableError(ex))
{
// TODO: Envelopes could end up in an infinite loop here. We should consider implementing some
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could end up with a lot of items in queue causing new data to not get a chance to be pushed on mobile platforms. Can the queue be sorted to send new data first until a retry/circuit breaker can be added?

Copy link
Collaborator Author

@jamescrosswell jamescrosswell Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Priority queues (e.g. prioritise error reports over traces) is one for the todo list:

// kind backoff strategy and a retry limit... then drop the envelopes if the limit is exceeded.
if (_options.NetworkStatusListener is PollingNetworkStatusListener pollingListener)
{
pollingListener.Online = false;
Expand Down
37 changes: 37 additions & 0 deletions test/Sentry.Tests/Internals/Http/CachingTransportTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -711,6 +711,43 @@ public async Task TransportResumesWhenNetworkComesBackOnline()
envelopes.Should().NotBeEmpty();
}

[Fact]
public async Task FlushAsync_RejectedByServer_DiscardsEnvelope()
{
// Arrange
var listener = Substitute.For<INetworkStatusListener>();
listener.Online.Returns(_ => true);

using var cacheDirectory = new TempDirectory();
var options = new SentryOptions
{
Dsn = ValidDsn,
DiagnosticLogger = _logger,
Debug = true,
CacheDirectoryPath = cacheDirectory.Path,
NetworkStatusListener = listener,
ClientReportRecorder = Substitute.For<IClientReportRecorder>()
};

using var envelope = Envelope.FromEvent(new SentryEvent());

var innerTransport = Substitute.For<ITransport>();
innerTransport.SendEnvelopeAsync(Arg.Any<Envelope>(), Arg.Any<CancellationToken>())
.Returns(_ => throw new SocketException(32 /* Bad pipe exception */));
await using var transport = CachingTransport.Create(innerTransport, options, startWorker: false);

// Act
await transport.SendEnvelopeAsync(envelope);
await transport.FlushAsync();

// Assert
foreach (var item in envelope.Items)
{
options.ClientReportRecorder.Received(1)
.RecordDiscardedEvent(DiscardReason.BufferOverflow, item.DataCategory);
}
}

[Fact]
public async Task DoesntWriteSentAtHeaderToCacheFile()
{
Expand Down
Loading