Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gmail connector fails to index on malformed date strings #3921

Open
sam-w opened this issue Feb 6, 2025 · 0 comments
Open

Gmail connector fails to index on malformed date strings #3921

sam-w opened this issue Feb 6, 2025 · 0 comments

Comments

@sam-w
Copy link
Contributor

sam-w commented Feb 6, 2025

Email is not great - it's pretty common to see malformed date strings of the form <datetime> +<tz offset> (<invalid tz name>) in the Date header field, as returned by the Gmail API.

Some concrete examples from our Gmail:

  • Thu, 23 Jan 2025 11:04:48 +0300 (+03) (DMARC report)
  • Sat, 25 Jan 2025 05:15:30 +1100 (AUSNSW) (automated invoice from a partner's legacy system)

These cause Gmail connector indexing to fail on this line. dateutil.parse doesn't know how to parse those strings because it's expecting the timezone name (in parentheses) to be < 5 characters and composed only of uppercase ASCII.

The gmail connector needs some way to handle these almost-ok-but-still-invalid date strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant