Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-45295: [Python][CI] Make download_tzdata_on_windows more robust and use tzdata package for tzinfo database on Windows for ORC #45425

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

amoeba
Copy link
Member

@amoeba amoeba commented Feb 5, 2025

Rationale for this change

We have two Windows issues and this PR is addressing both:

  1. PyArrow's download_tzdata_on_windows can fail due to TLS issues in certain CI environments.
  2. The Python wheel test infrastructure needs a tzinfo database for ORC and the automation fetching that started failing because the URL was made invalid upstream.

These two issues are being solved in one PR simply because they appeared together during the 19.0.1 release process but they're separate.

What changes are included in this PR?

  1. Makes download_tzdata_on_windows more robust to TLS errors by attempting to use requests if it's available and falling back to urllib otherwise.
  2. Switches our Windows wheel test infrastructure to grab a tzinfo database from the tzdata package on PyPi instead of from a mirror URL. This should be much more stable for us over time.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the awaiting review Awaiting review label Feb 5, 2025
Copy link

github-actions bot commented Feb 5, 2025

⚠️ GitHub issue #45295 has been automatically assigned in GitHub to PR creator.

@amoeba
Copy link
Member Author

amoeba commented Feb 5, 2025

@github-actions crossbow submit wheel-windows-cp39-amd64

Copy link

github-actions bot commented Feb 5, 2025

Unable to match any tasks for `wheel-windows-cp39-amd64`
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/13149682761

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Feb 5, 2025
@amoeba
Copy link
Member Author

amoeba commented Feb 5, 2025

@kou is there a way to test this PR?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Feb 5, 2025
@kou
Copy link
Member

kou commented Feb 5, 2025

@github-actions crossbow submit wheel-windows-cp39-amd64

Copy link

github-actions bot commented Feb 5, 2025

Unable to match any tasks for `wheel-windows-cp39-amd64`
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/13149993032

@kou
Copy link
Member

kou commented Feb 5, 2025

@github-actions crossbow submit wheel-windows-*

Copy link

github-actions bot commented Feb 5, 2025

Revision: 29467e2

Submitted crossbow builds: ursacomputing/crossbow @ actions-435df0c2a2

Task Status
wheel-windows-cp310-cp310-amd64 GitHub Actions
wheel-windows-cp311-cp311-amd64 GitHub Actions
wheel-windows-cp312-cp312-amd64 GitHub Actions
wheel-windows-cp313-cp313-amd64 GitHub Actions
wheel-windows-cp313-cp313t-amd64 GitHub Actions
wheel-windows-cp39-cp39-amd64 GitHub Actions

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 5, 2025
@amoeba
Copy link
Member Author

amoeba commented Feb 5, 2025

Thanks for getting the jobs running, I'll check on them tomorrow.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 5, 2025
Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be working and is back to the original issue failing on a specific test on Arrow to download tzdata:

    def test_download_tzdata_on_windows():
        tzdata_path = os.path.expandvars(r"%USERPROFILE%\Downloads\tzdata")
    
        # Download timezone database and remove data in case it already exists
        if (os.path.exists(tzdata_path)):
            shutil.rmtree(tzdata_path)
>       download_tzdata_on_windows()
...
with urlopen('https://data.iana.org/time-zones/tzdata-latest.tar.gz') as response:

Is the function even necessary?
I understand is a util to download tzdata but I am not sure we want to provide that utility if users can just use importlib.resources to get the tzdata one.
Should we just remove the utility function?
@AlenkaF @jorisvandenbossche thoughts?

ci/scripts/python_wheel_windows_test.bat Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Feb 6, 2025

Revision: 1d715f9

Submitted crossbow builds: ursacomputing/crossbow @ actions-4b3b781c2e

Task Status
wheel-windows-cp310-cp310-amd64 GitHub Actions
wheel-windows-cp311-cp311-amd64 GitHub Actions
wheel-windows-cp312-cp312-amd64 GitHub Actions
wheel-windows-cp313-cp313-amd64 GitHub Actions
wheel-windows-cp313-cp313t-amd64 GitHub Actions
wheel-windows-cp39-cp39-amd64 GitHub Actions

@amoeba
Copy link
Member Author

amoeba commented Feb 6, 2025

@github-actions crossbow submit wheel-windows-*

Copy link

github-actions bot commented Feb 6, 2025

Revision: 84ab16e

Submitted crossbow builds: ursacomputing/crossbow @ actions-715daf418e

Task Status
wheel-windows-cp310-cp310-amd64 GitHub Actions
wheel-windows-cp311-cp311-amd64 GitHub Actions
wheel-windows-cp312-cp312-amd64 GitHub Actions
wheel-windows-cp313-cp313-amd64 GitHub Actions
wheel-windows-cp313-cp313t-amd64 GitHub Actions
wheel-windows-cp39-cp39-amd64 GitHub Actions

Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kou and @amoeba ! This is a great find! The failure on appveyor is related and it's related to the concern I have with this change. I think we should fall back to use urllib in case of ImportError

python/pyarrow/util.py Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 6, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 6, 2025
@amoeba
Copy link
Member Author

amoeba commented Feb 6, 2025

@github-actions crossbow submit wheel-windows-*

Copy link

github-actions bot commented Feb 6, 2025

Revision: 459376c

Submitted crossbow builds: ursacomputing/crossbow @ actions-b4f931ecfe

Task Status
wheel-windows-cp310-cp310-amd64 GitHub Actions
wheel-windows-cp311-cp311-amd64 GitHub Actions
wheel-windows-cp312-cp312-amd64 GitHub Actions
wheel-windows-cp313-cp313-amd64 GitHub Actions
wheel-windows-cp313-cp313t-amd64 GitHub Actions
wheel-windows-cp39-cp39-amd64 GitHub Actions

@amoeba
Copy link
Member Author

amoeba commented Feb 7, 2025

@github-actions crossbow submit wheel-windows-*

@amoeba
Copy link
Member Author

amoeba commented Feb 7, 2025

Last round of crossbow job failures look to be due to an issue with GitHub, I'm seeing it on other PRs too. Re-ran jobs and will keep an eye out.

@raulcd whenever you have a moment, could you please look at my recent change to util.py?

Copy link

github-actions bot commented Feb 7, 2025

Revision: 459376c

Submitted crossbow builds: ursacomputing/crossbow @ actions-c246a1f639

Task Status
wheel-windows-cp310-cp310-amd64 GitHub Actions
wheel-windows-cp311-cp311-amd64 GitHub Actions
wheel-windows-cp312-cp312-amd64 GitHub Actions
wheel-windows-cp313-cp313-amd64 GitHub Actions
wheel-windows-cp313-cp313t-amd64 GitHub Actions
wheel-windows-cp39-cp39-amd64 GitHub Actions

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amoeba thanks for looking into this!

Apologies for the slow reply if I could have prevented some wasted time figuring this out, but indeed as you summarize in #45425 (comment), the way we understood it in the past is that we unfortunately cannot use the tzdata as shipped by the Python package. This would require some changes to support that format upstream in https://github.com/HowardHinnant/date.
See #31472 for the issue about it.

@amoeba could you update the title and top comment description of the PR, as I think it no longer reflects the change entirely?

python/pyarrow/util.py Outdated Show resolved Hide resolved
python/pyarrow/tests/conftest.py Outdated Show resolved Hide resolved
ci/scripts/python_wheel_windows_test.bat Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 7, 2025
@jorisvandenbossche
Copy link
Member

I also see in the wheel build logs the following message (https://github.com/ursacomputing/crossbow/actions/runs/13191501020/job/36825167067#step:9:1122):

SKIPPED [1] Python310\lib\site-packages\pyarrow\tests\test_compute.py:2301: Timezone database is not installed on Windows

so wondering what is going wrong (or are we not downloading the tzdata files in the wheel builds?)

Co-authored-by: Joris Van den Bossche <[email protected]>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 7, 2025
@amoeba amoeba changed the title GH-45295: [Python][CI] Use tzdata package to get tzinfo database when testing Windows wheels GH-45295: [Python][CI] Make download_tzdata_on_windows more robust and use tzdata package for tzinfo database on Windows Feb 7, 2025
@amoeba amoeba changed the title GH-45295: [Python][CI] Make download_tzdata_on_windows more robust and use tzdata package for tzinfo database on Windows GH-45295: [Python][CI] Make download_tzdata_on_windows more robust and use tzdata package for tzinfo database on Windows for ORC Feb 7, 2025
@amoeba
Copy link
Member Author

amoeba commented Feb 7, 2025

Thanks for the review @jorisvandenbossche. Good catch on the message in CI. I'm not sure yet but will investigate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants