-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Windows CI, fixing tests on Windows, test robustness #825
base: master
Are you sure you want to change the base?
Conversation
@@ -12,39 +12,35 @@ jobs: | |||
strategy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating and simplifying this. Dropping 2.3 as it's EOL and not available in the CI images.
@@ -138,11 +138,13 @@ def test_raises_when_writing_invalid_entries | |||
|
|||
def test_can_write_index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an interesting test failure. What was happening here is it accidentally working. The procedure adds two new index entries with identical OIDs. Then, when checking for the order of insertion we sorted by OID. The stability of this sort depends on too many factors, and was in fact in a different order on Windows. Quick fix is create the second index entry with an OID higher than the first.
@@ -554,12 +550,15 @@ def test_clone_bare | |||
end | |||
|
|||
def test_clone_with_transfer_progress_callback | |||
skip 'Callback does not work with filesystem clone on Windows' if Gem.win_platform? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be some sort of difference with Windows where callbacks will not trigger unless a buffer hits a certain size. Other clone tests pass, so we know that cloning actually works, it's only the callback which doesn't trigger.
@@ -804,6 +802,11 @@ def verify_subtrees | |||
end | |||
|
|||
def test_checkout_tree_with_commit | |||
# the test repo has an unclean status. apparently libgit2 on *nix does not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another interesting thing I don't know how to resolve. The libgit2 test repo has modifications initially, confirmed on linux and windows. Apparently on *nix checking out another pointer isn't an issue here, on Windows it is for some reason.
# there are race conditions on Windows where the original test run | ||
# did not release handles to the temporary directories. | ||
# waiting until all tests finish works successfully | ||
Minitest.after_run { Rugged::TestCase::FixtureRepo.teardown } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The largest issue I kept running into with these tests is something about the Windows libgit2 was leaving open file handles to the temporary directories created. It appears the handles close when the test suite finishes. Therefore I think the best thing to do is clean up everything at the end of the suite instead of inline.
@@ -524,11 +518,13 @@ def test_init_with_wrong_argument | |||
class RepositoryCloneTest < Rugged::TestCase | |||
def setup | |||
@tmppath = Dir.mktmpdir | |||
@source_path = "file://" + File.join(Rugged::TestCase::TEST_DIR, 'fixtures', 'testrepo.git') | |||
end | |||
Rugged::TestCase::FixtureRepo.ensure_cleanup @tmppath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making all temporary directory removal go through the same procedure for working around the open file handle issue mentioned later.
@@ -31,5 +31,8 @@ desc | |||
s.add_development_dependency "rake-compiler", ">= 0.9.0" | |||
s.add_development_dependency "pry" | |||
s.add_development_dependency "minitest", "~> 5.0" | |||
s.add_development_dependency "minitest-reporters" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added for junit xml output. github actions does not support this at this time.
@@ -54,7 +54,13 @@ def test_fetch_over_https_with_certificate_callback_fail | |||
}) | |||
end | |||
|
|||
assert_equal "user rejected certificate for github.com", exception.message | |||
response_message = if Gem.win_platform? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the windows response is actually more correct to the scenario. strange how we get different messages for the same action.
@@ -124,6 +124,7 @@ def test_remote_lookup_invalid | |||
|
|||
class RemotePushTest < Rugged::TestCase | |||
def setup | |||
skip 'local files and file:// protocol handled inconsistently with libgit2 on windows' if Gem.win_platform? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm already several layers of yak shaving doing what I originally wanted so I don't have the effort looking into this. This is worthy of someone looking into the libgit2 source and see what's up there. Some situations file://
works and is necessary, others it's rejected by the kernel with an invalid path name.
Looks like linux builds fail for reasons that are not our own. Appears github is transitioning ubuntu repos from ubuntu.com to microsoft.com |
Pinging @carlosmn @tenderlove for visibility |
@carlosmn @tenderlove beep beep |
@carlosmn @tenderlove hello |
Short story: I got this working
This was a journey. Hear my tale.
What started me on this saga was starting work on sonic pi. It depends on this library. Sonic PI's build instructions imply Rugged is the most fragile part in particular on Windows. Looking here I noticed Rugged does not have any Windows CI. This is important due to how fragile the RubyInstaller and msys2 environments are regarding C extensions. It's no fun doing
gem install rugged
which fails with no discernible fix from the user. Lets make sure that doesn't happen.Attempt 1: Github Actions
I tried really, really hard to make this work. Short answer is the Windows images produced by Github are unpredictable and do not have any means of remote access for figuring them out.
Longer answer
The most difficult part of dealing with the Github provided images is getting to the bottom of layers of abstraction. There are definite issues regarding the installation of Ruby on the Windows images. I could not get to the bottom of it.
The recommended way is use the setup-ruby action. This works fine on *nix images. I guess it works fine on Windows images in the fact that the requested version is present. Using it is a matter of frustration.
The most frustrating and baffling thing about these images comes from the unpredictability on which tools run and how performing actions does not apparently affect anything.
For example, the last run where I tried and gave up, I discovered the msys2 system changes from underneath the Rake task that builds the gem and runs tests. You can see this here. Drawing attention to a few lines.
First the build step uses an outdated version of GCC even though earlier in the process I specifically updated the msys2 system which absolutely installed GCC 9.
Further, the actual build step of the gem uses a different tool path in C:\strawberry than the tool which configured the build earlier at C:\hostedtoolcache
This is the point where I gave up. I'm sure it's fixable somehow. There is a 10-15 minute iteration time for understanding these images and there is no remote shell option.
Attempt 2: Appveyor
Appveyor's specialty is it's Windows CI support. I found this true through working with it. I didn't have any troubles with their images. I did have some troubles with Bundler exit non-zero on a harmless warning I worked around. Otherwise it's smooth sailing.
Test breakages
This exposed a few issues with the test suite and potentially with libgit2. I left comments in this PR on where and why I changed things.
Further work
Included is a working appveyor config file. A maintainer of this repo must connect appveyor so it picks up on the reporting status.
I'm curious on the source of the failures I skipped over in the tests. It seems like they're within the libgit2 library and not the C binding code.
I'm also wondering about the open file handles after a test completes. On *nix they're definitely closed, on Windows they still linger until after the suite finishes. I don't feel like it affects day to day usage though.