Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor adjustments to usage of alarm(2) timer in tests #5292

Merged
merged 1 commit into from
Feb 5, 2025

Conversation

jhendersonHDF
Copy link
Collaborator

Make alarm(2) timer per-test program rather than per-subtest

Avoid enabling alarm(2) timer when TestExpress is set to 0

@jhendersonHDF jhendersonHDF added Priority - 3. Low 🔽 Code cleanup, small feature change requests, etc. Component - Testing Code in test or testpar directories, GitHub workflows Type - Improvement Improvements that don't add a new feature or functionality labels Feb 4, 2025
Make alarm(2) timer per-test program rather than per-subtest

Avoid enabling alarm(2) timer when TestExpress is set to 0
@@ -204,7 +204,7 @@ H5TEST_DLLVAR MPI_Info h5_io_info_g; /* MPI INFO object for IO */

/* Macros for the different TestExpress levels for expediting tests */
#define H5_TEST_EXPRESS_EXHAUSTIVE 0 /** Exhaustive run; tests should take as long as necessary */
#define H5_TEST_EXPRESS_FULL 1 /** Full run; tests should take no more than 30 minutes */
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a documentation cleanup. The intention for a "full" run may have been 30 minutes at some point, but the current timeout value for both ctest and for the alarm timer is 20 minutes by default, so a "full" run will always be kneecapped at 20 minutes anyway, unless the values are overridden.

@@ -461,10 +467,6 @@ PerformTests(void)
MESSAGE(2, ("Testing -- %s (%s) \n", TestArray[Loop].Description, TestArray[Loop].Name));
MESSAGE(5, ("===============================================\n"));

if (TestAlarmOn() < 0)
MESSAGE(5, ("Couldn't enable test alarm timer for test -- %s (%s) \n",
Copy link
Collaborator Author

@jhendersonHDF jhendersonHDF Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alarm timer set for tests has always (?) been per-subtest rather than per-test program, presumably to catch a particular hanging sub-test while allowing the rest of the tests to run. While this may be helpful for Autotools, where there's no real built-in mechanism for restricting the runtime of tests in general, it effectively goes against CMake's built-in timeout value in ctest. If the alarm timer is set to its default 20 minutes, the test program is going to be killed by ctest before a hanging test even hits this timer's timeout value in the first place. Adding to that, the HPC systems that get tested on often have a ~30 minute limit on jobs in the job queue where testing is usually run, meaning that an Autotools test run would get killed shortly after a single hanging test anyway.

This changes the logic so that the alarm timer is per-test program and is turned on almost immediately on test program start (in TestInit()) so that there's at least a reasonable ability of reacting to testing timeouts in the future. This won't allow for continuing to run tests beyond one that hangs, but we really need functionality beyond what alarm(2) offers to do that nicely in CMake, unless we simply disable testing timeouts there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the alarm timer is set to its default 20 minutes, the test program is going to be killed by ctest before a hanging test even hits this timer's timeout value in the first place.

Looking at the cmake configuration, it seems like the cmake timeouts are derived from the alarm timer value. Is this in reference to CMake starting a count to the same timeout value slightly earlier than the alarm (in the current per-test implementation) and always pre-empting it?

This won't allow for continuing to run tests beyond one that hangs, but we really need functionality beyond what alarm(2) offers to do that nicely in CMake, unless we simply disable testing timeouts there.

The ctest timeout documentation says that moving on to the next test after a timeout is already how it behaves. Are you talking about problems with continuing beyond hangs in entire test programs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the cmake configuration, it seems like the cmake timeouts are derived from the alarm timer value.

There may be some historical precedent for this, but the ctest timer and alarm timer are separate from each other. It's generally likely that, since the ctest timer will be active before the program runs, the ctest timer will expire before the alarm timer since they currently have the same timeout value in seconds.

Are you talking about problems with continuing beyond hangs in entire test programs?

Yes this is referring to continuing to run other subtests in a test program.

* tests to run for as long as necessary, so avoid enabling an
* alarm-style timer here that would, by default, kill the test.
*/
if (GetTestExpress() == H5_TEST_EXPRESS_EXHAUSTIVE)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's generally handy to be able to perform manual runs of long-running tests like multi-threaded ones, while still allowing them to be restricted by the alarm(2) timer for CI. This just avoids setting any timer at all if TestExpress is set to 0. While this allows a test to run forever if it hangs, there's no particular way of predicting how long of a timer to set to catch hangs in these cases.

Copy link
Contributor

@qkoziol qkoziol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable

@lrknox lrknox merged commit 354994a into HDFGroup:develop Feb 5, 2025
76 checks passed
jhendersonHDF added a commit to LifeboatLLC/hdf5_lifeboat that referenced this pull request Feb 7, 2025
Make alarm(2) timer per-test program rather than per-subtest

Avoid enabling alarm(2) timer when TestExpress is set to 0
jhendersonHDF added a commit to LifeboatLLC/hdf5_lifeboat that referenced this pull request Feb 7, 2025
Make alarm(2) timer per-test program rather than per-subtest

Avoid enabling alarm(2) timer when TestExpress is set to 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component - Testing Code in test or testpar directories, GitHub workflows Priority - 3. Low 🔽 Code cleanup, small feature change requests, etc. Type - Improvement Improvements that don't add a new feature or functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants