-
-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor adjustments to usage of alarm(2) timer in tests #5292
Conversation
Make alarm(2) timer per-test program rather than per-subtest Avoid enabling alarm(2) timer when TestExpress is set to 0
b5a4e29
to
55d094a
Compare
@@ -204,7 +204,7 @@ H5TEST_DLLVAR MPI_Info h5_io_info_g; /* MPI INFO object for IO */ | |||
|
|||
/* Macros for the different TestExpress levels for expediting tests */ | |||
#define H5_TEST_EXPRESS_EXHAUSTIVE 0 /** Exhaustive run; tests should take as long as necessary */ | |||
#define H5_TEST_EXPRESS_FULL 1 /** Full run; tests should take no more than 30 minutes */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a documentation cleanup. The intention for a "full" run may have been 30 minutes at some point, but the current timeout value for both ctest
and for the alarm timer is 20 minutes by default, so a "full" run will always be kneecapped at 20 minutes anyway, unless the values are overridden.
@@ -461,10 +467,6 @@ PerformTests(void) | |||
MESSAGE(2, ("Testing -- %s (%s) \n", TestArray[Loop].Description, TestArray[Loop].Name)); | |||
MESSAGE(5, ("===============================================\n")); | |||
|
|||
if (TestAlarmOn() < 0) | |||
MESSAGE(5, ("Couldn't enable test alarm timer for test -- %s (%s) \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alarm timer set for tests has always (?) been per-subtest rather than per-test program, presumably to catch a particular hanging sub-test while allowing the rest of the tests to run. While this may be helpful for Autotools, where there's no real built-in mechanism for restricting the runtime of tests in general, it effectively goes against CMake's built-in timeout value in ctest
. If the alarm timer is set to its default 20 minutes, the test program is going to be killed by ctest
before a hanging test even hits this timer's timeout value in the first place. Adding to that, the HPC systems that get tested on often have a ~30 minute limit on jobs in the job queue where testing is usually run, meaning that an Autotools test run would get killed shortly after a single hanging test anyway.
This changes the logic so that the alarm timer is per-test program and is turned on almost immediately on test program start (in TestInit()
) so that there's at least a reasonable ability of reacting to testing timeouts in the future. This won't allow for continuing to run tests beyond one that hangs, but we really need functionality beyond what alarm(2)
offers to do that nicely in CMake, unless we simply disable testing timeouts there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the alarm timer is set to its default 20 minutes, the test program is going to be killed by ctest before a hanging test even hits this timer's timeout value in the first place.
Looking at the cmake configuration, it seems like the cmake timeouts are derived from the alarm timer value. Is this in reference to CMake starting a count to the same timeout value slightly earlier than the alarm (in the current per-test implementation) and always pre-empting it?
This won't allow for continuing to run tests beyond one that hangs, but we really need functionality beyond what alarm(2) offers to do that nicely in CMake, unless we simply disable testing timeouts there.
The ctest timeout documentation says that moving on to the next test after a timeout is already how it behaves. Are you talking about problems with continuing beyond hangs in entire test programs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the cmake configuration, it seems like the cmake timeouts are derived from the alarm timer value.
There may be some historical precedent for this, but the ctest timer and alarm timer are separate from each other. It's generally likely that, since the ctest timer will be active before the program runs, the ctest timer will expire before the alarm timer since they currently have the same timeout value in seconds.
Are you talking about problems with continuing beyond hangs in entire test programs?
Yes this is referring to continuing to run other subtests in a test program.
* tests to run for as long as necessary, so avoid enabling an | ||
* alarm-style timer here that would, by default, kill the test. | ||
*/ | ||
if (GetTestExpress() == H5_TEST_EXPRESS_EXHAUSTIVE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's generally handy to be able to perform manual runs of long-running tests like multi-threaded ones, while still allowing them to be restricted by the alarm(2)
timer for CI. This just avoids setting any timer at all if TestExpress is set to 0. While this allows a test to run forever if it hangs, there's no particular way of predicting how long of a timer to set to catch hangs in these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable
Make alarm(2) timer per-test program rather than per-subtest Avoid enabling alarm(2) timer when TestExpress is set to 0
Make alarm(2) timer per-test program rather than per-subtest Avoid enabling alarm(2) timer when TestExpress is set to 0
Make alarm(2) timer per-test program rather than per-subtest
Avoid enabling alarm(2) timer when TestExpress is set to 0