Minor adjustments to usage of alarm(2) timer in tests #5292

jhendersonHDF · 2025-02-04T23:11:55Z

Make alarm(2) timer per-test program rather than per-subtest

Avoid enabling alarm(2) timer when TestExpress is set to 0

Make alarm(2) timer per-test program rather than per-subtest Avoid enabling alarm(2) timer when TestExpress is set to 0

jhendersonHDF · 2025-02-04T23:13:54Z

test/h5test.h

@@ -204,7 +204,7 @@ H5TEST_DLLVAR MPI_Info h5_io_info_g; /* MPI INFO object for IO */

 /* Macros for the different TestExpress levels for expediting tests */
 #define H5_TEST_EXPRESS_EXHAUSTIVE 0 /** Exhaustive run; tests should take as long as necessary */
-#define H5_TEST_EXPRESS_FULL       1 /** Full run; tests should take no more than 30 minutes    */


This is just a documentation cleanup. The intention for a "full" run may have been 30 minutes at some point, but the current timeout value for both ctest and for the alarm timer is 20 minutes by default, so a "full" run will always be kneecapped at 20 minutes anyway, unless the values are overridden.

jhendersonHDF · 2025-02-04T23:27:10Z

test/testframe.c

@@ -461,10 +467,6 @@ PerformTests(void)
        MESSAGE(2, ("Testing  -- %s (%s) \n", TestArray[Loop].Description, TestArray[Loop].Name));
        MESSAGE(5, ("===============================================\n"));

-        if (TestAlarmOn() < 0)
-            MESSAGE(5, ("Couldn't enable test alarm timer for test -- %s (%s) \n",


The alarm timer set for tests has always (?) been per-subtest rather than per-test program, presumably to catch a particular hanging sub-test while allowing the rest of the tests to run. While this may be helpful for Autotools, where there's no real built-in mechanism for restricting the runtime of tests in general, it effectively goes against CMake's built-in timeout value in ctest. If the alarm timer is set to its default 20 minutes, the test program is going to be killed by ctest before a hanging test even hits this timer's timeout value in the first place. Adding to that, the HPC systems that get tested on often have a ~30 minute limit on jobs in the job queue where testing is usually run, meaning that an Autotools test run would get killed shortly after a single hanging test anyway.

This changes the logic so that the alarm timer is per-test program and is turned on almost immediately on test program start (in TestInit()) so that there's at least a reasonable ability of reacting to testing timeouts in the future. This won't allow for continuing to run tests beyond one that hangs, but we really need functionality beyond what alarm(2) offers to do that nicely in CMake, unless we simply disable testing timeouts there.

If the alarm timer is set to its default 20 minutes, the test program is going to be killed by ctest before a hanging test even hits this timer's timeout value in the first place.

Looking at the cmake configuration, it seems like the cmake timeouts are derived from the alarm timer value. Is this in reference to CMake starting a count to the same timeout value slightly earlier than the alarm (in the current per-test implementation) and always pre-empting it?

This won't allow for continuing to run tests beyond one that hangs, but we really need functionality beyond what alarm(2) offers to do that nicely in CMake, unless we simply disable testing timeouts there.

The ctest timeout documentation says that moving on to the next test after a timeout is already how it behaves. Are you talking about problems with continuing beyond hangs in entire test programs?

Looking at the cmake configuration, it seems like the cmake timeouts are derived from the alarm timer value.

There may be some historical precedent for this, but the ctest timer and alarm timer are separate from each other. It's generally likely that, since the ctest timer will be active before the program runs, the ctest timer will expire before the alarm timer since they currently have the same timeout value in seconds.

Are you talking about problems with continuing beyond hangs in entire test programs?

Yes this is referring to continuing to run other subtests in a test program.

jhendersonHDF · 2025-02-04T23:30:28Z

test/testframe.c

+     * tests to run for as long as necessary, so avoid enabling an
+     * alarm-style timer here that would, by default, kill the test.
+     */
+    if (GetTestExpress() == H5_TEST_EXPRESS_EXHAUSTIVE)


It's generally handy to be able to perform manual runs of long-running tests like multi-threaded ones, while still allowing them to be restricted by the alarm(2) timer for CI. This just avoids setting any timer at all if TestExpress is set to 0. While this allows a test to run forever if it hangs, there's no particular way of predicting how long of a timer to set to catch hangs in these cases.

qkoziol

Looks reasonable

Make alarm(2) timer per-test program rather than per-subtest Avoid enabling alarm(2) timer when TestExpress is set to 0

jhendersonHDF added Priority - 3. Low 🔽 Code cleanup, small feature change requests, etc. Component - Testing Code in test or testpar directories, GitHub workflows Type - Improvement Improvements that don't add a new feature or functionality labels Feb 4, 2025

jhendersonHDF requested review from lrknox, derobins, byrnHDF, fortnern, qkoziol, vchoi-hdfgroup, bmribler, glennsong09, mattjala and brtnfld as code owners February 4, 2025 23:11

Minor adjustments to usage of alarm(2) timer in tests

55d094a

Make alarm(2) timer per-test program rather than per-subtest Avoid enabling alarm(2) timer when TestExpress is set to 0

jhendersonHDF force-pushed the test_alarm_changes branch from b5a4e29 to 55d094a Compare February 4, 2025 23:13

jhendersonHDF commented Feb 4, 2025

View reviewed changes

mattjala approved these changes Feb 5, 2025

View reviewed changes

qkoziol approved these changes Feb 5, 2025

View reviewed changes

lrknox approved these changes Feb 5, 2025

View reviewed changes

lrknox merged commit 354994a into HDFGroup:develop Feb 5, 2025
76 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor adjustments to usage of alarm(2) timer in tests #5292

Minor adjustments to usage of alarm(2) timer in tests #5292

jhendersonHDF commented Feb 4, 2025

jhendersonHDF Feb 4, 2025

jhendersonHDF Feb 4, 2025 •

edited

Loading

mattjala Feb 5, 2025

jhendersonHDF Feb 5, 2025

jhendersonHDF Feb 4, 2025

qkoziol left a comment

Minor adjustments to usage of alarm(2) timer in tests #5292

Minor adjustments to usage of alarm(2) timer in tests #5292

Conversation

jhendersonHDF commented Feb 4, 2025

jhendersonHDF Feb 4, 2025

Choose a reason for hiding this comment

jhendersonHDF Feb 4, 2025 • edited Loading

Choose a reason for hiding this comment

mattjala Feb 5, 2025

Choose a reason for hiding this comment

jhendersonHDF Feb 5, 2025

Choose a reason for hiding this comment

jhendersonHDF Feb 4, 2025

Choose a reason for hiding this comment

qkoziol left a comment

Choose a reason for hiding this comment

jhendersonHDF Feb 4, 2025 •

edited

Loading