drivers:timer:jh7110: prevent potential race #403

Willmish · 2025-04-14T04:07:18Z

While writing the zcu102 timer driver (also using 2 32-bit counters) I noticed that we might have a race condition in this driver. The following interleaving would have resulted in a wrong time reading, when processing timeouts off by up to 2^32-1 ticks in the future (potentially triggering multiple timeouts falsely):

IRQ for TIMEOUT counter
enter get_ticks_in_ns()
store value_l as value from the other counter (not the TIMEOUT one) register
storevalue_h as read from counter_timer_elapses global
!! IRQ of OVERFLOW for the counter !! (pending, not handled)
check if IRQ pending for OVERFLOW counter
1. yes -> increment value_h . But value_l read before overflow!
now new value_h and old value_l give a value in the future, off by up to $2^{32}-1$ ticks!

After the fix, the same scenario does not have an issue:

IRQ for TIMEOUT counter
enter get_ticks_in_ns()
store value_l as value from the other counter (not the TIMEOUT one) register
storevalue_h as read from counter_timer_elapses global
!! IRQ of OVERFLOW for the counter !! (pending, not handled)
check if IRQ pending for OVERFLOW counter
1. yes -> increment value_h. **UPDATE value_l with most recent reading from the overflowed counter.
now new value_h and new value_l give a correct value for time

I have not yet checked other timer drivers, but pretty sure they all use 64-bit counters so we should be fine.

Willmish · 2025-04-14T04:32:18Z

I've managed to trigger a potential scenario when this could happen (I say could, cause IRQ might have been before the first read of value_l, in which case the current code works), but had to spam timeout IRQs (1 us):

...
CLIENT|INFO: Now the time (in nanoseconds) is: 1592162047041 i: 129000000
CLIENT|INFO: Now the time (in nanoseconds) is: 1604504365750 i: 130000000
potential race condition avoided!
CLIENT|INFO: Now the time (in nanoseconds) is: 1616849453875 i: 131000000
CLIENT|INFO: Now the time (in nanoseconds) is: 1629191819625 i: 132000000
...

the print statement is added here:

diff --git a/drivers/timer/jh7110/timer.c b/drivers/timer/jh7110/timer.c
index 321e8b68..80ae9a89 100644
--- a/drivers/timer/jh7110/timer.c
+++ b/drivers/timer/jh7110/timer.c
@@ -88,6 +88,7 @@ static uint64_t get_ticks_in_ns(void)
     if (counter_regs->intclr == 1) {
         value_h += 1;
         value_l = (uint64_t)(STARFIVE_TIMER_MAX_TICKS - counter_regs->value);
+        sddf_dprintf("potential race condition avoided!\n");
     }

     uint64_t value_ticks = (value_h << 32) | value_l;

Ivan-Velickovic · 2025-04-28T00:36:28Z

Good find, thank you for the detailed write-up.

It would be good to include some kind of assert/check in the example that triggers the race condition in case we have it in any future drivers.

Willmish · 2025-04-30T05:45:02Z

As mentioned when we talked, it might be difficult to trigger it intentionally. It requires a 32-bit counter to overflow, and depending on device's clock speed it could take a significant amount of time to happen (for this driver ~5min, for the zcu102 driver with 100MHz clock speed - ~42s) and there is very little chance that the race actually happens, had this happen once in ~15-20 overflows with constant 1us timeouts

I suggest we either have a separate test to try and catch it, or have design documentation for this 32-bit counter style drivers?

Ivan-Velickovic · 2025-05-02T00:12:28Z

drivers/timer/jh7110/timer.c

@@ -87,6 +87,7 @@ static uint64_t get_ticks_in_ns(void)
    /* Include unhandled interrupt in value_h */


Suggested change

/* Include unhandled interrupt in value_h */

/* Account for pending counter IRQ */

The following interleaving would have resulted in a wrong time reading, when processing timeouts off by up to 2^32-1 ticks in the future (potentially triggering mutiple timeouts falsely): 1. IRQ for TIMEOUT counter 2. enter `get_ticks_in_ns()` 4. store `value_l` as value from the other counter (not the TIMEOUT one) register 3. store`value_h` as read from `counter_timer_elapses` global 5. **!! IRQ of OVERFLOW for the counter !! (pending, not handled)** 6. check if IRQ pending for OVERFLOW counter 1. yes -> increment `value_h` . But `value_l` read before overflow! 7. now new `value_h` and old `value_l` give a value in the future, off by up to $2^{32}-1$ ticks! Signed-off-by: Szymon Duchniewicz <s.duchniewicz@unsw.edu.au> Signed-off-by: Ivan-Velickovic <i.velickovic@unsw.edu.au>

Willmish requested a review from Ivan-Velickovic April 14, 2025 04:17

Willmish mentioned this pull request Apr 30, 2025

drivers:timer:cdns zcu102 timer driver using cadence TTC device #407

Open

6 tasks

Ivan-Velickovic reviewed May 2, 2025

View reviewed changes

Ivan-Velickovic force-pushed the szymon/jh7110_racecond_fix branch from d12d29c to 1989003 Compare May 2, 2025 03:15

Ivan-Velickovic approved these changes May 2, 2025

View reviewed changes

Ivan-Velickovic enabled auto-merge (rebase) May 2, 2025 03:16

Ivan-Velickovic merged commit c0407d2 into main May 2, 2025
7 checks passed

Ivan-Velickovic deleted the szymon/jh7110_racecond_fix branch May 2, 2025 03:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

drivers:timer:jh7110: prevent potential race #403

drivers:timer:jh7110: prevent potential race #403

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		@@ -87,6 +87,7 @@ static uint64_t get_ticks_in_ns(void)
		/* Include unhandled interrupt in value_h */

	/* Include unhandled interrupt in value_h */
	/* Account for pending counter IRQ */

drivers:timer:jh7110: prevent potential race #403

drivers:timer:jh7110: prevent potential race #403

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!