8000 drivers:timer:jh7110: prevent potential race by Willmish · Pull Request #403 · au-ts/sddf · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

drivers:timer:jh7110: prevent potential race #403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 2, 2025

Conversation

Willmish
Copy link
Contributor

While writing the zcu102 timer driver (also using 2 32-bit counters) I noticed that we might have a race condition in this driver. The following interleaving would have resulted in a wrong time reading, when processing timeouts off by up to 2^32-1 ticks in the future (potentially triggering multiple timeouts falsely):

  1. IRQ for TIMEOUT counter
  2. enter get_ticks_in_ns()
  3. store value_l as value from the other counter (not the TIMEOUT one) register
  4. storevalue_h as read from counter_timer_elapses global
  5. !! IRQ of OVERFLOW for the counter !! (pending, not handled)
  6. check if IRQ pending for OVERFLOW counter
    1. yes -> increment value_h . But value_l read before overflow!
  7. now new value_h and old value_l give a value in the future, off by up to $2^{32}-1$ ticks!

After the fix, the same scenario does not have an issue:

  1. IRQ for TIMEOUT counter
  2. enter get_ticks_in_ns()
  3. store value_l as value from the other counter (not the TIMEOUT one) register
  4. storevalue_h as read from counter_timer_elapses global
  5. !! IRQ of OVERFLOW for the counter !! (pending, not handled)
  6. check if IRQ pending for OVERFLOW counter
    1. yes -> increment value_h. **UPDATE value_l with most recent reading from the overflowed counter.
  7. now new value_h and new value_l give a correct value for time

I have not yet checked other timer drivers, but pretty sure they all use 64-bit counters so we should be fine.

@Willmish
8000 Copy link
Contributor Author
Willmish commented Apr 14, 2025

I've managed to trigger a potential scenario when this could happen (I say could, cause IRQ might have been before the first read of value_l, in which case the current code works), but had to spam timeout IRQs (1 us):

...
CLIENT|INFO: Now the time (in nanoseconds) is: 1592162047041 i: 129000000
CLIENT|INFO: Now the time (in nanoseconds) is: 1604504365750 i: 130000000
potential race condition avoided!
CLIENT|INFO: Now the time (in nanoseconds) is: 1616849453875 i: 131000000
CLIENT|INFO: Now the time (in nanoseconds) is: 1629191819625 i: 132000000
...

the print statement is added here:

diff --git a/drivers/timer/jh7110/timer.c b/drivers/timer/jh7110/timer.c
index 321e8b68..80ae9a89 100644
--- a/drivers/timer/jh7110/timer.c
+++ b/drivers/timer/jh7110/timer.c
@@ -88,6 +88,7 @@ static uint64_t get_ticks_in_ns(void)
     if (counter_regs->intclr == 1) {
         value_h += 1;
         value_l = (uint64_t)(STARFIVE_TIMER_MAX_TICKS - counter_regs->value);
+        sddf_dprintf("potential race condition avoided!\n");
     }

     uint64_t value_ticks = (value_h << 32) | value_l;

@Ivan-Velickovic
Copy link
Collaborator

Good find, thank you for the detailed write-up.

It would be good to include some kind of assert/check in the example that triggers the race condition in case we have it in any future drivers.

@Willmish
Copy link
Contributor Author

As mentioned when we talked, it might be difficult to trigger it intentionally. It requires a 32-bit counter to overflow, and depending on device's clock speed it could take a significant amount of time to happen (for this driver ~5min, for the zcu102 driver with 100MHz clock speed - ~42s) and there is very little chance that the race actually happens, had this happen once in ~15-20 overflows with constant 1us timeouts

I suggest we either have a separate test to try and catch it, or have design documentation for this 32-bit counter style drivers?

@@ -87,6 +87,7 @@ static uint64_t get_ticks_in_ns(void)
/* Include unhandled interrupt in value_h */
Copy link
Collaborator
@Ivan-Velickovic Ivan-Velickovic May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* Include unhandled interrupt in value_h */
/* Account for pending counter IRQ */

The following interleaving would have resulted in a wrong time reading,
when processing timeouts off by up to 2^32-1 ticks in the future
(potentially triggering mutiple timeouts falsely):

1. IRQ for TIMEOUT counter
2. enter `get_ticks_in_ns()`
4. store `value_l` as value from the other counter (not the TIMEOUT one) register
3. store`value_h` as read from `counter_timer_elapses` global
5. **!! IRQ of OVERFLOW  for the counter !! (pending, not handled)**
6. check if IRQ pending for OVERFLOW counter
	1. yes -> increment `value_h` . But `value_l` read before overflow!
7. now new `value_h` and old `value_l` give a value in the future, off by up to $2^{32}-1$ ticks!

Signed-off-by: Szymon Duchniewicz <s.duchniewicz@unsw.edu.au>
Signed-off-by: Ivan-Velickovic <i.velickovic@unsw.edu.au>
@Ivan-Velickovic Ivan-Velickovic force-pushed the szymon/jh7110_racecond_fix branch from d12d29c to 1989003 Compare May 2, 2025 03:15
@Ivan-Velickovic Ivan-Velickovic enabled auto-merge (rebase) May 2, 2025 03:16
@Ivan-Velickovic Ivan-Velickovic merged commit c0407d2 into main May 2, 2025
7 checks passed
@Ivan-Velickovic Ivan-Velickovic deleted the szymon/jh7110_racecond_fix branch May 2, 2025 03:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0