8000 fix: Deadlock during cost calculation by dadrus · Pull Request #173 · jellydator/ttlcache · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fix: Deadlock during cost calculation #173

8000
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: v3
Choose a base branch
from

Conversation

dadrus
Copy link
Contributor
@dadrus dadrus commented May 16, 2025< 8000 /relative-time>

Unfortunately, the implementation, I've done in #152 can cause a deadlock during cost calculation:

The update method of the Item locks an RW lock and calls the item.calculateCost(item), which if configured, may make use of the available getter functions of the item object. All these functions make however use of the R lock - thus we have a deadlock.

This PR updates the implementation of the Item and performs the calculation of the item cost on a snapshot rather the actual update.

The test has been updated as well to use a public method. Without the update to the Item implementation the corresponding test will run into a deadlock

@coveralls
Copy link
coveralls commented May 16, 2025

Pull Request Test Coverage Report for Build 15447941855

Details

  • 15 of 15 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.001%) to 99.733%

Totals Coverage Status
Change from base Build 15260220298: 0.001%
Covered Lines: 747
Relevant Lines: 749

💛 - Coveralls

@dadrus
Copy link
Contributor Author
dadrus commented May 16, 2025

@swithek: Would you please review?

@swithek
Copy link
Contributor
swithek commented May 21, 2025

@dadrus thanks for the PR. I'll try to review it by the end of this week.

item_test.go Outdated
@@ -152,7 +152,7 @@ func Test_Item_update(t *testing.T) {
uc: "with version calculation and version tracking",
opts: []itemOption[string, string]{
withVersionTracking[string, string](true),
withCostFunc[string, string](func(item *Item[string, string]) uint64 { return uint64(len(item.value)) }),
withCostFunc[string, string](func(item *Item[string, string]) uint64 { return uint64(len(item.Value())) }),
Copy link
Contributor
@davseby davseby May 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should modify the CostFunc signature. Can we introduce a new type called CostItem[K, V], which would be passed to the cost function instead of re-using *Item[K,V]? The feature wasn't yet released, we should be fine making this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking about a similar thing, but was reluctant to directly implement it. My idea was, respectively is, however not introducing a special CostItem, but moving the actual content related attributes into an internal struct. That way we could use atomics instead of locks in the update and the getter functions. What do you think about this idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @davseby. The current implementation of CostFunc would work better if instead of passing the whole Item, we passed just the Item.Value() along with the key. The CostItem[K, V] would be a wrapper around them without any extra data fields (e.g., expiration time, version etc.). This way we wouldn't have to deal with any deadlocks or snapshot modes.

As an idea, the snapshot mode is quite good. However, it introduces some new complexity that's not needed by the current state of this library. Besides, that would also contribute to the general maintenance overhead.

So to sum up, the fix/changes for this would be:

// add CostItem, it would act as a special snapshot for the cost func
type CostItem[K comparable, V any] struct {
    Key K
    Value V
}

// adjust the CostFunc
type CostFunc[K comparable, V any] func(item *CostItem[K, V]) uint64

// adjust item.calculateCost() calls
item.mu.Lock()
defer item.mu.Unlock()

costItem := &CostItem[K, V]{
    Key: item.key,
    Value: item.value
}

item.calculateCost(costItem)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dadrus is this something you could add to your PR? I could open a new PR if you feel like this is out of scope here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I'm however on a conference the entire week and not sure whether I'll be able to get to it until I'm back home.

Copy link
Contributor Author
@dadrus dadrus Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davseby, @swithek: The new commits implement your suggestion. Please take a look.

I've used CostItem as value to guarantee no memory is allocated on the heap, even though no object escape can happen if a pointer to CostItem would have been used in the signature of the calculateCost function - at least as long as the user of the library does not store the object for some reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense. What do you think @davseby?

@dadrus dadrus requested review from swithek and davseby June 3, 2025 08:27
Copy link
Contributor
@swithek swithek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, one small change is needed. Also, you might need to merge/rebase your branch on the latest v3 branch, because there have been some changes there that might affect some of your code (at least the tests).

README.md Outdated
ttlcache.WithMaxCost[string, string](5120, func(item *ttlcache.Item[string, string]) uint64 {
return uint64(size.Of(item))
ttlcache.WithMaxCost[string, string](5120, func(item ttlcache.CostItem[string, string]) uint64 {
// The cache maintains internal structures averaging ~144 bytes per entry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it's really nice that you thoroughly explain the internal sizes, I think making these internal structures part of the cost calculation might lead to unexpected results and feels a bit like an anti-pattern. One of the reasons why I suggested using a simple CostItem struct here instead of the whole Item pointer is because some item metadata fields, like the expiration time or version, might change without triggering a cost recalculation. This means that the most accurate calculation of an item's cost would be the one that uses only its key and value (aka, the "accessible" data).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my suggestion here would be to remove the "176" and the explanation (but again, the explanation and your effort are really appreciated).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Will update it asap. A small comment from my side to

... some item metadata fields, like the expiration time or version, might change without triggering a cost recalculation

That's definitely true, but it has no effect on the amount of the used memory ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest commit includes the update, along with a comment to help manage expectations. Let me know if you'd prefer it removed as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw. updates from the v3 branch were already included in b5bd076 and there were no new updates since then.

@dadrus dadrus requested a review from swithek June 4, 2025 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0