Description
Before reporting an issue
- I have read and understood the above terms for submitting issues, and I understand that my issue may be closed without action if I do not follow them.
Area
infinispan
Describe the bug
When using the UserSessionModel.removeAuthenticatedClientSessions(Collection removedClientUUIDS) API method (e.g., within a custom Ke 83E7 ycloak provider), the removal of client sessions from a user session is not consistently and immediately replicated across all nodes in a Keycloak cluster.
Instead, the changes appear to be localized to the node where the operation originated, and are only propagated to other cluster nodes (and persisted to the database, if applicable) once the size of removedClientUUIDS meets or exceeds an internal, undocumented threshold, MINIMUM_INACTIVE_CLIENT_SESSIONS_TO_CLEANUP.
This behaviour leads to:
-
Inconsistent Session State: Different nodes in the cluster can have divergent views of a user's active client sessions, potentially leading to incorrect authorisation decisions or UI display.
-
Functional Issues: Operations that rely on an up-to-date session state (e.g., userSession.authenticatedClientSessions.isNullOrEmpty()) may return incorrect results on nodes other than the one that processed the removal. This can manifest as users being unexpectedly logged out or unable to access resources due to stale session information.
-
Violation of API Contract: The public API implies an immediate and consistent removal, which is not upheld due to an internal optimization.
-
Difficult Debugging: The non-deterministic nature of the replication (depending on an internal threshold) makes troubleshooting extremely challenging.
The problematic code is the following method:
package org.keycloak.models.sessions.infinispan;
//...
public class UserSessionAdapter<T extends SessionRefreshStore & UserSessionProvider> implements UserSessionModel {
//..
@Override
public void removeAuthenticatedClientSessions(Collection<String> removedClientUUIDS) {
if (removedClientUUIDS == null || removedClientUUIDS.isEmpty()) {
return;
}
// Performance: do not remove the clientUUIDs from the user session until there is enough of them;
// an invalid session is handled as nonexistent in UserSessionAdapter.getAuthenticatedClientSessions()
if (removedClientUUIDS.size() >= MINIMUM_INACTIVE_CLIENT_SESSIONS_TO_CLEANUP) {
// Update user session
UserSessionUpdateTask task = new UserSessionUpdateTask() {
@Override
public void runUpdate(UserSessionEntity entity) {
removedClientUUIDS.forEach(entity.getAuthenticatedClientSessions()::remove);
}
@Override
public boolean isOffline() {
return offline;
}
};
update(task);
}
// do not iterate the removedClientUUIDS and remove the clientSession directly as the addTask can manipulate
// the collection being iterated, and that can lead to unpredictable behaviour (e.g. NPE)
List<UUID> clientSessionUuids = removedClientUUIDS.stream()
.map(entity.getAuthenticatedClientSessions()::get)
.filter(Objects::nonNull)
.collect(Collectors.toList());
clientSessionUuids.forEach(clientSessionId -> this.clientSessionUpdateTx.addTask(clientSessionId, Tasks.removeSync(offline)));
}
As long as entity.getAuthenticatedClientSessions()::remove is not invoked, the other nodes in the cluster are going to have an inconsistent list of authenticated clients.
Version
26.1.2
Regression
- The issue is a regression
Expected behavior
This behaviour at least should be documented, or a flag should possibly be introduced to "force propagate" immediate removal of the authenticated client session from the Infinispan UserSessionEntity object.
At least this behaviour should be documented.
Actual behavior
Unless the number of parameters provided to the method is below
private static final int MINIMUM_INACTIVE_CLIENT_SESSIONS_TO_CLEANUP = 5;
```,
the authenticated session removal does not happen.
### How to Reproduce?
1. Set up a 2-node Keycloak cluster using Docker Compose with jdbc-ping for JGroups. Ensure JGroups clustering is healthy (e.g., verify FD_SOCK2 connections and GMS views show all nodes).
2. Create a user and at least two client applications in Keycloak.
3. Log in the user to both client applications. This will create a UserSessionModel with two AuthenticatedClientSessionModel entries.
4. On Node 1: Access the UserSessionModel (e.g., via a custom EventListenerProvider or a REST endpoint you've exposed for testing) and call userSession.removeAuthenticatedClientSessions(listOf(clientId_of_client_1)). Verify on Node 1: Immediately check userSession.authenticatedClientSessions.isNullOrEmpty() or inspect the UserSessionModel on Node 1. It should reflect the removal.
5. On Node 2: Immediately check userSession.authenticatedClientSessions.isNullOrEmpty() or inspect the same UserSessionModel on Node 2.
On Node 1 (where the removeAuthenticatedClientSessions call was made), the UserSessionModel correctly reflects the removal.
On Node 2, the UserSessionModel does not reflect the removal. It still shows clientId_of_client_1 as present (or isNullOrEmpty() returns false if it was the only one).
The change only propagates to Node 2 (and other nodes) if the removedClientUUIDS collection passed to removeAuthenticatedClientSessions is large enough to meet an internal threshold, MINIMUM_INACTIVE_CLIENT_SESSIONS_TO_CLEANUP. This threshold appears to gate the actual persistence/replication of the UserSessionEntity itself.
### Anything else?
Workaround:
1. Sticky sessions
2. "Pad" the removedClientUUIDS list with dummy entries (e.g. empty strings) until its size meets or exceeds the MINIMUM_INACTIVE_CLIENT_SESSIONS_TO_CLEANUP threshold.