8000 UserSessionModel.removeAuthenticatedClientSessions() does not consistently replicate changes across cluster nodes · Issue #40998 · keycloak/keycloak · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
UserSessionModel.removeAuthenticatedClientSessions() does not consistently replicate changes across cluster nodes #40998
Open
@atomcat1978

Description

@atomcat1978

Before reporting an issue

  • I have read and understood the above terms for submitting issues, and I understand that my issue may be closed without action if I do not follow them.

Area

infinispan

Describe the bug

When using the UserSessionModel.removeAuthenticatedClientSessions(Collection removedClientUUIDS) API method (e.g., within a custom Ke 83E7 ycloak provider), the removal of client sessions from a user session is not consistently and immediately replicated across all nodes in a Keycloak cluster.

Instead, the changes appear to be localized to the node where the operation originated, and are only propagated to other cluster nodes (and persisted to the database, if applicable) once the size of removedClientUUIDS meets or exceeds an internal, undocumented threshold, MINIMUM_INACTIVE_CLIENT_SESSIONS_TO_CLEANUP.

This behaviour leads to:

  • Inconsistent Session State: Different nodes in the cluster can have divergent views of a user's active client sessions, potentially leading to incorrect authorisation decisions or UI display.

  • Functional Issues: Operations that rely on an up-to-date session state (e.g., userSession.authenticatedClientSessions.isNullOrEmpty()) may return incorrect results on nodes other than the one that processed the removal. This can manifest as users being unexpectedly logged out or unable to access resources due to stale session information.

  • Violation of API Contract: The public API implies an immediate and consistent removal, which is not upheld due to an internal optimization.

  • Difficult Debugging: The non-deterministic nature of the replication (depending on an internal threshold) makes troubleshooting extremely challenging.

The problematic code is the following method:

package org.keycloak.models.sessions.infinispan;

//...

public class UserSessionAdapter<T extends SessionRefreshStore & UserSessionProvider> implements UserSessionModel {

//..

    @Override
    public void removeAuthenticatedClientSessions(Collection<String> removedClientUUIDS) {
        if (removedClientUUIDS == null || removedClientUUIDS.isEmpty()) {
            return;
        }

        // Performance: do not remove the clientUUIDs from the user session until there is enough of them;
        // an invalid session is handled as nonexistent in UserSessionAdapter.getAuthenticatedClientSessions()
        if (removedClientUUIDS.size() >= MINIMUM_INACTIVE_CLIENT_SESSIONS_TO_CLEANUP) {
            // Update user session
            UserSessionUpdateTask task = new UserSessionUpdateTask() {
                @Override
                public void runUpdate(UserSessionEntity entity) {
                    removedClientUUIDS.forEach(entity.getAuthenticatedClientSessions()::remove);
                }

                @Override
                public boolean isOffline() {
                    return offline;
                }
            };
            update(task);
        }

        // do not iterate the removedClientUUIDS and remove the clientSession directly as the addTask can manipulate
        // the collection being iterated, and that can lead to unpredictable behaviour (e.g. NPE)
        List<UUID> clientSessionUuids = removedClientUUIDS.stream()
                .map(entity.getAuthenticatedClientSessions()::get)
                .filter(Objects::nonNull)
                .collect(Collectors.toList());

        clientSessionUuids.forEach(clientSessionId -> this.clientSessionUpdateTx.addTask(clientSessionId, Tasks.removeSync(offline)));
    }

As long as entity.getAuthenticatedClientSessions()::remove is not invoked, the other nodes in the cluster are going to have an inconsistent list of authenticated clients.

Version

26.1.2

Regression

  • The issue is a regression

Expected behavior

This behaviour at least should be documented, or a flag should possibly be introduced to "force propagate" immediate removal of the authenticated client session from the Infinispan UserSessionEntity object.

At least this behaviour should be documented.

Actual behavior

Unless the number of parameters provided to the method is below

    private static final int MINIMUM_INACTIVE_CLIENT_SESSIONS_TO_CLEANUP = 5;
```,
the authenticated session removal does not happen.

### How to Reproduce?

1. Set up a 2-node Keycloak cluster using Docker Compose with jdbc-ping for JGroups. Ensure JGroups clustering is healthy (e.g., verify FD_SOCK2 connections and GMS views show all nodes).
2. Create a user and at least two client applications in Keycloak.
3. Log in the user to both client applications. This will create a UserSessionModel with two AuthenticatedClientSessionModel entries.
4. On Node 1: Access the UserSessionModel (e.g., via a custom EventListenerProvider or a REST endpoint you've exposed for testing) and call userSession.removeAuthenticatedClientSessions(listOf(clientId_of_client_1)). Verify on Node 1: Immediately check userSession.authenticatedClientSessions.isNullOrEmpty() or inspect the UserSessionModel on Node 1. It should reflect the removal.
5. On Node 2: Immediately check userSession.authenticatedClientSessions.isNullOrEmpty() or inspect the same UserSessionModel on Node 2.

On Node 1 (where the removeAuthenticatedClientSessions call was made), the UserSessionModel correctly reflects the removal.

On Node 2, the UserSessionModel does not reflect the removal. It still shows clientId_of_client_1 as present (or isNullOrEmpty() returns false if it was the only one).

The change only propagates to Node 2 (and other nodes) if the removedClientUUIDS collection passed to removeAuthenticatedClientSessions is large enough to meet an internal threshold, MINIMUM_INACTIVE_CLIENT_SESSIONS_TO_CLEANUP. This threshold appears to gate the actual persistence/replication of the UserSessionEntity itself.

### Anything else?

Workaround:

1. Sticky sessions
2. "Pad" the removedClientUUIDS list with dummy entries (e.g. empty strings) until its size meets or exceeds the MINIMUM_INACTIVE_CLIENT_SESSIONS_TO_CLEANUP threshold.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0