10000 Zombie process after Upgrade 2.9.5 to 2.10.1 · Issue #7929 · SSSD/sssd · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Zombie process after Upgrade 2.9.5 to 2.10.1 #7929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Ergamon opened this issue Apr 20, 2025 · 5 comments
Open

Zombie process after Upgrade 2.9.5 to 2.10.1 #7929

Ergamon opened this issue Apr 20, 2025 · 5 comments

Comments

@Ergamon
Copy link
Ergamon commented Apr 20, 2025

Hi,

I upgraded some of our test servers from Ubuntu 24.10 to 25.04, which is basically a version change for SSSD from 2.9.5 to 2.10.1.

So far everything works like before, i cannot find out any problems.

Most of the servers greet me with the message:

  => There is 1 zombie process.

When I investigate for the process I find this:

ps aux | grep Z
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      181316  0.0  0.0      0     0 ?        Z    Apr19   0:00 [ldap_child] <defunct>

Can anyone explain what is going on and how I can resolve this or how to investigate this further?

As said, everything works as expected so far, but still I would prefer not to have this.

Thanks

@alexey-tikhonov
Copy link
Member

Hi.

Any errors in /var/log/sssd/sssd.log or sssd_$domain.log?

@Ergamon
Copy link
Author
Ergamon commented Apr 22, 2025

I have a machine with the old sssd version (2.9.5, ubuntu 24.10) and did a restart (systemctl restart sssd) on both to get fresh logs.

There is a minor error in the sssd.log complaining about the config_file_version attribute. I tested on a clean install of Ubuntu 25.04, the config_file_version entry is created by the command:

sudo realm join DOMAIN

So obviously these dont match, but I dont think there is a problem (I deleted the line from my config manually with no change).

The domain log starts with an error:

(2025-04-20 20:45:56): [be[domain]] [ad_gpo_store_policy_settings] (0x0020): [RID#13] [/var/lib/sss/gpo_cache/domain/Policies/{AFD4C257-BB49-4096-8062-E5105170C8A7}/Machine/Microsoft/Windows NT/SecEdit/GptTmpl.inf]: ini_config_parse failed [5][Input/output error]
   *  ... skipping repetitive backtrace ...
(2025-04-20 20:45:56): [be[domain]] [ad_gpo_store_policy_settings] (0x0020): [RID#13] Error (5) on line 7: Equal sign is missing.
   *  ... skipping repetitive backtrace ...
(2025-04-20 20:45:56): [be[domain]] [ad_gpo_store_policy_settings] (0x0020): [RID#13] Error (5) on line 8: Equal sign is missing.
   *  ... skipping repetitive backtrace ...

but I see the same output with 2.9.5 and there it does not show the same problem.

This seems to me the moment when the problem occurs (but I might be wrong):

   *  (2025-04-23  0:06:53): [be[domain]] [child_handler_setup] (0x2000): [RID#20] Setting up signal handler up for pid [1289817]
   *  (2025-04-23  0:06:53): [be[domain]] [child_handler_setup] (0x2000): [RID#20] Signal handler set up for pid [1289817]
   *  (2025-04-23  0:06:53): [be[domain]] [child_sig_handler] (0x1000): [RID#20] Waiting for child [1289817].
   *  (2025-04-23  0:06:53): [be[domain]] [child_sig_handler] (0x0020): [RID#20] waitpid did not find a child with changed status.

Other than that I dont see any real errors.

@alexey-tikhonov
Copy link
Member
  • (2025-04-23 0:06:53): [be[domain]] [child_handler_setup] (0x2000): [RID#20] Setting up signal handler up for pid [1289817]
  • (2025-04-23 0:06:53): [be[domain]] [child_handler_setup] (0x2000): [RID#20] Signal handler set up for pid [1289817]
  • (2025-04-23 0:06:53): [be[domain]] [child_sig_handler] (0x1000): [RID#20] Waiting for child [1289817].
  • (2025-04-23 0:06:53): [be[domain]] [child_sig_handler] (0x0020): [RID#20] waitpid did not find a child with changed status.

This looks to be the problem.

Could you please quote log above, starting with first mention of 'child' process, and a pid of [ldap_child] <defunct>?

@Ergamon
Copy link
Author
Ergamon commented Apr 23, 2025

HI thanks for your reply.
There is not much more related to this in the log:

(2025-04-23 17:42:20): [be[domain]] [ad_gpo_store_policy_settings] (0x0020): [RID#7] Error (5) on line 7: Equal sign is missing.
(2025-04-23 17:42:20): [be[domain]] [ad_gpo_store_policy_settings] (0x0020): [RID#7] Error (5) on line 8: Equal sign is missing.
   *  ... skipping repetitive backtrace ...
(2025-04-23 17:42:20): [be[domain]] [child_sig_handler] (0x0020): [RID#7] waitpid did not find a child with changed status.
********************** PREVIOUS MESSAGE WAS TRIGGERED BY THE FOLLOWING BACKTRACE:
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_store_policy_settings] (0x4000): [RID#7] allow_key = SeInteractiveLogonRight
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_extract_policy_setting] (0x4000): [RID#7] section/name not found: [Privilege Rights][SeInteractiveLogonRight]
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_store_policy_settings] (0x4000): [RID#7] deny_key = SeDenyInteractiveLogonRight
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_extract_policy_setting] (0x4000): [RID#7] section/name not found: [Privilege Rights][SeDenyInteractiveLogonRight]
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_store_policy_settings] (0x4000): [RID#7] allow_key = SeRemoteInteractiveLogonRight
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_extract_policy_setting] (0x4000): [RID#7] section/name not found: [Privilege Rights][SeRemoteInteractiveLogonRight]
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_store_policy_settings] (0x4000): [RID#7] deny_key = SeDenyRemoteInteractiveLogonRight
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_extract_policy_setting] (0x4000): [RID#7] section/name not found: [Privilege Rights][SeDenyRemoteInteractiveLogonRight]
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_store_policy_settings] (0x4000): [RID#7] allow_key = SeNetworkLogonRight
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_extract_policy_setting] (0x4000): [RID#7] section/name not found: [Privilege Rights][SeNetworkLogonRight]
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_store_policy_settings] (0x4000): [RID#7] deny_key = SeDenyNetworkLogonRight
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_extract_policy_setting] (0x4000): [RID#7] section/name not found: [Privilege Rights][SeDenyNetworkLogonRight]
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_store_policy_settings] (0x4000): [RID#7] allow_key = SeBatchLogonRight
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_extract_policy_setting] (0x4000): [RID#7] section/name not found: [Privilege Rights][SeBatchLogonRight]
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_store_policy_settings] (0x4000): [RID#7] deny_key = SeDenyBatchLogonRight
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_extract_policy_setting] (0x4000): [RID#7] section/name not found: [Privilege Rights][SeDenyBatchLogonRight]
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_store_policy_settings] (0x4000): [RID#7] allow_key = SeServiceLogonRight
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_extract_policy_setting] (0x4000): [RID#7] section/name not found: [Privilege Rights][SeServiceLogonRight]
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_store_policy_settings] (0x4000): [RID#7] deny_key = SeDenyServiceLogonRight
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_extract_policy_setting] (0x4000): [RID#7] section/name not found: [Privilege Rights][SeDenyServiceLogonRight]
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_cse_step] (0x0400): [RID#7] cse filtered_gpos[1]->gpo_guid is {31B2F340-016D-11D2-945F-00C04FB984F9}
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_cse_step] (0x4000): [RID#7] cse_filtered_gpos[1]->gpo_cse_guids[0]->gpo_guid is {35378EAC-683F-11D2-A89A-00C04FBBCFA2}
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_cse_step] (0x4000): [RID#7] cse_filtered_gpos[1]->gpo_cse_guids[1]->gpo_guid is {827D319E-6EAC-11D2-A4EA-00C04F79F83A}
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_cse_step] (0x4000): [RID#7] cse_filtered_gpos[1]->gpo_cse_guids[2]->gpo_guid is {B1BE8D72-6EAC-11D2-A4EA-00C04F79F83A}
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_cse_step] (0x0400): [RID#7] smb_server: smb://svrzadom001.domain
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_cse_step] (0x0400): [RID#7] smb_share: /sysvol
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_cse_step] (0x0400): [RID#7] smb_path: /domain/Policies/{31B2F340-016D-11D2-945F-00C04FB984F9}
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_cse_step] (0x0400): [RID#7] gpo_guid: {31B2F340-016D-11D2-945F-00C04FB984F9}
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_cse_step] (0x0400): [RID#7] retrieving GPO from cache [{31B2F340-016D-11D2-945F-00C04FB984F9}]
   *  (2025-04-23 17:42:20): [be[domain]] [sysdb_gpo_get_gpo_by_guid] (0x4000): [RID#7] cn=gpos,cn=ad,cn=custom,cn=domain,cn=sysdb
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_cse_step] (0x0400): [RID#7] send_to_child: 1
   *  (2025-04-23 17:42:20): [be[domain]] [ad_gpo_cse_step] (0x0400): [RID#7] cached_gpt_version: 393303
   *  (2025-04-23 17:42:20): [be[domain]] [create_cse_send_buffer] (0x4000): [RID#7] buffer size: 171
   *  (2025-04-23 17:42:20): [be[domain]] [child_handler_setup] (0x2000): [RID#7] Setting up signal handler up for pid [1564372]
   *  (2025-04-23 17:42:20): [be[domain]] [child_handler_setup] (0x2000): [RID#7] Signal handler set up for pid [1564372]
   *  (2025-04-23 17:42:20): [be[domain]] [child_sig_handler] (0x1000): [RID#7] Waiting for child [1564372].
   *  (2025-04-23 17:42:20): [be[domain]] [child_sig_handler] (0x0020): [RID#7] waitpid did not find a child with changed status.
********************** BACKTRACE DUMP ENDS HERE *********************************

From the process side I can see the following:

ps aux | grep Z
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     1292029  0.0  0.0      0     0 ?        Z    00:14   0:00 [ldap_child] <defunct>
root     1292030  0.0  0.0      0     0 ?        Z    00:14   0:00 [ldap_child] <defunct>

Reading the parents:

ps hoppid 1292029
1292028

ps hoppid 1292030
1292028

Querying the parent

ps aux | grep 1292028
root     1292028  0.0  0.5 166348 41596 ?        S    00:14   0:04 /usr/libexec/sssd/sssd_be --domain domain --logger=files

1292028 is the child of 1292027

ps aux | grep 1292027
root     1292027  0.0  0.1  74444 10244 ?        Ss   00:14   0:00 /usr/sbin/sssd -i --logger=files

Searching the logs I cannot find any mention of the PIDS 1292027 , 1292028, 1292029 or 1292030

I hope this helps

@alexey-tikhonov
Copy link
Member

This looks to be the problem.

Hm... no, it is not. Looks like that waitpid did not find a child with changed status was about 'gpo_child', not 'ldap_child'.

Would it be possible to enable 'debug_level = 9' in the domain section of sssd.conf, restart sssd / reproduce the issue and provide the log "around" starting this 'ldap_child' process that is later hung in Z state?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0