From a16741dcf32b9ed20ca572d6353c3f1b9212a464 Mon Sep 17 00:00:00 2001 From: Andre Noll Date: Tue, 21 Sep 2021 14:09:39 +0200 Subject: [PATCH] server: Wait for command handler exit also when afs dies. When para_server is running in foreground mode in a terminal session, and gets signalled by hitting CTRL+C, it is unspecified whether the server or the afs process receive the resulting SIGINT first. It may even happen that the afs process dies first, and that the server sees the resulting SIGCHLD *before* the SIGINT. In this case we currently don't wait for the command handlers to exit but proceed right away with the shutdown, closing the signal pipe and destroying the shared memory area which contains the mmd structure. This leads to error messages on shutdown such as Sep 21 12:38:18 (5) (29166) para_semop: semaphore set 12648470 was removed Sep 21 12:38:18 (6) (29166) para_semop: fatal semop error Invalid argument: pid 29166 Sep 21 12:38:18 (6) (29161) generic_signal_handler: Bad file descriptor Sep 21 12:38:18 (6) (29164) para_semop: fatal semop error Invalid argument: pid 29164 Sep 21 12:38:18 (6) (29165) command_handler_sighandler: terminating on signal 15 Sep 21 12:38:18 (6) (29165) para_semop: fatal semop error Invalid argument: pid 29165 This commit avoids the issue by letting the server wait for all its children also in the SIGCHILD case when we exit because the afs process has terminated. --- server.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/server.c b/server.c index c86778cd..09087f7a 100644 --- a/server.c +++ b/server.c @@ -298,14 +298,14 @@ static int signal_post_select(struct sched *s, __a_unused void *context) if (pid != afs_pid) continue; PARA_EMERG_LOG("fatal: afs died\n"); - kill(0, SIGTERM); - goto cleanup; + goto genocide; } break; /* die on sigint/sigterm. Kill all children too. */ case SIGINT: case SIGTERM: PARA_EMERG_LOG("terminating on signal %d\n", signum); +genocide: kill(0, SIGTERM); /* * We must wait for all of our children to die. For the afs @@ -320,7 +320,6 @@ static int signal_post_select(struct sched *s, __a_unused void *context) while (wait(NULL) != -1 || errno != ECHILD) ; /* still at least one child alive */ mutex_lock(mmd_mutex); -cleanup: free(mmd->afd.afhi.chunk_table); task_notify_all(s, E_DEADLY_SIGNAL); return -E_DEADLY_SIGNAL; -- 2.39.5