When para_server is running in foreground mode in a terminal session,
and gets signalled by hitting CTRL+C, it is unspecified whether the
server or the afs process receive the resulting SIGINT first. It may
even happen that the afs process dies first, and that the server sees
the resulting SIGCHLD *before* the SIGINT.
In this case we currently don't wait for the command handlers to exit
but proceed right away with the shutdown, closing the signal pipe and
destroying the shared memory area which contains the mmd structure.
This leads to error messages on shutdown such as
Sep 21 12:38:18 (5) (29166) para_semop: semaphore set
12648470 was removed
Sep 21 12:38:18 (6) (29166) para_semop: fatal semop error Invalid argument: pid 29166
Sep 21 12:38:18 (6) (29161) generic_signal_handler: Bad file descriptor
Sep 21 12:38:18 (6) (29164) para_semop: fatal semop error Invalid argument: pid 29164
Sep 21 12:38:18 (6) (29165) command_handler_sighandler: terminating on signal 15
Sep 21 12:38:18 (6) (29165) para_semop: fatal semop error Invalid argument: pid 29165
This commit avoids the issue by letting the server wait for all
its children also in the SIGCHILD case when we exit because the afs
process has terminated.
if (pid != afs_pid)
continue;
PARA_EMERG_LOG("fatal: afs died\n");
- kill(0, SIGTERM);
- goto cleanup;
+ goto genocide;
}
break;
/* die on sigint/sigterm. Kill all children too. */
case SIGINT:
case SIGTERM:
PARA_EMERG_LOG("terminating on signal %d\n", signum);
+genocide:
kill(0, SIGTERM);
/*
* We must wait for all of our children to die. For the afs
while (wait(NULL) != -1 || errno != ECHILD)
; /* still at least one child alive */
mutex_lock(mmd_mutex);
-cleanup:
free(mmd->afd.afhi.chunk_table);
task_notify_all(s, E_DEADLY_SIGNAL);
return -E_DEADLY_SIGNAL;