Andre Noll [Mon, 30 Mar 2015 16:20:16 +0000 (16:20 +0000)]
daemon.c: Open /dev/null read-write.
While daemonizing we redirect stdin, stdout and stderr to /dev/null,
which is considered good practice. We should, however, open these
two devices in read-write mode rather than read-only, since not being
able to write to stdout/stderr might confuse rsync and the hooks.
Andre Noll [Wed, 25 Feb 2015 10:15:33 +0000 (11:15 +0100)]
Improve signal handler.
The signal handler of dss has two issues: (a) it does not check the
return value of the write(2) call, and (b) it does not restore errno
on exit. The second issue might cause problems on systems where
write(2) sets errno also on success. Those problems would be very
hard to reproduce and debug. So it is probably a good idea to be
conservative here.
This commit fixes (a) by printing an error message and calling exit(3)
if the write to the signal pipe failed or resulted in a short write.
As for (b), we now save a copy of errno before the write(2) call,
and restore the old value on success.
Andre Noll [Fri, 12 Dec 2014 14:05:21 +0000 (15:05 +0100)]
Rework restart logic, introduce --max-errors.
It has happened several times in the past that dss made no progress
because the underlying rsync command terminates with exit code 13
(Errors with program diagnostics). Currently dss special cases this
exit code as a non-fatal error, i.e. it does not terminate but restarts
the rsync command after 60 seconds. If the problem is permanent,
no new snapshots will be created, but the exit hook is not called
either, which is unfortunate.
This commit tries to improve on this. With this patch applied, the
only non-fatal exit code from rsync is 24 (Partial transfer due
to vanished source files), which is actually considered success.
All other non-zero exit codes cause dss to restart the rsync command,
but only at most N times, where N is the argument given to the new
--max-rsync-errors option.
Andre Noll [Sat, 3 Jan 2015 16:11:06 +0000 (16:11 +0000)]
tv.c: Remove unused functions.
Quite a few public functions of tv.c are not used anywhere in dss,
so let's get rid of them. We can easily add them back in case they
are neeed in the future.
Andre Noll [Sat, 3 Jan 2015 16:05:58 +0000 (16:05 +0000)]
Remove non-functional SEE ALSO links from index.html.
It's nice to have references to ssh and rsync in the SEE ALSO section
of the man page. On the web page, however, they do not add much value
since the links generated by man2html do not work. This patch omits
the broken links.
Andre Noll [Wed, 24 Sep 2014 13:28:39 +0000 (15:28 +0200)]
index.html.in: Fix gitweb link.
Apparently the symlink workaround for the gitweb pages on
git.tuebingen.mpg.de does not work any more although the symlink
dss->dss.git is still in place.
This commit changes the link on the web page to include the .git
suffix.
Andre Noll [Tue, 18 Feb 2014 13:16:02 +0000 (14:16 +0100)]
Introduce --min-complete.
Currently dss cowardly refuses to remove the last complete snapshot
even if disk space is low, and fails if there is not enough disk space
left for a second snapshot. However, in some situations it is more
important to have a recent snapshot and to to keep dss up and running.
This commit introduces a new integer option, --min-complete, which
defaults to one to resemble the old behaviour.
If it is set to zero, dss will happily remove the last complete
snapshot, even if it is used as the reference directory for rsync's
--link-dest option. This is dangerous, but it's the only way to keep
dss going.
Conversely, --min-complete may be set to a value greater than one
to guarantee there is always a certain number of complete snapshots
available.
Andre Noll [Tue, 21 Jan 2014 15:56:37 +0000 (16:56 +0100)]
Silence clang warnings.
The -Wno-sign-compare option is supposed to not print the noisy
warnings for comparisons between signed and unsigned values.
Currently, in DEBUG_CFLAGS this option is followed by -W which causes
clang (but not gcc) to turn on these warnings again. As CFLAGS contains
-Wall, the -W option was redundant anyway, so this patch removes it.
Andre Noll [Wed, 16 Oct 2013 12:17:46 +0000 (14:17 +0200)]
Kill children on fatal errors.
If dss is about to die because it received SIGINT or SIGTERM, we first
restart the rsync process by sending SIGCONT, then send SIGTERM to
both the rsync and the rm process to get rid of any child processes.
This works fine, but there are other fatal errors for which we miss
to clean up as thoroughly, most importantly if there is not enough
free disk space for a single snapshot.
This patch moves the signal-related cleanup part to the new function
kill_children(), and changes handle_signal() and com_run() to call
this function right before the exit hook is invoked.
Andre Noll [Thu, 20 Dec 2012 13:38:41 +0000 (14:38 +0100)]
rsync: Remove hardcoded --quiet option.
When running in daemon mode, the stdout and stderr stream of dss and
all its child processes are redirected to /dev/null. In particular any
output from the rsync process is discarded. Therefore, whenever a new
snapshot is created, dss currently passes --quiet to the underlying
rsync command, along with --archive and --delete.
However, as was pointed out by Sebastian Schultheiß, if the rsync
command fails for unknown reasons, the --quiet option complicates
debugging for the questionable benefit of saving the I/O for a few
writes to /dev/null.
Andre Noll [Sun, 28 Oct 2012 19:11:16 +0000 (20:11 +0100)]
Reject insane number of intervals.
Nobody needs more than 2^30 snapshots. More importantly, values
larger than 32 for --num_intervals cause an integer overflow in
desired_number_of_snapshots() because the number of snapshots in
interval zero does not fit in an unsigned int in this case.
This patch adds a test to check_config() that rejects values larger
than 30 for the --num_intervals option.
Many thanks to Klaus Kopec for pointing out this bug.
Andre Noll [Mon, 1 Oct 2012 17:10:02 +0000 (19:10 +0200)]
Don't create two snapshots in the same second.
This can only happen if all of the follwing are true:
(a) source and destination directories are small
(b) rsync completes successfully within one second
(c) At most two snapshots are missing
In this case the rename() call which changes the snapshot name from
*-incomplete to the proper name fails for the second snapshot with
EEXIST. This is because the previous snapshot name coincides with
the name of the second snapshot.
The fix is a bit ugly but also non-invasive and simple: Just sleep
one second in this case.
Andre Noll [Wed, 8 Aug 2012 19:47:56 +0000 (21:47 +0200)]
Switch logo from skencil to dia.
The sketch/skencil project appears to be inactive for years, and
it is no longer shipped on recent Linux distributions. This commit
replaces the sketch source file dss.sk by dss.dia, a source file for
dia, an GTK+ based diagram creation program. The new logo looks very
similar to the old one but was created from scratch.
dia allows to convert a .dia file to PNG image data. This patch also
adjusts the Makefile to produce the dss.png logo from dss.dia.
Andre Noll [Sat, 11 Aug 2012 18:32:19 +0000 (20:32 +0200)]
Rename source files which also exist as system headers.
As pointed out by Daniel Richard G. some of the dss header files
are named the same as system header files.
This patch renames these headers as well as their corresponding .c
files. Specifically, error.h, fd.h, signal.h, string.h and time.h
become err.h, file.h, sig.h, str.h and tv.h.
Daniel Richard G [Fri, 10 Aug 2012 12:41:22 +0000 (14:41 +0200)]
Make the dss log facility C89 conform.
Variadic macros were introduced in C99, so they are not supported on
ANSI C compilers. Since currently all DSS_*_LOG macros are variadic,
we need a replacement for these. Moreover, since not all compilers
support __func__ or an equivalent, we need to check for this feature
as well and provide a workaround if necessary.
This patch introduces the new public function dss_log_set_params()
which saves the given log level, filename, line number and the
function name in global variables. The DSS_*_LOG macros are changed
to receive a single argument only, which is the usual variadic list,
enclosed in additional parentheses.
The new DSS_*_LOG macros first set the log parameters by calling
dss_log_set_params(), then call dss_log() with the variadic list as
the argument. dss_log() is patched to print the function name only
if __func__ is supported and fall back to file name and the line
number otherwise.
All DSS_*_LOG() calls are changed to the new syntax.
These gcc extensions help the compiler optimize function calls,
but are unavailable if dss is not compiled with gcc.
This patch defines the corresponding macros to empty if __GNUC__
is not defined, or if the gcc version is too old to support the
particular function attribute.
This changes the definition of DSS_ERRORS so that it includes the commas,
and removes the comma from both definitions of DSS_ERROR. This
avoids "comma after last element" warnings, which on some compilers
produces an error.
Per-element struct initializers are not supported in ANSI C. This
construct doesn't gain much in terms of readability, and breaks
compatibility with older/stricter compilers.
argv[] can't be declared in this way because the initializers are not
computable at compile time. GCC allows this construct, but stricter
compilers don't.
Andre Noll [Sat, 13 Nov 2010 19:51:28 +0000 (20:51 +0100)]
Add the --kill subcommand.
It works as follows: Whenever a semaphore operation is performed, the
PID of the process is stored in the sempid field of the semaphore.
This PID can be obtained from a different process by calling semctl
with the GETPID command.
com_kill() first tries to acquire the lock by calling the new
mutex_try_lock() function of ipc.c. In contrast to mutex_lock(),
mutex_try_lock() only operates on the first semaphore in the semaphore
set, leaving the sempid field of the second semaphore unchanged. If
mutex_try_lock() succeeds, no running dss process is holding the lock
and the kill command fails. Otherwise, some dss process is running
whose PID can be obtained by calling semctl() on the second semaphore.
Andre Noll [Sat, 13 Nov 2010 19:31:10 +0000 (20:31 +0100)]
Use semaphore locking to avoid starting dss multiple times.
It's trickier than one might expect but it is hopefully also much
better than any pidfile-based approach.
This patch adds ipc.c and ipc.h containing the public lock_dss()
function which acquires a semaphore-based lock whose key is based
on the hash of the resolved path name of the dss config file. This
allows different instances of dss to coexist.
All semaphore operations are called with both the SEM_UNDO and the
IPC_NOWAIT flag. SEM_UNDO guarantees that no stale lock remains after
dss was killed by SIGKIlL while IPC_NOWAIT makes the call to lock_dss()
fail if another process is already holding the lock.
The prune/create/run commands simply take the lock at startup and
exit if it could not be acquired.
The underlying semaphore set contains two semaphores. This is necessary
to implement the --kill subcommand which is done in a subsequent patch.
Andre Noll [Sat, 13 Nov 2010 19:21:04 +0000 (20:21 +0100)]
Introduce get_config_file_name().
ATM, the config file name is computed in parse_config(). However, for the ipc
stuff we'll need that information as well, so move the computation to a separate
helper function().
Andre Noll [Tue, 9 Nov 2010 17:43:49 +0000 (18:43 +0100)]
Change default program for all hooks from /bin/true to true.
At least on Mac OS, true is /usr/bin/true, not /bin/true. So the old default
/bin/true causes all hooks to fail on these systems. Since we execute external
programs via execvp() anyway, there is no need to hardcode the path.
Andre Noll [Fri, 14 May 2010 12:43:51 +0000 (14:43 +0200)]
Introduce snapshot recycling.
When snapshotting large file systems whose contents do not change much between
snapshots, we end up removing large amounts of files just to recreate (hard
links to) most of them afterwards. This patch changes snapshot creation so that
outdated, redundant and orphaned snapshots are reused as the basis for new
snapshots. Only if no existing snapshot is suitable for recycling, a new one is
created.
Andre Noll [Wed, 12 May 2010 09:00:55 +0000 (11:00 +0200)]
Unify sending of signals.
This patch introduces dss_kill(), a wrapper for kill(2) which
prints a nice log message and checks the return value of the
the underlying call to kill().
Andre Noll [Fri, 16 Apr 2010 11:39:28 +0000 (13:39 +0200)]
Invalidate create_pid if create process has died.
We're checking create_pid against zero at several places, for example before
sending a signal to the create process. So set create_pid is zero in
handle_sigchld() if the create process just died.
Andre Noll [Thu, 25 Mar 2010 13:49:47 +0000 (14:49 +0100)]
Reuse old rsync argv if rsync has to be restarted.
If rsync must be restarted due to an exit code of 12 or 13,
create_rsync_argv() was called even if the old rsync_argv should
be reused in this case. This (correctly) triggers the assertion
assert(!name_of_reference_snapshot);
in create_rsync_argv(). Fix this by not calling create_rsync_argv()
if there is a reference snapshot.
Andre Noll [Fri, 12 Mar 2010 14:47:07 +0000 (15:47 +0100)]
Avoid busy loop on rsync exit status 12 or 13.
Although we set the next snapshot time to now + 60 seconds in case
rsync exits with exit status 12 or 13, we miss to check this time
barrier in case the snapshot creation status is HS_NEEDS_RESTART.
Fix this by adding an additional check in the switch() statement
of the select loop. As this change would trigger the assertion
Andre Noll [Mon, 1 Feb 2010 09:21:35 +0000 (10:21 +0100)]
Introduce --no-resume.
If the dss daemon (or the rsync process) is killed while a snaphot
is being created, e.g. because of a server shutdown, the latest
snapshot remains incomplete until it is removed by the usual shapshot
pruning mechanism.
This patch changes the snapshot creation behaviour if the
most recently created snapshot happens to be incomplete and the
new --no-resume option is not given. In this case the directory
of the incomplete snapshot is reused as the destination directory
for the the new snapshot.
This change saves disk space and reduces the snapshot creation time,
depending of course on how far the previous rsync process got before
it was interrupted.
Andre Noll [Fri, 28 Aug 2009 13:23:57 +0000 (15:23 +0200)]
Properly invalidate create_pid also for the post-create hook.
If the process associated with the create_pid dies, handle_sigchld()
investigates snapshot_creation_status to tell whether the pre-create
hook, the rsync process or the post-create hook has died.
In the first two cases, handle_pre_create_hook_exit() and
handle_rsync_exit() are called, respectively. Both functions correctly
invalidate create_pid (by resetting it to zero).
However, the post-create hook handling code misses to reset
create_pid. This causes dss to send SIGTERM to this pid on exit,
which might be fatal as the pid might have been reassigned to some
unrelated process in the meanwhile.
Fix this bug by moving the invalidation of create_pid to the end of
the "if (pid == create_pid)" clause, which even saves a line of code.
Many thanks to Sebastian Stark who pointed out that bug.
Andre Noll [Fri, 28 Aug 2009 09:28:57 +0000 (11:28 +0200)]
Improve error diagnostics.
When parsing the command line options we must not error out if a
required option was not given because that option might be specified
in the config file. Therefore we have to call cmdline_parser_ext()
with params->check_required = 0.
However, if --config-file is not given and the default config file
(~/.dssrc) does not exist, we end up with no check for required
options at all.
In particular, if the required --dest-dir option is not given,
conf.dest_dir is NULL and we call chdir(NULL) which returns EBADADRESS
at least on Linux. This causes dss to print the error message
Aug 28 11:35:07 main: Bad address
which is not really helpful. Fix this shortcoming by calling
cmdline_parser_ext() _again_ if no config file was read by
parse_config_file(). This second call uses params->check_required =
1, so that a proper error message is printed if any required options
are missing.
Andre Noll [Fri, 28 Aug 2009 09:12:30 +0000 (11:12 +0200)]
Improve next_snapshot_is_due().
Currently it's a bit weird how next_snapshot_is_due() decides whether
the next snapshot time has to be (re-)computed:
On startup, next_snapshot_time is zero as it is declared
static.
next_snapshot_is_due() checks whether next_snapshot_time is
greater than the current time. If yes, then next_snapshot_time
needs not be updated and the function returns false.
Otherwise (e.g. if it is called for the first time),
next_snapshot_time is recomputed, next_snapshot_is_due()
checks again if it is greater than the current time and
returns false if it is, true otherwise.
Consequently, dss computes the next snapshot time twice per snapshot.
Moreover, it compares next_snapshot_time twice against the current time
where one comparison would suffice. The code is thus less efficient
and harder to understand than necessary. This patch addresses both
issues. It introduces the two trivial helper functions
next_snapshot_time_is_valid() and invalidate_next_snapshot_time().
The former function simply tests next_snapshot_time against zero. It
is called from next_snapshot_is_due(). If it returns false, the new
compute_next_snapshot_time() is called (which makes next_snapshot_time
valid). Next, the usual comparison against the current time is
performed.
invalidate_next_snapshot_time() sets next_snapshot_time to zero. It
is called from pre_create_hook() and from handle_sighup(), the latter
call is necessary because changes in the config file might lead to
different snapshot creation times.
Andre Noll [Thu, 27 Aug 2009 14:27:29 +0000 (16:27 +0200)]
Simplify computation of next snapshot time.
Using an int64_t rather than a struct timeval for the next snapshot time
makes the code simpler and more readable as we don't have to use the
tv_xxx() functions to perform manipulations.