This was not a good idea because ftok(3) hashes, among other
information, the inode number of the file, and this number changes
every time the configuration file is edited.
The revert conflicted slightly to the commit which renamed
get_key_or_die() to get_key() and changed the type of the return
value to key_t, but the conflict was easy to resolve.
Andre Noll [Sun, 16 Apr 2017 10:48:58 +0000 (12:48 +0200)]
ipc: Improve error diagnostics for kill.
If dss is not running, the kill command prints "No such file or
directory" because the call to semget(2) fails with ENOENT. This
message is a bit misleading, so let's return -E_NOT_RUNNING in this
case instead.
Andre Noll [Fri, 17 Feb 2017 14:40:58 +0000 (15:40 +0100)]
ipc: Prefer key_t over int for System V IPC keys.
get_key() calls ftok(3), which returns a key_t value. key_t is also
the type which semget(2), the only function which receives the key via
mutex_get(), expects. It's stupid to convert the key_t from ftok(3)
into an int, only to convert it back to key_t later.
This patch changes ipc.c to use key_t everywhere. However, in
mutex_get() we print a log message containing the value of the key,
so the format string must be adjusted accordingly. Unfortunately,
on Linux, key_t is the same as int while on FreeBSD and NetBSD it is
defined as long. To avoid a warning from the compiler we use "%lx"
in the format string and cast the value to long.
Andre Noll [Fri, 17 Feb 2017 14:29:52 +0000 (15:29 +0100)]
ipc.c: Use ftok() instead of SuperFastHash.
ftok(3) uses the identity of the named file to generate a key_t type
System V IPC key, which is easier than computing the key by hashing
the (resolved) pathname of the config file. This change allows to
get rid of the realpath() and the super_fast_hash() implementation.
If ftok(3) fails, presumably because the underlying call to stat(2)
fails, we now simply return a phony identifier, similar to what we did
before in this case. This eliminates the only possible failure path
in get_key_or_die(), so this function is renamed to get_key().
Andre Noll [Fri, 17 Jun 2016 07:18:40 +0000 (09:18 +0200)]
ipc: Simplify mutex_try_lock().
There is no need to actually obtain the lock. A single semaphore
operation will do just fine. With sem_op equal to zero and IPC_NOWAIT
the semop() call returns immediately, and the return value tells
whether the semaphore value was zero.
Rename the (static) function to mutex_is_locked() to indicate that
it performs only read-only operations on the semaphore set.
Andre Noll [Thu, 16 Jun 2016 21:06:29 +0000 (23:06 +0200)]
ipc: Make pid pointer optional.
This changes get_dss_pid() to handle the case where the caller passed
a NULL pid pointer. Conversely, if pid is not NULL, we now make sure
to initialize the given address in all cases.
The single caller currently never passes NULL, so this change is just
defensive programming, protecting against future users. Be liberal
in what you accept, be strict in what you return..
Andre Noll [Tue, 7 Jun 2016 14:23:36 +0000 (16:23 +0200)]
build: Add two more warning options.
Both -Wunused-parameter and -Wshadow were added to gcc long ago. In
particular gcc-4.6.3, which ships with Ubuntu-12.04, supports them. It
should thus be safe to enable both warnings unconditionally.
Andre Noll [Tue, 7 Jun 2016 14:29:38 +0000 (16:29 +0200)]
dss.c: Add missing inclusion of <stdio.h>.
This is required for example for rename(2). Compilation succeeds without
the include only because the gengetopt header includes stdio.h as well and
we happen to include this header before fd.h.
Andre Noll [Mon, 16 May 2016 12:55:28 +0000 (14:55 +0200)]
dss: Make argument of parse_config_file() a boolean.
It is used as such, so there is no point to have an int here. Also
rename the argument from override to sighup to indicate that we
need to distinguish whether the function is called at startup or
because the dss process received SIGHUP.
Andre Noll [Fri, 17 Jun 2016 08:17:28 +0000 (10:17 +0200)]
dss: Do not shadow a global declaration.
num_complete_snapshots is a local variable in
compute_next_snapshot_time(), but also the name of a public function
declared in snap.h, causing a warning on some (old) gcc versions.
This patch avoids the ambiguity and thus the warning by renaming the
variable. It was unusually long anyway.
Andre Noll [Mon, 20 Jun 2016 14:34:37 +0000 (16:34 +0200)]
Create html version of the man page with groff.
The html post processor of groff can directly create html, which is
expected to be of higher quality than the html generated by man2html. A
brief look at the "official" web site of man2html (as mentioned in the
description of the Ubuntu-14.04 package)
Andre Noll [Mon, 20 Jun 2016 14:05:31 +0000 (16:05 +0200)]
Convert INSTALL and NEWS to markdown format.
The grutatxt project is dead, so we have to switch to something else
eventually. Fortunately, there are only three files in grutatxt format,
one of which (README) does not need any changes. The other two are
converted to markdown format in this commit. This is a rather simple
matter since only section headings, links and preformatted text need
slight adjustments.
The commands in the Makefile are modified to run markdown(1) instead
of grutatxt(1).
Andre Noll [Mon, 13 Jun 2016 15:54:37 +0000 (17:54 +0200)]
Fix rsync exit handling in create mode.
The logic in handle_rsync_exit() is horribly broken in case dss is
run in create mode and the rsync process terminates unsuccessfully.
First we claim to restart rsync, which is wrong. Next we call the
post-create hook despite the documentation says that this hook is
only run on *successful* termination. Finally, we dereference a NULL
pointer to print the path of the snapshot.
Fortunately, all three issues are easy to fix by special casing create
mode in handle_rsync_exit().
Andre Noll [Tue, 29 Dec 2015 15:28:02 +0000 (15:28 +0000)]
Always try to keep one snapshot for recycling.
Currently, if --keep-redundant is not given, we try to get rid of
outdated and redundant snapshots quickly, even if there is plenty of
free disk space available. However, as these snapshots can be used
for recycling, it seems to be worth to keep them around as long as
there are fewer snapshots available as configured.
This commit changes try_to_free_disk_space() to not remove snapshots
any more in this case. This patch should reduce disk I/O in the common
case where no snapshots need to be removed due to low disk space.
Andre Noll [Tue, 29 Dec 2015 16:10:05 +0000 (16:10 +0000)]
Allow to run in daemon mode without log file.
It's kind of silly to insist in having a log file in daemon mode.
This commit removes the dependency of --daemon on --logfile and makes
/dev/null the default log file. Consequently, running dss --daemon
--run without specifying --logfile no longer fails, and nothing will
be logged by default.
Andre Noll [Tue, 29 Dec 2015 16:42:18 +0000 (16:42 +0000)]
Improve documentation of --keep-redundant.
The help text for --keep-redundant was rather convoluted. This commit
shortens the text with no essential semantic change.
The patch also removes the sentence that encourages to specify this
option if the destination directory is only used for snapshots. After
all, most file systems allow to create an insane number of files,
so keeping snapshots around forever can result in a file system that
can no longer be checked or repaired due to the excessive number of
used inodes.
Andre Noll [Tue, 29 Dec 2015 15:52:32 +0000 (15:52 +0000)]
Improve documentation of interval-related args.
Minor rewording of the help text for the --unit-interval option and
a new sentence which explains that the total number of snapshots
doubles if --num-intervals is increased by one.
Andre Noll [Wed, 16 Dec 2015 13:52:54 +0000 (14:52 +0100)]
README: Explain that there are no incremental backups.
This was unclear to an admin who had used dss for several years! So
maybe it is a good idea to explain the idea behind hardlink-based
backups a bit more.
This commit adds two new sentences to README, one for the admin and
another one for the user.
Andre Noll [Mon, 30 Mar 2015 16:20:16 +0000 (16:20 +0000)]
daemon.c: Open /dev/null read-write.
While daemonizing we redirect stdin, stdout and stderr to /dev/null,
which is considered good practice. We should, however, open these
two devices in read-write mode rather than read-only, since not being
able to write to stdout/stderr might confuse rsync and the hooks.
Andre Noll [Wed, 25 Feb 2015 10:15:33 +0000 (11:15 +0100)]
Improve signal handler.
The signal handler of dss has two issues: (a) it does not check the
return value of the write(2) call, and (b) it does not restore errno
on exit. The second issue might cause problems on systems where
write(2) sets errno also on success. Those problems would be very
hard to reproduce and debug. So it is probably a good idea to be
conservative here.
This commit fixes (a) by printing an error message and calling exit(3)
if the write to the signal pipe failed or resulted in a short write.
As for (b), we now save a copy of errno before the write(2) call,
and restore the old value on success.
Andre Noll [Fri, 12 Dec 2014 14:05:21 +0000 (15:05 +0100)]
Rework restart logic, introduce --max-errors.
It has happened several times in the past that dss made no progress
because the underlying rsync command terminates with exit code 13
(Errors with program diagnostics). Currently dss special cases this
exit code as a non-fatal error, i.e. it does not terminate but restarts
the rsync command after 60 seconds. If the problem is permanent,
no new snapshots will be created, but the exit hook is not called
either, which is unfortunate.
This commit tries to improve on this. With this patch applied, the
only non-fatal exit code from rsync is 24 (Partial transfer due
to vanished source files), which is actually considered success.
All other non-zero exit codes cause dss to restart the rsync command,
but only at most N times, where N is the argument given to the new
--max-rsync-errors option.
Andre Noll [Sat, 3 Jan 2015 16:11:06 +0000 (16:11 +0000)]
tv.c: Remove unused functions.
Quite a few public functions of tv.c are not used anywhere in dss,
so let's get rid of them. We can easily add them back in case they
are neeed in the future.
Andre Noll [Sat, 3 Jan 2015 16:05:58 +0000 (16:05 +0000)]
Remove non-functional SEE ALSO links from index.html.
It's nice to have references to ssh and rsync in the SEE ALSO section
of the man page. On the web page, however, they do not add much value
since the links generated by man2html do not work. This patch omits
the broken links.
Andre Noll [Wed, 24 Sep 2014 13:28:39 +0000 (15:28 +0200)]
index.html.in: Fix gitweb link.
Apparently the symlink workaround for the gitweb pages on
git.tuebingen.mpg.de does not work any more although the symlink
dss->dss.git is still in place.
This commit changes the link on the web page to include the .git
suffix.
Andre Noll [Tue, 18 Feb 2014 13:16:02 +0000 (14:16 +0100)]
Introduce --min-complete.
Currently dss cowardly refuses to remove the last complete snapshot
even if disk space is low, and fails if there is not enough disk space
left for a second snapshot. However, in some situations it is more
important to have a recent snapshot and to to keep dss up and running.
This commit introduces a new integer option, --min-complete, which
defaults to one to resemble the old behaviour.
If it is set to zero, dss will happily remove the last complete
snapshot, even if it is used as the reference directory for rsync's
--link-dest option. This is dangerous, but it's the only way to keep
dss going.
Conversely, --min-complete may be set to a value greater than one
to guarantee there is always a certain number of complete snapshots
available.
Andre Noll [Tue, 21 Jan 2014 15:56:37 +0000 (16:56 +0100)]
Silence clang warnings.
The -Wno-sign-compare option is supposed to not print the noisy
warnings for comparisons between signed and unsigned values.
Currently, in DEBUG_CFLAGS this option is followed by -W which causes
clang (but not gcc) to turn on these warnings again. As CFLAGS contains
-Wall, the -W option was redundant anyway, so this patch removes it.
Andre Noll [Wed, 16 Oct 2013 12:17:46 +0000 (14:17 +0200)]
Kill children on fatal errors.
If dss is about to die because it received SIGINT or SIGTERM, we first
restart the rsync process by sending SIGCONT, then send SIGTERM to
both the rsync and the rm process to get rid of any child processes.
This works fine, but there are other fatal errors for which we miss
to clean up as thoroughly, most importantly if there is not enough
free disk space for a single snapshot.
This patch moves the signal-related cleanup part to the new function
kill_children(), and changes handle_signal() and com_run() to call
this function right before the exit hook is invoked.
Andre Noll [Thu, 20 Dec 2012 13:38:41 +0000 (14:38 +0100)]
rsync: Remove hardcoded --quiet option.
When running in daemon mode, the stdout and stderr stream of dss and
all its child processes are redirected to /dev/null. In particular any
output from the rsync process is discarded. Therefore, whenever a new
snapshot is created, dss currently passes --quiet to the underlying
rsync command, along with --archive and --delete.
However, as was pointed out by Sebastian Schultheiß, if the rsync
command fails for unknown reasons, the --quiet option complicates
debugging for the questionable benefit of saving the I/O for a few
writes to /dev/null.
Andre Noll [Sun, 28 Oct 2012 19:11:16 +0000 (20:11 +0100)]
Reject insane number of intervals.
Nobody needs more than 2^30 snapshots. More importantly, values
larger than 32 for --num_intervals cause an integer overflow in
desired_number_of_snapshots() because the number of snapshots in
interval zero does not fit in an unsigned int in this case.
This patch adds a test to check_config() that rejects values larger
than 30 for the --num_intervals option.
Many thanks to Klaus Kopec for pointing out this bug.
Andre Noll [Mon, 1 Oct 2012 17:10:02 +0000 (19:10 +0200)]
Don't create two snapshots in the same second.
This can only happen if all of the follwing are true:
(a) source and destination directories are small
(b) rsync completes successfully within one second
(c) At most two snapshots are missing
In this case the rename() call which changes the snapshot name from
*-incomplete to the proper name fails for the second snapshot with
EEXIST. This is because the previous snapshot name coincides with
the name of the second snapshot.
The fix is a bit ugly but also non-invasive and simple: Just sleep
one second in this case.
Andre Noll [Wed, 8 Aug 2012 19:47:56 +0000 (21:47 +0200)]
Switch logo from skencil to dia.
The sketch/skencil project appears to be inactive for years, and
it is no longer shipped on recent Linux distributions. This commit
replaces the sketch source file dss.sk by dss.dia, a source file for
dia, an GTK+ based diagram creation program. The new logo looks very
similar to the old one but was created from scratch.
dia allows to convert a .dia file to PNG image data. This patch also
adjusts the Makefile to produce the dss.png logo from dss.dia.
Andre Noll [Sat, 11 Aug 2012 18:32:19 +0000 (20:32 +0200)]
Rename source files which also exist as system headers.
As pointed out by Daniel Richard G. some of the dss header files
are named the same as system header files.
This patch renames these headers as well as their corresponding .c
files. Specifically, error.h, fd.h, signal.h, string.h and time.h
become err.h, file.h, sig.h, str.h and tv.h.
Daniel Richard G [Fri, 10 Aug 2012 12:41:22 +0000 (14:41 +0200)]
Make the dss log facility C89 conform.
Variadic macros were introduced in C99, so they are not supported on
ANSI C compilers. Since currently all DSS_*_LOG macros are variadic,
we need a replacement for these. Moreover, since not all compilers
support __func__ or an equivalent, we need to check for this feature
as well and provide a workaround if necessary.
This patch introduces the new public function dss_log_set_params()
which saves the given log level, filename, line number and the
function name in global variables. The DSS_*_LOG macros are changed
to receive a single argument only, which is the usual variadic list,
enclosed in additional parentheses.
The new DSS_*_LOG macros first set the log parameters by calling
dss_log_set_params(), then call dss_log() with the variadic list as
the argument. dss_log() is patched to print the function name only
if __func__ is supported and fall back to file name and the line
number otherwise.
All DSS_*_LOG() calls are changed to the new syntax.
These gcc extensions help the compiler optimize function calls,
but are unavailable if dss is not compiled with gcc.
This patch defines the corresponding macros to empty if __GNUC__
is not defined, or if the gcc version is too old to support the
particular function attribute.
This changes the definition of DSS_ERRORS so that it includes the commas,
and removes the comma from both definitions of DSS_ERROR. This
avoids "comma after last element" warnings, which on some compilers
produces an error.
Per-element struct initializers are not supported in ANSI C. This
construct doesn't gain much in terms of readability, and breaks
compatibility with older/stricter compilers.
argv[] can't be declared in this way because the initializers are not
computable at compile time. GCC allows this construct, but stricter
compilers don't.
Andre Noll [Sat, 13 Nov 2010 19:51:28 +0000 (20:51 +0100)]
Add the --kill subcommand.
It works as follows: Whenever a semaphore operation is performed, the
PID of the process is stored in the sempid field of the semaphore.
This PID can be obtained from a different process by calling semctl
with the GETPID command.
com_kill() first tries to acquire the lock by calling the new
mutex_try_lock() function of ipc.c. In contrast to mutex_lock(),
mutex_try_lock() only operates on the first semaphore in the semaphore
set, leaving the sempid field of the second semaphore unchanged. If
mutex_try_lock() succeeds, no running dss process is holding the lock
and the kill command fails. Otherwise, some dss process is running
whose PID can be obtained by calling semctl() on the second semaphore.
Andre Noll [Sat, 13 Nov 2010 19:31:10 +0000 (20:31 +0100)]
Use semaphore locking to avoid starting dss multiple times.
It's trickier than one might expect but it is hopefully also much
better than any pidfile-based approach.
This patch adds ipc.c and ipc.h containing the public lock_dss()
function which acquires a semaphore-based lock whose key is based
on the hash of the resolved path name of the dss config file. This
allows different instances of dss to coexist.
All semaphore operations are called with both the SEM_UNDO and the
IPC_NOWAIT flag. SEM_UNDO guarantees that no stale lock remains after
dss was killed by SIGKIlL while IPC_NOWAIT makes the call to lock_dss()
fail if another process is already holding the lock.
The prune/create/run commands simply take the lock at startup and
exit if it could not be acquired.
The underlying semaphore set contains two semaphores. This is necessary
to implement the --kill subcommand which is done in a subsequent patch.
Andre Noll [Sat, 13 Nov 2010 19:21:04 +0000 (20:21 +0100)]
Introduce get_config_file_name().
ATM, the config file name is computed in parse_config(). However, for the ipc
stuff we'll need that information as well, so move the computation to a separate
helper function().