Following a recent discussion on lkml, the choice of the initial
value for the square root is sub-optimal. The change introduced
in this commit was proposed by Peter Zijlstra who also measured a
significant speed improvement for both the hot and the cold cache case.
The speed improvements for the hot-cache case were confirmed on a
32 bit system by running a simple test program which calculates the
square root of
10000000 random numbers. With the new initial value,
the running time went down by 23%. This matters because when a new
mood is loaded, int_sqrt() is called four times per admissible file.
The new initial value is computed in terms of the position of the
most significant bit set in the given argument to int_sqrt(). While
ffs(3) (find first set bit) is in POSIX.1‐2008, there is no fls(3)
(find last set bit), so we have to introduce our own implementation.
We chose an open-coded version because this turned out to be faster
than reversing the bits and calling ffs(3).
*/
static struct mood *current_mood;
+/*
+ * Find the position of the most-significant set bit.
+ *
+ * Copied and slightly adapted from the linux source tree, version 4.9.39
+ * (2017-07).
+ */
+__a_const static uint32_t fls64(uint64_t v)
+{
+ int n = 63;
+ const uint64_t ones = ~(uint64_t)0U;
+
+ if ((v & (ones << 32)) == 0) {
+ n -= 32;
+ v <<= 32;
+ }
+ if ((v & (ones << (64 - 16))) == 0) {
+ n -= 16;
+ v <<= 16;
+ }
+ if ((v & (ones << (64 - 8))) == 0) {
+ n -= 8;
+ v <<= 8;
+ }
+ if ((v & (ones << (64 - 4))) == 0) {
+ n -= 4;
+ v <<= 4;
+ }
+ if ((v & (ones << (64 - 2))) == 0) {
+ n -= 2;
+ v <<= 2;
+ }
+ if ((v & (ones << (64 - 1))) == 0)
+ n -= 1;
+ return n;
+}
+
/*
* Rough approximation to sqrt.
*
op = x;
res = 0;
- one = one << 62;
- while (one > op)
- one >>= 2;
-
+ one = one << (fls64(x) & ~one);
while (one != 0) {
if (op >= res + one) {
op = op - (res + one);