Speeding up v.Reverse[v] or ListConvolve[v,v]

Question

What is the most efficient way to compute v.Reverse[v] or ListConvolve[v,v]? In particular, since most of the products occur twice, could there be a 2x speed improvement? My attempt:

f1[v_] := v . Reverse[v] f2[v_] := First@ListConvolve[v, v] f3[v_] := f3i[v, Length[v]] f3i[v_, s_?EvenQ] := With[{hs = s/2}, (2 Take[v, hs]) . Reverse[Take[v, -hs]]] f3i[v_, s_?OddQ] := f3i[v, s - 1] + v[[(s + 1)/2]]^2 f4[v_] := f4i[v, Length[v]] f4i[v_, s_?EvenQ] := With[{hs = s/2}, First@ListConvolve[2 Take[v, hs], Take[v, -hs]]] f4i[v_, s_?OddQ] := f4i[v, s - 1] + v[[(s + 1)/2]]^2

First, a sanity check:

v = {{}, {a}, {a, b}, {a, b, c}, {a, b, c, d}}; {f1[#], f1[#] === f2[#] === f3[#] === f4[#]} & /@ v

{{0, True}, {a^2, True}, {2 a b, True}, {b^2 + 2 a c, True}, {2 b c + 2 a d, True}}

Timing on symbols:

v = Table[StringForm["a``", i], {i, 5000000}]; First@RepeatedTiming[f1[v]] First@RepeatedTiming[f2[v]] First@RepeatedTiming[f3[v]] First@RepeatedTiming[f4[v]]

2.16298 $\quad$ (plain v.Reverse[v])
2.48454 $\quad$ (plain ListConvolve[v,v])
1.56501 $\quad$ (modified v.Reverse[v])
1.43643 $\quad$ (modified ListConvolve[v,v])

Not a 2x improvement, but an improvement nonetheless! Notice how plain v.Reverse[v] beats ListConvolve[v,v], but the opposite is true with the modified versions. Why?

Timing on integers:

v = Table[RandomInteger[100000], 5000000]; First@RepeatedTiming[f1[v]] First@RepeatedTiming[f2[v]] First@RepeatedTiming[f3[v]] First@RepeatedTiming[f4[v]]

0.00252993 $\quad$ (plain v.Reverse[v])
0.563211 $\quad$ (plain ListConvolve[v,v])
0.00335379 $\quad$ (modified v.Reverse[v])
0.287567 $\quad$ (modified ListConvolve[v,v])

With integers, v.Reverse[v] dramatically beats ListConvolve[v,v], but trying to improve v.Reverse[v] just makes it worse; on the other hand, the sought 2x improvement on ListConvolve[v,v] seems achievable in this case.

Timing on reals:

v = Table[RandomReal[], 5000000]; First@RepeatedTiming[f1[v]] First@RepeatedTiming[f2[v]] First@RepeatedTiming[f3[v]] First@RepeatedTiming[f4[v]]

0.00255105 $\quad$ (plain v.Reverse[v])
0.0221051 $\quad$ (plain ListConvolve[v,v])
0.00357253 $\quad$ (modified v.Reverse[v])
0.0133805 $\quad$ (modified ListConvolve[v,v])

We see v.Reverse[v] speed is the same for integers and reals (and again the attempt at improving it just makes it worse), but ListConvolve[v,v] is much faster with reals than integers, and can again be improved, but that improvement is no longer 2x.

Question/Challenge: What's happening under the hood to explain all this? Can you construct a variant of v.Reverse[v] that's twice as fast on integers and reals?

Try f4i with With[{hs = s/2}, 2 First@ListConvolve[Take[v, hs], Take[v, -hs]]]. Similarly for f3i. — Michael E2, Commented3 hours ago

Henrik Schumacher · Accepted Answer · 2025-04-26 19:44:36Z

For standard types like Real, Integer, and Complex you can use Compile to create optimized versions.

Here is a variant that does that for all three types.

ClearAll[cReverseDot]; cReverseDot[type : Real | Integer | Complex] := cReverseDot[type] = With[{ T = Blank[type], init = Switch[type, Integer, 0, Real, 0., Complex, 0. + 0. I], two = Switch[type, Integer, 2, Real, 2., Complex, 2.] }, Compile[{{v, T, 1}}, Module[{n, sum, k}, n = Length[v]; k = Quotient[n, 2]; sum = If[n == 2 k, init, Compile`GetElement[v, k + 1] Compile`GetElement[v, k + 1]]; Do[ sum += two (Compile`GetElement[v, i] Compile`GetElement[v, n + 1 - i]), {i, k, 1, -1} ]; sum ], CompilationTarget -> "C", RuntimeAttributes -> {Listable}, Parallelization -> True, RuntimeOptions -> "Speed" ] ];

The first run in each Mathematica session will take longer because the function has to be compiled once. Afterwards it is often 10-20 times faster than ordinary Mathematica code.

Here a usage example that highlights that the function is also Listable to some extent:

m = 1000000; n = 10; X = RandomReal[{-5, 5}, {m, n}]; Yc = cReverseDot[Real][X]; // AbsoluteTiming // First Y1 = f1 /@ X; // AbsoluteTiming // First Max[Abs[Y1 - Yc]]/Max[Abs[Y1]]

0.023744
0.593634
3.15487*10^-16

So cReverseDot is 25 times faster, when applied to many short vectors. It won't be better on very long vectors, though.

If the vector length is known already at compile time and if it is not too large, then one can improve on this as follows:

ClearAll[cReverseDotFixedSize]; cReverseDotFixedSize[type : Real | Integer | Complex, size_Integer] := cReverseDotFixedSize[type, size] = With[{ T = Blank[type], init = Switch[type, Integer, 0, Real, 0., Complex, 0. + 0. I], two = Switch[type, Integer, 2, Real, 2., Complex, 2.], k = Quotient[size, 2], n = size }, If[OddQ[n], Compile[{{v, T, 1}}, Module[{sum}, sum = Compile`GetElement[v, k + 1] Compile`GetElement[v, k + 1]; Do[ sum += two (Compile`GetElement[v, i] Compile`GetElement[v, n + 1 - i]), {i, k, 1, -1} ]; sum ], CompilationTarget -> "C", RuntimeAttributes -> {Listable}, Parallelization -> True, RuntimeOptions -> "Speed" ], Compile[{{v, T, 1}}, Module[{sum}, sum = init; Do[ sum += two (Compile`GetElement[v, i] Compile`GetElement[v, n + 1 - i]), {i, k, 1, -1} ]; sum ], CompilationTarget -> "C", RuntimeAttributes -> {Listable}, Parallelization -> True, RuntimeOptions -> "Speed" ] ] ];

Now

Yc = cReverseDot[Real][X]; // AbsoluteTiming // First Yf = cReverseDotFixedSize[Real, n][X]; // AbsoluteTiming // First Max[Abs[Yf - Yc]]/Max[Abs[Yc]]

0.030696
0.007259
0.

So the fixed-size version is about 4.2 times faster. This is because fixed vector size allows the compiler to write shorter code with less control constructs and it makes it also easier to vectorize the code.

It may interest the OP, and maybe others, that # . Reverse[#] & /@ X is quite a bit faster than f1 /@ X (auto-compilation). — Michael E2, Commented4 hours ago

Stack Exchange Network

Speeding up v.Reverse[v] or ListConvolve[v,v]

1 Answer 1

Hot Network Questions

Speeding up v.Reverse[v] or ListConvolve[v,v]

1 Answer 1

Related

Hot Network Questions