Computation of inter-grid distance array for 2D lattice

Question

I have a 2D lattice with U × U number of grids. My goal is to create an array – of shape U^2 × U^2 – where each entry denotes the least-distance between each grid and any other grid.

Currently, for U=60, my Python script takes ~5 minutes to complete. As a novice programmer, I suspect there must be a more efficient approach – I've read about performance issues related to numpy loops. My objective is to handle U=1000.

import numpy as np np.random.seed(13) import matplotlib.pyplot as plt import time start_time = time.time() U = 60 def FromGridToList(i_, j_): return i_*U + j_ def FromListToGrid(l_): i = np.floor(l_/U) j = l_ - i*U return np.array((i,j)) Ulist = range(U**2) dist_array = [] for l in Ulist: print l, 'of', U**2 di = np.array([np.abs(FromListToGrid(l)[0]-FromListToGrid(i)[0]) for i, x in enumerate(Ulist)]) di = np.minimum(di, U-di) dj = np.array([np.abs(FromListToGrid(l)[1]-FromListToGrid(i)[1]) for i, x in enumerate(Ulist)]) dj = np.minimum(dj, U-dj) d = np.sqrt(di**2+dj**2) dist_array.append(d) dist_array = np.vstack(dist_array) print time.time() - start_time, 'seconds'

You want to make a 1000² x 1000² array? Do you realize that'll take 1-8 TB of memory? Good luck... — Veedrac, CommentedMar 29, 2015 at 9:44

Community · Accepted Answer · 2017-04-13 12:40:57Z

1. Introduction

Veedrac is totally correct: you can't expect to compute a \$ 1000^2 \$ by \$ 1000^2 \$ array of distances: this would have \$ 10^{12} \$ elements, and if each element is a double-precision floating-point number then that would require 8 terabytes of memory.

So you need to reconsider your approach to whatever it is you are trying to do. You're going to have to compute these distances when you use them, instead of precomputing an array. Unfortunately, you haven't explained what you are trying to do, so we can't help. The only thing we can do, given what you've presented, is show you how to compute your result a bit faster and a bit tidier.

2. Review

There are no docstrings or comments. What do your functions compute? What is the meaning of variables like U? It's very hard to provide advice about code like this when the purpose is unclear.
The use of the print statement means that the code is not portable to Python 3. Better to use the print function.
Python has a built-in module timeit for timing execution of code. It's better to use this instead of writing your own.
Code at the top level of a module makes it hard to test a program or time its execution. It would be better to put this code into a function, for example:
```
def dist_array(U=60): Ulist = range(U**2) a = [] ... return np.vstack(a) 
```
Then in the interactive interpreter you could run:
```
>>> from timeit import timeit >>> timeit(dist_array, number=1) 554.5194303740282 
```

3. Performance

As always when speeding up NumPy code, you should scrutinize every Python loop—that is, every for, while, generator and comprehension—to see if the computation can be vectorized, that is, transformed into a sequence of NumPy whole-array operations. The usual approach is to start by removing the innermost loops and work outwards.

So let's start with the group of lines:

di = np.array([np.abs(FromListToGrid(l)[0]-FromListToGrid(i)[0]) for i, x in enumerate(Ulist)]) di = np.minimum(di, U-di) dj = np.array([np.abs(FromListToGrid(l)[1]-FromListToGrid(i)[1]) for i, x in enumerate(Ulist)]) dj = np.minimum(dj, U-dj) d = np.sqrt(di**2+dj**2)

I'm going to give speedups in terms of percentages of the runtime for the original version of the code.

Note that x is unused, so we don't need the enumerate, we can just write for i in range(len(Ulist)) which is the same as for i in Ulist. This saves about 2%, bringing the runtime down to 98%.

Expand the function FromListToGrid inline:

di = np.array([np.abs(np.floor(l / U) - np.floor(i / U)) for i in Ulist]) di = np.minimum(di, U - di) dj = np.array([np.abs(l - np.floor(l / U) * U - i + np.floor(i / U) * U) for i in Ulist]) dj = np.minimum(dj, U - dj) d = np.sqrt(di ** 2 + dj ** 2)

This brings the runtime down to 32%.

Note that l does not depend on i so some of the computation can be moved out of the loops:

zi = np.floor(l / U) di = np.array([np.abs(zi - np.floor(i / U)) for i in Ulist]) di = np.minimum(di, U - di) zj = l - zi * U dj = np.array([np.abs(zj - i + np.floor(i / U) * U) for i in Ulist]) dj = np.minimum(dj, U - dj) d = np.sqrt(di**2+dj**2)

This brings the runtime down to 22%.

Avoid the list comprehensions altogether using NumPy whole-array operations:

Uarray = np.arange(U ** 2) zi = np.floor(l / U) di = np.abs(zi - np.floor(Uarray / U)) di = np.minimum(di, U - di) zj = l - zi * U dj = np.abs(zj - Uarray + np.floor(Uarray / U) * U) dj = np.minimum(dj, U - dj) d = np.sqrt(di ** 2 + dj ** 2)

This brings the runtime down to 0.4%.

Move the computation of anything that does not depend on l out of the loop over l. Also, use numpy.hypot instead of np.sqrt(di ** 2 + dj ** 2):

a = [] Uarray = np.arange(U ** 2) Ui = np.floor(Uarray / U) Uj = Ui * U - Uarray for l in Uarray: zi = np.floor(l / U) di = np.abs(zi - Ui) di = np.minimum(di, U - di) zj = l - zi * U dj = np.abs(zj + Uj) dj = np.minimum(dj, U - dj) d = np.hypot(di, dj) a.append(d)

This brings the runtime down to 0.2%.

Now we can start to remove the loop over l. This can be done in stages, lifting one line at a time out of the loop (and checking that the results are correct as we go). To show how this works, let's lift the computation of di out of the loop. We have:
```
for l in Uarray: zi = np.floor(l / U) di = np.abs(zi - Ui) di = np.minimum(di, U - di) # etc. 
```
We can easily compute all the values of zi at once, like this:
```
Zi = np.floor(Uarray / U) for l in Uarray: di = np.abs(Zi[l] - Ui) di = np.minimum(di, U - di) # etc. 
```
Now we observe that Zi is just the same as Ui, so that becomes:
```
for l in Uarray: di = np.abs(Ui[l] - Ui) di = np.minimum(di, U - di) # etc. 
```
Next we lift Ui[l] - Ui out of the loop. This expression computes the pairwise differences between Ui[l] and all elements in Ui. NumPy's "broadcasting" mechanism means that we can do this in one step, for all values of l, just by indexing Ui in two different ways, like this:
```
Ui.reshape(-1, 1) - Ui.reshape(1, -1) 
```
In fact, broadcasting means that we don't even need the second reshape; this is the same as:
```
Ui.reshape(-1, 1) - Ui 
```
Putting that together, we now have:
```
di = np.abs(Ui.reshape(-1, 1) - Ui) di = np.minimum(di, U - di) for l in Uarray: # etc. 
```

We can keep applying this process to lift all the computation out of the loop (I won't go into the details), resulting in this:

def dist_array(U=60): Uarray = np.arange(U ** 2) Ui = np.floor(Uarray / U) di = np.abs(Ui.reshape(-1, 1) - Ui) di = np.minimum(di, U - di) Uj = Ui * U - Uarray dj = np.abs(Uarray.reshape(-1, 1) - Ui.reshape(-1, 1) * U + Uj) dj = np.minimum(dj, U - dj) return np.hypot(di, dj)

There's probably a little bit more time to be squeezed out by using NumPy's out= parameter to reduce the memory usage, but I'll stop here.

This is more than 500 times as fast as the original code, and computes the U=60 case in about a second on my laptop.

4. Simplifying

The computation of dj now looks like this:

 Uj = Ui * U - Uarray dj = np.abs(Uarray.reshape(-1, 1) - Ui.reshape(-1, 1) * U + Uj)

If we do the multiplication before the reshape, this becomes:

 dj = np.abs((Uarray - Ui * U).reshape(-1, 1) + Uj)

which is:

 dj = np.abs(-Uj.reshape(-1, 1) + Uj)

and since we're taking the absolute value, that's:

 dj = np.abs(Uj.reshape(-1, 1) - Uj)

which is symmetric with the computation of di.

We have:

 Uarray = np.arange(U ** 2) Ui = np.floor(Uarray / U) Uj = Ui * U - Uarray

but this is the same as:

 Uarray = np.arange(U ** 2) Ui, Uj = divmod(Uarray, U)

and this is the same as:

 Ui, Uj = np.mgrid[:U, :U] Ui = Ui.reshape(-1) Uj = Uj.reshape(-1)

(see numpy.mgrid).

Now that the computation of di and dj is so similar, it makes sense to extract the common code into a function, like this:

def dist_array(U=60): def d(i): i = np.abs(i.reshape(-1, 1) - i.reshape(-1)) return np.minimum(i, U - i) i, j = np.mgrid[:U, :U] return np.hypot(d(i), d(j))

Community · Accepted Answer · 2017-05-23 12:40:57Z

I think you’re placing too much reliance on the builtin numpy functions.

The numpy functions are generally fastest when you’re dealing with the builtin numpy types. If you’re applying numpy functions to builtin types (int, float, etc.), you take a performance hit on the conversions. See, for example, Are NumPy’s math functions faster than Python’s? on SO.

I’m certainly not saying that numpy is bad or slow, but you have to be careful when you use it. It can be very fast in the right situation (for example, I’ve been unable to improve the np.sqrt() line), but it shouldn’t be used without considering the alternatives.

I ran your script for U=25, and it took ~18.5s. By changing it to use the the Python math functions, I can make it quite a bit faster:

Return a tuple from FromListToGrid. By replacing the final line of this function to be
```
return (i, j) 
```
the time drops to ~11.8 seconds. Already a big improvement.
Use the floor function from the math module, not the numpy module. If I change your FromListToGrid function like this:
```
import math def FromListToGrid(l_): i = math.floor(l_/U) j = l_ - i*U return (i, j) 
```
then the time for U=25 drops to ~5 seconds. Another big gain.

Cache the result of FromListToGrid(ll). If you only compute it once, and then use the result when computing di and dj, I got the time down to ~1.1 seconds. Something like:

grid_l = FromListToGrid(l) di = np.array([abs(grid_l[0] - FromListToGrid(i)[0]) for i in Ulist]) dj = np.array([abs(grid_l[1] - FromListToGrid(i)[1]) for i in Ulist])

You can also cache the results from FromListToGrid(i), but you need to do that outside the for loop:

grid_i = [FromListToGrid(i) for i in Ulist] for l in Ulist: # do some stuff dj = np.array([abs(grid_l[1] - grid_i[i][1]) for i in Ulist])

Down to ~2.4 seconds.

Tidy up the list comprehension in your loop. There are two problems with your list comprehensions for di and dj:
- Since Ulist is a range, there’s no point enumerating it, as you’ll always have i=x. It’s much faster and more memory-efficient to do for i in Ulist.
- As suggested by the SO link above, you need to be careful about whether you use abs() or np.abs(). With the previous modifications, I found it was slightly better to use np.abs() on the entire array, but it was a very small difference.
The new list comprehension is
```
di = np.abs(np.array([grid_l[0] - grid_i[0] for i in Ulist])) 
```
and that takes ~0.5 seconds.
(A previous version of this answer had you use the builtin abs() within the list comprehension, but after I started caching the FromListToGrid results, this approach turned out to be marginally better.)

I ran out of easy and obvious ideas at this point. Everything else I tried caused the script to slow down.

With U=60, the script took just over 10 minutes on my computer. With these changes, it took just under 15 seconds.

Next steps

Two comments:

If I use a profiler, I can see that almost all of the time is going on the two list comprehensions: specifically, constructing the lists that get passed to np.array().
(Where by “most”, I mean 13 of the 14 seconds it took to do a run with U=60 on my other computer.)
If you need to get more performance gains, you should probably focus on those lines first. Everything else pales in comparison.
If you try to run this for U=1000, you will almost certainly run into memory problems. Consider using xrange() for Ulist, which is much more memory efficient (and in my brief testing, didn’t incur a performance hit).
See Should you always favour xrange() over range()? for a more detailed explanation.
As for the dist_array: this will be an array with O(trillion) elements (1000^4). Off the top of my head, I can’t think of a sensible way to handle that.

General comments

Read PEP 8, the Python style guide. Particularly the sections on module imports, docstrings and variable names. It will make it easier for other people to read your code.
You could use more descriptive variable names. And this may be a personal thing, but I find underscores at the end of variable names (in Python) to be particularly egregious and hard-to-read.
Add some comments explaining the code in the context of the wider problem. Explain why you wrote this code. Anybody can read the code to see what it does, but only you know why it does that. Writing good comments will make it much easier to debug this if you ever have to revisit it.

Veedrac · Accepted Answer · 2015-03-29 11:20:59Z

Python has PEP 8, a well-followed coding standard. It's a good idea to stick to it. One point is that most things, global constants and classes asside, should be snake_case. I'll rename these as I go, so names will change their case convention as I talk about the code.

You don't use FromGridToList, so chuck it.

Your from_list_to_grid is only ever used for one of its results, so split it up. I'd also inline it. Note that you can use // for flooring division.

The calculation l_ - (l_//U)*U is just l_ % U.

This gives

di = np.array([np.abs(l//U - i//U) for i, x in enumerate(Ulist)]) di = np.minimum(di, U-di) dj = np.array([np.abs(l%U - i%U) for i, x in enumerate(Ulist)]) dj = np.minimum(dj, U-dj)

Then you can move the abs out and vectorize:

di = np.abs(l//U - Ulist_indices//U) di = np.minimum(di, U-di) dj = np.abs(l%U - Ulist_indices%U) dj = np.minimum(dj, U-dj)

This takes ~0.04 seconds for U=25 for me, handily defeating alexwlchan's.

It takes ~0.6 seconds seconds for U=60. Caching Ulist_indices//U and Ulist_indices%U divides that by 3.

Now we can start vectorizing the whole loop, but it's so large that we seem to lose memory locality and it actually doesn't help!

Here's the code:

from __future__ import print_function import numpy from numpy import minimum, newaxis def gen_distances(ulist): u = len(ulist) ** 0.5 indices = numpy.arange(u ** 2, dtype=float) indices_div_u = indices // u indices_mod_u = indices % u for val in ulist: di = numpy.abs(val // u - indices_div_u) dj = numpy.abs(val % u - indices_mod_u) di = minimum(di, u-di) dj = minimum(dj, u-dj) yield numpy.sqrt(di**2 + dj**2) def main(): ulist = numpy.arange(60**2, dtype=float) import time start_time = time.time() dist_array = numpy.vstack(gen_distances(ulist)) print(time.time() - start_time, 'seconds') main() #>>> 0.19192290306091309 seconds

This takes me ~6 seconds for U=150, and I run out of memory if I go significantly higher.

Stack Exchange Network

Computation of inter-grid distance array for 2D lattice

3 Answers 3

1. Introduction

2. Review

3. Performance

4. Simplifying

Next steps

General comments

Hot Network Questions

Computation of inter-grid distance array for 2D lattice

3 Answers 3

1. Introduction

2. Review

3. Performance

4. Simplifying

Next steps

General comments

Related

Hot Network Questions