0

I tried to model voxels of 3D cylinder with the following code:

import math import numpy as np R0 = 500 hz = 1 x = np.arange(-1000, 1000, 1) y = np.arange(-1000, 1000, 1) z = np.arange(-10, 10, 1) xx, yy, zz = np.meshgrid(x, y, z) def density_f(x, y, z): r_xy = math.sqrt(x ** 2 + y ** 2) if r_xy <= R0 and -hz <= z <= hz: return 1 else: return 0 density = np.vectorize(density_f)(xx, yy, zz) 

and it took many minutes to compute.

Equivalent suboptimal Java code runs 10-15 seconds.

How to make Python compute voxels at the same speed? Where to optimize?

3
  • Your bio lists MATLAB. Originally that was a wrapper for Fortran matrix code. To get good speed you had to think in terms of whole arrays. New MATLAB has a jit compiler that lets you 'cheat' and write iterative code. numpy is more like the older MATLAB. To get around that you have use added packages like cython and numba that create custom compiled code. I assume you know enough computer science to know the difference between C, Java, and Python.
    – hpaulj
    CommentedOct 27, 2018 at 0:30
  • The question of how do a fast vectorization comes up often. Here's a simpler recent example: stackoverflow.com/questions/53016316/…
    – hpaulj
    CommentedOct 27, 2018 at 4:23
  • 1
    Slightly unrelated to your question, but compare square of xy distance with precomputed square of radius - you can save sqrt in each iteration, which should make a big difference.CommentedOct 29, 2018 at 12:00

3 Answers 3

2

Please do not use .vectorize(..), it is not efficient since it will still do the processing at the Python level. .vectorize() should only be used as a last resort if for example the function can not be calculated in "bulk" because its "structure" is too complex.

But you do not need to use .vectorize here, you can implement your function to work over arrays with:

r_xy = np.sqrt(xx ** 2 + yy ** 2) density = (r_xy <= R0) & (-hz <= zz) & (zz <= hz) 

or even a bit faster:

r_xy = xx * xx + yy * yy density = (r_xy <= R0 * R0) & (-hz <= zz) & (zz <= hz) 

This will construct a 2000×2000×20 array of booleans. We can use:

intdens = density.astype(int) 

to construct an array of ints.

Printing the array here is quite combersome, but it contains a total of 2'356'047 ones:

>>> density.astype(int).sum() 2356047 

Benchmarks: If I run this locally 10 times, I get:

>>> timeit(f, number=10) 18.040479518999973 >>> timeit(f2, number=10) # f2 is the optimized variant 13.287886952000008 

So on average, we calculate this matrix (including casting it to ints) in 1.3-1.8 seconds.

8
  • What about more complex cases with if-s? Can I apply functions elementwise efficiently?
    – Dims
    CommentedOct 26, 2018 at 21:53
  • P.S. Why are you using single ampersands?
    – Dims
    CommentedOct 26, 2018 at 21:54
  • @Dims: because and in Python evaluates truthiness, and an array has no truthiness. & is the "elementwise" and of two matrices. It can require some "creativity" to write it in such variant. This is something one "masters" over time with experience.CommentedOct 26, 2018 at 21:55
  • Thanks, but please explain, what to do with more complex functions, where are if-s, exceptions etc, which I can't write as single vectorized expression?
    – Dims
    CommentedOct 26, 2018 at 22:01
  • @Dims: there are no ifs, etc. you need to encode this in a logical way. The same for exceptions. It thus requires "mathematical engineering".CommentedOct 26, 2018 at 22:04
2

You can also use a compiled version of the function to calculate density. You can use cython or numba for that. I use numba to jit compile the density calculation function in the ans, as it is as easy as putting in a decorator.

Pros :

  • You can write if conditions as you mention in your comments
  • Slightly faster than the numpy version mentioned in the ans by @Willem Van Onsem, as we have to iterate through the boolean array to calculate the sum in density.astype(int).sum().

Cons:

  • Write an ugly three level loop. Looses the beauty of the singlish liner numpy solution.

Code:

import numba as nb @nb.jit(nopython=True, cache=True) def calc_density(xx, yy, zz, R0, hz): threshold = R0 * R0 dimensions = xx.shape density = 0 for i in range(dimensions[0]): for j in range(dimensions[1]): for k in range(dimensions[2]): r_xy = xx[i][j][k] ** 2 + yy[i][j][k] ** 2 if(r_xy <= threshold and -hz <= zz[i][j][k] <= hz): density+=1 return density 

Running times:

Willem Van Onsem solution, f2 variant : 1.28s without sum, 2.01 with sum.

Numba solution( calc_density, on second run, to discount the compile time) : 0.48s.

As suggested in the comments, we need not calculate the meshgrid also. We can directly pass the x, y, z to the function. Thus:

@nb.jit(nopython=True, cache=True) def calc_density2(x, y, z, R0, hz): threshold = R0 * R0 dimensions = len(x), len(y), len(z) density = 0 for i in range(dimensions[0]): for j in range(dimensions[1]): for k in range(dimensions[2]): r_xy = x[i] ** 2 + y[j] ** 2 if(r_xy <= threshold and -hz <= z[k] <= hz): density+=1 return density 

Now, for fair comparison, we also include the time of np.meshgrid in @Willem Van Onsem's ans. Running times:

Willem Van Onsem solution, f2 variant(np.meshgrid time included) : 2.24s

Numba solution( calc_density2, on second run, to discount the compile time) : 0.079s.

5
  • 1
    1) Don't use global variables! Within Numba every "global" variable is a compile time constant. If you change the global variable and run your function again, Numba won't recognize this changes, which leads to hard to debug errors. 2) A small change (passing x,x,z directly) leads to a factor 10 better performance. 3) Parallelization would give another factor of Number of Cores.
    – max9111
    CommentedOct 29, 2018 at 13:58
  • @max9111 can you please explain or example (2) and (3)?
    – Dims
    CommentedOct 29, 2018 at 14:16
  • @max9111, thanks. 1) yes, indeed. Slipped my mind. Edited to incorporate 2). Can't seem to get 3). I guess the bookkeeping required for parallelization dwarfs the gains for the specific size.CommentedOct 29, 2018 at 14:42
  • @Deepak Saini I have posted quite a duplicate answer, apart from parallelization. I think it would be better to add the parallelized aproach to your answer and I delete mine afterwards... BTW: cache=True is not working if paralization is set to True.
    – max9111
    CommentedOct 29, 2018 at 15:38
  • @max9111, sure. Will do.CommentedOct 29, 2018 at 15:46
1

This is meant as a lengthy comment on the answer of Deepak Saini.

The main change is to not use the coordinates generated by np.meshgrid which contains unecessary repetitions. This isn't recommandable if you can avoid it (both in terms of memory usage and performance)

Code

import numba as nb import numpy as np @nb.jit(nopython=True,parallel=True) def calc_density_2(x, y, z,R0,hz): threshold = R0 * R0 density = 0 for i in nb.prange(y.shape[0]): for j in range(x.shape[0]): r_xy = x[j] ** 2 + y[i] ** 2 for k in range(z.shape[0]): if(r_xy <= threshold and -hz <= z[k] <= hz): density+=1 return density 

Timings

R0 = 500 hz = 1 x = np.arange(-1000, 1000, 1) y = np.arange(-1000, 1000, 1) z = np.arange(-10, 10, 1) xx, yy, zz = np.meshgrid(x, y, z) #after the first call (compilation overhead) #calc_density_2 9.7 ms #calc_density_2 parallel 3.9 ms #@Deepak Saini 115 ms 

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.