Faster computation of barycentric coordinates for many points

Question

I'm just starting to understand the Python syntax and I created a module that does what I wanted, but really slow.

Here are the stats of cProfile, top 10 ordered by internal time:

 ncalls tottime percall cumtime percall filename:lineno(function) 1291716 45.576 0.000 171.672 0.000 geometry.py:10(barycentric_coords) 6460649 31.617 0.000 31.617 0.000 {numpy.core.multiarray.array} 2583432 15.979 0.000 15.979 0.000 {method 'reduce' of 'numpy.ufunc' objects} 2031 12.032 0.006 193.333 0.095 geometry.py:26(containing_tet) 1291716 10.944 0.000 58.323 0.000 linalg.py:244(solve) 1291716 7.075 0.000 7.075 0.000 {numpy.linalg.lapack_lite.dgesv} 1291716 5.750 0.000 9.865 0.000 linalg.py:99(_commonType) 2583432 5.659 0.000 5.659 0.000 {numpy.core.multiarray._fastCopyAn dTranspose} 1291716 5.526 0.000 7.299 0.000 twodim_base.py:169(eye) 1291716 5.492 0.000 12.791 0.000 numeric.py:1884(identity)

To process data and create a 24 bitmap of 300*300 pixels it would take about 1 1/2 hour on my laptop with the latest intel i5, a SSD disk and 12gb RAM!

I am not sure if it's a good idea to post all the code, but if not how can you get the entire picture?

The first thing colors_PPM.py does is to access a function called geometry.

geometry.py itself calls tetgen, tetgen.py calls tetgen.exe.

The module colors_PPM.py reads a list in targets.csv and a list in XYZcolorlist_D65.csv, it excludes some elements of XYZcolorlist_D65.csv and then reads one by one the rows of targets.csv, performs a delaunay triangulation via tetgen and returns 4 names[] and 4 bcoords[].

Then random is used to choose one name by a series of ifelifelse tests.

Finally the result is exported in a bitmap file epson_gamut.pbm.

Do you see any way this could run faster? I know it should seem quite a mess.

.csv files structures examples:

'XYZcolorlist_D65.csv'

255 63 127,35.5344302104,21.380721966,20.3661095969 255 95 127,40.2074945517,26.5282949405,22.7094284437 255 127 127,43.6647438365,32.3482625492,23.6181801523 255 159 127,47.1225628354,39.1780944388,22.9366615044 255 223 159,61.7379149646,62.8387601708,32.3936200864 255 255 159,70.7428790853,78.6134546144,29.5579371353 255 0 127,32.0951763469,18.3503537537,19.0863164396 255 31 127,32.281389139,18.5592317077,18.6802029444 255 191 127,52.6108977261,48.5621713952,21.7645428218 255 223 127,59.7600830083,60.9770436618,20.9338174593 ...

'targets.csv'

30,5,3 30.34,5,3 30.68,5,3 31.02,5,3 31.36,5,3 31.7,5,3 32.04,5,3 32.38,5,3 32.72,5,3 33.06,5,3 33.4,5,3 33.74,5,3 ...

This is geometry.py:

#geometry.py import numpy as np import numpy.linalg as la import tetgen def barycentric_coords(vertices, point): T = (np.array(vertices[:-1])-vertices[-1]).T v = np.dot(la.inv(T), np.array(point)-vertices[-1]) v.resize(len(vertices)) v[-1] = 1-v.sum() return v def tetgen_of_hull(points): tg_all = tetgen.TetGen(points) hull_i = set().union(*tg_all.hull) hull_points = [points[i] for i in hull_i] tg_hull = tetgen.TetGen(hull_points) return tg_hull, hull_i def containing_tet(tg, point): for i, tet in enumerate(tg.tets): verts = [tg.points[j] for j in tet] bcoords = barycentric_coords(verts, point) if (bcoords >= 0).all(): return i, bcoords return None, None

This is tetgen.py:

#tetgen import tempfile, subprocess, os class TetGen: def __init__(self, points): self.points = points node_f = tempfile.NamedTemporaryFile(suffix=".node", delete=False); node_f.write("%i 3 0 0\n" % len(points)) for i, point in enumerate(points): node_f.write("%i %f %f %f\n" % (i, point[0], point[1], point[2])) node_f.close() subprocess.call(["tetgen", node_f.name], stdout=open(os.devnull, 'wb')) #subprocess.call(["C:\Users\gary\Documents\eclipse\Light Transformer Setup\tetgen", node_f.name], stdout=open(os.devnull, 'wb')) ele_f_name = node_f.name[:-5] + ".1.ele" face_f_name = node_f.name[:-5] + ".1.face" ele_f_lines = [line.split() for line in open(ele_f_name)][1:-1] face_f_lines = [line.split() for line in open(face_f_name)][1:-1] self.tets = [map(int, line[1:]) for line in ele_f_lines] self.hull = [map(int, line[1:]) for line in face_f_lines] if __name__ == '__main__': from pprint import pprint points = [(0,0,0),(0,0,1),(0,1,0),(0,1,1),(1,0,0),(1,0,1),(1,1,0),(1,1,1)] tg = TetGen(points) pprint(tg.tets) pprint(tg.hull)

And this is the module colors_PPM.py:

import geometry import csv import numpy as np import random import cv2 S = 0 img = cv2.imread("MAP.tif", -1) height, width = img.shape ppm = file("epson gamut.ppm", 'w') ppm.write("P3" + "\n" + str(width) + " " + str(height) + "\n" + "255" + "\n") # PPM file header all_colors = [(name, float(X), float(Y), float(Z)) for name, X, Y, Z in csv.reader(open('XYZcolorlist_D65.csv'))] tg, hull_i = geometry.tetgen_of_hull([(X,Y,Z) for name, X, Y, Z in all_colors]) colors = [all_colors[i] for i in hull_i] print ("thrown out: " + ", ".join(set(zip(*all_colors)[0]).difference(zip(*colors)[0]))) targets = ((float(X), float(Y), float(Z)) for X, Y, Z in csv.reader(open('targets.csv'))) for target in targets: X, Y, Z = target target_point = (np.array([X,Y,Z])) tet_i, bcoords = geometry.containing_tet(tg, target_point) if tet_i == None: ppm.write(str("255 255 255") + "\n") continue # not in gamut else: A = bcoords[0] B = bcoords[1] C = bcoords[2] D = bcoords[3] R = random.uniform(0,1) names = [colors[i][0] for i in tg.tets[tet_i]] if R <= A: S = names[0] elif R <= A+B: S = names[1] elif R <= A+B+C: S = names[2] else: S = names[3] ppm.write(str(S) + "\n") print "done" ppm.close()

sorry but I dont know how to do that, I'm still quite a beginner... :( — adrienlucca.net, CommentedFeb 6, 2014 at 11:11
This code is in such a mess that it's expecting way too much of us to try to figure out what it does and why it's slow, especially since we don't have your tetgen program. You need to do more work here: in particular, which bit is slow? Is it the initial tetgen_of_hull? Or is it the later loop over the targets? As Janne says, it would be worth starting by learning to use the profiler. — Gareth Rees, CommentedFeb 6, 2014 at 11:48
See the manual, paragraph starting "The file cProfile can also be invoked as a script to profile another script." — Gareth Rees, CommentedFeb 6, 2014 at 15:33
@Janne Karila @Gareth Rees Thanks, I added the cProfile report: major time is spent in the module geometry.py — adrienlucca.net, CommentedFeb 6, 2014 at 15:57

Gareth Rees · Accepted Answer · 2014-02-07 13:43:55Z

1. Introduction

Thanks for running the profiler. As you can see from the output, most of the runtime is being spent in your containing_tet function.

The first thing to say is that you have made this question unnecessarily difficult for us because your functions have no documentation. We have to read and reverse-engineer your code to try to figure out what the purpose of each function is. Python allows you to write a "docstring" for each function in which you explain what the function does, what arguments it takes, and what values it returns.

It looks to me as though containing_tet tries to find a tetrahedron containing point, and it does so by computing the barycentric coordinates of point within each tetrahedron and checking that all the coordinates are non-negative.

So you could have written:

def containing_tet(tg, point): """If point is inside the i'th tetrahedron in tg, return the pair (i, bcoords) where bcoords are the barycentric coordinates of point within that tetrahedron. Otherwise, return (None, None). """

and so on for your other functions. Writing this kind of documentation will help your colleagues (and you in a few months when you have forgotten all the details).

2. Diagnosis

Why is containing_tet slow? Well, you say that you are calling it many times, and each time it has to repeat some work. First, it collects the vertices of each tetrahedron:

verts = [tg.points[j] for j in tet]

and then in barycentric_coords it computes the transform matrix:

T = (np.array(vertices[:-1])-vertices[-1]).T ... la.inv(T) ...

but these computations will be the same every time (they do not depend on point). It would be best to compute them just once.

Then you need to reorganize your code so that each operation is vectorized: that is, you should read all the targets into a NumPy array, and then compute the barycentric coordinates for all the targets at once.

3. Using scipy.spatial

Having written the above, however, I am wondering why you did not use scipy.spatial.Delaunay? Is it because TetGen does a better job?

Since I don't have a copy of TetGen handy, if I had to write this code I would generate the triangulations like this:

>>> import numpy as np >>> import scipy.spatial >>> points = np.array([(0,0,0),(0,0,1),(0,1,0),(0,1,1),(1,0,0),(1,0,1),(1,1,0),(1,1,1)]) >>> tri = scipy.spatial.Delaunay(points) >>> tri.simplices array([[3, 2, 4, 0], [3, 1, 4, 0], [3, 6, 2, 4], [3, 6, 7, 4], [3, 5, 1, 4], [3, 5, 7, 4]], dtype=int32)

and then if I have an array of targets to query:

>>> targets = np.array([[.1,.1,.1], [.9,.9,.9], [.1,.6,.7], [.4,.9,.1]])

I can find which tetrahedron each point belongs to by calling scipy.spatial.Delaunay.find_simplex:

>>> tetrahedra = tri.find_simplex(targets) >>> tetrahedra array([0, 3, 1, 2], dtype=int32)

And then I can find the barycentric coordinates of each point within its tetrahedron like this:

>>> X = tri.transform[tetrahedra,:3] >>> Y = targets - tri.transform[tetrahedra,3] >>> b = np.einsum('ijk,ik->ij', X, Y) >>> bcoords = np.c_[b, 1 - b.sum(axis=1)] >>> bcoords array([[ 0.1, 0. , 0.1, 0.8], [ 0.1, 0. , 0.8, 0.1], [ 0.6, 0.1, 0.1, 0.2], [ 0.1, 0.3, 0.5, 0.1]])

This is essentially following the recipe in the scipy.spatial.Delaunay documentation, except that I transform each point using the affine transformation for the tetrahedron it was found in. Note that in the final result, all the barycentric coordinates are in the range 0–1 as you would expect.

The bit of the computation that is tricky to figure out how to vectorize is the computation of b. numpy.dot computes the dot product of two arrays, but here I need an array of dot products. I could loop over the elements of X and Y in Python, like this:

>>> b = np.array([x.dot(y) for x, y in zip(X, Y)])

but that would be using slow Python iteration rather than fast NumPy vector operations. Hence the rather hairy use of numpy.einsum.

Note that you'll have to do something about the points that were not found in any tetrahedron. scipy.spatial.Delaunay.find_simplex returns -1 for these points. You could mask out the points you need, as described here, or you could try using a masked array. (Or try both and see which is faster.)

4. Answers to questions

In comments you asked:

What is vectorization? See the "What is NumPy?" section of the NumPy documentation, in particular the section starting:
Vectorization describes the absence of any explicit looping, indexing, etc., in the code - these things are taking place, of course, just “behind the scenes” (in optimized, pre-compiled C code).
Taking advantage of NumPy vector operations is the whole point for using NumPy. For example, this program computes 9 million multiplications and additions in Python:
```
>>> v = np.arange(3000) >>> sum(i * j for i in v for j in v) 20236502250000 
```
It takes about ten seconds on this computer. But the corresponding operation in NumPy:
```
>>> np.sum(np.outer(v, v)) 20236502250000 
```
is about 100 times faster, because the loops involved in this computation run in fast machine code and not in slow Python code. So when you are working on a program in NumPy, you need to scrutinize every Python loop — that is, every for statement — to see if you can turn it into a single NumPy operation.
How to structure your code? See §5 below. Notice how using NumPy efficiently requires us to turn the code "inside out". Instead of looping over the targets at top level, and in each loop iteration performing a series of operations on that one target, we lift the series of operations up to top level and make each operation run over all elements of a NumPy array.

How to read the colors from the colorlist CSV? In NumPy each array must have a single data type (that's one the requirements for NumPy to be able to process them quickly). So you must read the names into one array and the points into another array. An easy way to do this is to call numpy.loadtxt twice:

>>> colors = np.loadtxt('XYZcolorlist_D65.csv', usecols=(0,), delimiter=',', ... converters={0:lambda s:s.split()}, dtype=np.uint8) >>> colors[:5] array([[255, 63, 127], [255, 95, 127], [255, 127, 127], [255, 159, 127], [255, 223, 159]], dtype=uint8) >>> points = np.loadtxt('XYZcolorlist_D65.csv', usecols=(1,2,3), delimiter=',') >>> points[:5] array([[ 35.53443021, 21.38072197, 20.3661096 ], [ 40.20749455, 26.52829494, 22.70942844], [ 43.66474384, 32.34826255, 23.61818015], [ 47.12256284, 39.17809444, 22.9366615 ], [ 61.73791496, 62.83876017, 32.39362009]])

5. Revised code.

import numpy as np import scipy.spatial # Configuration. POINTS_FILENAME = 'XYZcolorlist_D65.csv' TARGETS_FILENAME = 'targets.csv' OUTPUT_FILENAME = 'gamut.ppm' DEFAULT_COLOR = np.array([[255, 255, 255]], dtype=np.uint8) # Load colors colors = np.loadtxt(POINTS_FILENAME, usecols=(0,), delimiter=',', converters={0:lambda s:s.split()}, dtype=np.uint8) # Load points points = np.loadtxt(POINTS_FILENAME, usecols=(1, 2, 3), delimiter=',') # Load targets targets = np.loadtxt(TARGETS_FILENAME, delimiter=',') ntargets = len(targets) # Compute Delaunay triangulation of points. tri = scipy.spatial.Delaunay(points) # Find the tetrahedron containing each target (or -1 if not found) tetrahedra = tri.find_simplex(targets) # Affine transformation for tetrahedron containing each target X = tri.transform[tetrahedra, :3] # Offset of each target from the origin of its containing tetrahedron Y = targets - tri.transform[tetrahedra, 3] # First three barycentric coordinates of each target in its tetrahedron. # The fourth coordinate would be 1 - b.sum(axis=1), but we don't need it. b = np.einsum('...jk,...k->...j', X, Y) # Cumulative sum of barycentric coordinates of each target. bsum = np.c_[b.cumsum(axis=1), np.ones(ntargets)] # A uniform random number in [0, 1] for each target. R = np.random.uniform(0, 1, size=(ntargets, 1)) # Randomly choose one of the tetrahedron vertices for each target, # weighted according to its barycentric coordinates, and get its # color. C = colors[tri.simplices[tetrahedra, np.argmax(R <= bsum, axis=1)]] # Mask out the targets where we failed to find a tetrahedron. C[tetrahedra == -1] = DEFAULT_COLOR # Determine width and height of image. # (Since I don't have your TIFF, this is the best I can do!) width, height = 1, ntargets # Write output as image in PPM format. with open(OUTPUT_FILENAME, 'wb') as ppm: ppm.write("P3\n{} {}\n255\n".format(width, height).encode('ascii')) np.savetxt(ppm, C, fmt='%d')

Thank you for this nice answer and the time you spent on it, and sorry for the lack of documentation. It's the first time I hear about "vectorizing" a code, could you be more explicit? So far - and letting apart scipy.spacial - I understand that in colors_PPM.py I should put tet_i, bcoords = geometry.containing_tet(tg, target_point) outside (before) the for loop, is that correct? Or is it that I should rewrite entirely verts = [tg.points[j] for j in tet] in geometry and make 2 separate def functions out of containing_tet(tg, point): ?? — adrienlucca.net, CommentedFeb 6, 2014 at 19:23
I've explained how to load the data. As for a new question, for best results you should wait until you have got the code working before asking a new question here at Code Review. But while you are still struggling with NumPy, I am sure the good folks at Stack Overflow would be happy to help answer your questions. — Gareth Rees, CommentedFeb 6, 2014 at 23:07
No problem. I hope I have been able to help, and good luck with your project. — Gareth Rees, CommentedFeb 6, 2014 at 23:18

Stack Exchange Network

Faster computation of barycentric coordinates for many points

1 Answer 1

1. Introduction

2. Diagnosis

3. Using scipy.spatial

4. Answers to questions

5. Revised code.

Linked

Hot Network Questions

Faster computation of barycentric coordinates for many points

1 Answer 1

1. Introduction

2. Diagnosis

3. Using scipy.spatial

4. Answers to questions

5. Revised code.

Linked

Related

Hot Network Questions