4
$\begingroup$

Say want to pick a fixed number of samples from a large 2D dataset, such that they relatively evenly distributed over the whole sample area. Imagine places in a country - so the border of the data is a irregular polygon.

Aware of poisson-disk based sampling, but that has a minimum distance as input, which dont know. The number of points want is the input variable.

Can just divide the area into a grid (as long as sample size is easily divisible), but it tends to favour the edges, parts of the grid with little coverage.

As an exaggerated example, picking 16 samples from a circle, tends to have a lot of samples around the edges, because will still pick one from each square. (the circle is just to demonstrate a kinda of worse case, but the 'grid' will never exactly match the polygon)

example of picking 16 points in samples from a circle

Best idea is at the moment, is to take the samples and perform something like https://en.wikipedia.org/wiki/Lloyd%27s_algorithm so the points 'relax' to be spread out over a rectangle. Then can just pick the points from the regular grid. ... but its never going to be very effient, as will have to do multiple rounds of Lloyds over a large dataset.

Feels like there is algorithm for this, but don't actually know what to look for!

Edit: actully here is a better demo, using real data, of issue with picking points from a grid. Getting lots of points around the coast, as have lots of Ireland that overlaps bits of squares.

enter image description here

$\endgroup$
1
  • 1
    $\begingroup$Well Lloyd's does kinda work (got round to implementing it) but it needs a LOT of rounds to be useful. On my country level datasets, like 50k+ because otherwise the 'coastal' squares still contain mostly coastal points. Needs a lot of rounds, so the inner points diffuse out the the edge squares. Although haven't tried 'over-relaxing' yet. Still looking for a more efficient algorithm$\endgroup$CommentedMar 6, 2024 at 11:01

1 Answer 1

3
$\begingroup$

Well, found an answer. Farthest-first traversal

https://en.wikipedia.org/wiki/Farthest-first_traversal

Embarrassingly it was linked in 'See Also' on wikipedia page for Lloyd's Algorithm. It was only when trying to implement Lloyd's that left the wikipedia page open, and read it again today.

enter image description here

While it is still favouring points along the edge (as furthest from other points) - it is picking them with good distribution.

... you just Traverse until have N points.

--- Edit to add:

And to further close the loop, found a way to avoid favouring coastal points. First add a selection of 'offshore' points, around the edge, so the selected points are kept away from edge (but will still approach the edges)

Sample of points avoiding favouring the coast

The points, actully follow the territorial sea, which is the line plotted on OSM map used in this demo. Its what gave me the idea. If not working with geographical data (where the area polygon, is not a know country) could perhaps just use Concave Hull + Buffer operation to get a 'offshore boundary'.

Although still need to find a nice way of selecting a uniform distribution of points along that boundary. Maybe Further-First Traversal of the points from the boundary would work!

Sample points placed along boundary, to nudge the final points away from coast

$\endgroup$

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.