First, this question is not really about binary search as I neither have sorted data, nor any sortable data at all. :-)
W.r.t the "unsortable" claim see below; but I think the title term "unsortable" is important here, because, afaikt, any sorting would require visiting all "nodes", which is exactly what I strife to avoid.
Question: Provide terms for the algorithm/algorithmic search approach taken below.
Long winded intro by example: What I have is, essentially, a list of names, and I need to find all "valid" names. However, to determine the valid names, I must query a 3rd party system, that, in terms of this simplified problem, only has one interface function:
// remote 3rd party interface // return true if all names in the given list are valid // [names] may be a list of 1 - n strings bool has_invalid_names(List names)
So, given my list l := [a, b, c, ...]
, I need to collect all valid names, meaning has_invalid_names([i]) == false
.
The easiest thing is obviously to just do a linear search, but that means calling all_names_are_valid(i)
n times, which can get prohibitively expensive, as each remote call has quite some overhead.
Given that we expect only a very few invalid names in the initial list, we intuitively came up with the idea to just slice the list up into halves and search over these:
a) Split list in halve b) for each half, check if any names are invalid c) rinse and repeat for all slices that show "invalid" until you're down to calls with just single names
Which just builds up a tree of slices of the initial list
[a, b, c, d, e, f, g*, h] /\ [a, b, c, d] [e, f, g*, h] /\ /\ [a, b] [c, d] [e, f] [g*, h] /\ /\ /\ /\ [a][b] [c][d] [e][f] [g*][h]
If the number of invalid names is small, only a very few paths down this tree have to be explored, essentially requiring only near log2(n)
calls, where the worst case, where all names are invalid, would be 2*n-1
.
In this example above, assuming we try sub-lists left-to-right, we'd need to call the remote function 6 times to find the "invalid" g*
. For larger n, the savings would be more significant, as long as only very few items are invalid.
So, I have a solution to this problem, but what is this stuff called?
- Is there any specific name for splitting up a dataset into a tree of slices?
What could the search algorithm / parts of it be called?
It seems kind of an unsortedbinary tree search, where a predicate is applied to each node to decide which sub-trees to follow, but I'm really weak on the whole algorithms & datas structures zoo. :-)
Mechanically, it seems more in the direction of what map, filter (, reduce) achieve, with a more complex map function.
Put another way: If you want to solve this problem, which libraries, existing algorithms, and data structures would you use? -> Note: The whole question was triggered by the fact that I'm lazy and don't want to code up all that slicing, tree'ing and possibly recursing myself, but the Wide Net didn't indulge me with anything prefabricated. :-)
n
are invalid, it would taken
service calls to identify that. With your algorithm, you still have ton
service calls at the bottom of the tree, and thenn / 2
calls at the next level up, etc, for a total of2 n
service calls.