Algorithm to select sets of objects while maximizing number of objects covered

Question

If we have different objects,

[A1, A2, A3, B1, B2, B3, B4, B5]

Some calculations will be performed to find compatible objects. For example, lets assume following 3 sets were formed and every set contains compatible objects:

{A1, B2}
{A3, B2}
{A1, A3, B4, B5}

Now we need to perform filtering. One node can participate only once. For example, if B2 has made a pair with A1, then B2 can not participate in any other set. Which means, if we select set 1, then set 2 will be deleted because B2 has already participated. And set 3 will be deleted because A1 has already participated.

Now we need to do filtering such that we maximize number of objects being utilized. If we select set 1, then we are only utilizing A1 and B2.
Thus, optimal way would be to select set 3 which is utilizing 4 objects.

Right now, i have a complex function which goes through list of sets recursively and keeps on adjusting until no changes are made to existing sets. This is not only inefficient, but it might not work in cases where changing sets can cause ripple effect.

I am not looking for coded solution, but just a guidance that is there any graph algorithm which I should study?

I would suggest that cs.stackexchange.com is a better forum for your question — Steven Eccles, CommentedJan 12, 2018 at 15:19
Graph theory puts a lot of emphasis on the relationships between nodes. Right now I don't quite follow the topography of your collection of nodes - how are they related? If you put them on a whiteboard, what lines would you draw between them, and why? — Dan Pichelman, CommentedJan 12, 2018 at 15:21
Hi Dan, thanks for reply. I think node was wrong word to be used in context. I changed it to 'objects'. If you still think white board image will help, please let me know and I will share white board picture. I can best imagine it as if I am creating a zoo, then different animals might be compatible with certain animals. We need to keep all compatible animals in one set maximizing number of animals in zoo. And one animal cannot be repeated in any other set. I will also share pic. Thanks for your time. — Zohaib, CommentedJan 12, 2018 at 15:52
I think this might be an NP-complete problem: checking whether a selection of sets is valid can be done cheaply, but to find the best combination you have to search every combination. So unless you need an exact solution, you can use a heuristic (e.g. start a search from the largest set), or use a probabilistic algorithm (e.g. a genetic algorithm). Because these are approximate, you can trade solution quality against performance. A heuristic might work very well in practice if you have more info about the structure of your problem, e.g. if you know how the sizes of sets are distributed. — amon, CommentedJan 12, 2018 at 16:01

Erik Eidt · Accepted Answer · 2018-01-15 16:47:43Z

I can't think of any algorithm that computes directly what you want without searching the solution space.

However, you want better performance than your complex recursive algorithm.

I think that can be had by using a non-recursive approach.

Data structures you already have:

a Map, MatchesToMembers, taking us from Match # to Set of Member #s

Let's add to that a complementary structure for reverse lookup:

a Map, MembersToMatches, taking us from Member # to Match #'s, initially empty

And a data structure of possible match grouping candidates, which will be

a Map, Candidates, taking us from Candidate # to Boolean,
- initially all true
- sized of 2^(# Matches)

Let's populate this new data structure (#2):

for each Match# as Key in MatchesToMembers for each Member# in MatchesToMembers[Match#] Update MembersToMatches[Member#] to include Match# next next

We now have (#2) MembersToMatches as follows:

A1: S1, S3 A3: S2, S3 B2: S1, S2 B4: S3 B5: S3

Let's populate the Candidate data structure (#3) from the other (#2):

for each MatchSet as Value in MembersToMatches if MatchSet has more than one member then Assemble a Candidate mask from the set of Match#'s by accumlating (1<<Match#) for each Match# in the MatchSet Reject all Candidates at that mask

And (#3) Candidates as follows

0 true (the empty candidate) 1 true (candidate having match#1 only) 2 true (candidate having match#2 only) 3 false (candidate having match1&2) 4 true (candidate having match#3 only) 5 false (candidate having match#1&3) 6 false (candidate having match#2&3) 7 false (candidate having all matches)

Hopefully you can see that we're using a bit values of the solution space for the Key in the Candiates map. E.g. Candidate #5 (it's index/Key) in binary is 101, meaning consider the solution of taking Set 3 and Set 1 (in your 1-based set #'ering).

Now we compute the score for each non-rejected candidate, and take the max.

Rejection from candidate mask as all the elements in the set are known to they interfere with each other:

for each index as Key in Candidate if index & CandidateIndexMask == CandidateIndexMask then Candidate[index] = false endif next

Max computation

for index as Key in Candidate if Candidate[index] then compute score for Candidate if larger than before, then keep it endif next

if your Candidate#'s are small (e.g. less than 32), then you can do this with a simple for (int index = 0; index < count; index++) { }, however, for a larger set you'd need a bit vector for the index variable (and the ability to "increment" a bit vector).

Also, these algorithms & data structures can be used with bit vectors instead of individual elements or sets or other. if we work on bit vectors we can work on chunks of 64-bits at at time reducing the iterations by a large factor.

Hi Erik, Thanks a lot for your time. Algorithm seems promising. I am having a bit difficulty in understanding some parts. For example, first iteration of Step 3 may be: A1: S1, S3 masking of S1 = 1 masking of S3 = 100 we take OR of these 2 values and that is: 101 which means consider solution of taking S1 and S3. Now what does "Reject all Candidates at that mask" means? Once again, thanks a lot. It was really helpful. — Zohaib, CommentedJan 15, 2018 at 16:14
Ok, at the high level, since A1 has S1 and S3, that means that S1 and S3 are mutually exclusive solutions by your business logic. So, we cannot consider any solution that has both S1 and S3 together. At the bit level, any Candidiate# that matches 1x1 must then be ruled out. That means ruling out both 101 and 111. — Erik Eidt, CommentedJan 15, 2018 at 16:32
Let's say there's two more matches (4. and 5.). Now the Match#'s bit vectors are 5 bits. A1: S1, S3 is 00101. So again, we have to rule out all Candidates where S1 & S3 are together, which is all Candidates that match the pattern xx1x1. This means ruling out Candidate#'s: 00101, 00111, 01101, 01111, 11101, 11111. This is what algorithm with & and == above does. — Erik Eidt, CommentedJan 15, 2018 at 16:34
Hi Erik, apologies for not making my question clear. Although, I get it now. It looks great. Thanks a million. And your last comment helped me to understand some other points also. — Zohaib, CommentedJan 15, 2018 at 16:36
Is my understanding correct that finally we select a set with highest score. Which means there will be one answer. (1 set will be selected). And do you think, same algorithm will work, if we add another Set such that S4:{A2, B1} ? Now Set 4 contains elements which do not appear in any other set. — Zohaib, CommentedJan 15, 2018 at 16:38

ichantz · Accepted Answer · 2018-01-13 09:51:03Z

7

Your problem seems like a combination of maximum coverage and set packing problems. So my guess is that algorithms for them should be helpful. A greedy heuristic would be to choose the set that contains the largest number of uncovered elements over the conflicts with other sets.Another approach would be to use the integer linear program formulation of max coverage and add a constraint where you only accept solutions forming a set packing.

edited Jan 13, 2018 at 9:51

answered Jan 12, 2018 at 16:17

ichantz

3681 silver badge8 bronze badges

1
this reads more like a comment, see How to Answer
– gnat
CommentedJan 12, 2018 at 16:31
4
@gnat "I am not looking for coded solution, but just a guidance that is there any graph algorithm which I should study?" . It very much looks like an answer to this.
– ichantz
CommentedJan 12, 2018 at 18:25
2
@gnat Also read the section Answer the question of the link you sent me. It is matches the part you "try this instead" and i think it is "in the right direction" for the asker.Am i wrong?
– ichantz
CommentedJan 12, 2018 at 18:37
1
The Set Packing Problem is a good pointer (+1), though OP's problem is slightly differnent: in maximum set packing, the number of sets seems to be maximized, whereas OP maximizes for number of objects in the sets. Aside from that, the problem structure (in particular, that sets may not overlap) is identical.
– amon
CommentedJan 12, 2018 at 21:55
@amon that is the reason i also pointed at set cover. In set cover you try to cover all objects as you state, hence a combination of the two problems.Set cover differs in the sence that sets might overlap whereas set packing differs in the objective (asks for maximum number of subsets instead of objects). So you need to provide a set cover for the Universe(list of objects) using set packings( disjoint subsets of the universe).
– ichantz
CommentedJan 12, 2018 at 22:07

Add a comment |

Stack Exchange Network

Algorithm to select sets of objects while maximizing number of objects covered

2 Answers 2

Hot Network Questions

Algorithm to select sets of objects while maximizing number of objects covered

2 Answers 2

Related

Hot Network Questions