We have a new service where an admin can upload a spreadsheet of new users they want to add to our application. During our processing we make a union map of all the users both existing and new additions, and check to make sure that the users will still be in a valid tree structure(s) after the update goes through.
Requirements:
For the user relationship tree structure to be valid,
- There can be no circular references (ex. Users cannot be each other's manager)
- A single branch of the tree cannot exceed 50 users (ex. There should not be 50 middle-management managers between CEO and Janitor).
- Its ok if the tree is severed into multiple trees when a user does not have a manager. Separate warnings are generated for that circumstance.
This may need to scale to up to 1,000,000 user relationships, so far, the current solution supports at least 10,000 user relationships in a reasonable amount of time (less than 1 minute running locally).
FYI this is Groovy/Java
Solution So Far:
I have a method that takes a HashMap of user relationships and it returns a list of invalid users ids.
Map<String, String> userRelationships = [["userId1": "userId2"], ["userId2": "userId3"], ...] def invalidRelationships = findInvalidRelationships(userRelationships)
.
/** * Takes a map of user relationships (user with corresponding manager) * For each relationship, climb up the tree and look to see if the user is repeated in their branch * If user is found again up the tree branch, that is a bad circular reference * If the current branch goes on past 50 users, that is too long and also invalid * Report all users that are part of that bad relationship * userRelationship.key is the user * userRelationship.value is the manager */ def findInvalidUserRelationships(Map<String, String> userRelationships) { Set<String> invalidUserRelationshipUsers = [] userRelationships.each { Entry<String, String> currentUserRelationship -> Set<String> usersInCurrentBranch = [] Entry<String, String> nextIterationRelationship = currentUserRelationship while ( nextIterationRelationship?.value && //while not top of tree (not null manager) !invalidUserRelationshipUsers.contains(nextIterationRelationship.key) && //while user not part of circular relationship !usersInCurrentBranch.contains(nextIterationRelationship.key) //while user not connected to other circular relationship ) { usersInCurrentBranch << nextIterationRelationship.key //if bad circular relationship found or branch has more than max users, report users as invalid if (nextIterationRelationship.value == currentUserRelationship.key || usersInCurrentBranch?.size() > 50) { invalidUserRelationshipUsers += usersInCurrentBranch } //set next user relationship to check in next iteration (manager's manager) - (go upline) nextIterationRelationship = new MapEntry( nextIterationRelationship.value, //manager of current iteration user will be next user userRelationships[nextIterationRelationship.value as String] //manager's manager ) } } return invalidUserRelationshipUsers as List<String> }
I have already made lots of optimizations, at least compared to how bad it was before using only Lists instead of Maps and Sets, but I was wondering if there would be a more efficient solution to this problem. I wondered if a Java TreeMap would be appropriate somewhere here, but could not figure out how it could improve performance.