Looks nice enough, pretty straightforward, except that it implements a MultiSet rather than the stated Dictionary mapping. I agree with ppperry that simplifying get_value() by using hash() makes sense. If reimplementing is necessary, I'd probably go with java's hash function just because it is well documented, well understood, and known not to have ugly collision pathologies. If you really want to keep your own, then the 7 ** i
expression seems more expensive than necessary. Try maintaining an accumulator that starts at 1 and gets acc *= 7
each time through the loop, it may bench faster.
In insert(), this seems like a regrettable design choice:
if self.table[val] == None: self.table[val] = key
If you'd instead assigned [key]
then some special casing simply disappears. For that matter, it would be useful to replace None
with []
.
This line:
self.table[val] = [self.table[val], key]
winds up doing what I suggest, in a deferred way, so the complex invariant you're maintaining is "an entry is None or a single non-colliding key or a list of colliding keys", rather than an easily maintained invariant of "an entry is a list of zero or more keys". If the complexity is justified by reduced memory consumption due to fewer list objects, then make that design decision explicit in a comment.
Caller is prohibited from storing the reserved value None, but insert() doesn't have a comment that only strings (only sequences one can call ord()
on) are acceptable. A bad input of None will be discovered one level down on the call stack, in get_value(). This is not so bad - calls involving illegal input will soon be debugged. But consider adding assert
s for preconditions so that bugs are quickly isolated to the relevant layer, rather than potentially being propagated a few levels down the stack.
In lookup(), this is linear with number of collisions:
if type(self.table[val]) == list: found = key in self.table[val]
You may as well replace list with set, for faster lookups. Curiously enough, insert() does not check for key prior to inserting key, so repeated insertions lead to a long (always colliding) list. So rather than Set, you wind up offering MultiSet (or Bag) semantics. You supplied no unit tests that show the behavior of dup key insertion.
In delete()
i = self.table[val].index(key) self.table[val][i] = None
you accumulate lots of [foo, None, None, bar, None] trash after repeated insertions and deletions, leading to very long running times for lookup(). The .index()
is linear, as with in
above. It would be natural to del
the i-th entry, rather than assign None to it. This also has linear cost, which again becomes smaller if you switch from list to set.