4
\$\begingroup\$

Basic impl of hash_table in python. Supports set, get, and delete. Uses open addressing with linear probing function. Any ideas would be greatly appreciated! Thanks!

class Entry: """ holds key, value pairs """ def __init__(self, key, value): self.key = key self.value = value self.is_deleted = False class Map(object): """ a basic, minimal implementation of a hash map """ def __init__(self): """ constructs a new Map """ self.table = [None] * 10 self.load_factor = .75 self.current_size = 0 def __setitem__(self, key, item): """ stores the key value combo in the table implements open addressing collision resolution """ entry = Entry(key, item) for i in range(len(self.table)): index = self.__get_hash_code(key, i) if self.table[index] is None or self.table[index].is_deleted: self.table[index] = entry self.current_size += 1 if float(self.current_size) / len(self.table) >= self.load_factor: self.__resize_table() break def __getitem__(self, key): """ gets the value associated with the key """ for i in range(len(self.table)): index = self.__get_hash_code(key, i) if self.table[index] is not None: if self.table[index].key == key: if self.table[index].is_deleted: raise KeyError('Key is not in the map') else: return self.table[index].value elif self.table[index] is None: raise KeyError('Key is not in the map') raise KeyError('Hmm something has gone wrong here') def __get_hash_code(self, key, value): return (hash(key) + value) % len(self.table) def __resize_table(self): new_table = [None] * (len(self.table) * 2) for i in range(len(self.table)): new_table[i] = self.table[i] self.table = new_table def delete(self, key): """ deletes a value from the hash table """ for i in range(len(self.table)): index = self.__get_hash_code(key, i) if self.table[index] is not None: if self.table[index].key == key: if self.table[index].is_deleted: raise KeyError('Key is not in the map') else: self.table[index].is_deleted = True self.current_size -= 1 break 
\$\endgroup\$
2
  • 2
    \$\begingroup\$Just to be sure, you do know that Python's dicts are implemented as hash tables, right (stackoverflow.com/questions/114830/…)? If that is the case and this is just for fun (or educational value), we have a tag for that: reinventing-the-wheel.\$\endgroup\$
    – Graipher
    CommentedDec 18, 2017 at 16:06
  • \$\begingroup\$Thanks for the accept! Though, next time you post code on CodeReview, you ought to wait 24-48 hours after your post for answers to come in. Accepting an answer communicates to everybody "I don't want any more answers". It's possible that someone else may have something to improve your code; you don't want to dissuade them.\$\endgroup\$
    – Snowbody
    CommentedDec 19, 2017 at 14:23

1 Answer 1

5
\$\begingroup\$

Your code looks generally quite good. I just have a few suggestions:

  • Don't forget to define __str__() and __repr__() for each class
  • I dislike the use of the "magic number" constants in the __init__() of the Hash_Map. They really ought to be either passed as parameters, read from a config somewhere, or clearly marked in a parameter section as pseudo-constants.
  • The line for i in range(len(self.table)): appears multiple times. This style of loop is considered non-Pythonic. Python has the enumerate() built-in that returns a succession of tuples consisting of both the item and its index. I'd prefer for (i,item) in enumerate(self.table):
  • If there's a clash, hash(key) will unnecessarily be called multiple times. Note that index = self.__get_hash_code(key, i) is called each time through the i loop, but the first thing it does is call hash(key) even though it may have already been computed. Better to compute hash(key) once outside the loop since it won't change.
  • Multiple references to self.table[index]. The interpreter might be able to optimize them away, but the code will be shorter (and less error-prone) if it performs the lookup once and stores in a temp. (Or just use enumerate() like my previous suggestion)

  • In def __get_hash_code(self, key, value): the third parameter is poorly named. It's not the value of the element, it's the linear probing parameter.

  • There's a bug related to the interaction of __get_hash_code() and __resize_table() which can lead to \$O(n)\$ performance, defeating the purpose of using a hashtable.
\$\endgroup\$

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.