58
\$\begingroup\$

I wrote a solution to the Knapsack problem in Python, using a bottom-up dynamic programming algorithm. It correctly computes the optimal value, given a list of items with values and weights, and a maximum allowed weight.

Any critique on code style, comment style, readability, and best-practice would be greatly appreciated. I'm not sure how Pythonic the code is; filling in a matrix seems like a natural way of implementing dynamic programming, but it doesn't "feel" Pythonic (and is that a bad thing, in this case?).

Note that the comments are a little on the verbose side; as I was writing the algorithm, I tried to be as explicit about each step as possible, since I was trying to understand it to the fullest extent possible. If they are excessive (or just plain incorrect), feel free to comment (hah!) on it.

import sys def knapsack(items, maxweight): # Create an (N+1) by (W+1) 2-d list to contain the running values # which are to be filled by the dynamic programming routine. # # There are N+1 rows because we need to account for the possibility # of choosing from 0 up to and including N possible items. # There are W+1 columns because we need to account for possible # "running capacities" from 0 up to and including the maximum weight W. bestvalues = [[0] * (maxweight + 1) for i in xrange(len(items) + 1)] # Enumerate through the items and fill in the best-value table for i, (value, weight) in enumerate(items): # Increment i, because the first row (0) is the case where no items # are chosen, and is already initialized as 0, so we're skipping it i += 1 for capacity in xrange(maxweight + 1): # Handle the case where the weight of the current item is greater # than the "running capacity" - we can't add it to the knapsack if weight > capacity: bestvalues[i][capacity] = bestvalues[i - 1][capacity] else: # Otherwise, we must choose between two possible candidate values: # 1) the value of "running capacity" as it stands with the last item # that was computed; if this is larger, then we skip the current item # 2) the value of the current item plus the value of a previously computed # set of items, constrained by the amount of capacity that would be left # in the knapsack (running capacity - item's weight) candidate1 = bestvalues[i - 1][capacity] candidate2 = bestvalues[i - 1][capacity - weight] + value # Just take the maximum of the two candidates; by doing this, we are # in effect "setting in stone" the best value so far for a particular # prefix of the items, and for a particular "prefix" of knapsack capacities bestvalues[i][capacity] = max(candidate1, candidate2) # Reconstruction # Iterate through the values table, and check # to see which of the two candidates were chosen. We can do this by simply # checking if the value is the same as the value of the previous row. If so, then # we say that the item was not included in the knapsack (this is how we arbitrarily # break ties) and simply move the pointer to the previous row. Otherwise, we add # the item to the reconstruction list and subtract the item's weight from the # remaining capacity of the knapsack. Once we reach row 0, we're done reconstruction = [] i = len(items) j = maxweight while i > 0: if bestvalues[i][j] != bestvalues[i - 1][j]: reconstruction.append(items[i - 1]) j -= items[i - 1][1] i -= 1 # Reverse the reconstruction list, so that it is presented # in the order that it was given reconstruction.reverse() # Return the best value, and the reconstruction list return bestvalues[len(items)][maxweight], reconstruction if __name__ == '__main__': if len(sys.argv) != 2: print('usage: knapsack.py [file]') sys.exit(1) filename = sys.argv[1] with open(filename) as f: lines = f.readlines() maxweight = int(lines[0]) items = [map(int, line.split()) for line in lines[1:]] bestvalue, reconstruction = knapsack(items, maxweight) print('Best possible value: {0}'.format(bestvalue)) print('Items:') for value, weight in reconstruction: print('V: {0}, W: {1}'.format(value, weight)) 

The input file that it expects is as follows:

165 92 23 57 31 49 29 68 44 60 53 43 38 67 63 84 85 87 89 72 82 

The first line contains the maximum weight allowed, and subsequent lines contain the items, represented by value-weight pairs.

\$\endgroup\$
2
  • 5
    \$\begingroup\$An old post, I know, but if you haven't run into it already enumerate allows you to specify the starting index (e.g. for i,item in enumerate(items,1): would have i begin at 1.\$\endgroup\$
    – SimonT
    CommentedOct 22, 2013 at 0:18
  • \$\begingroup\$@SimonT: Fantastic! I've programmed in Python for 2+ years now, and I had never seen enumerate used that way. I'll definitely keep it in mind.\$\endgroup\$
    – voithos
    CommentedOct 22, 2013 at 4:44

3 Answers 3

62
\$\begingroup\$

1. Review

  1. The function knapsack lacks a docstring that would explain what arguments the function takes (what kind of things are in items? must items be a sequence, or can it be an iterable?) and what it returns.

  2. This kind of function is ideal for doctests.

  3. The comments say things like "Create an (N+1) by (W+1) 2-d list". But what is N and what is W? Presumably N is len(items) and W is maxweight, so it would be a good idea to use the same names in the comments and the code:

    N = len(items) W = maxweight 
  4. The comment above bestvalues fails to explain what the values in this table mean. I would write something like this:

    # bestvalues[i][j] is the best sum of values for any # subsequence of the first i items, whose weights sum # to no more than j. 

    This makes it obvious why \$0 ≤ i ≤ N\$ and why \$0 ≤ j ≤ W\$.

  5. In a loop like:

    bestvalues = [[0] * (maxweight + 1) for i in xrange(len(items) + 1)] 

    where the loop variable (here i) is unused, it's conventional to name it _.

  6. These lines could be omitted:

    # Increment i, because the first row (0) is the case where no items # are chosen, and is already initialized as 0, so we're skipping it i += 1 

    and then in the rest of the code, use i + 1 instead of i and i instead of i - 1.

  7. The reconstruction loop:

    i = N while i > 0: # code i -= 1 

    can be written like this:

    for i in reversed(range(1, N + 1)): # code 
  8. The code prints an error message like this:

    print('usage: knapsack.py [file]') 

    Error messages ought to go to standard error (not standard output). And it's a good idea not to hard-code the name of the program, because it would be easy to rename the program but forget to update the code. So write instead:

    sys.stderr.write("usage: {0} [file]\n".format(sys.argv[0])) 
  9. The block of code that reads the problem description and prints the result only runs when __name__ == '__main__'. This makes it hard to test, for example from the interactive interpreter. It's usually best to put this kind of code in its own function, like this:

    def main(filename): with open(filename) as f: # etc. if __name__ == '__main__': if len(sys.argv) != 2: print('usage: knapsack.py [file]') sys.exit(1) main(sys.argv[1]) 

    and now you can run main('problem.txt') from the interpreter to test it.

  10. The code reads the whole of the file into memory as a list of lines:

    lines = f.readlines() 

    this is harmless here because the file is small, but it's usually better to process a file one line at a time, like this:

    with open(filename) as f: maxweight = int(next(f)) items = [[int(word) for word in line.split()] for line in f] 

2. Recursive approach

Any dynamic programming algorithm can be implemented in two ways: by building a table of partial results from the bottom up (as in the code in the post), or by recursively computing the result from the top down, using memoization to avoid computing any partial result more than once.

The top-down approach often results in slightly simpler and clearer code, and it only computes the partial results that are needed for the particular problem instance (whereas in the bottom-up approach it's often hard to avoid computing all partial results even if some of them go unused).

So we could use @functools.lru_cache to implement a top-down solution, like this:

from functools import lru_cache def knapsack(items, maxweight): """Solve the knapsack problem by finding the most valuable subsequence of items that weighs no more than maxweight. items must be a sequence of pairs (value, weight), where value is a number and weight is a non-negative integer. maxweight is a non-negative integer. Return a pair whose first element is the sum of values in the most valuable subsequence, and whose second element is the subsequence. >>> items = [(4, 12), (2, 1), (6, 4), (1, 1), (2, 2)] >>> knapsack(items, 15) (11, [(2, 1), (6, 4), (1, 1), (2, 2)]) """ @lru_cache(maxsize=None) def bestvalue(i, j): # Return the value of the most valuable subsequence of the first # i elements in items whose weights sum to no more than j. if j < 0: return float('-inf') if i == 0: return 0 value, weight = items[i - 1] return max(bestvalue(i - 1, j), bestvalue(i - 1, j - weight) + value) j = maxweight result = [] for i in reversed(range(len(items))): if bestvalue(i + 1, j) != bestvalue(i, j): result.append(items[i]) j -= items[i][1] result.reverse() return bestvalue(len(items), maxweight), result 

To see how many partial solutions this needs to compute, print bestvalue.cache_info() just before returning the result. When solving the example problem in the docstring, this outputs:

CacheInfo(hits=17, misses=42, maxsize=None, currsize=42) 

The 42 entries in the cache are fewer than the 96 partial solutions computed by the bottom up approach.

\$\endgroup\$
8
  • \$\begingroup\$Thanks for your response (especially points 3 and 9)! I can't believe I forgot to consider the memoized recursive method. I'll apply some of your suggestions later today, hopefully. Thanks again.\$\endgroup\$
    – voithos
    CommentedJan 16, 2013 at 22:19
  • \$\begingroup\$Beware of recursion depth limit issues with this proposed solution.\$\endgroup\$CommentedMar 8, 2014 at 15:13
  • 1
    \$\begingroup\$@Frank: yes, this solution uses len(items) levels of stack.\$\endgroup\$CommentedMar 13, 2014 at 8:26
  • \$\begingroup\$@GarethRees Do you need to define bestvalue() inside knapsack()? I would tend to avoid defining a function inside a function to make things clearer.\$\endgroup\$CommentedJan 17, 2017 at 13:43
  • 4
    \$\begingroup\$@greendiod: I defined bestvalue inside knapsack because (i) it's only used there; (ii) it uses items from the outer scope (it would be a pain to have to pass this down through all the recursive calls); (iii) it's memoized so it has an associated cache that we don't want to keep around when knapsack has returned.\$\endgroup\$CommentedJan 17, 2017 at 18:06
2
\$\begingroup\$

For the first section of code involving nested loops, if you should choose to
- use curr instead of i,
- assign prev = curr - 1 and
- use enumerate(items, 1),

the code will literally document itself.

for curr, (value, weight) in enumerate(items, 1): prev = curr - 1 for capacity in range(maxweight + 1): if weight > capacity: bestvalues[curr][capacity] = bestvalues[prev][capacity] else: candidate1 = bestvalues[prev][capacity] candidate2 = bestvalues[prev][capacity - weight] + value bestvalues[curr][capacity] = max(candidate1, candidate2) 
\$\endgroup\$
    1
    \$\begingroup\$

    Following problem can be solved using Dynamic Programming in a much efficient way, in term of lines of code and fastest time to perform computation. On top that , following code perform memoization to cache previously computed results.

    try: from functools import lru_cache except ImportError: # For Python2 # pip install backports.functools_lru_cache from backports.functools_lru_cache import lru_cache class knapsack: """ Maximize sum of selected weight. Sum of selected size is less than capacity. Algorithm: Dynamic Programming ---- >>>items = [(4, 12), (2, 1), (6, 4), (1, 1), (2, 2)] >>>weight, size = zip(*items) >>>weight = list(weight) >>>size = list(size) >>>capacity = 15 >>>knapsack(size, weight).solve(capacity) >>>(11, [1, 2, 3, 4]) """ def __init__(self, size, weight): self.size = size self.weight = weight @lru_cache() def solve(self, cap, i=0): if cap < 0: return -sum(self.weight), [] if i == len(self.size): return 0, [] res1 = self.solve(cap, i + 1) res2 = self.solve(cap - self.size[i], i + 1) res2 = (res2[0] + self.weight[i], [i] + res2[1]) return res1 if res1[0] >= res2[0] else res2 
    \$\endgroup\$
    1
    • 2
      \$\begingroup\$Welcome to Code Review! You have presented an alternative solution, but haven't reviewed the code. Please edit to show what aspects of the question code prompted you to write this version, and in what ways it's an improvement over the original. It may be worth (re-)reading How to Answer.\$\endgroup\$CommentedOct 5, 2018 at 7:54

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.