4
\$\begingroup\$

This post is based on the 0-1 Knapsack problem. I came across this problem in Assignment #4 of Professor Tim Roughgarden's course Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming on Coursera.

Question 1:

In this programming problem and the next you'll code up the knapsack algorithm from lecture.

Let's start with a warm-up. Download the text file below.

knapsack.txt

This file describes a knapsack instance, and it has the following format:

[knapsack_size][number_of_items]

[value_1] [weight_1]

[value_2] [weight_2]

...

For example, the third line of the file is "50074 659", indicating that the second item has value 50074 and size 659, respectively.

You can assume that all numbers are positive. You should assume that item weights and the knapsack capacity are integers.

In the box below, type in the value of the optimal solution.

My program:

def knapSack(W , wt , val , n): # Base Case if n == 0 or W == 0 : return 0 # If weight of the nth item is more than Knapsack of capacity # W, then this item cannot be included in the optimal solution if (wt[n-1] > W): return knapSack(W , wt , val , n-1) # Check for required value in lookup table first if (lookup[n][W]!=-1): return lookup[n][W] # Add to lookup table and return the maximum of two cases: # (1) nth item included # (2) not included else: x = max(val[n-1] + knapSack(W-wt[n-1] , wt , val , n-1), knapSack(W , wt , val , n-1)) lookup[n][W] = x return x #End of function knapSack #To test above function val = [] wt = [] with open("knapsack.txt") as f: first_line = f.readline().split() W = int(first_line[0]) n = int(first_line[1]) for line in f: words = line.split() val.append(int(words[0])) wt.append(int(words[1])) lookup = [[-1 for i in range(W+1)] for j in range(n+1)] print knapSack(W, wt, val, n) 

This program gave me the output as 2493893, which matches with the official solution. So I presume that my code is correct. So far so good. The next question in the same assignment is as follows.

Question 2:

This problem also asks you to solve a knapsack instance, but a much bigger one.

Download the text file below.

knapsack_big.txt

This file describes a knapsack instance, and it has the following format:

[knapsack_size][number_of_items]

[value_1] [weight_1]

[value_2] [weight_2]

...

For example, the third line of the file is "50074 834558", indicating that the second item has value 50074 and size 834558, respectively. As before, you should assume that item weights and the knapsack capacity are integers.

This instance is so big that the straightforward iterative implemetation uses an infeasible amount of time and space. So you will have to be creative to compute an optimal solution. One idea is to go back to a recursive implementation, solving subproblems --- and, of course, caching the results to avoid redundant work --- only on an "as needed" basis. Also, be sure to think about appropriate data structures for storing and looking up solutions to subproblems.

In the box below, type in the value of the optimal solution.

My program (same as before):

def knapSack(W , wt , val , n): # Base Case if n == 0 or W == 0 : return 0 # If weight of the nth item is more than Knapsack of capacity # W, then this item cannot be included in the optimal solution if (wt[n-1] > W): return knapSack(W , wt , val , n-1) # Check for required value in lookup table first if (lookup[n][W]!=-1): return lookup[n][W] # Add to lookup table and return the maximum of two cases: # (1) nth item included # (2) not included else: x = max(val[n-1] + knapSack(W-wt[n-1] , wt , val , n-1), knapSack(W , wt , val , n-1)) lookup[n][W] = x return x #End of function knapSack #To test above function val = [] wt = [] with open("knapsack_big.txt") as f: first_line = f.readline().split() W = int(first_line[0]) n = int(first_line[1]) for line in f: words = line.split() val.append(int(words[0])) wt.append(int(words[1])) lookup = [[-1 for i in range(W+1)] for j in range(n+1)] print knapSack(W, wt, val, n) 

The second question had mentioned that the ordinary iterative approach would not suffice and that we'd have to get back to the recursive approach and use appropriate caching. Fair enough. I had already used the recursive approach in my initial program and also implemented a lookup table for memoization purposes.

It worked perfectly fine on the smaller data set i.e. knapsack.txt and took less than a second to execute. However, when I execute the program on the second file knapsack_big.txt (which is considerably larger than knapsack.txt) it takes forever to execute and in most cases ends up getting killed.

So, any idea how to speed up the program so that it can be executed on larger data sets? I suspect that there might be more efficient data structures (than a simple 2D list) for this problem, but not sure which.

\$\endgroup\$
5
  • 1
    \$\begingroup\$I'd propose that, for posterity, you also include in-text the contents of knapsack.txt as it's relatively short.\$\endgroup\$CommentedDec 24, 2018 at 1:07
  • \$\begingroup\$Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.\$\endgroup\$
    – Mast
    CommentedDec 24, 2018 at 8:01
  • \$\begingroup\$For your speed-up problem, have you tried what happened if, instead of keeping a list val and wt, you keep a list of tuples containing (val, wt)? It appears you only use them for same-index values, so it might be faster on both the append and the look-up.\$\endgroup\$
    – Mast
    CommentedDec 24, 2018 at 8:07
  • \$\begingroup\$@Mast Thanks, I will try that out but not sure if that will be a substantial improvement.\$\endgroup\$
    – user188780
    CommentedDec 24, 2018 at 8:17
  • 1
    \$\begingroup\$Neither am I, hence a comment instead of an answer, but I'd say it's worth a try. Something even more important you should try, is profiling your code to see where the bottlenecks are. It takes some getting used to, but being able to profile is an essential skill in debugging performance.\$\endgroup\$
    – Mast
    CommentedDec 24, 2018 at 8:32

3 Answers 3

2
\$\begingroup\$
def knapSack(W , wt , val , n): 

This is not PEP8-compliant. You should run this code through a linter. For that particular line, it would instead be

def knapsack(W, wt, val, n): 

You have several lines like this:

if (wt[n-1] > W): 

with parens around the if condition. These parens are not needed.

This:

print knapSack(W, wt, val, n) 

uses Python 2-style print. In most cases, using Python 3-style print is compatible with Python 2, and this is such a case; so for forward compatibility you should instead write

print(knapSack(W, wt, val, n)) 

This:

if (lookup[n][W]!=-1): return lookup[n][W] else: 

doesn't need the else, since the previous block would have already returned.

\$\endgroup\$
1
  • 1
    \$\begingroup\$Thanks for the answer! This doesn’t address the speedup issue though (which is the main problem I’m facing).\$\endgroup\$
    – user188780
    CommentedDec 24, 2018 at 6:24
2
\$\begingroup\$

This isn't an answer but more of an update. I ran the code through a linter - Pylint, as suggested by @Reinderien. These are the current versions:


'''Programming Assignment #4: Question 1''' def knapsack(max_weight, weights, values, max_value): '''Recursive method for the knapsack problem (with memoization)''' # Base Case if max_value == 0 or max_weight == 0: return 0 # If weight of the nth item is more than Knapsack of capacity # W, then this item cannot be included in the optimal solution if weights[max_value-1] > max_weight: return knapsack(max_weight, weights, values, max_value-1) # Check for required value in lookup table first if LOOKUP[max_value][max_weight] != -1: return LOOKUP[max_value][max_weight] # Add to lookup table and return the maximum of two cases: # (1) nth item included # (2) not included val1 = knapsack(max_weight-WEIGHTS[max_value-1], weights, values, max_value-1) val1 += VALUES[max_value-1] # val1 stores the maximum possible profit when the last item is included val2 = knapsack(max_weight, weights, values, max_value-1) # val2 stores the maximum possible profit when the last item is excluded temp = max(val1, val2) LOOKUP[max_value][max_weight] = temp return temp #End of function knapsack #To test above function VALUES = [] WEIGHTS = [] #Reading the data file with open("knapsack.txt") as f: FIRST_LINE = f.readline().split() MAX_WEIGHT = int(FIRST_LINE[0]) MAX_VALUE = int(FIRST_LINE[1]) for line in f: words = line.split() VALUES.append(int(words[0])) WEIGHTS.append(int(words[1])) #Declaring and initializing the lookup table LOOKUP = [[-1 for i in range(MAX_WEIGHT+1)] for j in range(MAX_VALUE+1)] #Function knapsack is invoked print knapsack(MAX_WEIGHT, WEIGHTS, VALUES, MAX_VALUE) 

'''Programming Assignment #4: Question 2''' def knapsack(max_weight, weights, values, max_value): '''Recursive method for the knapsack problem (with memoization)''' # Base Case if max_value == 0 or max_weight == 0: return 0 # If weight of the nth item is more than Knapsack of capacity # W, then this item cannot be included in the optimal solution if weights[max_value-1] > max_weight: return knapsack(max_weight, weights, values, max_value-1) # Check for required value in lookup table first if LOOKUP[max_value][max_weight] != -1: return LOOKUP[max_value][max_weight] # Add to lookup table and return the maximum of two cases: # (1) nth item included # (2) not included val1 = knapsack(max_weight-WEIGHTS[max_value-1], weights, values, max_value-1) val1 += VALUES[max_value-1] # val1 stores the maximum possible profit when the last item is included val2 = knapsack(max_weight, weights, values, max_value-1) # val2 stores the maximum possible profit when the last item is excluded temp = max(val1, val2) LOOKUP[max_value][max_weight] = temp return temp #End of function knapsack #To test above function VALUES = [] WEIGHTS = [] #Reading the data file with open("knapsack_big.txt") as f: FIRST_LINE = f.readline().split() MAX_WEIGHT = int(FIRST_LINE[0]) MAX_VALUE = int(FIRST_LINE[1]) for line in f: words = line.split() VALUES.append(int(words[0])) WEIGHTS.append(int(words[1])) #Declaring and initializing the lookup table LOOKUP = [[-1 for i in range(MAX_WEIGHT+1)] for j in range(MAX_VALUE+1)] #Function knapsack is invoked print knapsack(MAX_WEIGHT, WEIGHTS, VALUES, MAX_VALUE) 

The above programs should be fine syntactically now. They returned a 10/10 score on being run through Pylint. However, as mentioned in the original question, the code still gets killed whenever it is executed on large data files like knapsack_big.txt.

\$\endgroup\$
1
  • 3
    \$\begingroup\$Typically capitalized values are only seen on globals. I think the reason they appear in your program is that you have a bunch of global code. You should attempt to put all of your global-scope code into a main function; then pylint will probably change its suggestions to de-capitalize those variables.\$\endgroup\$CommentedDec 24, 2018 at 14:51
1
\$\begingroup\$

Not sure if the question is still relevant. But this line is creating a list that is too big that exceeds most memory limits:

LOOKUP = [[-1 for i in range(MAX_WEIGHT+1)] for j in range(MAX_VALUE+1)] 

Instead of creating the LOOKUP table upfront, it may be better to save the sub-problems in a hash table or a search tree as you solve them.

\$\endgroup\$
1
  • \$\begingroup\$Welcome to Code Review! Is this an observation about the code in the answer by JD? The main objective is to review the OPs code.\$\endgroup\$CommentedAug 6, 2021 at 7:10