2
\$\begingroup\$

I wrote some code that uses the arcpy module to read from some tables and checks to see if those values are in another table then if they aren't it writes them to a txt document. I made a change so I could write more than just one field to the txt document. When I did this my code slowed down tremendously. Before I think it took about 5 minutes to run. After I stopped it from running at about 30 minutes in. Why'd my code take so much longer the second time around?

The common field has values that are in one table and potentially not in the other.

Here's what I changed it to:

import arcpy import os import sets import time #script takes longer now.. Wonder why #Looks like this just isn't working anymore pathOne = os.path.join(real path that exists) pathOneFields = ["Common field","OBJECTID","Shape.STArea()","Shape.STLength()"] listOne = [row for row in arcpy.da.SearchCursor(pathOne, pathOneFields)] print "Done with List One" pathTwo = os.path.join(this path is also real...or is it? It is) pathTwoFields = ["Common field","OBJECTID","SHAPE.STArea()","SHAPE.STLength()"] listTwo = [row for row in arcpy.da.SearchCursor(pathTwo, pathTwoFields)] print "Done with Path Two List" #just counting some stuff to make sure I actually did something i = 0 j = 0 with open("None Equals IDs.txt", "a") as text: for item in pathOneList: i += 1 if (item[0] not in pathTwoList) and (str(item[0]) != "Null"): #add month to this file name as well as full path for the file j += 1 text.write("\n{} not in pathTwo".format(item[1])) elif str(item) == "Null": j += 1 text.write("\n{} not in pathTwo {} {} {}".format(item, "How should we", "handle Null", "Values?")) print "Done with what's in pathOne and not pathTwo" print j k = 0 l = 0 with open("None Equals IDs.txt", "a") as text: for item in pathTwoList: k += 1 if (item[0] not in pathOneList) and (str(item[0]) != "Null"): #add month to this file name as well as full path for the file l += 1 text.write("\n{} not in pathOne".format(item[1])) elif str(item[0]) == "Null": l += 1 text.write("\n{} not in pathOne {} {} {}".format(item, "How should we", "handle Null", "Values?")) print "Done with what's in pathTwo and not in pathOne" print l print "Finished" 

Here's what I changed it from:

 import arcpy import os import sets import time #This is the fast script pathOne = os.path.join(real path that exists) pathOneFields = ["Common field","OBJECTID","Shape.STArea()","Shape.STLength()"] listOne = [row[0] for row in arcpy.da.SearchCursor(pathOne, pathOneFields)] print "Done with List One" pathTwo = os.path.join(this path is also real...or is it? It is) pathTwoFields = ["Common field","OBJECTID","SHAPE.STArea()","SHAPE.STLength()"] listTwo = [row[0] for row in arcpy.da.SearchCursor(pathTwo, pathTwoFields)] print "Done with Path Two List" #just counting some stuff to make sure I actually did something i = 0 j = 0 with open("None Equals IDs.txt", "a") as text: for item in pathOneList: i += 1 if (item not in pathTwoList) and (str(item) != "Null"): #add month to this file name as well as full path for the file j += 1 text.write("\n{} not in pathTwo".format(item)) elif str(item) == "Null": j += 1 text.write("\n{} not in pathTwo {} {} {}".format(item, "How should we", "handle Null", "Values?")) print "Done with what's in pathOne and not pathTwo" print j k = 0 l = 0 with open("None Equals IDs.txt", "a") as text: for item in pathTwoList: k += 1 if (item not in pathOneList) and (str(item) != "Null"): #add month to this file name as well as full path for the file l += 1 text.write("\n{} not in pathOne".format(item)) elif str(item[0]) == "Null": l += 1 text.write("\n{} not in pathOne {} {} {}".format(item, "How should we", "handle Null", "Values?")) print "Done with what's in pathTwo and not in pathOne" print l print "Finished" 

I was expecting the former bit of code to take longer, but not over 6 times longer. It wasn't even halfway done when I stopped it. How can this be!?

\$\endgroup\$
2
  • \$\begingroup\$because of the item[0] in your 'from' script i doubt this was really run. get your from script, run and time it and post it.\$\endgroup\$
    – stefan
    CommentedNov 21, 2017 at 22:22
  • \$\begingroup\$@stefan Was the logic not working? I thought it might not have been. Can you not use indexing in if statements?\$\endgroup\$
    – user106363
    CommentedNov 21, 2017 at 22:29

1 Answer 1

4
\$\begingroup\$

Assuming pathTwoList is listTwo, pathOneList is listOne..

From what I understand, you've actually broken the initial logic. Look at the item[0] not in pathTwoList expression. pathTwoList is a list of rows returned by the AcrPy query, item[0] is a value of the "Common Field". This means that the expression would always return False after a full scan of the pathTwoList list, which, in other words, means that you are hitting the worst case every time, which explains the slowdown.

A different approach would probably be to make sets of common field values and work with the difference of the sets.

\$\endgroup\$
6
  • \$\begingroup\$I wanted to use sets, but some of the values that the query returns are repeated.\$\endgroup\$
    – user106363
    CommentedNov 21, 2017 at 23:08
  • \$\begingroup\$I went ahead and corrected some stuff that I transposed wrong. it should be right now. That was wrong in my actual code too. Thanks for pointing it out. It was wrong in both the slow version and the quick version though.\$\endgroup\$
    – user106363
    CommentedNov 21, 2017 at 23:10
  • \$\begingroup\$what's wrong with elif str(item) =="Null":?\$\endgroup\$
    – user106363
    CommentedNov 22, 2017 at 23:46
  • \$\begingroup\$@Steve sorry, missed your comment. What do you mean what is wrong, please elaborate.\$\endgroup\$
    – alecxe
    CommentedNov 26, 2017 at 1:27
  • \$\begingroup\$I think I figured it out. For some reason the elif never executes. I think it's because str(item) never equals null. Is it worth putting my logic in functions to avoid repetition here?\$\endgroup\$
    – user106363
    CommentedNov 27, 2017 at 22:00