4
\$\begingroup\$

This method takes a text file of my work schedule and parses it. It works, but is kind of kludgey, and I'm looking for feedback and suggestions. My first attempt at a file parser, and my first look at Python (I have dabbled in Java/Android and JavaScript). I'm hoping that between the pseudocode and the comments it will be clear what is going on.

Pseudocode:

def read_body(): f = open('file') tempList = [] d = custom data object() for line in f: if line not empty line: if begins with a macron: #denotes new day in the file prepare new/empty data object for line in f: split line for whitespace if line not empty line if line is work log it in data object else its another activity if pay value, also log it elif begins with an equal sign: #denotes end of day else: clean up f.close() 

Full Code (it works):

macron = '¯' equalSign = '=' underscore = '_' def read_body(path, month_year): '''Reads schedule detail text file and generates pairing data objects for each work day. path: file to open 'monthYEAR.txt' date: MonthYEAREMPNO (month, year, employee number) from file header ''' f = open(path) tempList = [] d = WorkDay() for line in f: tempList = line.rsplit() if len(tempList) > 0: if macron in tempList[0]: #begin work day #prepare new data object d.clear() d.month_year(month_year) for line in f: tempList = line.rsplit() if len(tempList) > 0: if isWorkingDay(tempList[0]): #parse tempList[1] date block MM/dd/YYYY d.activity_name(tempList[0]) d.date_of_month(tempList[1]) line = f.next() tempList = line.rsplit() d.hours_worked(tempList[1]) d.pay(tempList[3]) break else: #is other activity (sick, off, vacation, etc), log date/time/pay d.activity_name(tempList[0]) #tempList[0] failed isWorkingDay(), so just print. Human readable string for line in f: if underscore in line: break elif 'Credit:' in line: d.pay(line[8:12]) break elif equalSign in tempList[0]: #end work day #drop activity object in "write to db" queue add_to_db(d) else: '''end of file, clean up partial data objects. necessary because schedule text file ends with a macron''' #discard d if empty, raise error if non-complete, write to db if complete pass f.close() 
\$\endgroup\$

    2 Answers 2

    4
    \$\begingroup\$

    As a recommendation for readability i suggest you replace

    if len(tempList) > 0: 

    with

    if tempList: 

    as the Empty List resolves to False.

    It's also recommended to use the "with statement" when dealing with files to avoid errors.

    with open('file') as f: ... 

    This will free any locks on the filesystem once this block is exited.

    \$\endgroup\$
      3
      \$\begingroup\$
      • Those constants (macron, equalSign, underscore) are a bit uselessly named. It's no use having const int TEN = 10; use names that reflect what the constants are for. (This also helps avoid the analogic case of const int TEN = 11 when you need to change the constant in a new version...) Better yet, put the constants into predicate functions that encapsulate all the checking in one easy-to-use package.

      • The nested loops over your file handle may work, but they're confusing as heck. I recommend an outer while True loop and manual calls to f.next().

      • Your code is very nested. Try splitting it into multiple functions.

      • It's often preferable to return early (if not tempList: break) instead of nesting (if tempList: ...).

      • Your inner loop could be improved to handle an entire "record" (begin day, stuff happens, end day) at a time, rather than taking each line and clumsily maintaining state. (All this really entails is changing the structure to remove the loops and clearly define a sequence of processing; moving the code into its own function is just bonus points.)

      • Some bits of code common to all branches of execution (d.month_year(month_year) is one spot, d.activity_name(tempList[0]) is another) can be moved out of the branching constructs.

      • If a day is started without ending the previous day, the code throws the previous day away. You might want to change that.

      • You're not awfully clear about whether the macron to start a new day gets its own line; your code acts as if it does (by throwing away the rest of that line), so I'm going with that.

      • Your terminating characters (macron, equals, underscore) - you say they have to be at the beginning of the line, but then test if they're anywhere in the line! Combined with the above leap of logic, you can simplify those tests to checking if the character is the only one on its line.

      • Other-activity sections (terminated with _) don't terminate a record when they end. I'l assume it's impermissible to have anything between the end of an other-section and the end of the record.

      Also taking into account Christoph Hegemann's pointers, your revised code:

      def isStartWorkDay(line): return line == "¯\n" def isEndWorkDay(line): return line == "=\n" def isEndOtherActivitySection(line): return line == "_\n" def processRecord(f, d): # This line should just start a record line = f.next() if line == '\n': return False # Start of record should be start of record if not isStartWorkDay(line): raise Exception("You missed the ¯ at the start of the day") # First line always starts with a day or a special identifier ("sick" etc.) lineList = f.next().rsplit() d.activity_name(lineList[0]) if isWorkingDay(lineList[0]): # Line 1 is day of week, day of month, [boring] d.date_of_month(lineList[1]) # Line 2 is [boring], hours worked, [boring], pay, [boring] lineList = f.next().rsplit() d.hours_worked(lineList[1]) d.pay(lineList[3]) else: # Something special happened; go through lines describing it # I know this is potentially confusing (as I said above), but it's the "least bad" way for line in f: if isEndOtherActivitySection(line): # That's it for this section break elseif "Credit:" in line: # It's a pay credit, we need to record it d.pay(line[8:12]) else: pass # Move along, nothing to see here # All right, this record had better be ending now line = f.next() if isEndWorkDay(line): return True raise Exception("You missed the = at the end of the day") def readBodyIntoDb(path, month_year): '''Reads schedule detail text file and generates pairing data objects for each work day. path: file to open 'monthYEAR.txt' date: MonthYEAREMPNO (month, year, employee number) from file header''' d = None # WorkDay object; required in outer scope with open(path) as f: try: while True: d = WorkDay() d.month_year(month_year) if processRecord(f, d): # d modified by side effect dbWrite(d) except StopIteration: # EOF. I filled this pseudocode in for you if isEmptyRecord(d): pass # Nothing to see here elseif isCompleteRecord(d): dbWrite(d) # Someone left off the last `=` or something else: raise Exception("incomplete record") return True 

      (My Python's a bit rusty, so there may be latent syntax or logic errors in there.)

      \$\endgroup\$
      2
      • \$\begingroup\$Thanks for the thorough look! A couple things: flags like macron etc are only present on lines by themselves, 40 at a time (horizontal line). I originally checked for if tempList[0][0:1] == macron: because its 2 bytes, but that didn't work for the equal or underscore, which needed tempList[0][0] (1 byte), so it got cluttered and I discovered the in statement which simplified it. Thoughts? I understand about the constant names, thanks. I'll rework and post a new question in a few days.\$\endgroup\$CommentedAug 5, 2014 at 1:40
      • \$\begingroup\$if line == '\n' doesn't work with my particular file, company IT uses windows line endings, but line.isspace() (built-in function) works.\$\endgroup\$CommentedAug 5, 2014 at 2:56

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.