5
\$\begingroup\$

First ever code review post, so I hope I'm doing it right. My understanding of this site is that it is a place to post stuff like this for feedback and criticism. If that isn't the case, I apologise for my misunderstanding.

I'm teaching myself Python 3 and I'm pretty near the start. I couldn't find a simple description of a file parser, so I spent a frankly embarrassing amount of time trying to craft one, and this is the result:

def parse(file, startTag, endTag): bears=[] bearFile = open(file, 'r') for bearLine in bearFile: if startTag in bearLine: bear = bearFile.readline() while not (endTag in bear): bears.append(bear.strip()) bear = bearFile.readline() return bears 

I tested it using:

print(parse("bears.txt", "==start==", "==end==")) 

...on a file which looked like this:

This file contains a list of bears. Bears are below, between the start and end tags. ==start== Grizzly Polar Koala Panda Spectacled Sun ==end== These are some more tags, listing some false bears: ==start== Purple Hairless Aquatic Flying ==end== 

...and it works! The output is:

['Grizzly', 'Polar', 'Koala', 'Panda', 'Spectacled', 'Sun', 'Purple', 'Hairless', 'Aquatic', 'Flying'] 

With some obvious flaws, including:

1 - I can't work out how to ignore empty lines between the tags without using another if. I wondered if it might be better to do this later, and strip out all the '' entries, perhaps as an argument option.

2 - It depends on new lines because of the readline method, so using something else like a comma as a delimiter is currently beyond me.

3 - If there is no end tag, it loops forever. try and except are quite new to me right now.

I hope I'm not just wasting everyone's time by being here...

\$\endgroup\$
1
  • \$\begingroup\$Not a full answer, but you never close bearFile.\$\endgroup\$CommentedJan 30, 2018 at 17:02

1 Answer 1

5
\$\begingroup\$
  • Python has a style guide, PEP8. I recommend that you follow it, as it makes your code easier to read. In this case it's mostly just using snake_case, rather than camelCase.
  • You should use with to automatically close the file. This happens in any circumstance.
  • Rather than using if startTag in bearFile, I'd invert the if and continue.
  • Rather than using while you can use for still. This, IMO, is easier to understand too.
  • You could use a generator function so you don't have to write as much.

Along with a couple of name edits, here's how I'd change your code:

def parse(path, start, end): with open(path) as file: for line in file: if start not in line: continue for line in file: if end in line: break yield line.strip() print(list(parse("bears.txt", "==start==", "==end=="))) 

To address your concerns:

  1. Do this out of the function. If you ever do want these empty values, then you'll be duplicating code. You also only need to use:

    [item for item in parse(...) if item] 
  2. You could implement a function, split_delim, that splits on a delimiter. If you make the function simple, then all you'd need to do is add one line to your function:

    def parse(path, start, end): with open(path) as file: file = split_delim(file, '\n') for line in file: ... 
  3. My changes should remove this error, due to how Python's for loops work. However I havn't tested this.

\$\endgroup\$
3
  • \$\begingroup\$Regarding PEP8, this is a lot to study. Worthwhile of course, but my intention was to first focus on learning to make the code work, and then go to PEP8 a bit later on - after the language makes a bit more sense to me, but before I've gone long enough to start forming bad habits. Is there any reason this is a particularly bad idea? I'm sad to see snake_case favoured. I learned camelCase at University about 200 years ago and I find it prettier.\$\endgroup\$
    – fishkake
    CommentedJan 31, 2018 at 8:18
  • \$\begingroup\$Thanks for your feedback - what is the protocol here, do I mark this as "the answer" because I like the feedback, even though this isn't a question-and-answer situation? (Also I haven't seen with yet - the way it is used here doesn't make sense to me, so that's where I'm off to now.)\$\endgroup\$
    – fishkake
    CommentedJan 31, 2018 at 8:22
  • 1
    \$\begingroup\$@fishkake From what I remember, PEP8 points can be split into two things. (1) style - mostly where and where not to put spaces. (2) code improvements, that people are likely not to do - using with with context managers. I think anyone can learn (1), where I agree (2) can be hard to learn. Python uses CamelCase for class names, and after a while it's easy to read both - I started with CamelCase too. The way the check works here is if you think it's helped you the most. And in this case it may also deter some feedback.\$\endgroup\$
    – Peilonrayz
    CommentedJan 31, 2018 at 9:06

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.