Python simple parser using tags

Question

First ever code review post, so I hope I'm doing it right. My understanding of this site is that it is a place to post stuff like this for feedback and criticism. If that isn't the case, I apologise for my misunderstanding.

I'm teaching myself Python 3 and I'm pretty near the start. I couldn't find a simple description of a file parser, so I spent a frankly embarrassing amount of time trying to craft one, and this is the result:

def parse(file, startTag, endTag): bears=[] bearFile = open(file, 'r') for bearLine in bearFile: if startTag in bearLine: bear = bearFile.readline() while not (endTag in bear): bears.append(bear.strip()) bear = bearFile.readline() return bears

I tested it using:

print(parse("bears.txt", "==start==", "==end=="))

...on a file which looked like this:

This file contains a list of bears. Bears are below, between the start and end tags. ==start== Grizzly Polar Koala Panda Spectacled Sun ==end== These are some more tags, listing some false bears: ==start== Purple Hairless Aquatic Flying ==end==

...and it works! The output is:

['Grizzly', 'Polar', 'Koala', 'Panda', 'Spectacled', 'Sun', 'Purple', 'Hairless', 'Aquatic', 'Flying']

With some obvious flaws, including:

1 - I can't work out how to ignore empty lines between the tags without using another if. I wondered if it might be better to do this later, and strip out all the '' entries, perhaps as an argument option.

2 - It depends on new lines because of the readline method, so using something else like a comma as a delimiter is currently beyond me.

3 - If there is no end tag, it loops forever. try and except are quite new to me right now.

I hope I'm not just wasting everyone's time by being here...

\$\begingroup\$Not a full answer, but you never close bearFile.\$\endgroup\$
– internet_user
CommentedJan 30, 2018 at 17:02 — internet_user, CommentedJan 30, 2018 at 17:02

Peilonrayz · Accepted Answer · 2018-01-30 17:25:43Z

Python has a style guide, PEP8. I recommend that you follow it, as it makes your code easier to read. In this case it's mostly just using snake_case, rather than camelCase.
You should use with to automatically close the file. This happens in any circumstance.
Rather than using if startTag in bearFile, I'd invert the if and continue.
Rather than using while you can use for still. This, IMO, is easier to understand too.
You could use a generator function so you don't have to write as much.

Along with a couple of name edits, here's how I'd change your code:

def parse(path, start, end): with open(path) as file: for line in file: if start not in line: continue for line in file: if end in line: break yield line.strip() print(list(parse("bears.txt", "==start==", "==end==")))

To address your concerns:

Do this out of the function. If you ever do want these empty values, then you'll be duplicating code. You also only need to use:
```
[item for item in parse(...) if item] 
```
You could implement a function, split_delim, that splits on a delimiter. If you make the function simple, then all you'd need to do is add one line to your function:
```
def parse(path, start, end): with open(path) as file: file = split_delim(file, '\n') for line in file: ... 
```
My changes should remove this error, due to how Python's for loops work. However I havn't tested this.

Regarding PEP8, this is a lot to study. Worthwhile of course, but my intention was to first focus on learning to make the code work, and then go to PEP8 a bit later on - after the language makes a bit more sense to me, but before I've gone long enough to start forming bad habits. Is there any reason this is a particularly bad idea? I'm sad to see snake_case favoured. I learned camelCase at University about 200 years ago and I find it prettier. — fishkake, CommentedJan 31, 2018 at 8:18
Thanks for your feedback - what is the protocol here, do I mark this as "the answer" because I like the feedback, even though this isn't a question-and-answer situation? (Also I haven't seen with yet - the way it is used here doesn't make sense to me, so that's where I'm off to now.) — fishkake, CommentedJan 31, 2018 at 8:22
@fishkake From what I remember, PEP8 points can be split into two things. (1) style - mostly where and where not to put spaces. (2) code improvements, that people are likely not to do - using with with context managers. I think anyone can learn (1), where I agree (2) can be hard to learn. Python uses CamelCase for class names, and after a while it's easy to read both - I started with CamelCase too. The way the check works here is if you think it's helped you the most. And in this case it may also deter some feedback. — Peilonrayz, CommentedJan 31, 2018 at 9:06

Stack Exchange Network

Python simple parser using tags

1 Answer 1

Hot Network Questions

Python simple parser using tags

1 Answer 1

Related

Hot Network Questions