Command-line sentence-by-sentence EPUB reader in Python

Question

This is a command line application which displays the text of an EPUB one sentence at a time.

I am going to make it more robust, including:

make the segmentation more accurate, because it currently groups together unrelated text sometimes
make it faster, so that the segmentation occurs a first time, then the segments are saved on the filesystem
add in more reading capabilities, like a progress meter and the ability to take notes on each sentence

However, for now, I'm really just interested in feedback about optimizing the code I have. Is there any more elegant design pattern?

Thanks very much.

# Note: this code works, but it's slow to start because Spacy's nlp runs for a while before the curses display launches. # This is a Python program which takes the name of an EPUB from the command line, extracts the plaintext content from the EPUB, then segments it with Spacy, then displays each sentence one at a time on-screen. # The controls are "n" for next sentence, "b" for last sentence, and "q" to quit the application. import sys import spacy import epub2txt import curses def main(stdscr): # Get the name of the EPUB textname = sys.argv[1] # Get the plaintext out of the EPUB text = epub2txt.epub2txt(textname) # Segment the text with Spacy. nlp = spacy.load('en_core_web_sm') doc = nlp(text) lines = list(doc.sents) # loop through the sentences with index: i = 0 while i < len(lines): stdscr.clear() stdscr.addstr(str(lines[i])) stdscr.refresh() c = stdscr.getch() if c == ord('q'): break elif c == ord('b'): if i > 0: i -= 1 elif c == ord('n'): if i < len(lines) - 1: i += 1 curses.wrapper(main)

\$\begingroup\$This was a very clear explanation of what your program does\$\endgroup\$
– Zachary Vance
CommentedDec 12, 2021 at 21:06 — Zachary Vance, CommentedDec 12, 2021 at 21:06

Zachary Vance · Accepted Answer · 2021-12-12 20:46:47Z

Right now this looks okay as-is. In general it's less important to optimize style the shorter something is, because it will be readable either way. As you add features, style and breaking things up will become more important. As such, I've put some suggestions to do now, and some suggestions to do later.

Style now

Read and follow PEP 8, a fairly universal style guide. It will suggest removing many of your blank lines.
Use a if __name__ == '__main__': guard
Don't use ord('q'). Instead change c from a number into a character.
Rename variables to be more descriptive. c should be user_command. i should be visible_sentence_index (yes that's long).
# loop through the sentences with index: is inaccurate. You are not looping over sentences, you are letting the user browsing them.
Your check if i < len(lines) - 1: does nothing and never triggers.

Style later

Split out the parsing logic, the display logic, the logic to read a command, and the logic to execute the command into four sections. Each section should probably be a function with an appropriate name and comment.

Suggested features

Add on-screen documentation for the buttons you can press to navigate, etc. I personally like nano's method of on-screen documentation (screenshot, hotkeys are at the bottom).
Add some feedback when you press an invalid button.
(Difficult!) If you want to make it look nicer, you could add threading and load the UI before you finish parsing the document. Or if you can parse the epub incrementally, you could be displaying the first sentence before the whole document is extracted.
I suspect the screen flickers right now every time you press forward/back, that would be nice to clean up.

Awesome, thanks so much. Had almost given up hope that anybody would pay attention to my project. "Split out the parsing logic, the display logic, the logic to read a command, and the logic to execute the command into four sections. Each section should probably be a function with an appropriate name and comment." Should it be four functions in one file? Could you perhaps provide an example? I am really interested in alternatives to the "loop". Is there a simple way to trigger actions on key presses so the program is written in more self-contained methods and not one big loop? Thank you! — Julius Hamilton, CommentedDec 14, 2021 at 9:09

Stack Exchange Network

Command-line sentence-by-sentence EPUB reader in Python

1 Answer 1

Style now

Style later

Suggested features

Hot Network Questions

Command-line sentence-by-sentence EPUB reader in Python

1 Answer 1

Style now

Style later

Suggested features

Related

Hot Network Questions