This is a command line application which displays the text of an EPUB one sentence at a time.
I am going to make it more robust, including:
- make the segmentation more accurate, because it currently groups together unrelated text sometimes
- make it faster, so that the segmentation occurs a first time, then the segments are saved on the filesystem
- add in more reading capabilities, like a progress meter and the ability to take notes on each sentence
However, for now, I'm really just interested in feedback about optimizing the code I have. Is there any more elegant design pattern?
Thanks very much.
# Note: this code works, but it's slow to start because Spacy's nlp runs for a while before the curses display launches. # This is a Python program which takes the name of an EPUB from the command line, extracts the plaintext content from the EPUB, then segments it with Spacy, then displays each sentence one at a time on-screen. # The controls are "n" for next sentence, "b" for last sentence, and "q" to quit the application. import sys import spacy import epub2txt import curses def main(stdscr): # Get the name of the EPUB textname = sys.argv[1] # Get the plaintext out of the EPUB text = epub2txt.epub2txt(textname) # Segment the text with Spacy. nlp = spacy.load('en_core_web_sm') doc = nlp(text) lines = list(doc.sents) # loop through the sentences with index: i = 0 while i < len(lines): stdscr.clear() stdscr.addstr(str(lines[i])) stdscr.refresh() c = stdscr.getch() if c == ord('q'): break elif c == ord('b'): if i > 0: i -= 1 elif c == ord('n'): if i < len(lines) - 1: i += 1 curses.wrapper(main)