4
\$\begingroup\$

This question now has afollow-up question 1andfollow-up question 2

As an exercise in writing decent code and because I learn best by example, I wrote this program and would ask for a review so that I can see where my problems lie and where I need to do better.

For the sake of terminology, when I talk about flags I mean commandline parameters that contain a "-" as their first character. When I talk about arguments I mean every parameter following a flag (seperated by 1 space) that is not a flag itself.

The task itself is to go through a large .txt file that contains lines with ">" and to remove all spaces in these lines. I am using this program to modify files in FASTA format, a format often used in biology. Due to the way FASTA format is structured, ">" can only occur as first character of a line. Argument parsing is handled by a class called ArgumentHandler that I wrote as well but am not presenting here because that might make things overly complicated, as well as a class called "FastReader" that I modified from the FastReader class made by geeksforgeeks. To reduce effort, below relevant information about the two classes.

Relevant Notes for FastReader: It reads a provided file using a BufferedReader on a FileReader and can read an entire line in the file using its non-static nextLine() method.

Relevant Notes for ArgumentHandler: ArgumentHandler contains a String array of flags and their arguments from the commandline. In general an ArgumentHandler is first instantiated with a String array containing all allowed flags for this program (here -i for input, -o for output, -h for help). Each Flag is followed by an amount of cells that contain the different arguments for this flag - when ArgumentHandler is initialized those contain either default-values if possible or "" to signalize they don't have a default-value but they are not required (because there are or will be default values that can be used for example) or null to signalize they don't have a default-value and are absolutely required.

The methods getFlagStringValue(String flag) and getFlagIntValue(String flag) both return an ArrayList<String>/ArrayList<Integer> that contain all arguments that were given in the commandline between the flag "flag" and the next flag in the commandline call(recognized by having a "-" as first character). In this particular case both always only have 1 argument per flag, but for the sake of reusing this class for other programs, where one flag might have several arguments associated with it, I coded it so it returned an ArrayList.

Algorithm:

  1. Read next line from input file and store in "line"
  2. If "line" contains ">", remove all spaces in "line"
  3. Print "line" to output file
  4. If next line is not null, go back to Step 1

The Code

import java.io.BufferedWriter; import java.io.FileNotFoundException; import java.io.FileOutputStream; import java.io.IOException; import java.io.OutputStreamWriter; public class RemoveSpaces { private static void writeln(BufferedWriter writer, String line) { try { writer.write(line); writer.newLine(); } catch (IOException e) { e.printStackTrace(); } } private static void closeWriter(BufferedWriter writer) { try { writer.close(); } catch (IOException e) { throw new RuntimeException("Failed to close writer!"); } } private static BufferedWriter createWriter(String filename) { BufferedWriter outputWriter; try { outputWriter = new BufferedWriter(new OutputStreamWriter( new FileOutputStream(filename))); } catch (FileNotFoundException e) { System.out.println("Output file name " + filename + " was not accessible. Printing to " + (filename + ".sorted.txt instead")); try { outputWriter = new BufferedWriter(new OutputStreamWriter( new FileOutputStream(filename + ".sorted.txt"))); } catch (FileNotFoundException e2) { throw new RuntimeException( "Way to print to output file could not be established! Try a new output file name"); } } return outputWriter; } /** * Allowed flags: - i : Path and name of input file; - o : Path and name of * output file (default value: value of -i with ".nospace.txt" added to it); * - h : help */ public static void main(String[] args) { /* Define Arguments */ String[] parameterList = { "-i", null, "-o", "", "-h", "Text to display if -h is called" }; ArgumentHandler arguments = new ArgumentHandler(parameterList); /* Parse Arguments */ arguments.parseArguments(args); arguments.setFlagValue("-o", arguments.getFlagStringValue("-i").get(0) + ".nospace.txt", 0); System.out.println("Starting Program with the following arguments: "); arguments.printArguments(); /* Create reader of input file and writer to output file */ FastReader inputReader = new FastReader(arguments.getFlagStringValue( "-i").get(0)); BufferedWriter outputWriter = createWriter(arguments .getFlagStringValue("-o").get(0)); /* * Write every line from input file to output file. If the line is a * name (contains ">"), remove all spaces in it before writing. */ String line = inputReader.nextLine(); int i = 0; while (line != null) { if (line.contains(">")) { line = line.replaceAll(" ", ""); } writeln(outputWriter, line); line = inputReader.nextLine(); /*Display amount of printed lines for user*/ if (i % 1000000 == 0) { System.out.println("Printed " + i / 1000000 + " * 10^6 lines."); } i++; } closeWriter(outputWriter); System.out.println("Finished!"); } } 
\$\endgroup\$
7
  • \$\begingroup\$FWIW, java is probably not the right tool for the job ... sed '/>/s/ //g' inputfile > outputfile does the job alright\$\endgroup\$
    – Vogel612
    CommentedMay 18, 2017 at 13:05
  • \$\begingroup\$@Vogel612 I have to start learning coding entire programs as opposed to single functions at some point, therefore the very simple task here in order to do slowly work my way up to more complex stuff. At least that's the idea behind it.\$\endgroup\$CommentedMay 18, 2017 at 13:09
  • \$\begingroup\$I know ... that's why that is a comment and not an answer :)\$\endgroup\$
    – Vogel612
    CommentedMay 18, 2017 at 13:11
  • \$\begingroup\$@Vogel612 this is practically asking for Perl! Yeah, yeah, I know "sed is enough" and all that, but especially in a community of research biologists, Perl is the recommended language for these stuffs. Source: I am a research biologist, among other things.\$\endgroup\$CommentedMay 19, 2017 at 16:22
  • \$\begingroup\$You say ...contains lines with ">" as their **first character** ...in your description and If "line" contains ">", remove all spaces in "line"in your algorithm and code. If this is not a mistake, please update. The edit will be allowed as it does not invalidate existing answers. I'll not post this as an answer because I don't think it merits being one.\$\endgroup\$CommentedMay 19, 2017 at 16:24

1 Answer 1

4
\$\begingroup\$

Exception Handling

  • writeln catches and prints an exception, but is not thrown.
  • Use try-with-resource and get rid of try-catch-finally-try-catch and manual closing of readers/writers

Use of implementation detail

You work directly with BufferedWriter and FastReader. If there would be an interface, I'd say, always develop against the interface. But in this case, always develop against the super class. Because, if you won't use e.g. a BufferedWriter anymore, you can change it at one place.

other

  • Maybe you want to use the nio, it's a bit more "fluent" to write and read.
  • Maybe you want to move the creation of the ArgumentHandler and its parsing to a separate method createArgumentHandler, because that's too much implementation detail for the main routine. I'd preferably would have used it as a member variable of an instance (instead of performing everything in the main method), and provide a method getSourceFile, since that would be much easier to understand. To be honest, I would have introduced an interface Configuration with a getSourceFile method. I couldn't care less, how the configuration is set up. But that might be a bit over the top (the separate interface, not the additional subroutine)
  • You use replaceAll, which actually parses regex, use replace instead.
  • You can actually do something like for(int i = 0; (line = inputReader.nextLine) != null; i++).

"philosophy excurion"

In general, with testing in mind and the attitude, that good code has a decent test coverage and applying object oriented prinipcles is always a good thing, I'd go further and moved the "LineReplacing-Logic" to a LineProcessor, to test the behaviour separately. I'd provided a LineProvider, so I can mock it in the main routine and can actually unit test without accessing the file system (since if I would, it won't be a unit test, right?) and I can exchange implementation, because maybe the contents of the file are not from a File, maybe from a Queue. Same goes for the Writing, maybe provide a LineWriter, since you maybe want to write into /dev/null with a DevNullLineWriter, to find a bug with production data and you don't enough space on your local harddrive. All that could maybe be "too much effort", but breaking problems into smaller problems, reaching good code coverage with your tests, applying oo principles and make the code "sexy" usually never hurts and it usually pays back in the long term.

Hope this helps...

\$\endgroup\$
2
  • \$\begingroup\$Seeing as I am almost entirely self-taught in terms of programming I have quite a few questions on, for example, terminology, how to get to your line of thinking at points, some specification (What exactly do you mean by LineProcessor? Make an entire object only for the String manipulation of the lines from file?) etc. The commentsection isn't quite enough for that, so would you like to discuss my questions in the codereview chat at some point (it seems impossible to message people otherwise)?\$\endgroup\$CommentedMay 18, 2017 at 7:07
  • \$\begingroup\$Sure, feel free to invite me to a chat room.\$\endgroup\$
    – slowy
    CommentedMay 18, 2017 at 10:04

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.