This question now has afollow-up question 1andfollow-up question 2
As an exercise in writing decent code and because I learn best by example, I wrote this program and would ask for a review so that I can see where my problems lie and where I need to do better.
For the sake of terminology, when I talk about flags I mean commandline parameters that contain a "-" as their first character. When I talk about arguments I mean every parameter following a flag (seperated by 1 space) that is not a flag itself.
The task itself is to go through a large .txt file that contains lines with ">" and to remove all spaces in these lines. I am using this program to modify files in FASTA format, a format often used in biology. Due to the way FASTA format is structured, ">" can only occur as first character of a line. Argument parsing is handled by a class called ArgumentHandler
that I wrote as well but am not presenting here because that might make things overly complicated, as well as a class called "FastReader" that I modified from the FastReader class made by geeksforgeeks. To reduce effort, below relevant information about the two classes.
Relevant Notes for FastReader: It reads a provided file using a BufferedReader on a FileReader and can read an entire line in the file using its non-static nextLine()
method.
Relevant Notes for ArgumentHandler: ArgumentHandler contains a String array of flags and their arguments from the commandline. In general an ArgumentHandler is first instantiated with a String array containing all allowed flags for this program (here -i for input, -o for output, -h for help). Each Flag is followed by an amount of cells that contain the different arguments for this flag - when ArgumentHandler is initialized those contain either default-values if possible or "" to signalize they don't have a default-value but they are not required (because there are or will be default values that can be used for example) or null to signalize they don't have a default-value and are absolutely required.
The methods getFlagStringValue(String flag)
and getFlagIntValue(String flag)
both return an ArrayList<String>
/ArrayList<Integer>
that contain all arguments that were given in the commandline between the flag "flag" and the next flag in the commandline call(recognized by having a "-" as first character). In this particular case both always only have 1 argument per flag, but for the sake of reusing this class for other programs, where one flag might have several arguments associated with it, I coded it so it returned an ArrayList.
Algorithm:
- Read next line from input file and store in "line"
- If "line" contains ">", remove all spaces in "line"
- Print "line" to output file
- If next line is not null, go back to Step 1
The Code
import java.io.BufferedWriter; import java.io.FileNotFoundException; import java.io.FileOutputStream; import java.io.IOException; import java.io.OutputStreamWriter; public class RemoveSpaces { private static void writeln(BufferedWriter writer, String line) { try { writer.write(line); writer.newLine(); } catch (IOException e) { e.printStackTrace(); } } private static void closeWriter(BufferedWriter writer) { try { writer.close(); } catch (IOException e) { throw new RuntimeException("Failed to close writer!"); } } private static BufferedWriter createWriter(String filename) { BufferedWriter outputWriter; try { outputWriter = new BufferedWriter(new OutputStreamWriter( new FileOutputStream(filename))); } catch (FileNotFoundException e) { System.out.println("Output file name " + filename + " was not accessible. Printing to " + (filename + ".sorted.txt instead")); try { outputWriter = new BufferedWriter(new OutputStreamWriter( new FileOutputStream(filename + ".sorted.txt"))); } catch (FileNotFoundException e2) { throw new RuntimeException( "Way to print to output file could not be established! Try a new output file name"); } } return outputWriter; } /** * Allowed flags: - i : Path and name of input file; - o : Path and name of * output file (default value: value of -i with ".nospace.txt" added to it); * - h : help */ public static void main(String[] args) { /* Define Arguments */ String[] parameterList = { "-i", null, "-o", "", "-h", "Text to display if -h is called" }; ArgumentHandler arguments = new ArgumentHandler(parameterList); /* Parse Arguments */ arguments.parseArguments(args); arguments.setFlagValue("-o", arguments.getFlagStringValue("-i").get(0) + ".nospace.txt", 0); System.out.println("Starting Program with the following arguments: "); arguments.printArguments(); /* Create reader of input file and writer to output file */ FastReader inputReader = new FastReader(arguments.getFlagStringValue( "-i").get(0)); BufferedWriter outputWriter = createWriter(arguments .getFlagStringValue("-o").get(0)); /* * Write every line from input file to output file. If the line is a * name (contains ">"), remove all spaces in it before writing. */ String line = inputReader.nextLine(); int i = 0; while (line != null) { if (line.contains(">")) { line = line.replaceAll(" ", ""); } writeln(outputWriter, line); line = inputReader.nextLine(); /*Display amount of printed lines for user*/ if (i % 1000000 == 0) { System.out.println("Printed " + i / 1000000 + " * 10^6 lines."); } i++; } closeWriter(outputWriter); System.out.println("Finished!"); } }
sed '/>/s/ //g' inputfile > outputfile
does the job alright\$\endgroup\$sed
is enough" and all that, but especially in a community of research biologists, Perl is the recommended language for these stuffs. Source: I am a research biologist, among other things.\$\endgroup\$...contains lines with ">" as their **first character** ...
in your description andIf "line" contains ">", remove all spaces in "line"
in your algorithm and code. If this is not a mistake, please update. The edit will be allowed as it does not invalidate existing answers. I'll not post this as an answer because I don't think it merits being one.\$\endgroup\$