I wrote a simple huffman coding algorithm for learning and practice. I just used the technique given on the wikipedia page.
Could you tell me my missing points and mistakes?
Node.java
package ged.gont.bst.huffmancode; public class Node implements Comparable<Node> { private char letter; private int freq; private Node leftChild; private Node rightChild; public Node(char letter, int freq, Node leftChild, Node rightChild) { this.letter = letter; this.freq = freq; this.leftChild = leftChild; this.rightChild = rightChild; } public char getLetter() { return letter; } public int getFreq() { return freq; } public Node getLeftChild() { return leftChild; } public Node getRightChild() { return rightChild; } public boolean isLeaf() { return this.leftChild == null && this.rightChild == null; } @Override public int compareTo(Node arg0) { return this.freq - arg0.freq; } }
HuffmanCode.java
package ged.gont.bst.huffmancode; import java.util.LinkedHashMap; import java.util.Map; import java.util.PriorityQueue; public class HuffmanCode { private Node root; Map<Character, String> charMap = new LinkedHashMap<>(); /** * Encodes string in which most used characters have min codeword length * * @param inputString compressed string * @return encoded string * @throws IllegelArgumentException if inputString contains invalid ASCII character */ public String encode(String inputString) { char[] letters = inputString.toCharArray(); Map<Character, Integer> charFreq = new LinkedHashMap<>(); PriorityQueue<Node> priorityQueue = new PriorityQueue<>(); String encodedString = ""; for (char c : letters) { if ((int) c > 255) { throw new IllegalArgumentException("Input contains invalid ASCII character"); } if (charFreq.containsKey(c)) { charFreq.put(c, charFreq.get(c) + 1); } else { charFreq.put(c, 1); } } for (Character c : charFreq.keySet()) { priorityQueue.offer(new Node(c, charFreq.get(c), null, null)); } while (priorityQueue.size() > 1) { Node leftChild = priorityQueue.remove(); Node rightChild = priorityQueue.remove(); priorityQueue.offer(new Node(Character.MIN_VALUE, leftChild.getFreq() + rightChild.getFreq(), leftChild, rightChild)); } root = priorityQueue.remove(); generatePrefix(root, ""); for (int i = 0; i < inputString.length(); i++) { encodedString += (charMap.get(inputString.charAt(i))); } return encodedString; } /** * Generates prefix code in bit string format * * @param root * @param code */ private void generatePrefix(Node root, String prefix) { if (!root.isLeaf()) { generatePrefix(root.getLeftChild(), prefix.concat("0")); generatePrefix(root.getRightChild(), prefix.concat("1")); } else { charMap.put(root.getLetter(), prefix); } } /** * Decodes the given encoded string * * @param encodedString * @return decoded string */ public String decode(String encodedString) { String decodedString = ""; Node currentNode = root; for (int i = 0; i < encodedString.length(); i++) { if (encodedString.charAt(i) == '0') { currentNode = currentNode.getLeftChild(); } else if (encodedString.charAt(i) == '1') { currentNode = currentNode.getRightChild(); } if (currentNode.isLeaf()) { decodedString += currentNode.getLetter(); currentNode = root; } } return decodedString; } }
TestHuffmanCode.java
package ged.gont.testbst.testhuffmancode; import static org.junit.jupiter.api.Assertions.assertEquals; import org.junit.jupiter.api.*; import ged.gont.bst.huffmancode.*; public class TestHuffmanCode { static HuffmanCode huffmanCode; static String inputString; static String expectedEncodedString; @BeforeAll public static void init() { huffmanCode = new HuffmanCode(); inputString = "A_DEAD_DAD_CEDED_A_BAD_BABE_A_BEADED_ABACA_BED"; expectedEncodedString = "1000011101001000110010011101100111001001000111110010011111011111100010001111110100111001001011111011101000111111001"; } @Test public void testEncode() { assertEquals(expectedEncodedString, huffmanCode.encode(inputString)); } @Test public void testDecode() { assertEquals(inputString, huffmanCode.decode(huffmanCode.encode(inputString))); } }
code.prefix("0")
? It doesn't look likecode
is defined (but it looks like it used to be a parameter and isn't anymore)\$\endgroup\$for learning and practice
(tagging reinventing-the-wheel is definite where a wheel is well-known).\$\endgroup\$