I've written a huffman compression program that works as intended, however in order to intuitively lay out the processes required to write the compressed file, I used strings in critical functions that I suspect are causing a performance hit. I came here to get ideas from the community on ways that I can go about this with a more efficient approach.
My program reads the input, creates the huffman tree and creates a corresponding dictionary holding the codes for each byte which has the following struct:
struct dictentry { int byte; string enc; size_t depth; dictentry(int byte, string enc) { this->byte = byte; this->enc = enc; this->depth = enc.length(); } }; class HuffDict { vector<dictentry> dict; string b2enc(unsigned char byte); ...... };
So in the end each byte occuring in the input file is assigned a string with it's code for example: 54 - '001', 55 - '010' and so on.
Now in order to create the actual bytes that will be written in the output file, I have created a function that looks in the dictionary and returns the code for a byte argument. VS profiler listed this function highest in execution time:
string HuffDict::b2enc(unsigned char byte) { for (vector<dictentry>::iterator it = dict.begin(); it != dict.end(); ++it) { if ((*it).byte == byte) { return (*it).enc; } } return ""; }
In order to put together the actual bytes to write I am first creating a string of 0-1s by concatenating string codes and then read them in chunks of 8, where I use bitwise operations to eventually convert the string representation of a byte to a number. The following is a class method which does this exact thing and ENC_BUFFER is a constant number which I 've set to 16kB.
int HuffEncoder::encode_chunk() { int bitstreampos = 0; //position in the bitstream unsigned char byte = 0; //the byte to be added to the bitstream string str2enc = ""; //string of at least 8 bits to encode while ((bitstreampos <= ENC_BUFFER) && (!this->done)) { byte = 0; str2enc = ""; str2enc += this->strleftover; while (str2enc.length() < 8) { str2enc += (*dict).b2enc(buffer[this->pos]); this->pos++; if (this->pos >= this->len) { this->done = true; break; } } this->leftover = str2enc.length() - 8; if (this->leftover > 0) { this->strleftover = str2enc.substr(8, 8 + this->leftover); str2enc = str2enc.substr(0, 8); } else if (this->leftover < 0) { this->lastbits = str2enc.length(); } else { this->strleftover = ""; } //bitwise operations to construct byte from str2enc for (string::size_type i = 0; i < str2enc.size(); ++i) { if (str2enc[i] == '1') { byte += 1 << (7 - i); } } this->bitstream[bitstreampos] = byte; bitstreampos++; } return bitstreampos; //bitstreampos actually holds the byte size of the chunk }
It's declaration is a bit hectic but I'll include it in case it helps:
class HuffEncoder { HuffDict *dict; char *buffer; int len; char bitstream[ENC_BUFFER]; //the resulting bitstream after encoding int pos = 0; //current position in buffer int leftover = 0; //how many bits were left over from previous bitstream string strleftover = ""; bool done = false; public: unsigned char lastbits = 0; HuffEncoder(char* buffer, int len, HuffDict* dict) { this->buffer = buffer; this->len = len; this->dict = dict; } int encode_chunk(); //encodes a chunk of the buffer into the bitstream and returns the size of bitstream bool isdone() { return this->done; } char* get_bitstream_ptr() { return bitstream; }; };
So to wrap it all, my question is: is there a different approach I can use to create the resulting encoded bytes (probably with no string involvement)? Or any other kind of improvement that you can think of?