Huffman coding

You are encouraged to solve this task according to the task description, using any language you may know.
Huffman encoding is a way to assign binary codes to symbols that reduces the overall number of bits used to encode a typical string of those symbols.
For example, if you use letters as symbols and have details of the frequency of occurrence of those letters in typical strings, then you could just encode each letter with a fixed number of bits, such as in ASCII codes. You can do better than this by encoding more frequently occurring letters such as e and a, with smaller bit strings; and less frequently occurring letters such as q and x with longer bit strings.
Any string of letters will be encoded as a string of bits that are no-longer of the same length per letter. To successfully decode such as string, the smaller codes assigned to letters such as 'e' cannot occur as a prefix in the larger codes such as that for 'x'.
- If you were to assign a code 01 for 'e' and code 011 for 'x', then if the bits to decode started as 011... then you would not know if you should decode an 'e' or an 'x'.
The Huffman coding scheme takes each symbol and its weight (or frequency of occurrence), and generates proper encodings for each symbol taking account of the weights of each symbol, so that higher weighted symbols have fewer bits in their encoding. (See the WP article for more information).
A Huffman encoding can be computed by first creating a tree of nodes:

- Create a leaf node for each symbol and add it to the priority queue.
- While there is more than one node in the queue:
- Remove the node of highest priority (lowest probability) twice to get two nodes.
- Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities.
- Add the new node to the queue.
- The remaining node is the root node and the tree is complete.
Traverse the constructed binary tree from root to leaves assigning and accumulating a '0' for one branch and a '1' for the other at each node. The accumulated zeros and ones at each leaf constitute a Huffman encoding for those symbols and weights:
- Task
Using the characters and their frequency from the string:
- this is an example for huffman encoding
create a program to generate a Huffman encoding for each character as a table.
11l
T Element((Int weight, [(Char, String)] symbols)) F <(other) R (.weight, .symbols) < (other.weight, other.symbols) F encode(symb2freq) V heap = symb2freq.map((sym, wt) -> Element(wt, [(sym, ‘’)])) minheap:heapify(&heap) L heap.len > 1 V lo = minheap:pop(&heap) V hi = minheap:pop(&heap) L(&sym) lo.symbols sym[1] = ‘0’sym[1] L(&sym) hi.symbols sym[1] = ‘1’sym[1] minheap:push(&heap, Element(lo.weight + hi.weight, lo.symbols [+] hi.symbols)) R sorted(minheap:pop(&heap).symbols, key' p -> (p[1].len, p)) V txt = ‘this is an example for huffman encoding’ V symb2freq = DefaultDict[Char, Int]() L(ch) txt symb2freq[ch]++ V huff = encode(symb2freq) print("Symbol\tWeight\tHuffman Code") L(p) huff print("#.\t#.\t#.".format(p[0], symb2freq[p[0]], p[1]))
- Output:
Symbol Weight Huffman Code 6 101 n 4 010 a 3 1001 e 3 1100 f 3 1101 h 2 0001 i 3 1110 m 2 0010 o 2 0011 s 2 0111 g 1 00000 l 1 00001 p 1 01100 r 1 01101 t 1 10000 u 1 10001 x 1 11110 c 1 111110 d 1 111111
Ada
huffman.ads:
withAda.Containers.Indefinite_Ordered_Maps;withAda.Containers.Ordered_Maps;withAda.Finalization;generictypeSymbol_Typeisprivate;withfunction"<"(Left,Right:Symbol_Type)returnBooleanis<>;withprocedurePut(Item:Symbol_Type);typeSymbol_Sequenceisarray(Positiverange<>)ofSymbol_Type;typeFrequency_Typeisprivate;withfunction"+"(Left,Right:Frequency_Type)returnFrequency_Typeis<>;withfunction"<"(Left,Right:Frequency_Type)returnBooleanis<>;packageHuffmanis-- bits = booleans (true/false = 1/0)typeBit_Sequenceisarray(Positiverange<>)ofBoolean;Zero_Sequence:constantBit_Sequence(1..0):=(others=>False);-- output the sequenceprocedurePut(Code: Bit_Sequence);-- type for freqency mappackageFrequency_Mapsis newAda.Containers.Ordered_Maps(Element_Type=> Frequency_Type,Key_Type=> Symbol_Type);typeHuffman_Treeisprivate;-- create a huffman tree from frequency mapprocedureCreate_Tree(Tree: outHuffman_Tree;Frequencies: Frequency_Maps.Map);-- encode a single symbolfunctionEncode(Tree: Huffman_Tree;Symbol: Symbol_Type)returnBit_Sequence;-- encode a symbol sequencefunctionEncode(Tree: Huffman_Tree;Symbols: Symbol_Sequence)returnBit_Sequence;-- decode a bit sequencefunctionDecode(Tree: Huffman_Tree;Code: Bit_Sequence)returnSymbol_Sequence;-- dump the encoding tableprocedureDump_Encoding(Tree: Huffman_Tree);private-- type for encoding mappackageEncoding_Mapsis newAda.Containers.Indefinite_Ordered_Maps(Element_Type=> Bit_Sequence,Key_Type=> Symbol_Type);typeHuffman_Node;typeNode_AccessisaccessHuffman_Node;-- a node is either internal (left_child/right_child used)-- or a leaf (left_child/right_child are null)typeHuffman_NodeisrecordFrequency:Frequency_Type;Left_Child:Node_Access:=null;Right_Child:Node_Access:=null;Symbol:Symbol_Type;end record;-- create a leaf nodefunctionCreate_Node(Symbol: Symbol_Type;Frequency: Frequency_Type)returnNode_Access;-- create an internal nodefunctionCreate_Node(Left,Right: Node_Access)returnNode_Access;-- fill the encoding mapprocedureFill(The_Node: Node_Access;Map: inoutEncoding_Maps.Map;Prefix: Bit_Sequence);-- huffman tree has a tree and an encoding maptypeHuffman_TreeisnewAda.Finalization.ControlledwithrecordTree:Node_Access:=null;Map:Encoding_Maps.Map:=Encoding_Maps.Empty_Map;end record;-- free memory after finalizationoverridingprocedureFinalize(Object: inoutHuffman_Tree);endHuffman;
huffman.adb:
withAda.Text_IO;withAda.Unchecked_Deallocation;withAda.Containers.Vectors;packagebodyHuffmanispackageNode_Vectorsis newAda.Containers.Vectors(Element_Type=> Node_Access,Index_Type=> Positive);function"<"(Left,Right: Node_Access)returnBooleanisbegin-- compare frequencyifLeft.Frequency<Right.FrequencythenreturnTrue;elsifRight.Frequency<Left.FrequencythenreturnFalse;endif;-- same frequency, choose leaf nodeifLeft.Left_Child=nulland thenRight.Left_Child/=nullthenreturnTrue;elsifLeft.Left_Child/=nulland thenRight.Left_Child=nullthenreturnFalse;endif;-- same frequency, same node type (internal/leaf)ifLeft.Left_Child/=nullthen-- for internal nodes, compare left children, then right childrenifLeft.Left_Child<Right.Left_ChildthenreturnTrue;elsifRight.Left_Child<Left.Left_ChildthenreturnFalse;elsereturnLeft.Right_Child<Right.Right_Child;endif;else-- for leaf nodes, compare symbolreturnLeft.Symbol<Right.Symbol;endif;end"<";packageNode_Vector_Sortis newNode_Vectors.Generic_Sorting;procedureCreate_Tree(Tree: outHuffman_Tree;Frequencies: Frequency_Maps.Map)isNode_Queue:Node_Vectors.Vector:=Node_Vectors.Empty_Vector;begin-- insert all leafs into the queuedeclareuseFrequency_Maps;Position:Cursor:=Frequencies.First;The_Node:Node_Access:=null;beginwhilePosition/=No_ElementloopThe_Node:=Create_Node(Symbol=>Key(Position),Frequency=>Element(Position));Node_Queue.Append(The_Node);Next(Position);endloop;end;-- sort by frequency (see "<")Node_Vector_Sort.Sort(Node_Queue);-- iterate over all elementswhilenotNode_Queue.Is_EmptyloopdeclareFirst:constantNode_Access:=Node_Queue.First_Element;beginNode_Queue.Delete_First;-- if we only have one node left, it is the root node of the treeifNode_Queue.Is_EmptythenTree.Tree:=First;else-- create new internal node with two smallest frequenciesdeclareSecond:constantNode_Access:=Node_Queue.First_Element;beginNode_Queue.Delete_First;Node_Queue.Append(Create_Node(First,Second));end;Node_Vector_Sort.Sort(Node_Queue);endif;end;endloop;-- fill encoding mapFill(The_Node=>Tree.Tree,Map=>Tree.Map,Prefix=>Zero_Sequence);endCreate_Tree;-- create leaf nodefunctionCreate_Node(Symbol: Symbol_Type;Frequency: Frequency_Type)returnNode_AccessisResult:Node_Access:=newHuffman_Node;beginResult.Frequency:=Frequency;Result.Symbol:=Symbol;returnResult;endCreate_Node;-- create internal nodefunctionCreate_Node(Left,Right: Node_Access)returnNode_AccessisResult:Node_Access:=newHuffman_Node;beginResult.Frequency:=Left.Frequency+Right.Frequency;Result.Left_Child:=Left;Result.Right_Child:=Right;returnResult;endCreate_Node;-- fill encoding mapprocedureFill(The_Node: Node_Access;Map: inoutEncoding_Maps.Map;Prefix: Bit_Sequence)isbeginifThe_Node.Left_Child/=nullthen-- append false (0) for left childFill(The_Node.Left_Child,Map,Prefix&False);-- append true (1) for right childFill(The_Node.Right_Child,Map,Prefix&True);else-- leaf node reached, prefix = code for symbolMap.Insert(The_Node.Symbol,Prefix);endif;endFill;-- free memory after finalizationoverridingprocedureFinalize(Object: inoutHuffman_Tree)isprocedureFreeisnewAda.Unchecked_Deallocation(Name=>Node_Access,Object=>Huffman_Node);-- recursively free all nodesprocedureRecursive_Free(The_Node: inoutNode_Access)isbegin-- free node if it is a leafifThe_Node.Left_Child=nullthenFree(The_Node);else-- free left and right child if node is internalRecursive_Free(The_Node.Left_Child);Recursive_Free(The_Node.Right_Child);-- free node afterwardsFree(The_Node);endif;endRecursive_Free;begin-- recursively free root nodeRecursive_Free(Object.Tree);endFinalize;-- encode single symbolfunctionEncode(Tree: Huffman_Tree;Symbol: Symbol_Type)returnBit_Sequenceisbegin-- simply lookup in mapreturnTree.Map.Element(Symbol);endEncode;-- encode symbol sequencefunctionEncode(Tree: Huffman_Tree;Symbols: Symbol_Sequence)returnBit_Sequenceisbegin-- only one elementifSymbols'Length=1then-- see abovereturnEncode(Tree,Symbols(Symbols'First));else-- encode first element, append result of recursive callreturnEncode(Tree,Symbols(Symbols'First))&Encode(Tree,Symbols(Symbols'First+1..Symbols'Last));endif;endEncode;-- decode a bit sequencefunctionDecode(Tree: Huffman_Tree;Code: Bit_Sequence)returnSymbol_Sequenceis-- maximum length = code lengthResult:Symbol_Sequence(1..Code'Length);-- last used index of resultLast:Natural:=0;The_Node:Node_Access:=Tree.Tree;begin-- iterate over the codeforIinCode'Rangeloop-- if current element is true, descent the right branchifCode(I)thenThe_Node:=The_Node.Right_Child;else-- false: descend left branchThe_Node:=The_Node.Left_Child;endif;ifThe_Node.Left_Child=nullthen-- reached leaf node: append symbol to resultLast:=Last+1;Result(Last):=The_Node.Symbol;-- reset current node to rootThe_Node:=Tree.Tree;endif;endloop;-- return subset of result arrayreturnResult(1..Last);endDecode;-- output a bit sequenceprocedurePut(Code: Bit_Sequence)ispackageInt_IOis newAda.Text_IO.Integer_IO(Integer);beginforIinCode'RangeloopifCode(I)then-- true = 1Int_IO.Put(1,0);else-- false = 0Int_IO.Put(0,0);endif;endloop;Ada.Text_IO.New_Line;endPut;-- dump encoding mapprocedureDump_Encoding(Tree: Huffman_Tree)isusetypeEncoding_Maps.Cursor;Position:Encoding_Maps.Cursor:=Tree.Map.First;begin-- iterate mapwhilePosition/=Encoding_Maps.No_Elementloop-- keyPut(Encoding_Maps.Key(Position));Ada.Text_IO.Put(" = ");-- codePut(Encoding_Maps.Element(Position));Encoding_Maps.Next(Position);endloop;endDump_Encoding;endHuffman;
example main.adb:
withAda.Text_IO;withHuffman;procedureMainispackageChar_Natural_Huffman_Treeis newHuffman(Symbol_Type=> Character,Put=> Ada.Text_IO.Put,Symbol_Sequence=> String,Frequency_Type=> Natural);Tree:Char_Natural_Huffman_Tree.Huffman_Tree;Frequencies:Char_Natural_Huffman_Tree.Frequency_Maps.Map;Input_String:constantString:="this is an example for huffman encoding";begin-- build frequency mapforIinInput_String'RangeloopdeclareuseChar_Natural_Huffman_Tree.Frequency_Maps;Position:constantCursor:=Frequencies.Find(Input_String(I));beginifPosition=No_ElementthenFrequencies.Insert(Key=>Input_String(I),New_Item=>1);elseFrequencies.Replace_Element(Position=>Position,New_Item=>Element(Position)+1);endif;end;endloop;-- create huffman treeChar_Natural_Huffman_Tree.Create_Tree(Tree=>Tree,Frequencies=>Frequencies);-- dump encodingsChar_Natural_Huffman_Tree.Dump_Encoding(Tree=>Tree);-- encode example stringdeclareCode:constantChar_Natural_Huffman_Tree.Bit_Sequence:=Char_Natural_Huffman_Tree.Encode(Tree=>Tree,Symbols=>Input_String);beginChar_Natural_Huffman_Tree.Put(Code);Ada.Text_IO.Put_Line(Char_Natural_Huffman_Tree.Decode(Tree=>Tree,Code=>Code));end;endMain;
- Output:
= 101 a = 1001 c = 01010 d = 01011 e = 1100 f = 1101 g = 01100 h = 11111 i = 1110 l = 01101 m = 0010 n = 000 o = 0011 p = 01110 r = 01111 s = 0100 t = 10000 u = 10001 x = 11110 1000011111111001001011110010010110010001011100111101001001001110011011100101110100110111110111111100011101110100101001000101110000001010001101011111000001100 this is an example for huffman encoding
BBC BASIC
INSTALL@lib$+"SORTSALIB"SortUp%=FN_sortSAinit(0,0):REM AscendingSortDn%=FN_sortSAinit(1,0):REM DescendingText$="this is an example for huffman encoding"DIMtree{(127)ch&,num%,lkl%,lkr%}FORi%=1TOLEN(Text$)c%=ASCMID$(Text$,i%)tree{(c%)}.ch&=c%tree{(c%)}.num%+=1NEXTC%=DIM(tree{()},1)+1CALLSortDn%,tree{()},tree{(0)}.num%FORi%=0TODIM(tree{()},1)IFtree{(i%)}.num%=0EXITFORNEXTsize%=i%linked%=0REPEATC%=size%CALLSortUp%,tree{()},tree{(0)}.num%i%=0:WHILEtree{(i%)}.lkl%ORtree{(i%)}.lkr%i%+=1:ENDWHILEtree{(i%)}.lkl%=size%j%=0:WHILEtree{(j%)}.lkl%ORtree{(j%)}.lkr%j%+=1:ENDWHILEtree{(j%)}.lkr%=size%linked%+=2tree{(size%)}.num%=tree{(i%)}.num%+tree{(j%)}.num%size%+=1UNTILlinked%=(size%-1)FORi%=size%-1TO0STEP-1IFtree{(i%)}.ch&THENh$=""j%=i%REPEATCASETRUEOFWHENtree{(j%)}.lkl%<>0:h$="0"+h$j%=tree{(j%)}.lkl%WHENtree{(j%)}.lkr%<>0:h$="1"+h$j%=tree{(j%)}.lkr%OTHERWISE:EXITREPEATENDCASEUNTILFALSEVDUtree{(i%)}.ch&:PRINT" "h$ENDIFNEXTEND
- Output:
101 n 000 e 1110 f 1101 a 1100 i 1011 s 0110 m 0101 h 0100 o 0011 c 0010 l 0001 r 0000 x 11111 p 11110 d 11101 u 11100 g 11011 t 11010
Bracmat
( "this is an example for huffman encoding":?S & 0:?chars & 0:?p & ( @( !S : ? ( [!p %?char [?p ? & !char+!chars:?chars & ~ ) ) | ) & 0:?prioritized & whl ' ( !chars:?n*%@?w+?chars & (!n.!w)+!prioritized:?prioritized ) & whl ' ( !prioritized:(?p.?x)+(?q.?y)+?nprioritized & (!p+!q.(!p.0,!x)+(!q.1,!y))+!nprioritized:?prioritized ) & 0:?L & ( walk = bits tree bit subtree . !arg:(?bits.?tree) & whl ' ( !tree:(?p.?bit,?subtree)+?tree & ( !subtree:@ & (!subtree.str$(!bits !bit))+!L:?L | walk$(!bits !bit.!subtree) ) ) ) & !prioritized:(?.?prioritized) & walk$(.!prioritized) & lst$L & :?encoded & 0:?p & ( @( !S : ? ( [!p %?char [?p ? & !L:?+(!char.?code)+? & !encoded !code:?encoded & ~ ) ) | out$(str$!encoded) ) & ( decode = char bits . !L : ?+(?char.?bits&@(!arg:!bits ?arg))+? & !char decode$!arg | !arg ) & out$("decoded:" str$(decode$(str$!encoded)));
- Output:
(L= (" ".101) + (a.1001) + (c.01010) + (d.01011) + (e.1100) + (f.1101) + (g.01100) + (h.11111) + (i.1110) + (l.01101) + (m.0010) + (n.000) + (o.0011) + (p.01110) + (r.01111) + (s.0100) + (t.10000) + (u.10001) + (x.11110)); 1000011111111001001011110010010110010001011100111101001001001110011011100101110100110111110111111100011101110100101001000101110000001010001101011111000001100 decoded: this is an example for huffman encoding
C
This code lacks a lot of needed checkings, especially for memory allocation.
#include<stdio.h>#include<stdlib.h>#include<string.h>#define BYTES 256structhuffcode{intnbits;intcode;};typedefstructhuffcodehuffcode_t;structhuffheap{int*h;intn,s,cs;long*f;};typedefstructhuffheapheap_t;/* heap handling funcs */staticheap_t*_heap_create(ints,long*f){heap_t*h;h=malloc(sizeof(heap_t));h->h=malloc(sizeof(int)*s);h->s=h->cs=s;h->n=0;h->f=f;returnh;}staticvoid_heap_destroy(heap_t*heap){free(heap->h);free(heap);}#define swap_(I,J) do { int t_; t_ = a[(I)]; \ a[(I)] = a[(J)]; a[(J)] = t_; } while(0)staticvoid_heap_sort(heap_t*heap){inti=1,j=2;/* gnome sort */int*a=heap->h;while(i<heap->n){/* smaller values are kept at the end */if(heap->f[a[i-1]]>=heap->f[a[i]]){i=j;j++;}else{swap_(i-1,i);i--;i=(i==0)?j++:i;}}}#undef swap_staticvoid_heap_add(heap_t*heap,intc){if((heap->n+1)>heap->s){heap->h=realloc(heap->h,heap->s+heap->cs);heap->s+=heap->cs;}heap->h[heap->n]=c;heap->n++;_heap_sort(heap);}staticint_heap_remove(heap_t*heap){if(heap->n>0){heap->n--;returnheap->h[heap->n];}return-1;}/* huffmann code generator */huffcode_t**create_huffman_codes(long*freqs){huffcode_t**codes;heap_t*heap;longefreqs[BYTES*2];intpreds[BYTES*2];inti,extf=BYTES;intr1,r2;memcpy(efreqs,freqs,sizeof(long)*BYTES);memset(&efreqs[BYTES],0,sizeof(long)*BYTES);heap=_heap_create(BYTES*2,efreqs);if(heap==NULL)returnNULL;for(i=0;i<BYTES;i++)if(efreqs[i]>0)_heap_add(heap,i);while(heap->n>1){r1=_heap_remove(heap);r2=_heap_remove(heap);efreqs[extf]=efreqs[r1]+efreqs[r2];_heap_add(heap,extf);preds[r1]=extf;preds[r2]=-extf;extf++;}r1=_heap_remove(heap);preds[r1]=r1;_heap_destroy(heap);codes=malloc(sizeof(huffcode_t*)*BYTES);intbc,bn,ix;for(i=0;i<BYTES;i++){bc=0;bn=0;if(efreqs[i]==0){codes[i]=NULL;continue;}ix=i;while(abs(preds[ix])!=ix){bc|=((preds[ix]>=0)?1:0)<<bn;ix=abs(preds[ix]);bn++;}codes[i]=malloc(sizeof(huffcode_t));codes[i]->nbits=bn;codes[i]->code=bc;}returncodes;}voidfree_huffman_codes(huffcode_t**c){inti;for(i=0;i<BYTES;i++)free(c[i]);free(c);}#define MAXBITSPERCODE 100voidinttobits(intc,intn,char*s){s[n]=0;while(n>0){s[n-1]=(c%2)+'0';c>>=1;n--;}}constchar*test="this is an example for huffman encoding";intmain(){huffcode_t**r;inti;charstrbit[MAXBITSPERCODE];constchar*p;longfreqs[BYTES];memset(freqs,0,sizeoffreqs);p=test;while(*p!='\0')freqs[*p++]++;r=create_huffman_codes(freqs);for(i=0;i<BYTES;i++){if(r[i]!=NULL){inttobits(r[i]->code,r[i]->nbits,strbit);printf("%c (%d) %s\n",i,r[i]->code,strbit);}}free_huffman_codes(r);return0;}
Alternative
Using a simple heap-based priority queue. Heap is an array, while ndoe tree is done by binary links.
#include<stdio.h>#include<string.h>typedefstructnode_t{structnode_t*left,*right;intfreq;charc;}*node;structnode_tpool[256]={{0}};nodeqqq[255],*q=qqq-1;intn_nodes=0,qend=1;char*code[128]={0},buf[1024];nodenew_node(intfreq,charc,nodea,nodeb){noden=pool+n_nodes++;if(freq)n->c=c,n->freq=freq;else{n->left=a,n->right=b;n->freq=a->freq+b->freq;}returnn;}/* priority queue */voidqinsert(noden){intj,i=qend++;while((j=i/2)){if(q[j]->freq<=n->freq)break;q[i]=q[j],i=j;}q[i]=n;}nodeqremove(){inti,l;noden=q[i=1];if(qend<2)return0;qend--;while((l=i*2)<qend){if(l+1<qend&&q[l+1]->freq<q[l]->freq)l++;q[i]=q[l],i=l;}q[i]=q[qend];returnn;}/* walk the tree and put 0s and 1s */voidbuild_code(noden,char*s,intlen){staticchar*out=buf;if(n->c){s[len]=0;strcpy(out,s);code[n->c]=out;out+=len+1;return;}s[len]='0';build_code(n->left,s,len+1);s[len]='1';build_code(n->right,s,len+1);}voidinit(constchar*s){inti,freq[128]={0};charc[16];while(*s)freq[(int)*s++]++;for(i=0;i<128;i++)if(freq[i])qinsert(new_node(freq[i],i,0,0));while(qend>2)qinsert(new_node(0,0,qremove(),qremove()));build_code(q[1],c,0);}voidencode(constchar*s,char*out){while(*s){strcpy(out,code[*s]);out+=strlen(code[*s++]);}}voiddecode(constchar*s,nodet){noden=t;while(*s){if(*s++=='0')n=n->left;elsen=n->right;if(n->c)putchar(n->c),n=t;}putchar('\n');if(t!=n)printf("garbage input\n");}intmain(void){inti;constchar*str="this is an example for huffman encoding";charbuf[1024];init(str);for(i=0;i<128;i++)if(code[i])printf("'%c': %s\n",i,code[i]);encode(str,buf);printf("encoded: %s\n",buf);printf("decoded: ");decode(buf,q[1]);return0;}
- Output:
' ': 000 'a': 1000 'c': 01101 'd': 01100 'e': 0101 'f': 0010 'g': 010000 'h': 1101 'i': 0011 'l': 010001 'm': 1111 'n': 101 'o': 1110 'p': 10011 'r': 10010 's': 1100 't': 01111 'u': 01110 'x': 01001 encoded: 0111111010011110000000111100000100010100001010100110001111100110100010101000001011101001000011010111000100010111110001010000101101011011110011000011101010000 decoded: this is an example for huffman encoding
C#
usingSystem;usingSystem.Collections.Generic;namespaceHuffman_Encoding{publicclassPriorityQueue<T>whereT:IComparable{protectedList<T>LstHeap=newList<T>();publicvirtualintCount{get{returnLstHeap.Count;}}publicvirtualvoidAdd(Tval){LstHeap.Add(val);SetAt(LstHeap.Count-1,val);UpHeap(LstHeap.Count-1);}publicvirtualTPeek(){if(LstHeap.Count==0){thrownewIndexOutOfRangeException("Peeking at an empty priority queue");}returnLstHeap[0];}publicvirtualTPop(){if(LstHeap.Count==0){thrownewIndexOutOfRangeException("Popping an empty priority queue");}TvalRet=LstHeap[0];SetAt(0,LstHeap[LstHeap.Count-1]);LstHeap.RemoveAt(LstHeap.Count-1);DownHeap(0);returnvalRet;}protectedvirtualvoidSetAt(inti,Tval){LstHeap[i]=val;}protectedboolRightSonExists(inti){returnRightChildIndex(i)<LstHeap.Count;}protectedboolLeftSonExists(inti){returnLeftChildIndex(i)<LstHeap.Count;}protectedintParentIndex(inti){return(i-1)/2;}protectedintLeftChildIndex(inti){return2*i+1;}protectedintRightChildIndex(inti){return2*(i+1);}protectedTArrayVal(inti){returnLstHeap[i];}protectedTParent(inti){returnLstHeap[ParentIndex(i)];}protectedTLeft(inti){returnLstHeap[LeftChildIndex(i)];}protectedTRight(inti){returnLstHeap[RightChildIndex(i)];}protectedvoidSwap(inti,intj){TvalHold=ArrayVal(i);SetAt(i,LstHeap[j]);SetAt(j,valHold);}protectedvoidUpHeap(inti){while(i>0&&ArrayVal(i).CompareTo(Parent(i))>0){Swap(i,ParentIndex(i));i=ParentIndex(i);}}protectedvoidDownHeap(inti){while(i>=0){intiContinue=-1;if(RightSonExists(i)&&Right(i).CompareTo(ArrayVal(i))>0){iContinue=Left(i).CompareTo(Right(i))<0?RightChildIndex(i):LeftChildIndex(i);}elseif(LeftSonExists(i)&&Left(i).CompareTo(ArrayVal(i))>0){iContinue=LeftChildIndex(i);}if(iContinue>=0&&iContinue<LstHeap.Count){Swap(i,iContinue);}i=iContinue;}}}internalclassHuffmanNode<T>:IComparable{internalHuffmanNode(doubleprobability,Tvalue){Probability=probability;LeftSon=RightSon=Parent=null;Value=value;IsLeaf=true;}internalHuffmanNode(HuffmanNode<T>leftSon,HuffmanNode<T>rightSon){LeftSon=leftSon;RightSon=rightSon;Probability=leftSon.Probability+rightSon.Probability;leftSon.IsZero=true;rightSon.IsZero=false;leftSon.Parent=rightSon.Parent=this;IsLeaf=false;}internalHuffmanNode<T>LeftSon{get;set;}internalHuffmanNode<T>RightSon{get;set;}internalHuffmanNode<T>Parent{get;set;}internalTValue{get;set;}internalboolIsLeaf{get;set;}internalboolIsZero{get;set;}internalintBit{get{returnIsZero?0:1;}}internalboolIsRoot{get{returnParent==null;}}internaldoubleProbability{get;set;}publicintCompareTo(objectobj){return-Probability.CompareTo(((HuffmanNode<T>)obj).Probability);}}publicclassHuffman<T>whereT:IComparable{privatereadonlyDictionary<T,HuffmanNode<T>>_leafDictionary=newDictionary<T,HuffmanNode<T>>();privatereadonlyHuffmanNode<T>_root;publicHuffman(IEnumerable<T>values){varcounts=newDictionary<T,int>();varpriorityQueue=newPriorityQueue<HuffmanNode<T>>();intvalueCount=0;foreach(Tvalueinvalues){if(!counts.ContainsKey(value)){counts[value]=0;}counts[value]++;valueCount++;}foreach(Tvalueincounts.Keys){varnode=newHuffmanNode<T>((double)counts[value]/valueCount,value);priorityQueue.Add(node);_leafDictionary[value]=node;}while(priorityQueue.Count>1){HuffmanNode<T>leftSon=priorityQueue.Pop();HuffmanNode<T>rightSon=priorityQueue.Pop();varparent=newHuffmanNode<T>(leftSon,rightSon);priorityQueue.Add(parent);}_root=priorityQueue.Pop();_root.IsZero=false;}publicList<int>Encode(Tvalue){varreturnValue=newList<int>();Encode(value,returnValue);returnreturnValue;}publicvoidEncode(Tvalue,List<int>encoding){if(!_leafDictionary.ContainsKey(value)){thrownewArgumentException("Invalid value in Encode");}HuffmanNode<T>nodeCur=_leafDictionary[value];varreverseEncoding=newList<int>();while(!nodeCur.IsRoot){reverseEncoding.Add(nodeCur.Bit);nodeCur=nodeCur.Parent;}reverseEncoding.Reverse();encoding.AddRange(reverseEncoding);}publicList<int>Encode(IEnumerable<T>values){varreturnValue=newList<int>();foreach(Tvalueinvalues){Encode(value,returnValue);}returnreturnValue;}publicTDecode(List<int>bitString,refintposition){HuffmanNode<T>nodeCur=_root;while(!nodeCur.IsLeaf){if(position>bitString.Count){thrownewArgumentException("Invalid bitstring in Decode");}nodeCur=bitString[position++]==0?nodeCur.LeftSon:nodeCur.RightSon;}returnnodeCur.Value;}publicList<T>Decode(List<int>bitString){intposition=0;varreturnValue=newList<T>();while(position!=bitString.Count){returnValue.Add(Decode(bitString,refposition));}returnreturnValue;}}internalclassProgram{privateconststringExample="this is an example for huffman encoding";privatestaticvoidMain(){varhuffman=newHuffman<char>(Example);List<int>encoding=huffman.Encode(Example);List<char>decoding=huffman.Decode(encoding);varoutString=newstring(decoding.ToArray());Console.WriteLine(outString==Example?"Encoding/decoding worked":"Encoding/Decoding failed");varchars=newHashSet<char>(Example);foreach(charcinchars){encoding=huffman.Encode(c);Console.Write("{0}: ",c);foreach(intbitinencoding){Console.Write("{0}",bit);}Console.WriteLine();}Console.ReadKey();}}}
C++
This code builds a tree to generate huffman codes, then prints the codes.
#include<iostream>#include<queue>#include<map>#include<climits> // for CHAR_BIT#include<iterator>#include<algorithm>constintUniqueSymbols=1<<CHAR_BIT;constchar*SampleString="this is an example for huffman encoding";typedefstd::vector<bool>HuffCode;typedefstd::map<char,HuffCode>HuffCodeMap;classINode{public:constintf;virtual~INode(){}protected:INode(intf):f(f){}};classInternalNode:publicINode{public:INode*constleft;INode*constright;InternalNode(INode*c0,INode*c1):INode(c0->f+c1->f),left(c0),right(c1){}~InternalNode(){deleteleft;deleteright;}};classLeafNode:publicINode{public:constcharc;LeafNode(intf,charc):INode(f),c(c){}};structNodeCmp{booloperator()(constINode*lhs,constINode*rhs)const{returnlhs->f>rhs->f;}};INode*BuildTree(constint(&frequencies)[UniqueSymbols]){std::priority_queue<INode*,std::vector<INode*>,NodeCmp>trees;for(inti=0;i<UniqueSymbols;++i){if(frequencies[i]!=0)trees.push(newLeafNode(frequencies[i],(char)i));}while(trees.size()>1){INode*childR=trees.top();trees.pop();INode*childL=trees.top();trees.pop();INode*parent=newInternalNode(childR,childL);trees.push(parent);}returntrees.top();}voidGenerateCodes(constINode*node,constHuffCode&prefix,HuffCodeMap&outCodes){if(constLeafNode*lf=dynamic_cast<constLeafNode*>(node)){outCodes[lf->c]=prefix;}elseif(constInternalNode*in=dynamic_cast<constInternalNode*>(node)){HuffCodeleftPrefix=prefix;leftPrefix.push_back(false);GenerateCodes(in->left,leftPrefix,outCodes);HuffCoderightPrefix=prefix;rightPrefix.push_back(true);GenerateCodes(in->right,rightPrefix,outCodes);}}intmain(){// Build frequency tableintfrequencies[UniqueSymbols]={0};constchar*ptr=SampleString;while(*ptr!='\0')++frequencies[*ptr++];INode*root=BuildTree(frequencies);HuffCodeMapcodes;GenerateCodes(root,HuffCode(),codes);deleteroot;for(HuffCodeMap::const_iteratorit=codes.begin();it!=codes.end();++it){std::cout<<it->first<<" ";std::copy(it->second.begin(),it->second.end(),std::ostream_iterator<bool>(std::cout));std::cout<<std::endl;}return0;}
- Output:
110 a 1001 c 101010 d 10001 e 1111 f 1011 g 101011 h 0101 i 1110 l 01110 m 0011 n 000 o 0010 p 01000 r 01001 s 0110 t 01111 u 10100 x 10000
Clojure
(Updated to 1.6 & includes pretty-printing). Uses Java PriorityQueue
(require'[clojure.pprint:refer:all])(defn probs[s](let [freqs(frequenciess)sum(apply + (vals freqs))](into {}(map (fn [[kv]][k(/ vsum)])freqs))))(defn init-pq[weighted-items](let [comp (proxy [java.util.Comparator][](compare[ab](compare(:prioritya)(:priorityb))))pq(java.util.PriorityQueue.(count weighted-items)comp)](doseq [[itemprob]weighted-items](.addpq{:symbolitem,:priorityprob}))pq))(defn huffman-tree[pq](while(> (.sizepq)1)(let [a(.pollpq)b(.pollpq)new-node{:priority(+ (:prioritya)(:priorityb)):lefta:rightb}](.addpqnew-node)))(.pollpq))(defn symbol-map([t](symbol-mapt""))([{:keys[symbol priorityleft right]:ast}code](if symbol [{:symbolsymbol :weightpriority:codecode}](concat (symbol-mapleft (str code\0))(symbol-mapright (str code\1))))))(defn huffman-encode[items](-> itemsprobsinit-pqhuffman-treesymbol-map))(defn display-huffman-encode[s](->>shuffman-encode(sort-by :weight>)print-table))(display-huffman-encode"this is an example for huffman encoding")
- Output:
| :symbol | :weight | :code | |---------+---------+--------| | | 2/13 | 111 | | n | 4/39 | 011 | | a | 1/13 | 1001 | | e | 1/13 | 1011 | | i | 1/13 | 1100 | | f | 1/13 | 1101 | | h | 2/39 | 0001 | | s | 2/39 | 0010 | | m | 2/39 | 0100 | | o | 2/39 | 0101 | | d | 1/39 | 00000 | | t | 1/39 | 00001 | | c | 1/39 | 00110 | | x | 1/39 | 00111 | | u | 1/39 | 10000 | | l | 1/39 | 10001 | | r | 1/39 | 10100 | | g | 1/39 | 101010 | | p | 1/39 | 101011 |
Alternate Version
Uses c.d.priority-map. Creates a more shallow tree but appears to meet the requirements.
(require'[clojure.data.priority-map:refer[priority-map-keyfn-by]])(require'[clojure.pprint:refer[print-table]])(defn init-pq[s](let [c(count s)](->>sfrequencies(map (fn [[kv]][k{:symk:weight(/ vc)}]))(into (priority-map-keyfn-by:weight<)))))(defn huffman-tree[pq](letfn[(build-step[pq](let [a(second (peek pq))b(second (peek (pop pq)))nn{:sym(str (:syma)(:symb)):weight(+ (:weighta)(:weightb)):lefta:rightb}](assoc (pop (pop pq))(:symnn)nn)))](->>(iterate build-steppq)(drop-while #(> (count %)1))first vals first)))(defn symbol-map[m](letfn[(sym-step[{:keys[symweightleft right]:asm}code](cond (and left right)#(vector (trampolinesym-stepleft (str code\0))(trampolinesym-stepright (str code\1)))left #(sym-stepleft (str code\0))right #(sym-stepright (str code\1)):else{:symsym:weightweight:codecode}))](trampolinesym-stepm"")))(defn huffman-encode[s](->>sinit-pqhuffman-treesymbol-mapflatten))(defn display-huffman-encode[s](->>shuffman-encode(sort-by :weight>)print-table))(display-huffman-encode"this is an example for huffman encoding")
- Output:
| :sym | :weight | :code | |------+---------+-------| | | 2/13 | 101 | | n | 4/39 | 010 | | a | 1/13 | 1001 | | i | 1/13 | 1101 | | e | 1/13 | 1110 | | f | 1/13 | 1111 | | m | 2/39 | 0000 | | o | 2/39 | 0001 | | s | 2/39 | 0010 | | h | 2/39 | 11001 | | g | 1/39 | 00110 | | l | 1/39 | 00111 | | t | 1/39 | 01100 | | u | 1/39 | 01101 | | c | 1/39 | 01110 | | d | 1/39 | 01111 | | p | 1/39 | 10000 | | r | 1/39 | 10001 | | x | 1/39 | 11000 |
CoffeeScript
huffman_encoding_table=(counts) -># counts is a hash where keys are characters and# values are frequencies;# return a hash where keys are codes and values# are charactersbuild_huffman_tree=-># returns a Huffman tree. Each node has# cnt: total frequency of all chars in subtree# c: character to be encoded (leafs only)# children: children nodes (branches only)q=min_queue()forc,cntofcountsq.enqueuecnt,cnt:cntc:cwhileq.size()>=2a=q.dequeue()b=q.dequeue()cnt=a.cnt+b.cntnode=cnt:cntchildren:[a,b]q.enqueuecnt,noderoot=q.dequeue()root=build_huffman_tree()codes={}encode=(node, code) ->ifnode.c?codes[code]=node.celseencodenode.children[0],code+"0"encodenode.children[1],code+"1"encode(root,"")codesmin_queue=-># This is very non-optimized; you could use a binary heap for better# performance. Items with smaller priority get dequeued first.arr=[]enqueue:(priority, data) ->i=0whilei<arr.lengthifpriority<arr[i].prioritybreaki+=1arr.splicei,0,priority:prioritydata:datadequeue:->arr.shift().datasize:->arr.length_internal:->arrfreq_count=(s) ->cnts={}forcinscnts[c]?=0cnts[c]+=1cntsrpad=(s, n) ->whiles.length<ns+=' 'sexamples=["this is an example for huffman encoding""abcd""abbccccddddddddeeeeeeeee"]forsinexamplesconsole.log"---- #{s}"counts=freq_count(s)huffman_table=huffman_encoding_table(counts)codes=(codeforcodeofhuffman_table).sort()forcodeincodesc=huffman_table[code]console.log"#{rpad(code,5)}: #{c} (#{counts[c]})"console.log()
- Output:
> coffee huffman.coffee ---- this is an example for huffman encoding 000 : n (4) 0010 : s (2) 0011 : m (2) 0100 : o (2) 01010: t (1) 01011: x (1) 01100: p (1) 01101: l (1) 01110: r (1) 01111: u (1) 10000: c (1) 10001: d (1) 1001 : i (3) 101 : (6) 1100 : a (3) 1101 : e (3) 1110 : f (3) 11110: g (1) 11111: h (2) ---- abcd 00 : a (1) 01 : b (1) 10 : c (1) 11 : d (1) ---- abbccccddddddddeeeeeeeee 0 : e (9) 1000 : a (1) 1001 : b (2) 101 : c (4) 11 : d (8)
Common Lisp
This implementation uses a tree built of huffman-node
s, and a hash table mapping from elements of the input sequence to huffman-node
s. The priority queue is implemented as a sorted list. (For a more efficient implementation of a priority queue, see the Heapsort task.)
(defstructhuffman-node(weight0:typenumber)(elementnil:typet)(encodingnil:type(ornullbit-vector))(leftnil:type(ornullhuffman-node))(rightnil:type(ornullhuffman-node)))(defuninitial-huffman-nodes(sequence&key(test'eql))(let*((length(lengthsequence))(increment(/1length))(nodes(make-hash-table:sizelength:testtest))(queue'()))(mapnil#'(lambda(element)(multiple-value-bind(nodepresentp)(gethashelementnodes)(ifpresentp(incf(huffman-node-weightnode)increment)(let((node(make-huffman-node:weightincrement:elementelement)))(setf(gethashelementnodes)nodequeue(list*nodequeue))))))sequence)(valuesnodes(sortqueue'<:key'huffman-node-weight))))(defunhuffman-tree(sequence&key(test'eql))(multiple-value-bind(nodesqueue)(initial-huffman-nodessequence:testtest)(do()((endp(restqueue))(valuesnodes(firstqueue)))(destructuring-bind(n1n2&restqueue-rest)queue(let((n3(make-huffman-node:leftn1:rightn2:weight(+(huffman-node-weightn1)(huffman-node-weightn2)))))(setfqueue(merge'list(listn3)queue-rest'<:key'huffman-node-weight)))))))1(defunhuffman-codes(sequence&key(test'eql))(multiple-value-bind(nodestree)(huffman-treesequence:testtest)(labels((hc(nodelengthbits)(let((left(huffman-node-leftnode))(right(huffman-node-rightnode)))(cond((and(nullleft)(nullright))(setf(huffman-node-encodingnode)(make-arraylength:element-type'bit:initial-contents(reversebits))))(t(hcleft(1+length)(list*0bits))(hcright(1+length)(list*1bits)))))))(hctree0'())nodes)))(defunprint-huffman-code-table(nodes&optional(out*standard-output*))(formatout"~&Element~10tWeight~20tCode")(loopfornodebeingeachhash-valueofnodesdo(formatout"~&~s~10t~s~20t~s"(huffman-node-elementnode)(huffman-node-weightnode)(huffman-node-encodingnode))))
Example:
> (print-huffman-code-table (huffman-codes "this is an example for huffman encoding")) Element Weight Code #\t 1/39 #*10010 #\d 1/39 #*01101 #\m 2/39 #*0100 #\f 1/13 #*1100 #\o 2/39 #*0111 #\x 1/39 #*100111 #\h 2/39 #*1000 #\a 1/13 #*1010 #\s 2/39 #*0101 #\c 1/39 #*00010 #\l 1/39 #*00001 #\u 1/39 #*00011 #\e 1/13 #*1101 #\n 4/39 #*001 #\g 1/39 #*01100 #\p 1/39 #*100110 #\i 1/13 #*1011 #\r 1/39 #*00000 #\Space 2/13 #*111
D
importstd.stdio,std.algorithm,std.typecons,std.container,std.array;autoencode(aliaseq,R)(Group!(eq,R)sf)/*pure nothrow @safe*/{autoheap=sf.map!(s=>tuple(s[1],[tuple(s[0],"")])).array.heapify!q{b<a};while(heap.length>1){autolo=heap.front;heap.removeFront;autohi=heap.front;heap.removeFront;lo[1].each!((refpair)=>pair[1]='0'~pair[1]);hi[1].each!((refpair)=>pair[1]='1'~pair[1]);heap.insert(tuple(lo[0]+hi[0],lo[1]~hi[1]));}returnheap.front[1].schwartzSort!q{tuple(a[1].length,a[0])};}voidmain()/*@safe*/{immutables="this is an example for huffman encoding"d;foreach(constp;s.dup.sort().group.encode)writefln("'%s' %s",p[]);}
- Output:
' ' 101 'n' 010 'a' 1001 'e' 1100 'f' 1101 'h' 0001 'i' 1110 'm' 0010 'o' 0011 's' 0111 'g' 00000 'l' 00001 'p' 01100 'r' 01101 't' 10000 'u' 10001 'x' 11110 'c' 111110 'd' 111111
Eiffel
Adapted C# solution.
classHUFFMAN_NODE[T->COMPARABLE]inheritCOMPARABLEredefinethree_way_comparisonendcreateleaf_node,inner_nodefeature{NONE}leaf_node(a_probability:REAL_64;a_value:T)doprobability:=a_probabilityvalue:=a_valueis_leaf:=trueleft:=voidright:=voidparent:=voidendinner_node(a_left,a_right:HUFFMAN_NODE[T])doleft:=a_leftright:=a_righta_left.parent:=Currenta_right.parent:=Currenta_left.is_zero:=truea_right.is_zero:=falseprobability:=a_left.probability+a_right.probabilityis_leaf:=falseendfeatureprobability:REAL_64value:detachableTis_leaf:BOOLEANis_zero:BOOLEANassignset_is_zeroset_is_zero(a_value:BOOLEAN)dois_zero:=a_valueendleft:detachableHUFFMAN_NODE[T]right:detachableHUFFMAN_NODE[T]parent:detachableHUFFMAN_NODE[T]assignset_parentset_parent(a_parent:detachableHUFFMAN_NODE[T])doparent:=a_parentendis_root:BOOLEANdoResult:=parent=voidendbit_value:INTEGERdoifis_zerothenResult:=0elseResult:=1endendfeature-- comparable implementationis_lessalias"<"(other:likeCurrent):BOOLEANdoResult:=three_way_comparison(other)=-1endthree_way_comparison(other:likeCurrent):INTEGERdoResult:=-probability.three_way_comparison(other.probability)endendclassHUFFMANcreatemakefeature{NONE}make(a_string:STRING)requirenon_empty_string:a_string.count>0locall_queue:HEAP_PRIORITY_QUEUE[HUFFMAN_NODE[CHARACTER]]l_counts:HASH_TABLE[INTEGER,CHARACTER]l_node:HUFFMAN_NODE[CHARACTER]l_left,l_right:HUFFMAN_NODE[CHARACTER]docreatel_queue.make(a_string.count)createl_counts.make(10)acrossa_stringascharloopifnotl_counts.has(char.item)thenl_counts.put(0,char.item)endl_counts.replace(l_counts.at(char.item)+1,char.item)endcreateleaf_dictionary.make(l_counts.count)acrossl_countsaskvloopcreatel_node.leaf_node((kv.item*1.0)/a_string.count,kv.key)l_queue.put(l_node)leaf_dictionary.put(l_node,kv.key)endfromuntill_queue.count<=1loopl_left:=l_queue.iteml_queue.removel_right:=l_queue.iteml_queue.removecreatel_node.inner_node(l_left,l_right)l_queue.put(l_node)endroot:=l_queue.itemroot.is_zero:=falseendfeatureroot:HUFFMAN_NODE[CHARACTER]leaf_dictionary:HASH_TABLE[HUFFMAN_NODE[CHARACTER],CHARACTER]encode(a_value:CHARACTER):STRINGrequireencodable:leaf_dictionary.has(a_value)locall_node:HUFFMAN_NODE[CHARACTER]doResult:=""ifattachedleaf_dictionary.item(a_value)asattached_nodethenl_node:=attached_nodefromuntill_node.is_rootloopResult.append_integer(l_node.bit_value)ifattachedl_node.parentasparentthenl_node:=parentendendResult.mirrorendendendclassAPPLICATIONcreatemakefeature{NONE}make-- entry pointlocall_str:STRINGhuff:HUFFMANchars:BINARY_SEARCH_TREE_SET[CHARACTER]dol_str:="this is an example for huffman encoding"createhuff.make(l_str)createchars.makechars.fill(l_str)fromchars.startuntilchars.offloopprint(chars.item.out+": "+huff.encode(chars.item)+"%N")chars.forthendendend
- Output:
: 101 a: 1001 c: 01110 d: 01111 e: 1111 f: 1100 g: 01001 h: 11101 i: 1101 l: 10001 m: 0010 n: 000 o: 0011 p: 10000 r: 11100 s: 0110 t: 01000 u: 01011 x: 01010
Erlang
The main part of the code used here is extracted from Michel Rijnders' GitHubGist. See also his blog, for a complete description of the original module.
-module(huffman).-export([encode/1,decode/2,main/0]).encode(Text)->Tree=tree(freq_table(Text)),Dict=dict:from_list(codewords(Tree)),Code=<<<<(dict:fetch(Char,Dict))/bitstring>>||Char<-Text>>,{Code,Tree,Dict}.decode(Code,Tree)->decode(Code,Tree,Tree,[]).main()->{Code,Tree,Dict}=encode("this is an example for huffman encoding"),[beginio:format("~s: ",[[Key]]),print_bits(Value)end||{Key,Value}<-lists:sort(dict:to_list(Dict))],io:format("encoded: "),print_bits(Code),io:format("decoded: "),io:format("~s\n",[decode(Code,Tree)]).decode(<<>>,_,_,Result)->lists:reverse(Result);decode(<<0:1,Rest/bits>>,Tree,{L={_,_},_R},Result)->decode(<<Rest/bits>>,Tree,L,Result);decode(<<0:1,Rest/bits>>,Tree,{L,_R},Result)->decode(<<Rest/bits>>,Tree,Tree,[L|Result]);decode(<<1:1,Rest/bits>>,Tree,{_L,R={_,_}},Result)->decode(<<Rest/bits>>,Tree,R,Result);decode(<<1:1,Rest/bits>>,Tree,{_L,R},Result)->decode(<<Rest/bits>>,Tree,Tree,[R|Result]).codewords({L,R})->codewords(L,<<0:1>>)++codewords(R,<<1:1>>).codewords({L,R},<<Bits/bits>>)->codewords(L,<<Bits/bits,0:1>>)++codewords(R,<<Bits/bits,1:1>>);codewords(Symbol,<<Bits/bitstring>>)->[{Symbol,Bits}].tree([{N,_}|[]])->N;tree(Ns)->[{N1,C1},{N2,C2}|Rest]=lists:keysort(2,Ns),tree([{{N1,N2},C1+C2}|Rest]).freq_table(Text)->freq_table(lists:sort(Text),[]).freq_table([],Acc)->Acc;freq_table([S|Rest],Acc)->{Block,MoreBlocks}=lists:splitwith(fun(X)->X==Send,Rest),freq_table(MoreBlocks,[{S,1+length(Block)}|Acc]).print_bits(<<>>)->io:format("\n");print_bits(<<Bit:1,Rest/bitstring>>)->io:format("~w",[Bit]),print_bits(Rest).
- Output:
: 111 a: 1011 c: 10010 d: 100111 e: 1010 f: 1101 g: 100110 h: 1000 i: 1100 l: 00001 m: 0101 n: 001 o: 0100 p: 00000 r: 00011 s: 0111 t: 00010 u: 01101 x: 01100 encoded: 0001010001100011111111000111111101100111110100110010110101000000000110101111101010000011111100001101110111010101101100111110100011001001001001111100001100110 decoded: this is an example for huffman encoding
F#
type'aHuffmanTree=|Leafofint*'a|Nodeofint*'aHuffmanTree*'aHuffmanTreeletfreq=functionLeaf(f,_)|Node(f,_,_)->fletfreqCompareab=compare(freqa)(freqb)letbuildTreecharFreqs=letleaves=List.map(fun(c,f)->Leaf(f,c))charFreqsletfreqSort=List.sortWithfreqCompareletrecaux=function|[]->failwith"empty list"|[a]->a|a::b::tl->letnode=Node(freqa+freqb,a,b)aux(freqSort(node::tl))aux(freqSortleaves)letrecprintTree=function|code,Leaf(f,c)->printfn"%c\t%d\t%s"cf(String.concat""(List.revcode));|code,Node(_,l,r)->printTree("0"::code,l);printTree("1"::code,r)let()=letstr="this is an example for huffman encoding"letcharFreqs=str|>Seq.groupByid|>Seq.map(fun(c,vals)->(c,Seq.lengthvals))|>Map.ofSeqlettree=charFreqs|>Map.toList|>buildTreeprintfn"Symbol\tWeight\tHuffman code";printTree([],tree)
- Output:
Symbol Weight Huffman code p 1 00000 r 1 00001 g 1 00010 l 1 00011 n 4 001 m 2 0100 o 2 0101 c 1 01100 d 1 01101 h 2 0111 s 2 1000 x 1 10010 t 1 100110 u 1 100111 f 3 1010 i 3 1011 a 3 1100 e 3 1101 6 111
Factor
USING:kernelsequencescombinatorsaccessorsassocsmathhashtablesmath.ordersorting.slotsclassesformattingprettyprint;IN:huffman! -------------------------------------! CLASSES -----------------------------! -------------------------------------TUPLE:huffman-nodeweightelementencodingleftright;! For nodes:<huffman-tnode>(leftright--huffman)huffman-nodenew[left<<][swap>>right]bi;! For leafs:<huffman-node>(element--huffman)1 swapfffhuffman-nodeboa;! --------------------------------------! INITIAL HASHTABLE --------------------! --------------------------------------<PRIVATE! Increment node if it already exists! Else make it and add it to the hash-table:huffman-gen(elementnodes--)2dupat[[[1 +]change-weight]change-at][[dup<huffman-node>swap]dipset-at]if;! Curry node-hash. Then each over the seq! to get the weighted values:(huffman)(nodesseq--nodes)dup[[huffman-gen]curryeach]dip;! ---------------------------------------! TREE GENERATION -----------------------! ---------------------------------------:(huffman-weight)(node1node2--weight)[weight>>]dupbi*+;! Combine two nodes into the children of a parent! node which has a weight equal to their collective! weight:(huffman-combine)(node1node2--node3)[(huffman-weight)] [<huffman-tnode>]2biswap>>weight;! Generate a tree by combining nodes! in the priority queue until we're! left with the root node:(huffman-tree)(nodes--tree)duprestempty?[first][ {{weight>><=>}}sort-by [restrest][first] [second]tri(huffman-combine)prefix(huffman-tree) ]if;recursive! --------------------------------------! ENCODING -----------------------------! --------------------------------------:(huffman-leaf?)(node--bool)[left>>huffman-nodeinstance?] [right>>huffman-nodeinstance?]biandnot;:(huffman-leaf)(leafbit--)swapencoding<<;DEFER:(huffman-encoding)! Recursively walk the nodes left and right:(huffman-node)(bitnodes--)[0 suffix][1 suffix]bi[[left>>][right>>]bi]2dip[swap]dip[(huffman-encoding)]2bi@;:(huffman-encoding)(bitnodes--)over(huffman-leaf?) [(huffman-leaf)] [(huffman-node)]if;PRIVATE>! -------------------------------! USER WORDS --------------------! -------------------------------:huffman-print(nodes--)"Element""Weight""Code""\n%10s\t%10s\t%6s\n"printf {{weight>>>=<}}sort-by [[encoding>>][element>>][weight>>]tri"%8c\t%7d\t\t"printfpprint"\n"printf]each;:huffman(sequence--nodes)H{}clone(huffman)values[(huffman-tree){}(huffman-encoding)]keep;! ---------------------------------! USAGE ---------------------------! ---------------------------------! { 1 2 3 4 } huffman huffman-print! "this is an example of a huffman tree" huffman huffman-print! Element Weight Code! 7 { 0 0 0 }! a 4 { 1 1 1 }! e 4 { 1 1 0 }! f 3 { 0 0 1 0 }! h 2 { 1 0 1 0 }! i 2 { 0 1 0 1 }! m 2 { 0 1 0 0 }! n 2 { 0 1 1 1 }! s 2 { 0 1 1 0 }! t 2 { 0 0 1 1 }! l 1 { 1 0 1 1 1 }! o 1 { 1 0 1 1 0 }! p 1 { 1 0 0 0 1 }! r 1 { 1 0 0 0 0 }! u 1 { 1 0 0 1 1 }! x 1 { 1 0 0 1 0 }
Fantom
class Node { Float probability := 0.0f } class Leaf : Node { Int character new make (Int character, Float probability) { this.character = character this.probability = probability } } class Branch : Node { Node left Node right new make (Node left, Node right) { this.left = left this.right = right probability = this.left.probability + this.right.probability } } class Huffman { Node[] queue := [,] Str:Str table := [:] new make (Int[] items) { uniqueItems := items.dup.unique uniqueItems.each |Int item| { num := items.findAll { it == item }.size queue.add (Leaf(item, num.toFloat / items.size)) } createTree createTable } Void createTree () { while (queue.size > 1) { queue.sort |a,b| {a.probability <=> b.probability} node1 := queue.removeAt (0) node2 := queue.removeAt (0) queue.add (Branch (node1, node2)) } } Void traverse (Node node, Str encoding) { if (node is Leaf) { table[(node as Leaf).character.toChar] = encoding } else // (node is Branch) { traverse ((node as Branch).left, encoding + "0") traverse ((node as Branch).right, encoding + "1") } } Void createTable () { if (queue.size != 1) return // error! traverse (queue.first, "") } override Str toStr () { result := "Huffman Encoding Table:\n" table.keys.sort.each |Str key| { result += "$key -> ${table[key]}\n" } return result } } class Main { public static Void main () { example := "this is an example for huffman encoding" huffman := Huffman (example.chars) echo ("From \"$example\"") echo (huffman) } }
- Output:
From "this is an example for huffman encoding" Huffman Encoding Table: -> 101 a -> 1100 c -> 10000 d -> 10001 e -> 1101 f -> 1110 g -> 11110 h -> 11111 i -> 1001 l -> 01101 m -> 0011 n -> 000 o -> 0100 p -> 01100 r -> 01110 s -> 0010 t -> 01010 u -> 01111 x -> 01011
Fortran
! output:! d-> 00000, t-> 00001, h-> 0001, s-> 0010, ! c-> 00110, x-> 00111, m-> 0100, o-> 0101, ! n-> 011, u-> 10000, l-> 10001, a-> 1001, ! r-> 10100, g-> 101010, p-> 101011, ! e-> 1011, i-> 1100, f-> 1101, -> 111!! 00001|0001|1100|0010|111|1100|0010|111|1001|011|! 111|1011|00111|1001|0100|101011|10001|1011|111|! 1101|0101|10100|111|0001|10000|1101|1101|0100|! 1001|011|111|1011|011|00110|0101|00000|1100|011|101010|!module huffmanimplicit nonetype nodecharacter(len=1),allocatable::sym(:)character(len=10),allocatable::code(:)integer::freqcontains procedure::show=>show_nodeend typetype queuetype(node),allocatable::buf(:)integer::n=0contains procedure::extractminprocedure::appendprocedure::siftdownend typecontainssubroutine siftdown(this,a)class(queue)::thisinteger::a,parent,childassociate(x=>this%buf)parent=ado while(parent*2<=this%n)child=parent*2if(child+1<=this%n)then if(x(child+1)%freq<x(child)%freq)thenchild=child+1end if end if if(x(parent)%freq>x(child)%freq)thenx([child,parent])=x([parent,child])parent=childelse exit end if end do end associateend subroutinefunction extractmin(this)result(res)class(queue)::thistype(node)::resres=this%buf(1)this%buf(1)=this%buf(this%n)this%n=this%n-1call this%siftdown(1)end functionsubroutine append(this,x)class(queue),intent(inout)::thistype(node)::xtype(node),allocatable::tmp(:)integer::ithis%n=this%n+1if(.not.allocated(this%buf))allocate(this%buf(1))if(size(this%buf)<this%n)then allocate(tmp(2*size(this%buf)))tmp(1:this%n-1)=this%bufcall move_alloc(tmp,this%buf)end ifthis%buf(this%n)=xi=this%ndo i=i/2if(i==0)exit call this%siftdown(i)end doend subroutinefunction join(a,b)result(c)type(node)::a,b,cinteger::i,n,n1n1=size(a%sym)n=n1+size(b%sym)c%freq=a%freq+b%freqallocate(c%sym(n),c%code(n))do i=1,n1c%sym(i)=a%sym(i)c%code(i)="0"//trim(a%code(i))end do do i=1,size(b%sym)c%sym(i+n1)=b%sym(i)c%code(i+n1)="1"//trim(b%code(i))end doend functionsubroutine show_node(this)class(node)::thisinteger::iwrite(*,"(*(g0,'-> ',g0,:,', '))",advance="no")&(this%sym(i),trim(this%code(i)),i=1,size(this%sym))print*end subroutinefunction create(letter,freq)result(this)character::letterinteger::freqtype(node)::thisallocate(this%sym(1),this%code(1))this%sym(1)=letter;this%code(1)=""this%freq=freqend functionend module program mainuse huffmancharacter(len=*),parameter::txt=&"this is an example for huffman encoding"integer::i,freq(0:255)=0type(queue)::Qtype(node)::xdo i=1,len(txt)freq(ichar(txt(i:i)))=freq(ichar(txt(i:i)))+1end do do i=0,255if(freq(i)>0)then call Q%append(create(char(i),freq(i)))end if end do do i=1,Q%n-1call Q%append(join(Q%extractmin(),Q%extractmin()))end dox=Q%extractmin()call x%show()do i=1,len(txt)do k=1,size(x%sym)if(x%sym(k)==txt(i:i))exit end do write(*,"(a,'|')",advance="no")trim(x%code(k))end do print*end program
FreeBASIC
typeblockfreqasuintegercharsasstringendtypetypecodecharasstring*1codeasstringendtypesubbubble(lst()asblock,n_lasuinteger)forjasinteger=n_l-1to0step-1ifj>0andalsolst(j).freq>lst(j-1).freqthenswaplst(j),lst(j-1)endifnextjendsubdimasstringSample="this is an example for huffman encoding"redimasblockhufflist(0)hufflist(0).freq=1:hufflist(0).chars=mid(Sample,1,1)dimasbooleannewchardimasstring*1currchardimasuintegern_h=1,n_c'read characters in one-by-one and simultaneously bubblesort themforiasuinteger=2tolen(Sample)currchar=mid(Sample,i,1)newchar=trueforjasuinteger=0ton_h-1ifmid(Sample,i,1)=hufflist(j).charsthenhufflist(j).freq+=1newchar=falseendififj>0andalsohufflist(j).freq>hufflist(j-1).freqthenswaphufflist(j),hufflist(j-1)endifnextjifnewcharthenredimpreservehufflist(0ton_h)hufflist(n_h).chars=currcharhufflist(n_h).freq=1n_h+=1endifnexti'one final pass of bubblesort may be necessarybubblehufflist(),n_h'initialise huffman coderedimascodecodelist(0ton_h-1)foriasuinteger=0ton_h-1codelist(i).char=hufflist(i).charscodelist(i).code=""nextin_c=n_hdo'characters in the least common block get "0" appendedforiasuinteger=1tolen(hufflist(n_h-1).chars)forjasuinteger=0ton_c-1ifcodelist(j).char=mid(hufflist(n_h-1).chars,i,1)thencodelist(j).code="0"+codelist(j).codeendifnextjnexti'characters in the second-least common block get "1" appendedforiasuinteger=1tolen(hufflist(n_h-2).chars)forjasuinteger=0ton_c-1ifcodelist(j).char=mid(hufflist(n_h-2).chars,i,1)thencodelist(j).code="1"+codelist(j).codeendifnextjnexti'combine the two least frequent blockshufflist(n_h-2).chars=hufflist(n_h-2).chars+hufflist(n_h-1).charshufflist(n_h-2).freq=hufflist(n_h-2).freq+hufflist(n_h-1).freqredimpreservehufflist(0ton_h-2)n_h-=1'move the new combined block to its proper place in the listbubblehufflist(),n_hloopuntiln_h=1foriasuinteger=0ton_c-1print"'"+codelist(i).char+"'",codelist(i).codenexti
- Output:
' ' 111 'n' 001 'a' 1011 'e' 1010 'f' 1101 'i' 1100 's' 1000 'h' 0101 'm' 0100 'o' 0111 't' 10010 'x' 100111 'p' 100110 'l' 00001 'r' 00000 'u' 00011 'c' 00010 'd' 01101 'g' 01100
Go
packagemainimport("container/heap""fmt")typeHuffmanTreeinterface{Freq()int}typeHuffmanLeafstruct{freqintvaluerune}typeHuffmanNodestruct{freqintleft,rightHuffmanTree}func(selfHuffmanLeaf)Freq()int{returnself.freq}func(selfHuffmanNode)Freq()int{returnself.freq}typetreeHeap[]HuffmanTreefunc(thtreeHeap)Len()int{returnlen(th)}func(thtreeHeap)Less(i,jint)bool{returnth[i].Freq()<th[j].Freq()}func(th*treeHeap)Push(eleinterface{}){*th=append(*th,ele.(HuffmanTree))}func(th*treeHeap)Pop()(poppedinterface{}){popped=(*th)[len(*th)-1]*th=(*th)[:len(*th)-1]return}func(thtreeHeap)Swap(i,jint){th[i],th[j]=th[j],th[i]}funcbuildTree(symFreqsmap[rune]int)HuffmanTree{vartreestreeHeapforc,f:=rangesymFreqs{trees=append(trees,HuffmanLeaf{f,c})}heap.Init(&trees)fortrees.Len()>1{// two trees with least frequencya:=heap.Pop(&trees).(HuffmanTree)b:=heap.Pop(&trees).(HuffmanTree)// put into new node and re-insert into queueheap.Push(&trees,HuffmanNode{a.Freq()+b.Freq(),a,b})}returnheap.Pop(&trees).(HuffmanTree)}funcprintCodes(treeHuffmanTree,prefix[]byte){switchi:=tree.(type){caseHuffmanLeaf:// print out symbol, frequency, and code for this// leaf (which is just the prefix)fmt.Printf("%c\t%d\t%s\n",i.value,i.freq,string(prefix))caseHuffmanNode:// traverse leftprefix=append(prefix,'0')printCodes(i.left,prefix)prefix=prefix[:len(prefix)-1]// traverse rightprefix=append(prefix,'1')printCodes(i.right,prefix)prefix=prefix[:len(prefix)-1]}}funcmain(){test:="this is an example for huffman encoding"symFreqs:=make(map[rune]int)// read each symbol and record the frequenciesfor_,c:=rangetest{symFreqs[c]++}// build treetree:=buildTree(symFreqs)// print out resultsfmt.Println("SYMBOL\tWEIGHT\tHUFFMAN CODE")printCodes(tree,[]byte{})}
- Output:
SYMBOL WEIGHT HUFFMAN CODE n 4 000 m 2 0010 o 2 0011 s 2 0100 u 1 01010 p 1 01011 h 2 0110 d 1 01110 c 1 01111 t 1 10000 l 1 10001 x 1 10010 r 1 100110 g 1 100111 i 3 1010 e 3 1011 6 110 f 3 1110 a 3 1111
packagemainimport("container/heap""fmt")typecodedstruct{symrunecodestring}typecountedstruct{totalintsyms[]coded}typecHeap[]counted// satisfy heap.Interfacefunc(ccHeap)Len()int{returnlen(c)}func(ccHeap)Less(i,jint)bool{returnc[i].total<c[j].total}func(ccHeap)Swap(i,jint){c[i],c[j]=c[j],c[i]}func(c*cHeap)Push(eleinterface{}){*c=append(*c,ele.(counted))}func(c*cHeap)Pop()(poppedinterface{}){popped=(*c)[len(*c)-1]*c=(*c)[:len(*c)-1]return}funcencode(sym2freqmap[rune]int)[]coded{varchcHeapforsym,freq:=rangesym2freq{ch=append(ch,counted{freq,[]coded{{sym:sym}}})}heap.Init(&ch)forlen(ch)>1{a:=heap.Pop(&ch).(counted)b:=heap.Pop(&ch).(counted)fori,c:=rangea.syms{a.syms[i].code="0"+c.code}fori,c:=rangeb.syms{b.syms[i].code="1"+c.code}heap.Push(&ch,counted{a.total+b.total,append(a.syms,b.syms...)})}returnheap.Pop(&ch).(counted).syms}consttxt="this is an example for huffman encoding"funcmain(){sym2freq:=make(map[rune]int)for_,c:=rangetxt{sym2freq[c]++}table:=encode(sym2freq)fmt.Println("Symbol Weight Huffman Code")for_,c:=rangetable{fmt.Printf(" %c %d %s\n",c.sym,sym2freq[c.sym],c.code)}}
Groovy
Implemented and tested with Groovy 2.3.
importgroovy.transform.*@Canonical@Sortable(includes=['freq','letter'])classNode{StringletterintfreqNodeleftNoderightbooleanisLeaf(){left==null&&right==null}}Mapcorrespondance(Noden,Mapcorresp=[:],Stringprefix=''){if(n.isLeaf()){corresp[n.letter]=prefix?:'0'}else{correspondance(n.left,corresp,prefix+'0')correspondance(n.right,corresp,prefix+'1')}returncorresp}MaphuffmanCode(Stringmessage){defqueue=message.toList().countBy{it}// char frequencies.collect{Stringletter,intfreq->// transformed into tree nodesnewNode(letter,freq)}asTreeSet// put in a queue that maintains orderingwhile(queue.size()>1){def(nodeLeft,nodeRight)=[queue.pollFirst(),queue.pollFirst()]queue<<newNode(freq:nodeLeft.freq+nodeRight.freq,letter:nodeLeft.letter+nodeRight.letter,left:nodeLeft,right:nodeRight)}returncorrespondance(queue.pollFirst())}Stringencode(CharSequencemsg,MapcodeTable){msg.collect{codeTable[it]}.join()}Stringdecode(StringcodedMsg,MapcodeTable,Stringdecoded=''){defpair=codeTable.find{k,v->codedMsg.startsWith(v)}pair?pair.key+decode(codedMsg.substring(pair.value.size()),codeTable):decoded}
Usage:
defmessage="this is an example for huffman encoding"defcodeTable=huffmanCode(message)codeTable.each{k,v->println"$k: $v"}defencoded=encode(message,codeTable)printlnencodeddefdecoded=decode(encoded,codeTable)printlndecoded
- Output:
g: 00000 l: 00001 h: 0001 m: 0010 o: 0011 n: 010 p: 01100 r: 01101 s: 0111 t: 10000 u: 10001 a: 1001 : 101 e: 1100 f: 1101 i: 1110 x: 11110 c: 111110 d: 111111 1000000011110011110111100111101100101010111001111010010010011000000111001011101001101101101000110001110111010010100101010111000101111100011111111111001000000 this is an example for huffman encoding
Haskell
Credits go to huffman where you'll also find a non-tree solution. Uses sorted list as a priority queue.
importData.List(group,insertBy,sort,sortBy)importControl.Arrow((&&&),second)importData.Ord(comparing)dataHTreea=Leafa|Branch(HTreea)(HTreea)deriving(Show,Eq,Ord)test::String->IO()test=mapM_(\(a,b)->putStrLn('\'':a:("' : "++b))).serialize.huffmanTree.freqserialize::HTreea->[(a,String)]serialize(Branchlr)=(second('0':)<$>serializel)++(second('1':)<$>serializer)serialize(Leafx)=[(x,"")]huffmanTree::(Ordw,Numw)=>[(w,a)]->HTreeahuffmanTree=snd.head.until(null.tail)hstep.sortBy(comparingfst).fmap(secondLeaf)hstep::(Orda,Numa)=>[(a,HTreeb)]->[(a,HTreeb)]hstep((w1,t1):(w2,t2):wts)=insertBy(comparingfst)(w1+w2,Brancht1t2)wtsfreq::Orda=>[a]->[(Int,a)]freq=fmap(length&&&head).group.sortmain::IO()main=test"this is an example for huffman encoding"
- Output:
'p':00000'r':00001'g':00010'l':00011'n':001'm':0100'o':0101'c':01100'd':01101'h':0111's':1000'x':10010't':100110'u':100111'f':1010'i':1011'a':1100'e':1101' ':111
Using Set
as a priority queue
(might be worth it for bigger alphabets):
importqualifiedData.SetasShtree::(Ordt,Numt,Orda)=>S.Set(t,HTreea)->HTreeahtreets|S.nullts_1=t1|otherwise=htreets_3where((w1,t1),ts_1)=S.deleteFindMints((w2,t2),ts_2)=S.deleteFindMints_1ts_3=S.insert(w1+w2,Brancht1t2)ts_2huffmanTree::(Ordw,Numw,Orda)=>[(w,a)]->HTreeahuffmanTree=htree.S.fromList.map(secondLeaf)
A non-tree version
This produces the output required without building the Huffman tree at all, by building all the trace strings directly while reducing the histogram:
importData.List(sortBy,insertBy,sort,group)importControl.Arrow(second,(&&&))importData.Ord(comparing)freq::Orda=>[a]->[(Int,a)]freq=map(length&&&head).group.sorthuffman::[(Int,Char)]->[(Char,String)]huffman=reduce.map(\(p,c)->(p,[(c,"")])).sortBy(comparingfst)whereadd(p1,xs1)(p2,xs2)=(p1+p2,map(second('0':))xs1++map(second('1':))xs2)reduce[(_,ys)]=sortBy(comparingfst)ysreduce(x1:x2:xs)=reduce$insertBy(comparingfst)(addx1x2)xstests=mapM_(\(a,b)->putStrLn('\'':a:"\' : "++b)).huffman.freq$smain=dotest"this is an example for huffman encoding"
Icon and Unicon
- Output:
Input String : "this is an example for huffman encoding" char freq encoding " " 6 101 "a" 3 1100 "c" 1 10000 "d" 1 10001 "e" 3 1101 "f" 3 1110 "g" 1 11110 "h" 2 11111 "i" 3 1001 "l" 1 01101 "m" 2 0011 "n" 4 000 "o" 2 0100 "p" 1 01100 "r" 1 01110 "s" 2 0010 "t" 1 01010 "u" 1 01111 "x" 1 01011
The following Unicon specific solution takes advantage of the Heap priority queue implementation found in the UniLib Collections package and implements the algorithm given in the problem description. The program produces Huffman codes based on each line of input.
importCollectionsproceduremain(A)everyline:=!&inputdo{every(t:=table(0))[!line]+:=1# Frequency tableheap:=Heap(sort(t),field,"<")# Initial priority queuewhileheap.size()>1do{# Tree constructionevery(p1|p2):=heap.get()heap.add([&null,p1[2]+p2[2],p1,p2])}codes:=treeWalk(heap.get(),"")# Get codes from treewrite("Huffman encoding:")# Display codeseverypair:=!sort(codes)dowrite("\t'",\pair[1],"'-> ",pair[2])}endprocedurefield(node)# selector function for Heapreturnnode[2]# field to use for priority orderingendproceduretreeWalk(node,prefix,codeMap)/codeMap:=table("")if/node[1]then{# interior nodetreeWalk(node[3],prefix||"0",codeMap)treeWalk(node[4],prefix||"1",codeMap)}elsecodeMap[node[1]]:=prefixreturncodeMapend
A sample run:
->huffman this is an example for huffman encoding Huffman encoding: ' '-> 111 'a'-> 1001 'c'-> 00110 'd'-> 00000 'e'-> 1011 'f'-> 1101 'g'-> 101010 'h'-> 0001 'i'-> 1100 'l'-> 10001 'm'-> 0100 'n'-> 011 'o'-> 0101 'p'-> 101011 'r'-> 10100 's'-> 0010 't'-> 00001 'u'-> 10000 'x'-> 00111 aardvarks are ant eaters Huffman encoding: ' '-> 011 'a'-> 10 'd'-> 0010 'e'-> 010 'k'-> 0011 'n'-> 0001 'r'-> 110 's'-> 1111 't'-> 1110 'v'-> 0000 ->
HuffStuff provides huffman encoding routines
J
Solution (drawn from the J wiki):
hc=:4 : 0if.1=#xdo.yelse.((i{x),+/j{x)hc(i{y),<j{y[i=.(i.#x)-.j=.2{./:xend.)hcodes=:4 : 0assert.x-:&$yNB. weights and words have same shapeassert.(0<:x)*.1=#$xNB. weights are non-negativeassert.1>:L.yNB. words are boxed not more than oncew=.,&.>yNB. standardized wordsassert.w-:~.wNB. words are uniquet=.0{::xhcwNB. minimal weight binary tree((<S:0t)i.w){<@(1&=)@;S:1{::t)
Example:
;"1":L:0(#/.~(],.(<' '),.hcodes),&.>@~.)'this is an example for huffman encoding't01010h11111i1001s0010101a1100n000 e 1101x01011m0011p01100l01101f1110o0100 r 01110u01111c10000d10001g11110
Java
This implementation creates an actual tree structure, and then traverses the tree to recover the code.
importjava.util.*;abstractclassHuffmanTreeimplementsComparable<HuffmanTree>{publicfinalintfrequency;// the frequency of this treepublicHuffmanTree(intfreq){frequency=freq;}// compares on the frequencypublicintcompareTo(HuffmanTreetree){returnfrequency-tree.frequency;}}classHuffmanLeafextendsHuffmanTree{publicfinalcharvalue;// the character this leaf representspublicHuffmanLeaf(intfreq,charval){super(freq);value=val;}}classHuffmanNodeextendsHuffmanTree{publicfinalHuffmanTreeleft,right;// subtreespublicHuffmanNode(HuffmanTreel,HuffmanTreer){super(l.frequency+r.frequency);left=l;right=r;}}publicclassHuffmanCode{// input is an array of frequencies, indexed by character codepublicstaticHuffmanTreebuildTree(int[]charFreqs){PriorityQueue<HuffmanTree>trees=newPriorityQueue<HuffmanTree>();// initially, we have a forest of leaves// one for each non-empty characterfor(inti=0;i<charFreqs.length;i++)if(charFreqs[i]>0)trees.offer(newHuffmanLeaf(charFreqs[i],(char)i));asserttrees.size()>0;// loop until there is only one tree leftwhile(trees.size()>1){// two trees with least frequencyHuffmanTreea=trees.poll();HuffmanTreeb=trees.poll();// put into new node and re-insert into queuetrees.offer(newHuffmanNode(a,b));}returntrees.poll();}publicstaticvoidprintCodes(HuffmanTreetree,StringBufferprefix){asserttree!=null;if(treeinstanceofHuffmanLeaf){HuffmanLeafleaf=(HuffmanLeaf)tree;// print out character, frequency, and code for this leaf (which is just the prefix)System.out.println(leaf.value+"\t"+leaf.frequency+"\t"+prefix);}elseif(treeinstanceofHuffmanNode){HuffmanNodenode=(HuffmanNode)tree;// traverse leftprefix.append('0');printCodes(node.left,prefix);prefix.deleteCharAt(prefix.length()-1);// traverse rightprefix.append('1');printCodes(node.right,prefix);prefix.deleteCharAt(prefix.length()-1);}}publicstaticvoidmain(String[]args){Stringtest="this is an example for huffman encoding";// we will assume that all our characters will have// code less than 256, for simplicityint[]charFreqs=newint[256];// read each character and record the frequenciesfor(charc:test.toCharArray())charFreqs[c]++;// build treeHuffmanTreetree=buildTree(charFreqs);// print out resultsSystem.out.println("SYMBOL\tWEIGHT\tHUFFMAN CODE");printCodes(tree,newStringBuffer());}}
- Output:
SYMBOL WEIGHT HUFFMAN CODE d 1 00000 t 1 00001 h 2 0001 s 2 0010 c 1 00110 x 1 00111 m 2 0100 o 2 0101 n 4 011 u 1 10000 l 1 10001 a 3 1001 r 1 10100 g 1 101010 p 1 101011 e 3 1011 i 3 1100 f 3 1101 6 111
JavaScript
for the print()
function.
First, use the Binary Heap implementation from here: http://eloquentjavascript.net/appendix2.html
The Huffman encoder
functionHuffmanEncoding(str){this.str=str;varcount_chars={};for(vari=0;i<str.length;i++)if(str[i]incount_chars)count_chars[str[i]]++;elsecount_chars[str[i]]=1;varpq=newBinaryHeap(function(x){returnx[0];});for(varchincount_chars)pq.push([count_chars[ch],ch]);while(pq.size()>1){varpair1=pq.pop();varpair2=pq.pop();pq.push([pair1[0]+pair2[0],[pair1[1],pair2[1]]]);}vartree=pq.pop();this.encoding={};this._generate_encoding(tree[1],"");this.encoded_string=""for(vari=0;i<this.str.length;i++){this.encoded_string+=this.encoding[str[i]];}}HuffmanEncoding.prototype._generate_encoding=function(ary,prefix){if(aryinstanceofArray){this._generate_encoding(ary[0],prefix+"0");this._generate_encoding(ary[1],prefix+"1");}else{this.encoding[ary]=prefix;}}HuffmanEncoding.prototype.inspect_encoding=function(){for(varchinthis.encoding){print("'"+ch+"': "+this.encoding[ch])}}HuffmanEncoding.prototype.decode=function(encoded){varrev_enc={};for(varchinthis.encoding)rev_enc[this.encoding[ch]]=ch;vardecoded="";varpos=0;while(pos<encoded.length){varkey=""while(!(keyinrev_enc)){key+=encoded[pos];pos++;}decoded+=rev_enc[key];}returndecoded;}
And, using the Huffman encoder
vars="this is an example for huffman encoding";print(s);varhuff=newHuffmanEncoding(s);huff.inspect_encoding();vare=huff.encoded_string;print(e);vart=huff.decode(e);print(t);print("is decoded string same as original? "+(s==t));
- Output:
this is an example for huffman encoding 'n': 000 's': 0010 'm': 0011 'o': 0100 't': 01010 'x': 01011 'p': 01100 'l': 01101 'r': 01110 'u': 01111 'c': 10000 'd': 10001 'i': 1001 ' ': 101 'a': 1100 'e': 1101 'f': 1110 'g': 11110 'h': 11111 0101011111100100101011001001010111000001011101010111100001101100011011101101111001000111010111111011111110111000111100000101110100010000010010001100100011110 this is an example for huffman encoding is decoded string same as original? true
classnode{constructor(freq,char,left,right){this.left=left;this.right=right;this.freq=freq;this.c=char;}};nodes=[];code={};functionnew_node(left,right){returnnewnode(left.freq+right.freq,-1,left,right);;};functionqinsert(node){nodes.push(node);nodes.sort(compareFunction);};functionqremove(){returnnodes.pop();};functioncompareFunction(a,b){returnb.freq-a.freq;};functionbuild_code(node,codeString,length){if(node.c!=-1){code[node.c]=codeString;return;};/* Left Branch */leftCodeString=codeString+"0";build_code(node.left,leftCodeString,length+1);/* Right Branch */rightCodeString=codeString+"1";build_code(node.right,rightCodeString,length+1);};functioninit(string){vari;varfreq=[];varcodeString="";for(vari=0;i<string.length;i++){if(isNaN(freq[string.charCodeAt(i)])){freq[string.charCodeAt(i)]=1;}else{freq[string.charCodeAt(i)]++;};};for(vari=0;i<freq.length;i++){if(freq[i]>0){qinsert(newnode(freq[i],i,null,null));};};while(nodes.length>1){qinsert(new_node(qremove(),qremove()));};build_code(nodes[0],codeString,0);};functionencode(string){output="";for(vari=0;i<string.length;i++){output+=code[string.charCodeAt(i)];};returnoutput;};functiondecode(input){output="";node=nodes[0];for(vari=0;i<input.length;i++){if(input[i]=="0"){node=node.left;}else{node=node.right;};if(node.c!=-1){output+=String.fromCharCode(node.c);node=nodes[0];};};returnoutput};string="this is an example of huffman encoding";console.log("initial string: "+string);init(string);for(vari=0;i<Object.keys(code).length;i++){if(isNaN(code[Object.keys(code)[i]])){}else{console.log("'"+String.fromCharCode(Object.keys(code)[i])+"'"+": "+code[Object.keys(code)[i]]);};};huffman=encode(string);console.log("encoded: "+huffman+"\n");output=decode(huffman);console.log("decoded: "+output);
initial string: this is an example of huffman encoding ' ': 111 'a': 1011 'c': 00101 'd': 00100 'e': 1010 'f': 1101 'g': 00111 'h': 0101 'i': 1100 'l': 00110 'm': 0100 'n': 100 'o': 0111 'p': 00001 's': 0110 't': 00000 'u': 00011 'x': 00010 encoded: 000000101110001101111100011011110111001111010000101011010000001001101010111011111011110101000111101110101001011100111101010000101011100100110010000111 decoded: this is an example of huffman encoding
Julia
abstracttypeHuffmanTreeendstructHuffmanLeaf<:HuffmanTreech::Charfreq::IntendstructHuffmanNode<:HuffmanTreefreq::Intleft::HuffmanTreeright::HuffmanTreeendfunctionmakefreqdict(s::String)d=Dict{Char,Int}()forcinsif!haskey(d,c)d[c]=1elsed[c]+=1endenddendfunctionhuffmantree(ftable::Dict)trees::Vector{HuffmanTree}=[HuffmanLeaf(ch,fq)for(ch,fq)inftable]whilelength(trees)>1sort!(trees,lt=(x,y)->x.freq<y.freq,rev=true)least=pop!(trees)nextleast=pop!(trees)push!(trees,HuffmanNode(least.freq+nextleast.freq,least,nextleast))endtrees[1]endprintencoding(lf::HuffmanLeaf,code)=println(lf.ch==' '?"space":lf.ch,"\t",lf.freq,"\t",code)functionprintencoding(nd::HuffmanNode,code)code*='0'printencoding(nd.left,code)code=code[1:end-1]code*='1'printencoding(nd.right,code)code=code[1:end-1]endconstmsg="this is an example for huffman encoding"println("Char\tFreq\tHuffman code")printencoding(huffmantree(makefreqdict(msg)),"")
- Output:
Char Freq Huffman code p 1 00000 c 1 00001 g 1 00010 x 1 00011 n 4 001 s 2 0100 h 2 0101 u 1 01100 l 1 01101 m 2 0111 o 2 1000 d 1 10010 r 1 100110 t 1 100111 e 3 1010 f 3 1011 a 3 1100 i 3 1101 space 6 111
Kotlin
This implementation creates an actual tree structure, and then traverses the tree to recover the code.
importjava.util.*abstractclassHuffmanTree(varfreq:Int):Comparable<HuffmanTree>{overridefuncompareTo(other:HuffmanTree)=freq-other.freq}classHuffmanLeaf(freq:Int,varvalue:Char):HuffmanTree(freq)classHuffmanNode(varleft:HuffmanTree,varright:HuffmanTree):HuffmanTree(left.freq+right.freq)funbuildTree(charFreqs:IntArray):HuffmanTree{valtrees=PriorityQueue<HuffmanTree>()charFreqs.forEachIndexed{index,freq->if(freq>0)trees.offer(HuffmanLeaf(freq,index.toChar()))}assert(trees.size>0)while(trees.size>1){vala=trees.poll()valb=trees.poll()trees.offer(HuffmanNode(a,b))}returntrees.poll()}funprintCodes(tree:HuffmanTree,prefix:StringBuffer){when(tree){isHuffmanLeaf->println("${tree.value}\t${tree.freq}\t$prefix")isHuffmanNode->{//traverse leftprefix.append('0')printCodes(tree.left,prefix)prefix.deleteCharAt(prefix.lastIndex)//traverse rightprefix.append('1')printCodes(tree.right,prefix)prefix.deleteCharAt(prefix.lastIndex)}}}funmain(args:Array<String>){valtest="this is an example for huffman encoding"valmaxIndex=test.max()!!.toInt()+1valfreqs=IntArray(maxIndex)//256 enough for latin ASCII table, but dynamic size is more funtest.forEach{freqs[it.toInt()]+=1}valtree=buildTree(freqs)println("SYMBOL\tWEIGHT\tHUFFMAN CODE")printCodes(tree,StringBuffer())}
- Output:
SYMBOL WEIGHT HUFFMAN CODE d 1 00000 t 1 00001 h 2 0001 s 2 0010 c 1 00110 x 1 00111 m 2 0100 o 2 0101 n 4 011 u 1 10000 l 1 10001 a 3 1001 r 1 10100 g 1 101010 p 1 101011 e 3 1011 i 3 1100 f 3 1101 6 111
Lua
This implementation proceeds in three steps: determine word frequencies, construct the Huffman tree, and finally fold the tree into the codes while outputting them.
localbuild_freqtable=function(data)localfreq={}fori=1,#datadolocalcur=string.sub(data,i,i)localcount=freq[cur]or0freq[cur]=count+1endlocalnodes={}forw,finnext,freqdonodes[#nodes+1]={word=w,freq=f}endtable.sort(nodes,function(a,b)returna.freq>b.freqend)--- reverse order!returnnodesendlocalbuild_hufftree=function(nodes)whiletruedolocaln=#nodeslocalleft=nodes[n]nodes[n]=nillocalright=nodes[n-1]nodes[n-1]=nillocalnew={freq=left.freq+right.freq,left=left,right=right}ifn==2thenreturnnewend--- insert new node at correct prioritylocalprio=1whileprio<#nodesandnodes[prio].freq>new.freqdoprio=prio+1endtable.insert(nodes,prio,new)endendlocalprint_huffcodesdolocalrec_build_huffcodesrec_build_huffcodes=function(node,bits,acc)ifnode.word==nilthenrec_build_huffcodes(node.left,bits.."0",acc)rec_build_huffcodes(node.right,bits.."1",acc)returnaccelse--- leafacc[#acc+1]={node.freq,node.word,bits}endreturnaccendprint_huffcodes=function(root)localcodes=rec_build_huffcodes(root,"",{})table.sort(codes,function(a,b)returna[1]<b[1]end)print("frequency\tword\thuffman code")fori=1,#codesdoprint(string.format("%9d\t‘%s’\t“%s”",table.unpack(codes[i])))endendendlocalhuffcode=function(data)localnodes=build_freqtable(data)localhuff=build_hufftree(nodes)print_huffcodes(huff)return0endreturnhuffcode"this is an example for huffman encoding"
frequency word huffman code 1 ‘g’ “01111” 1 ‘p’ “01011” 1 ‘d’ “01100” 1 ‘c’ “01101” 1 ‘t’ “01010” 1 ‘r’ “10000” 1 ‘u’ “11110” 1 ‘x’ “10001” 1 ‘l’ “01110” 2 ‘o’ “11111” 2 ‘m’ “0011” 2 ‘h’ “0010” 2 ‘s’ “0100” 3 ‘i’ “1101” 3 ‘f’ “1110” 3 ‘a’ “1100” 3 ‘e’ “1001” 4 ‘n’ “000” 6 ‘ ’ “101”
M2000 Interpreter
Module Huffman { comp=lambda (a, b) ->{ =array(a, 0)<array(b, 0) } module InsertPQ (a, n, &comp) { if len(a)=0 then stack a {data n} : exit if comp(n, stackitem(a)) then stack a {push n} : exit stack a { push n t=2: b=len(a) m=b While t<=b { t1=m m=(b+t) div 2 if m=0 then m=t1 : exit If comp(stackitem(m),n) then t=m+1: continue b=m-1 m=b } if m>1 then shiftback m } } a$="this is an example for huffman encoding" inventory queue freq For i=1 to len(a$) { b$=mid$(a$,i,1) if exist(freq, b$) then Return freq, b$:=freq(b$)+1 : continue append freq, b$:=1 } sort ascending freq b=stack K=each(freq) LenA=len(a$) While k { InsertPQ b, (Round(Eval(k)/lenA, 4), eval$(k, k^)), &comp } While len(b)>1 { Stack b { Read m1, m2 InsertPQ b, (Array(m1)+Array(m2), (m1, m2) ), &comp } } Print "Size of stack object (has only Root):"; len(b) Print "Root probability:";Round(Array(Stackitem(b)), 3) inventory encode, decode Traverse(stackitem(b), "") message$="" For i=1 to len(a$) message$+=encode$(mid$(a$, i, 1)) Next i Print message$ j=1 check$="" For i=1 to len(a$) d=each(encode) While d { code$=eval$(d) if mid$(message$, j, len(code$))=code$ then { check$+=decode$(code$) Print decode$(code$); : j+=len(code$) } } Next i Print Print len(message$);" bits ", if$(a$=check$->"Encoding/decoding worked", "Encoding/Decoding failed") Sub Traverse(a, a$) local b=array(a,1) if type$(b)="mArray" Else { Print @(10); quote$(array$(a, 1));" "; a$,@(20),array(a) Append decode, a$ :=array$(a, 1) Append encode, array$(a, 1):=a$ Exit Sub } traverse(array(b), a$+"0") traverse(array(b,1), a$+"1") End Sub } Huffman
- Output:
"p" 00000 0,0256 "l" 00001 0,0256 "t" 00010 0,0256 "r" 00011 0,0256 "x" 00100 0,0256 "u" 00101 0,0256 "s" 0011 0,0513 "o" 0100 0,0513 "m" 0101 0,0513 "n" 011 0,1026 "h" 1000 0,0513 "c" 10010 0,0256 "g" 100110 0,0256 "d" 100111 0,0256 "e" 1010 0,0769 "a" 1011 0,0769 "i" 1100 0,0769 "f" 1101 0,0769 " " 111 0,1538 0001010001100001111111000011111101101111110100010010110101000000000110101111101010000011111100000101110111010101101101111110100111001001001001111100011100110 this is an example for huffman encoding 157 bits Encoding/decoding worked
Mathematica / Wolfram Language
huffman[s_String]:=huffman[Characters[s]];huffman[l_List]:=Module[{merge,structure,rules},(*merge front two branches. list is assumed to be sorted*)merge[k_]:=Replace[k,{{a_,aC_},{b_,bC_},rest___}:>{{{a,b},aC+bC},rest}];structure=FixedPoint[Composition[merge,SortBy[#,Last]&],Tally[l]][[1,1]];rules=(#->Flatten[Position[structure,#]-1])&/@DeleteDuplicates[l];{Flatten[l/.rules],rules}];
Nim
importtables,sequtilstype# Following range can be changed to produce Huffman codes on arbitrary alphabet (e.g. ternary codes)CodeSymbol=range[0..1]HuffCode=seq[CodeSymbol]Node=refobjectf:intparent:NodecaseisLeaf:booloftrue:c:charelse:childs:array[CodeSymbol,Node]func`<`(a:Node,b:Node):bool=# For min operator.a.f<b.ffunc`$`(hc:HuffCode):string=result=""forsymbolinhc:result&=$symbolfuncfreeChildList(tree:seq[Node],parent:Node=nil):seq[Node]=## Constructs a sequence of nodes which can be adopted## Optional parent parameter can be set to ensure node will not adopt itselffornodeintree:ifnode.parent.isNilandnode!=parent:result.add(node)funcconnect(parent:Node,child:Node)=# Only call this proc when sure that parent has a free child slotchild.parent=parentparent.f+=child.fforiinparent.childs.low..parent.childs.high:ifparent.childs[i]==nil:parent.childs[i]=childreturnfuncgenerateCodes(codes:TableRef[char,HuffCode],currentNode:Node,currentCode:HuffCode=@[])=ifcurrentNode.isLeaf:letkey=currentNode.ccodes[key]=currentCodereturnforiincurrentNode.childs.low..currentNode.childs.high:ifnotcurrentNode.childs[i].isNil:letnewCode=currentCode&igenerateCodes(codes,currentNode.childs[i],newCode)funcbuildTree(frequencies:CountTable[char]):seq[Node]=result=newSeq[Node](frequencies.len)foriinresult.low..result.high:letkey=toSeq(frequencies.keys)[i]result[i]=Node(f:frequencies[key],isLeaf:true,c:key)whileresult.freeChildList.len>1:letcurrentNode=newNoderesult.add(currentNode)forcincurrentNode.childs:currentNode.connect(min(result.freeChildList(currentNode)))ifresult.freeChildList.len<=1:breakwhenisMainModule:importalgorithm,strformatconstSampleString="this is an example for huffman encoding"SampleFrequencies=SampleString.toCountTable()func`<`(code1,code2:HuffCode):bool=# Used to sort the result.ifcode1.len==code2.len:result=falsefor(c1,c2)inzip(code1,code2):ifc1!=c2:returnc1<c2else:result=code1.len<code2.lenlettree=buildTree(SampleFrequencies)root=tree.freeChildList[0]varhuffCodes=newTable[char,HuffCode]()generateCodes(huffCodes,root)for(key,value)insortedByIt(toSeq(huffCodes.pairs),it[1]):echo&"'{key}' → {value}"
- Output:
'n' → 000 ' ' → 101 's' → 0010 'h' → 0011 'm' → 0100 'f' → 1001 'i' → 1100 'a' → 1101 'e' → 1110 'd' → 01010 'x' → 01011 'g' → 01100 'r' → 01101 'c' → 01110 'u' → 01111 't' → 10000 'p' → 10001 'l' → 11110 'o' → 11111
Oberon-2
MODULEHuffmanEncoding;IMPORTObject,PriorityQueue,Strings,Out;TYPELeaf=POINTERTOLeafDesc;LeafDesc=RECORD(Object.ObjectDesc)c:CHAR;END;Inner=POINTERTOInnerDesc;InnerDesc=RECORD(Object.ObjectDesc)left,right:Object.Object;END;VARstr:ARRAY128OFCHAR;i:INTEGER;f:ARRAY96OFINTEGER;q:PriorityQueue.Queue;a:PriorityQueue.Node;b:PriorityQueue.Node;c:PriorityQueue.Node;h:ARRAY64OFCHAR;PROCEDURENewLeaf(c:CHAR):Leaf;VARx:Leaf;BEGINNEW(x);x.c:=c;RETURNxENDNewLeaf;PROCEDURENewInner(l,r:Object.Object):Inner;VARx:Inner;BEGINNEW(x);x.left:=l;x.right:=r;RETURNxENDNewInner;PROCEDUREPreorder(n:Object.Object;VARx:ARRAYOFCHAR);BEGINIFnISLeafTHENOut.Char(n(Leaf).c);Out.String(": ");Out.String(h);Out.LnELSEIFn(Inner).left#NILTHENStrings.Append("0",x);Preorder(n(Inner).left,x);Strings.Delete(x,(Strings.Length(x)-1),1)END;IFn(Inner).right#NILTHENStrings.Append("1",x);Preorder(n(Inner).right,x);Strings.Delete(x,(Strings.Length(x)-1),1)ENDENDENDPreorder;BEGINstr:="this is an example for huffman encoding";(* Collect letter frecuencies *)i:=0;WHILEstr[i]#0XDOINC(f[ORD(CAP(str[i]))-ORD(' ')]);INC(i)END;(* Create Priority Queue *)NEW(q);q.Clear();(* Insert into the queue *)i:=0;WHILE(i<LEN(f))DOIFf[i]#0THENq.Insert(f[i]/Strings.Length(str),NewLeaf(CHR(i+ORD(' '))))END;INC(i)END;(* create tree *)WHILEq.Length()>1DOq.Remove(a);q.Remove(b);q.Insert(a.w+b.w,NewInner(a.d,b.d));END;(* tree traversal *)h[0]:=0X;q.Remove(c);Preorder(c.d,h);ENDHuffmanEncoding.
- Output:
D: 00000 T: 00001 H: 0001 S: 0010 C: 00110 X: 00111 M: 0100 O: 0101 N: 011 U: 10000 L: 10001 A: 1001 R: 10100 G: 101010 P: 101011 E: 1011 I: 1100 F: 1101 : 111
Objective-C
This is not purely Objective-C. It uses Apple's Core Foundation library for its binary heap, which admittedly is very ugly. Thus, this only builds on Mac OS X, not GNUstep.
#import <Foundation/Foundation.h>@interfaceHuffmanTree : NSObject{intfreq;}-(instancetype)initWithFreq:(int)f;@property(nonatomic,readonly)intfreq;@end@implementationHuffmanTree@synthesizefreq;// the frequency of this tree-(instancetype)initWithFreq:(int)f{if(self=[superinit]){freq=f;}returnself;}@endconstvoid*HuffmanRetain(CFAllocatorRefallocator,constvoid*ptr){return(__bridge_retainedconstvoid*)(__bridgeid)ptr;}voidHuffmanRelease(CFAllocatorRefallocator,constvoid*ptr){(void)(__bridge_transferid)ptr;}CFComparisonResultHuffmanCompare(constvoid*ptr1,constvoid*ptr2,void*unused){intf1=((__bridgeHuffmanTree*)ptr1).freq;intf2=((__bridgeHuffmanTree*)ptr2).freq;if(f1==f2)returnkCFCompareEqualTo;elseif(f1>f2)returnkCFCompareGreaterThan;elsereturnkCFCompareLessThan;}@interfaceHuffmanLeaf : HuffmanTree{charvalue;// the character this leaf represents}@property(readonly)charvalue;-(instancetype)initWithFreq:(int)fcharacter:(char)c;@end@implementationHuffmanLeaf@synthesizevalue;-(instancetype)initWithFreq:(int)fcharacter:(char)c{if(self=[superinitWithFreq:f]){value=c;}returnself;}@end@interfaceHuffmanNode : HuffmanTree{HuffmanTree*left,*right;// subtrees}@property(readonly)HuffmanTree*left,*right;-(instancetype)initWithLeft:(HuffmanTree*)lright:(HuffmanTree*)r;@end@implementationHuffmanNode@synthesizeleft,right;-(instancetype)initWithLeft:(HuffmanTree*)lright:(HuffmanTree*)r{if(self=[superinitWithFreq:l.freq+r.freq]){left=l;right=r;}returnself;}@endHuffmanTree*buildTree(NSCountedSet*chars){CFBinaryHeapCallBackscallBacks={0,HuffmanRetain,HuffmanRelease,NULL,HuffmanCompare};CFBinaryHeapReftrees=CFBinaryHeapCreate(NULL,0,&callBacks,NULL);// initially, we have a forest of leaves// one for each non-empty characterfor(NSNumber*chinchars){intfreq=[charscountForObject:ch];if(freq>0)CFBinaryHeapAddValue(trees,(__bridgeconstvoid*)[[HuffmanLeafalloc]initWithFreq:freqcharacter:(char)[chintValue]]);}NSCAssert(CFBinaryHeapGetCount(trees)>0,@"String must have at least one character");// loop until there is only one tree leftwhile(CFBinaryHeapGetCount(trees)>1){// two trees with least frequencyHuffmanTree*a=(__bridgeHuffmanTree*)CFBinaryHeapGetMinimum(trees);CFBinaryHeapRemoveMinimumValue(trees);HuffmanTree*b=(__bridgeHuffmanTree*)CFBinaryHeapGetMinimum(trees);CFBinaryHeapRemoveMinimumValue(trees);// put into new node and re-insert into queueCFBinaryHeapAddValue(trees,(__bridgeconstvoid*)[[HuffmanNodealloc]initWithLeft:aright:b]);}HuffmanTree*result=(__bridgeHuffmanTree*)CFBinaryHeapGetMinimum(trees);CFRelease(trees);returnresult;}voidprintCodes(HuffmanTree*tree,NSMutableString*prefix){NSCAssert(tree!=nil,@"tree must not be nil");if([treeisKindOfClass:[HuffmanLeafclass]]){HuffmanLeaf*leaf=(HuffmanLeaf*)tree;// print out character, frequency, and code for this leaf (which is just the prefix)NSLog(@"%c\t%d\t%@",leaf.value,leaf.freq,prefix);}elseif([treeisKindOfClass:[HuffmanNodeclass]]){HuffmanNode*node=(HuffmanNode*)tree;// traverse left[prefixappendString:@"0"];printCodes(node.left,prefix);[prefixdeleteCharactersInRange:NSMakeRange([prefixlength]-1,1)];// traverse right[prefixappendString:@"1"];printCodes(node.right,prefix);[prefixdeleteCharactersInRange:NSMakeRange([prefixlength]-1,1)];}}intmain(intargc,constchar*argv[]){@autoreleasepool{NSString*test=@"this is an example for huffman encoding";// read each character and record the frequenciesNSCountedSet*chars=[[NSCountedSetalloc]init];intn=[testlength];for(inti=0;i<n;i++)[charsaddObject:@([testcharacterAtIndex:i])];// build treeHuffmanTree*tree=buildTree(chars);// print out resultsNSLog(@"SYMBOL\tWEIGHT\tHUFFMAN CODE");printCodes(tree,[NSMutableStringstring]);}return0;}
- Output:
SYMBOL WEIGHT HUFFMAN CODE g 1 00000 x 1 00001 m 2 0001 d 1 00100 u 1 00101 t 1 00110 r 1 00111 n 4 010 s 2 0110 o 2 0111 p 1 10000 l 1 10001 a 3 1001 6 101 f 3 1100 e 3 1101 c 1 11100 h 2 11101 i 3 1111
OCaml
We use a Set (which is automatically sorted) as a priority queue.
type'ahuffman_tree=|Leafof'a|Nodeof'ahuffman_tree*'ahuffman_treemoduleHSet=Set.Make(structtypet=int*charhuffman_tree(* pair of frequency and the tree *)letcompare=compare(* We can use the built-in compare function to order this: it will order first by the first element (frequency) and then by the second (the tree), the latter of which we don't care about but which helps prevent elements from being equal, since Set does not allow duplicate elements *)end);;letbuild_treecharFreqs=letleaves=HSet.of_list(List.map(fun(c,f)->(f,Leafc))charFreqs)inletrecauxtrees=letf1,a=HSet.min_elttreesinlettrees'=HSet.remove(f1,a)treesinifHSet.is_emptytrees'thenaelseletf2,b=HSet.min_elttrees'inlettrees''=HSet.remove(f2,b)trees'inlettrees'''=HSet.add(f1+f2,Node(a,b))trees''inauxtrees'''inauxleavesletrecprint_treecode=function|Leafc->Printf.printf"%c\t%s\n"c(String.concat""(List.revcode));|Node(l,r)->print_tree("0"::code)l;print_tree("1"::code)rlet()=letstr="this is an example for huffman encoding"inletcharFreqs=Hashtbl.create42inString.iter(func->letold=tryHashtbl.findcharFreqscwithNot_found->0inHashtbl.replacecharFreqsc(old+1))str;letcharFreqs=Hashtbl.fold(funcfacc->(c,f)::acc)charFreqs[]inlettree=build_treecharFreqsinprint_string"Symbol\tHuffman code\n";print_tree[]tree
Ol
(definephrase"this is an example for huffman encoding"); prepare initial probabilities table(definetable(ff->list(fold(lambda(ffx)(putffx(+(ffx0)1))){}(string->runesphrase)))); just sorter...(define(resortl)(sort(lambda(xy)(<(cdrx)(cdry)))l)); ...to sort table(definetable(resorttable)); build huffman tree(definetree(letloop((tabletable))(if(null?(cdrtable))(cartable)(loop(resort(cons(cons{1(cartable)0(cadrtable)}(+(cdartable)(cdadrtable)))(cddrtable))))))); huffman codes(definecodes(map(lambda(i)(call/cc(lambda(return)(letloop((prefix#null)(treetree))(if(ff?(cartree))(begin(loop(cons0prefix)((cartree)0))(loop(cons1prefix)((cartree)1)))(if(eq?(cartree)i)(return(reverseprefix))))))))(mapcartable)))
- Output:
(print"weights: ---------------------------")(for-each(lambda(ch)(print(string(carch))": "(cdrch)))(reversetable))(print"codes: -----------------------------")(map(lambda(charcode)(print(stringchar)": "code))(reverse(mapcartable))(reversecodes))
weights: --------------------------- : 6 n: 4 i: 3 f: 3 e: 3 a: 3 s: 2 o: 2 m: 2 h: 2 x: 1 u: 1 t: 1 r: 1 p: 1 l: 1 g: 1 d: 1 c: 1 codes: ----------------------------- : (0 0 0) n: (1 1 0) i: (0 1 0 0) f: (0 1 0 1) e: (0 0 1 0) a: (0 0 1 1) s: (0 1 1 1) o: (1 0 1 0) m: (1 0 1 1) h: (1 0 0 0) x: (0 1 1 0 1) u: (0 1 1 0 0 0) t: (0 1 1 0 0 1) r: (1 1 1 1 0) p: (1 1 1 1 1) l: (1 1 1 0 0) g: (1 1 1 0 1) d: (1 0 0 1 0) c: (1 0 0 1 1)
Perl
use5.10.0;usestrict;# produce encode and decode dictionary from a treesubwalk{my($node,$code,$h,$rev_h)=@_;my$c=$node->[0];if(ref$c){walk($c->[$_],$code.$_,$h,$rev_h)for0,1}else{$h->{$c}=$code;$rev_h->{$code}=$c}$h,$rev_h}# make a tree, and return resulting dictionariessubmktree{my(%freq,@nodes);$freq{$_}++forsplit'',shift;@nodes=map([$_,$freq{$_}],keys%freq);do{# poor man's priority queue@nodes=sort{$a->[1]<=>$b->[1]}@nodes;my($x,$y)=splice@nodes,0,2;push@nodes,[[$x,$y],$x->[1]+$y->[1]]}while(@nodes>1);walk($nodes[0],'',{},{})}subencode{my($str,$dict)=@_;join'',map$dict->{$_}//die("bad char $_"),split'',$str}subdecode{my($str,$dict)=@_;my($seg,@out)=("");# append to current segment until it's in the dictionaryfor(split'',$str){$seg.=$_;my$x=$dict->{$seg}//next;push@out,$x;$seg='';}die"bad code"iflength($seg);join'',@out}my$txt='this is an example for huffman encoding';my($h,$rev_h)=mktree($txt);for(keys%$h){print"'$_': $h->{$_}\n"}my$enc=encode($txt,$h);print"$enc\n";printdecode($enc,$rev_h),"\n";
- Output:
'u': 10000 'd': 01111 'a': 1101 'l': 10001 'i': 1110 'g': 11110 'h': 0100 'r': 01110 ' ': 101 'p': 01100 't': 01101 'n': 000 'm': 0011 'x': 01011 'f': 1100 'c': 01010 'o': 0010 's': 11111 'e': 1001 0110101001110111111011110111111011101000101100101011110100110110010001100110111000010011101010100100001100110000111101000101100100001010001001111111000011110 this is an example for huffman encoding
Phix
withjavascript_semanticsfunctionstore_nodes(objectkey,objectdata,integernodes)setd({data,key},0,nodes)return1endfunctionfunctionbuild_freqtable(stringdata)integerfreq=new_dict(),nodes=new_dict()fori=1tolength(data)dointegerdi=data[i]setd(di,getd(di,freq)+1,freq)endfortraverse_dict(store_nodes,nodes,freq)destroy_dict(freq)returnnodesendfunctionfunctionbuild_hufftree(integernodes)sequencenodewhiletruedosequencelkey=getd_partial_key({0,0},nodes)integerlfreq=lkey[1]deld(lkey,nodes)sequencerkey=getd_partial_key({0,0},nodes)integerrfreq=rkey[1]deld(rkey,nodes)node={lfreq+rfreq,{lkey,rkey}}ifdict_size(nodes)=0thenexitendifsetd(node,0,nodes)endwhiledestroy_dict(nodes)returnnodeendfunctionprocedurebuild_huffcodes(objectnode,stringbits,integerd){integerfreq,objectdata}=nodeifsequence(data)thenbuild_huffcodes(data[1],bits&'0',d)build_huffcodes(data[2],bits&'1',d)elsesetd(data,{freq,bits},d)endifendprocedurefunctionprint_huffcode(integerkey,sequencedata,integer/*user_data*/){integeri,strings}=dataprintf(1,"'%c' [%d] %s\n",{key,i,s})return1endfunctionprocedureprint_huffcodes(integerd)traverse_dict(print_huffcode,0,d)endprocedurefunctioninvert_huffcode(integerkey,sequencedata,integerrd)setd(data[2],key,rd)return1endfunctionproceduremain(stringdata)iflength(data)<2then?9/0endifintegernodes=build_freqtable(data)sequencehuff=build_hufftree(nodes)integerd=new_dict()build_huffcodes(huff,"",d)print_huffcodes(d)stringencoded=""fori=1tolength(data)doencoded&=getd(data[i],d)[2]endfor?shorten(encoded)integerrd=new_dict()traverse_dict(invert_huffcode,rd,d)stringdecoded=""integerdone=0whiledone<length(encoded)dostringkey=""integernode=0whilenode=0dodone+=1key&=encoded[done]node=getd_index(key,rd)endwhiledecoded&=getd_by_index(node,rd)endwhile?decodedendproceduremain("this is an example for huffman encoding")
- Output:
' ' [6] 101 'a' [3] 1001 'c' [1] 01010 'd' [1] 01011 'e' [3] 1100 'f' [3] 1101 'g' [1] 01100 'h' [2] 11111 'i' [3] 1110 'l' [1] 01101 'm' [2] 0010 'n' [4] 000 'o' [2] 0011 'p' [1] 01110 'r' [1] 01111 's' [2] 0100 't' [1] 10000 'u' [1] 10001 'x' [1] 11110 "10000111111110010010...01101011111000001100 (157 digits)" "this is an example for huffman encoding"
PHP
(not exactly)
<?phpfunctionencode($symb2freq){$heap=newSplPriorityQueue;$heap->setExtractFlags(SplPriorityQueue::EXTR_BOTH);foreach($symb2freqas$sym=>$wt)$heap->insert(array($sym=>''),-$wt);while($heap->count()>1){$lo=$heap->extract();$hi=$heap->extract();foreach($lo['data']as&$x)$x='0'.$x;foreach($hi['data']as&$x)$x='1'.$x;$heap->insert($lo['data']+$hi['data'],$lo['priority']+$hi['priority']);}$result=$heap->extract();return$result['data'];}$txt='this is an example for huffman encoding';$symb2freq=array_count_values(str_split($txt));$huff=encode($symb2freq);echo"Symbol\tWeight\tHuffman Code\n";foreach($huffas$sym=>$code)echo"$sym\t$symb2freq[$sym]\t$code\n";?>
- Output:
Symbol Weight Huffman Code n 4 000 m 2 0010 o 2 0011 t 1 01000 g 1 01001 x 1 01010 u 1 01011 s 2 0110 c 1 01110 d 1 01111 p 1 10000 l 1 10001 a 3 1001 6 101 f 3 1100 i 3 1101 r 1 11100 h 2 11101 e 3 1111
Picat
go => huffman("this is an example for huffman encoding"). huffman(LA) :- LS=sort(LA), packList(LS,PL), PLS=sort(PL).remove_dups(), build_tree(PLS, A), coding(A, [], C), SC=sort(C).remove_dups(), println("Symbol\tWeight\tCode"), foreach(SS in SC) print_code(SS) end. build_tree([[V1|R1], [V2|R2]|T], AF) :- V = V1 + V2, A = [V, [V1|R1], [V2|R2]], ( T=[] -> AF=A ; NT=sort([A|T]), build_tree(NT, AF) ). coding([_A,FG,FD], Code, CF) :- ( is_node(FG) -> coding(FG, [0 | Code], C1) ; leaf_coding(FG, [0 | Code], C1) ), ( is_node(FD) -> coding(FD, [1 | Code], C2) ; leaf_coding(FD, [1 | Code], C2) ), append(C1, C2, CF). leaf_coding([FG,FD], Code, CF) :- CodeR = reverse(Code), CF = [[FG, FD, CodeR]] . is_node([_V, _FG, _FD]). print_code([N, Car, Code]) :- printf("%w:\t%w\t", Car, N), foreach(V in Code) print(V) end, nl. packList([], []). packList([X],[[1,X]]). packList([X|Rest], XRunPacked) :- XRunPacked = [XRun|Packed], run(X, Rest, XRun, RRest), packList(RRest, Packed). run(V, [], VV, []) :- VV=[1,V]. run(V, [V|LRest], [N1,V], RRest) :- run(V, LRest, [N, V], RRest), N1 = N + 1. run(V, [Other|RRest], [1,V], [Other|RRest]) :- different_terms(V, Other).
- Output:
Symbol Weight Code c: 1 01010 d: 1 01011 g: 1 01100 l: 1 01101 p: 1 01110 r: 1 01111 t: 1 10000 u: 1 10001 x: 1 11110 h: 2 11111 m: 2 0010 o: 2 0011 s: 2 0100 a: 3 1001 e: 3 1100 f: 3 1101 i: 3 1110 n: 4 000 : 6 101
PicoLisp
Using a cons cells (freq . char) for leaves, and two cells (freq left . right) for nodes.
(de prio (Idx) (while (cadr Idx) (setq Idx @)) (car Idx) ) (let (A NIL P NIL L NIL) (for C (chop "this is an example for huffman encoding") (accu 'A C 1) ) # Count characters (for X A # Build index tree as priority queue (idx 'P (cons (cdr X) (car X)) T) ) (while (or (cadr P) (cddr P)) # Remove entries, insert as nodes (let (A (car (idx 'P (prio P) NIL)) B (car (idx 'P (prio P) NIL))) (idx 'P (cons (+ (car A) (car B)) A B) T) ) ) (setq P (car P)) (recur (P L) # Traverse and print (if (atom (cdr P)) (prinl (cdr P) " " L) (recurse (cadr P) (cons 0 L)) (recurse (cddr P) (cons 1 L)) ) ) )
- Output:
n 000 m 0100 o 1100 s 0010 c 01010 d 11010 g 00110 l 10110 p 01110 r 11110 t 00001 u 10001 a 1001 101 e 0011 f 1011 i 0111 x 01111 h 11111
PL/I
*process source attributes xref or(!); hencode: Proc Options(main); /*-------------------------------------------------------------------- * 28.12.013 Walter Pachl translated from REXX *-------------------------------------------------------------------*/ Dcl debug Bit(1) Init('0'b); Dcl (i,j,k) Bin Fixed(15); Dcl c Char(1); Dcl s Char(100) Var Init('this is an example for huffman encoding'); Dcl sc Char(1000) Var Init(''); Dcl sr Char(100) Var Init(''); Dcl 1 cocc(100), 2 c Char(1), 2 occ Bin Fixed(31); Dcl cocc_n Bin Fixed(15) Init(0); dcl 1 node, 2 id Bin Fixed(15), /* Node id */ 2 c Char(1), /* character */ 2 occ Bin Fixed(15), /* number of occurrences */ 2 left Bin Fixed(15), /* left child */ 2 rite Bin Fixed(15), /* right child */ 2 father Bin Fixed(15), /* father */ 2 digit Pic'9', /* digit (0 or 1) */ 2 term Pic'9'; /* 1=terminal node */ node=''; Dcl 1 m(100) Like node; Dcl m_n Bin Fixed(15) Init(0); Dcl father(100) Bin Fixed(15); Dcl 1 t(100), 2 char Char(1), 2 code Char(20) Var; Dcl t_n Bin Fixed(15) Init(0); Do i=1 To length(s); /* first collect used characters */ c=substr(s,i,1); /* and number of occurrences */ Do j=1 To cocc_n; If cocc(j).c=c Then Leave; End; If j<= cocc_n Then cocc(j).occ+=1; Else Do; cocc(j).c=c; cocc(j).occ=1; cocc_n+=1; End; End; Do j=1 To cocc_n; /* create initial node list */ node.id+=1; node.c=cocc(j).c; node.occ=cocc(j).occ; node.term=1; Call add_node; End; If debug Then Call show; Do While(pairs()); /* while there is more than one fatherless node */ Call mk_node; /* create a father node */ If debug Then Call show; End; Call show; /* show the node table */ Call mk_trans; /* create the translate table */ Put Edit('The translate table:')(Skip,a); Do i=1 To t_n; /* show it */ Put Edit(t(i).char,' -> ',t(i).code)(Skip,a,a,a); End; Call encode; /* encode the string s -> sc */ Put Edit('length(sc)=',length(sc)) /* show it */ (Skip,a,f(3)); Do i=1 By 70 To length(sc); Put Edit(substr(sc,i,70))(Skip,a); End; Call decode; /* decode the string sc -> sr */ Put Edit('input : ',s)(skip,a,a); Put Edit('result: ',sr)(skip,a,a); Return; add_node: Proc; /*-------------------------------------------------------------------- * Insert the node according to increasing occurrences *-------------------------------------------------------------------*/ il: Do i=1 To m_n; If m(i).occ>=node.occ Then Do; Do k=m_n To i By -1; m(k+1)=m(k); End; Leave il; End; End; m(i)=node; m_n+=1; End; show: Proc; /*-------------------------------------------------------------------- * Show the contents of the node table *-------------------------------------------------------------------*/ Put Edit('The list of nodes:')(Skip,a); Put Edit('id c oc l r f d t')(Skip,a); Do i=1 To m_n; Put Edit(m(i).id,m(i).c,m(i).occ, m(i).left,m(i).rite,m(i).father,m(i).digit,m(i).term) (Skip,f(2),x(1),a,4(f(3)),f(2),f(3)); End; End; mk_node: Proc; /*-------------------------------------------------------------------- * construct and store a new intermediate node or the top node *-------------------------------------------------------------------*/ Dcl z Bin Fixed(15); node=''; node.id=m_n+1; /* the next node id */ node.c='*'; ni=m_n+1; loop: Do i=1 To m_n; /* loop over node lines */ If m(i).father=0 Then Do; /* a fatherless node */ z=m(i).id; /* its id */ If node.left=0 Then Do; /* new node has no left child */ node.left=z; /* make this the lect child */ node.occ=m(i).occ; /* occurrences */ m(i).father=ni; /* store father info */ m(i).digit=0; /* digit 0 to be used */ father(z)=ni; /* remember z's father (redundant) */ End; Else Do; /* New node has already left child */ node.rite=z; /* make this the right child */ node.occ=node.occ+m(i).occ; /* add in the occurrences */ m(i).father=ni; /* store father info */ m(i).digit=1; /* digit 1 to be used */ father(z)=ni; /* remember z's father (redundant) */ Leave loop; End; End; End; Call add_node; End; pairs: Proc Returns(Bit(1)); /*-------------------------------------------------------------------- * Return true if there are at least 2 fatherless nodes *-------------------------------------------------------------------*/ Dcl i Bin Fixed(15); Dcl cnt Bin Fixed(15) Init(0); Do i=1 To m_n; If m(i).father=0 Then Do; cnt+=1; If cnt>1 Then Return('1'b); End; End; Return('0'b); End; mk_trans: Proc; /*-------------------------------------------------------------------- * Compute the codes for all terminal nodes (characters) * and store the relation char -> code in array t(*) *-------------------------------------------------------------------*/ Dcl (i,fi,fid,fidz,node,z) Bin Fixed(15); Dcl code Char(20) Var; Do i=1 To m_n; /* now we loop over all lines representing nodes */ If m(i).term Then Do; /* for each terminal node */ code=m(i).digit; /* its digit is the last code digit */ node=m(i).id; /* its id */ Do fi=1 To 1000; /* actually Forever */ fid=father(node); /* id of father */ If fid>0 Then Do; /* father exists */ fidz=zeile(fid); /* line that contains the father */ code=m(fidz).digit!!code; /* prepend the digit */ node=fid; /* look for next father */ End; Else /* no father (we reached the top */ Leave; End; If length(code)>1 Then /* more than one character in input */ code=substr(code,2); /* remove the the top node's 0 */ call dbg(m(i).c!!' -> '!!code); /* character is encoded this way*/ ti_loop: Do ti=1 To t_n; If t(ti).char>m(i).c Then Do; Do tj=t_n To ti By -1 t(tj+1)=t(tj); End; Leave ti_loop; End; End; t(ti).char=m(i).c; t(ti).code=code; t_n+=1; Call dbg(t(ti).char!!' -> '!!t(ti).code); End; End; End; zeile: Proc(nid) Returns(Bin Fixed(15)); /*-------------------------------------------------------------------- * find and return line number containing node-id *-------------------------------------------------------------------*/ Dcl (nid,i) Bin Fixed(15); do i=1 To m_n; If m(i).id=nid Then Return(i); End; Stop; End; dbg: Proc(txt); /*-------------------------------------------------------------------- * Show text if debug is enabled *-------------------------------------------------------------------*/ Dcl txt Char(*); If debug Then Put Skip List(txt); End; encode: Proc; /*-------------------------------------------------------------------- * encode the string s -> sc *-------------------------------------------------------------------*/ Dcl (i,j) Bin Fixed(15); Do i=1 To length(s); c=substr(s,i,1); Do j=1 To t_n; If c=t(j).char Then Leave; End; sc=sc!!t(j).code; End; End; decode: Proc; /*-------------------------------------------------------------------- * decode the string sc -> sr *-------------------------------------------------------------------*/ Dcl (i,j) Bin Fixed(15); Do While(sc>''); Do j=1 To t_n; If substr(sc,1,length(t(j).code))=t(j).code Then Leave; End; sr=sr!!t(j).char; sc=substr(sc,length(t(j).code)+1); End; End; End;
- Output:
The list of nodes: id c oc l r f d t 19 g 1 0 0 20 0 1 18 d 1 0 0 20 1 1 17 c 1 0 0 21 0 1 16 u 1 0 0 21 1 1 15 r 1 0 0 22 0 1 12 l 1 0 0 22 1 1 11 p 1 0 0 23 0 1 9 x 1 0 0 23 1 1 1 t 1 0 0 24 0 1 23 * 2 11 9 24 1 0 22 * 2 15 12 25 0 0 21 * 2 17 16 25 1 0 20 * 2 19 18 26 0 0 14 o 2 0 0 26 1 1 10 m 2 0 0 27 0 1 4 s 2 0 0 27 1 1 2 h 2 0 0 28 0 1 24 * 3 1 23 28 1 0 13 f 3 0 0 29 0 1 8 e 3 0 0 29 1 1 6 a 3 0 0 30 0 1 3 i 3 0 0 30 1 1 27 * 4 10 4 31 0 0 26 * 4 20 14 31 1 0 25 * 4 22 21 32 0 0 7 n 4 0 0 32 1 1 28 * 5 2 24 33 0 0 30 * 6 6 3 33 1 0 29 * 6 13 8 34 0 0 5 6 0 0 34 1 1 32 * 8 25 7 35 0 0 31 * 8 27 26 35 1 0 33 * 11 28 30 36 0 0 34 * 12 29 5 36 1 0 35 * 16 32 31 37 0 0 36 * 23 33 34 37 1 0 37 * 39 35 36 0 0 0 The translate table: -> 111 a -> 1010 c -> 00010 d -> 01101 e -> 1101 f -> 1100 g -> 01100 h -> 1000 i -> 1011 l -> 00001 m -> 0100 n -> 001 o -> 0111 p -> 100110 r -> 00000 s -> 0101 t -> 10010 u -> 00011 x -> 100111 length(sc)=157 1001010001011010111110110101111101000111111011001111010010010011000001 1101111110001110000011110000001111001100010010100011111101001000100111 01101101100101100 input : this is an example for huffman encoding result: this is an example for huffman encoding
PowerShell
functionGet-HuffmanEncodingTable($String){# Create leaf nodes$ID=0$Nodes=[char[]]$String|Group-Object|ForEach{$ID++;$_}|Select @{Label='Symbol';Expression={$_.Name}},@{Label='Count';Expression={$_.Count}},@{Label='ID';Expression={$ID}},@{Label='Parent';Expression={0}},@{Label='Code';Expression={''}}# Grow stems under leafsForEach($Branchin2..($Nodes.Count)){# Get the two nodes with the lowest count$LowNodes=$Nodes|Where Parent-eq0|Sort Count|Select -First2# Create a new stem node$ID++$Nodes+=''|Select @{Label='Symbol';Expression={''}},@{Label='Count';Expression={$LowNodes[0].Count+$LowNodes[1].Count}},@{Label='ID';Expression={$ID}},@{Label='Parent';Expression={0}},@{Label='Code';Expression={''}}# Put the two nodes in the new stem node$LowNodes[0].Parent=$ID$LowNodes[1].Parent=$ID# Assign 0 and 1 to the left and right nodes$LowNodes[0].Code='0'$LowNodes[1].Code='1'}# Assign coding to nodesForEach($Nodein$Nodes[($Nodes.Count-2)..0]){$Node.Code=($Nodes|Where ID-eq$Node.Parent).Code+$Node.Code}$EncodingTable=$Nodes|Where {$_.Symbol}|Select Symbol,Code|Sort Symbolreturn$EncodingTable}# Get table for given string$String="this is an example for huffman encoding"$HuffmanEncodingTable=Get-HuffmanEncodingTable$String# Display table$HuffmanEncodingTable|Format-Table-AutoSize# Encode string$EncodedString=$StringForEach($Nodein$HuffmanEncodingTable){$EncodedString=$EncodedString.Replace($Node.Symbol,$Node.Code)}$EncodedString
- Output:
Symbol Code ------ ---- 101 a 1100 c 01011 d 01100 e 1101 f 1110 g 01110 h 11111 i 1001 l 11110 m 0011 n 000 o 0100 p 10001 r 01111 s 0010 t 01010 u 01101 x 10000 0101011111100100101011001001010111000001011101100001100001110001111101101101111001000111110111111011011110111000111100000101110100001011010001100100100001110
Prolog
Works with SWI-Prolog
huffman:-L='this is an example for huffman encoding',atom_chars(L,LA),msort(LA,LS),packList(LS,PL),sort(PL,PLS),build_tree(PLS,A),coding(A,[],C),sort(C,SC),format('Symbol~t Weight~t~30|Code~n'),maplist(print_code,SC).build_tree([[V1|R1],[V2|R2]|T],AF):-VisV1+V2,A=[V,[V1|R1],[V2|R2]],(T=[]->AF=A;sort([A|T],NT),build_tree(NT,AF)).coding([_A,FG,FD],Code,CF):-(is_node(FG)->coding(FG,[0|Code],C1);leaf_coding(FG,[0|Code],C1)),(is_node(FD)->coding(FD,[1|Code],C2);leaf_coding(FD,[1|Code],C2)),append(C1,C2,CF).leaf_coding([FG,FD],Code,CF):-reverse(Code,CodeR),CF=[[FG,FD,CodeR]].is_node([_V,_FG,_FD]).print_code([N,Car,Code]):-format('~w :~t~w~t~30|',[Car,N]),forall(member(V,Code),write(V)),nl.packList([],[]).packList([X],[[1,X]]):-!.packList([X|Rest],[XRun|Packed]):-run(X,Rest,XRun,RRest),packList(RRest,Packed).run(V,[],[1,V],[]).run(V,[V|LRest],[N1,V],RRest):-run(V,LRest,[N,V],RRest),N1isN+1.run(V,[Other|RRest],[1,V],[Other|RRest]):-dif(V,Other).
- Output:
?- huffman. Symbol Weight Code c : 1 01010 d : 1 01011 g : 1 01100 l : 1 01101 p : 1 01110 r : 1 01111 t : 1 10000 u : 1 10001 x : 1 11110 h : 2 11111 m : 2 0010 o : 2 0011 s : 2 0100 a : 3 1001 e : 3 1100 f : 3 1101 i : 3 1110 n : 4 000 : 6 101
PureBasic
OpenConsole()SampleString.s="this is an example for huffman encoding"datalen=Len(SampleString)Structureztreelinked.cischar.cchar.cnumber.lleft.lright.lEndStructureDimmemc.c(datalen)CopyMemory(@SampleString,@memc(0),datalen*SizeOf(Character))Dimtree.ztree(255)Fori=0Todatalen-1tree(memc(i))\char=memc(i)tree(memc(i))\number+1tree(memc(i))\ischar=1NextSortStructuredArray(tree(),#PB_Sort_Descending,OffsetOf(ztree\number),#PB_Integer)Fori=0To255Iftree(i)\number=0ReDimtree(i-1)BreakEndIfNextdimsize=ArraySize(tree())Repeatmin1.l=0min2.l=0Fori=0TodimsizeIftree(i)\linked=0Iftree(i)\number<min1Ormin1=0min1=tree(i)\numberhmin1=iElseIftree(i)\number<min2Ormin2=0min2=tree(i)\numberhmin2=iEndIfEndIfNextIfmin1=0Ormin2=0BreakEndIfdimsize+1ReDimtree(dimsize)tree(dimsize)\number=tree(hmin1)\number+tree(hmin2)\numbertree(hmin1)\left=dimsizetree(hmin2)\right=dimsizetree(hmin1)\linked=1tree(hmin2)\linked=1ForEveri=0Whiletree(i)\ischar=1str.s=""k=iZNEXT:Iftree(k)\left<>0str="0"+strk=tree(k)\leftGotoZNEXTElseIftree(k)\right<>0str="1"+strk=tree(k)\rightGotoZNEXTEndIfPrintN(Chr(tree(i)\char)+" "+str)i+1WendInput()CloseConsole()
- Output:
110 n 000 e 1010 f 1001 a 1011 i 1110 h 0010 s 11111 o 0011 m 0100 x 01010 u 01011 l 01100 r 01101 c 01110 g 01111 p 10000 t 10001 d 11110
Python
A slight modification of the method outlined in the task description allows the code to be accumulated as the heap is manipulated.
The output is sorted first on length of the code, then on the symbols.
fromheapqimportheappush,heappop,heapifyfromcollectionsimportdefaultdictdefencode(symb2freq):"""Huffman encode the given dict mapping symbols to weights"""heap=[[wt,[sym,""]]forsym,wtinsymb2freq.items()]heapify(heap)whilelen(heap)>1:lo=heappop(heap)hi=heappop(heap)forpairinlo[1:]:pair[1]='0'+pair[1]forpairinhi[1:]:pair[1]='1'+pair[1]heappush(heap,[lo[0]+hi[0]]+lo[1:]+hi[1:])returnsorted(heappop(heap)[1:],key=lambdap:(len(p[-1]),p))txt="this is an example for huffman encoding"symb2freq=defaultdict(int)forchintxt:symb2freq[ch]+=1# in Python 3.1+:# symb2freq = collections.Counter(txt)huff=encode(symb2freq)print"Symbol\tWeight\tHuffman Code"forpinhuff:print"%s\t%s\t%s"%(p[0],symb2freq[p[0]],p[1])
- Output:
Symbol Weight Huffman Code 6 101 n 4 010 a 3 1001 e 3 1100 f 3 1101 h 2 0001 i 3 1110 m 2 0010 o 2 0011 s 2 0111 g 1 00000 l 1 00001 p 1 01100 r 1 01101 t 1 10000 u 1 10001 x 1 11110 c 1 111110 d 1 111111
An extension to the method outlined above is given here.
Alternative
This implementation creates an explicit tree structure, which is used during decoding. We also make use of a "pseudo end of file" symbol and padding bits to facilitate reading and writing encoded data to from/to a file.
"""Huffman encoding and decoding. Requires Python >= 3.7."""from__future__importannotationsfromcollectionsimportCounterfromheapqimportheapifyfromheapqimportheappushfromheapqimportheappopfromitertoolsimportchainfromitertoolsimportislicefromtypingimportBinaryIOfromtypingimportDictfromtypingimportIterablefromtypingimportOptionalfromtypingimportTupleLEFT_BIT="0"RIGHT_BIT="1"WORD_SIZE=8# Assumed to be a multiple of 8.READ_SIZE=WORD_SIZE//8P_EOF=1<<WORD_SIZEclassNode:"""Huffman tree node."""def__init__(self,weight:int,symbol:Optional[int]=None,left:Optional[Node]=None,right:Optional[Node]=None,):self.weight=weightself.symbol=symbolself.left=leftself.right=rightdefis_leaf(self)->bool:"""Return `True` if this node is a leaf node, or `False` otherwise."""returnself.leftisNoneandself.rightisNonedef__lt__(self,other:Node)->bool:returnself.weight<other.weightdefhuffman_tree(weights:Dict[int,int])->Node:"""Build a prefix tree from a map of symbol frequencies."""heap=[Node(v,k)fork,vinweights.items()]heapify(heap)# Pseudo end-of-file with a weight of 1.heappush(heap,Node(1,P_EOF))whilelen(heap)>1:left,right=heappop(heap),heappop(heap)node=Node(weight=left.weight+right.weight,left=left,right=right)heappush(heap,node)returnheappop(heap)defhuffman_table(tree:Node)->Dict[int,str]:"""Build a table of prefix codes by visiting every leaf node in `tree`."""codes:Dict[int,str]={}defwalk(node:Optional[Node],code:str=""):ifnodeisNone:returnifnode.is_leaf():assertnode.symbolcodes[node.symbol]=codereturnwalk(node.left,code+LEFT_BIT)walk(node.right,code+RIGHT_BIT)walk(tree)returncodesdefhuffman_encode(data:bytes)->Tuple[Iterable[bytes],Node]:"""Encode the given byte string using Huffman coding. Returns the encoded byte stream and the Huffman tree required to decode those bytes. """weights=Counter(data)tree=huffman_tree(weights)table=huffman_table(tree)return_encode(data,table),treedefhuffman_decode(data:Iterable[bytes],tree:Node)->bytes:"""Decode the given byte stream using a Huffman tree."""returnbytes(_decode(_bits_from_bytes(data),tree))def_encode(stream:Iterable[int],codes:Dict[int,str])->Iterable[bytes]:bits=chain(chain.from_iterable(codes[s]forsinstream),codes[P_EOF])# Pack bits (stream of 1s and 0s) one word at a time.whileTrue:word="".join(islice(bits,WORD_SIZE))# Most significant bit first.ifnotword:break# Pad last bits if they don't align to a whole word.iflen(word)<WORD_SIZE:word=word.ljust(WORD_SIZE,"0")# Byte order becomes relevant when READ_SIZE > 1.yieldint(word,2).to_bytes(READ_SIZE,byteorder="big",signed=False)def_decode(bits:Iterable[str],tree:Node)->Iterable[int]:node=treeforbitinbits:ifbit==LEFT_BIT:assertnode.leftnode=node.leftelse:assertnode.rightnode=node.rightifnode.symbol==P_EOF:breakifnode.is_leaf():assertnode.symbolyieldnode.symbolnode=tree# Back to the top of the tree.def_word_to_bits(word:bytes)->str:"""Return the binary representation of a word given as a byte string, including leading zeros up to WORD_SIZE. For example, when WORD_SIZE is 8: _word_to_bits(b'65') == '01000001' """i=int.from_bytes(word,"big")returnbin(i)[2:].zfill(WORD_SIZE)def_bits_from_file(file:BinaryIO)->Iterable[str]:"""Generate a stream of bits (strings of either "0" or "1") from file-like object `file`, opened in binary mode."""word=file.read(READ_SIZE)whileword:yield from_word_to_bits(word)word=file.read(READ_SIZE)def_bits_from_bytes(stream:Iterable[bytes])->Iterable[str]:"""Generate a stream of bits (strings of either "0" or "1") from an iterable of single byte byte-strings."""returnchain.from_iterable(_word_to_bits(byte)forbyteinstream)defmain():"""Example usage."""s="this is an example for huffman encoding"data=s.encode()# Need a byte stringencoded,tree=huffman_encode(data)# Pretty print the Huffman tableprint(f"Symbol Code\n------ ----")fork,vinsorted(huffman_table(tree).items(),key=lambdax:len(x[1])):print(f"{chr(k):<6}{v}")# Print the bit pattern of the encoded dataprint("".join(_bits_from_bytes(encoded)))# Encode then decodedecoded=huffman_decode(*huffman_encode(data))print(decoded.decode())if__name__=="__main__":main()
- Output:
Symbol Code ------ ---- n 000 110 m 0010 h 0101 i 1001 f 1010 e 1011 a 1110 r 00110 l 00111 c 01000 u 01001 x 01100 d 01101 t 01110 p 01111 Ā 10000 g 10001 o 11110 s 11111 011100101100111111110100111111110111000011010110110011100010011110011110111101010111100011011001010100110101010001011100001101011000010001111001101100100010001100000000 this is an example for huffman encoding
Quackery
To use this code you will need to load the higher-order words defined at Higher-order functions#Quackery and the priority queue words defined at Priority queue#Quackery.
The word huffmantree
takes a string and generates a tree from it suitable for Huffman decoding. To decode a single character, start with the whole tree and either 0 peek
or 1 peek
according to the next bit in the compressed stream until you reach a number (ascii character code.)
The word huffmanlist
will turn the Huffman tree into a nest of nests, each containing an ascii character code and a nest containing a Huffman code. The nests are sorted by ascii character code to facilitate binary splitting.
[ 2dup peek 1+ unrot poke ] is itemincr ( [ n --> [ ) [ [ 0 128 of ] constant swap witheach itemincr ' [ i^ join ] map ' [ 0 peek ] filter ] is countchars ( $ --> [ ) [ 0 peek dip [ 0 peek ] < ] is fewerchars ( [ [ --> b ) [ behead rot behead rot + unrot dip nested nested join join ] is makenode ( [ [ --> [ ) [ [ dup pqsize 1 > while frompq dip frompq makenode topq again ] frompq nip 0 pluck drop ] is maketree ( [ --> [ ) [ countchars pqwith fewerchars maketree ] is huffmantree ( $ --> [ ) [ stack ] is path.hfl ( --> s ) [ stack ] is list.hfl ( --> s ) forward is makelist ( [ --> ) [ dup size 1 = iff [ 0 peek path.hfl behead drop nested join nested list.hfl take join list.hfl put ] done unpack 1 path.hfl put makelist 0 path.hfl replace makelist path.hfl release ] resolves makelist ( [ --> ) [ sortwith [ 0 peek swap 0 peek < ] ] is charsort ( [ --> [ ) [ [] list.hfl put makelist list.hfl take charsort ] is huffmanlist ( [ --> [ ) [ sortwith [ 1 peek size swap 1 peek size < ] ] is codesort ( [ --> [ ) [ witheach [ unpack swap say ' "' emit say '" ' echo cr ] ] is echohuff ( [ --> [ ) $ "this is an example for huffman encoding" huffmantree huffmanlist say " Huffman codes sorted by character." cr dup echohuff cr say " Huffman codes sorted by code length." cr codesort echohuff
- Output:
Huffman codes sorted by character. " " [ 1 1 1 ] "a" [ 1 0 0 1 ] "c" [ 1 0 0 0 1 ] "d" [ 0 0 1 1 0 ] "e" [ 1 0 1 1 ] "f" [ 1 1 0 1 ] "g" [ 1 0 1 0 1 0 ] "h" [ 0 0 0 1 ] "i" [ 1 1 0 0 ] "l" [ 1 0 0 0 0 ] "m" [ 0 1 0 0 ] "n" [ 0 1 1 ] "o" [ 0 1 0 1 ] "p" [ 1 0 1 0 1 1 ] "r" [ 1 0 1 0 0 ] "s" [ 0 0 0 0 ] "t" [ 0 0 1 1 1 ] "u" [ 0 0 1 0 0 ] "x" [ 0 0 1 0 1 ] Huffman codes sorted by code length. " " [ 1 1 1 ] "n" [ 0 1 1 ] "a" [ 1 0 0 1 ] "e" [ 1 0 1 1 ] "f" [ 1 1 0 1 ] "h" [ 0 0 0 1 ] "i" [ 1 1 0 0 ] "m" [ 0 1 0 0 ] "o" [ 0 1 0 1 ] "s" [ 0 0 0 0 ] "c" [ 1 0 0 0 1 ] "d" [ 0 0 1 1 0 ] "l" [ 1 0 0 0 0 ] "r" [ 1 0 1 0 0 ] "t" [ 0 0 1 1 1 ] "u" [ 0 0 1 0 0 ] "x" [ 0 0 1 0 1 ] "g" [ 1 0 1 0 1 0 ] "p" [ 1 0 1 0 1 1 ]
Racket
#lang racket(requiredata/heapdata/bit-vector);; A node is either an interior, or a leaf.;; In either case, they record an item with an associated frequency.(structnode(freq)#:transparent)(structinteriornode(leftright)#:transparent)(structleafnode(val)#:transparent);; node<=?: node node -> boolean;; Compares two nodes by frequency.(define(node<=?xy)(<=(node-freqx)(node-freqy)));; make-huffman-tree: (listof leaf) -> interior-node(define(make-huffman-treeleaves)(definea-heap(make-heapnode<=?))(heap-add-all!a-heapleaves)(for([i(sub1(lengthleaves))])(definemin-1(heap-mina-heap))(heap-remove-min!a-heap)(definemin-2(heap-mina-heap))(heap-remove-min!a-heap)(heap-add!a-heap(interior(+(node-freqmin-1)(node-freqmin-2))min-1min-2)))(heap-mina-heap));; string->huffman-tree: string -> node;; Given a string, produces its huffman tree. The leaves hold the characters;; and their relative frequencies.(define(string->huffman-treestr)(defineht(make-hash))(definen(sequence-lengthstr))(for([chstr])(hash-update!htchadd1(λ()0)))(make-huffman-tree(for/list([(kv)(in-hashht)])(leaf(/vn)k))));; make-encoder: node -> (string -> bit-vector);; Given a huffman tree, generates the encoder function.(define(make-encodera-tree)(definedict(huffman-tree->dictionarya-tree))(lambda(a-str)(list->bit-vector(applyappend(for/list([cha-str])(hash-refdictch))))));; huffman-tree->dictionary: node -> (hashof val (listof boolean));; A helper for the encoder: maps characters to their code sequences.(define(huffman-tree->dictionarya-node)(defineht(make-hash))(letloop([a-nodea-node][path/rev'()])(cond[(interior?a-node)(loop(interior-lefta-node)(cons#fpath/rev))(loop(interior-righta-node)(cons#tpath/rev))][(leaf?a-node)(hash-set!ht(reversepath/rev)(leaf-vala-node))]))(for/hash([(kv)ht])(valuesvk)));; make-decoder: interior-node -> (bit-vector -> string);; Generates the decoder function from the tree.(define(make-decodera-tree)(lambda(a-bitvector)(define-values(decoded/rev_)(for/fold([decoded/rev'()][a-nodea-tree])([bita-bitvector])(definenext-node(cond[(notbit)(interior-lefta-node)][else(interior-righta-node)]))(cond[(leaf?next-node)(values(cons(leaf-valnext-node)decoded/rev)a-tree)][else(valuesdecoded/revnext-node)])))(applystring(reversedecoded/rev))));;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Example application:;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;(definemsg"this is an example for huffman encoding")(definetree(string->huffman-treemsg));; We can print out the mapping for inspection:(huffman-tree->dictionarytree)(defineencode(make-encodertree))(defineencoded(encodemsg));; Here's what the encoded message looks like:(bit-vector->stringencoded)(definedecode(make-decodertree));; Here's what the decoded message looks like:(decodeencoded)
Raku
(formerly Perl 6)
By building a tree
This version uses nested Array
s to build a tree like shown in this diagram, and then recursively traverses the finished tree to accumulate the prefixes.
subhuffman (%frequencies) { my@queue = %frequencies.map({ [.value, .key] }).sort; while@queue > 1 { given@queue.splice(0, 2) -> ([$freq1, $node1], [$freq2, $node2]) { @queue = (|@queue, [$freq1 + $freq2, [$node1, $node2]]).sort; } } hashgatherwalk@queue[0][1], ''; } multiwalk ($node, $prefix) { take$node => $prefix; } multiwalk ([$node1, $node2], $prefix) { walk$node1, $prefix ~ '0'; walk$node2, $prefix ~ '1'; }
Without building a tree
This version uses an Array
of Pair
s to implement a simple priority queue. Each value of the queue is a Hash
mapping from letters to prefixes, and when the queue is reduced the hashes are merged on-the-fly, so that the last one remaining is the wanted Huffman table.
subhuffman (%frequencies) { my@queue = %frequencies.map: { .value => (hash .key => '') }; while@queue > 1 { @queue.=sort; my$x = @queue.shift; my$y = @queue.shift; @queue.push: ($x.key + $y.key) => hash$x.value.deepmap('0' ~ *), $y.value.deepmap('1' ~ *); } @queue[0].value; } # Testingforhuffman'this is an example for huffman encoding'.comb.Bag { say"'{.key}' : {.value}"; } # To demonstrate that the table can do a round trip:say''; my$original = 'this is an example for huffman encoding'; my%encode-key = huffman$original.comb.Bag; my%decode-key = %encode-key.invert; my@codes = %decode-key.keys; my$encoded = $original.subst: /./, { %encode-key{$_} }, :g; my$decoded = $encoded .subst: /@codes/, { %decode-key{$_} }, :g; .sayfor$original, $encoded, $decoded;
- Output:
'x' : 11000 'p' : 01100 'h' : 0001 'g' : 00000 'a' : 1001 'e' : 1101 'd' : 110011 's' : 0111 'f' : 1110 'c' : 110010 'm' : 0010 ' ' : 101 'n' : 010 'o' : 0011 'u' : 10001 't' : 10000 'i' : 1111 'r' : 01101 'l' : 00001 this is an example for huffman encoding 1000000011111011110111110111101100101010111011100010010010011000000111011011110001101101101000110001111011100010100101010111010101100100011110011111101000000 this is an example for huffman encoding
Red
Red[file:%huffy.red];; message to encode:msg:"this is an example for huffman encoding";;map to collect leave knots per uniq character of messagem:makemap![]knot:makeobject![left:right:none;; pointer to left/right siblingcode:none;; first holds char for debugging, later binary codecount:depth:1;;occurence of character - length of branch];;-----------------------------------------set-code:func["recursive function to generate binary code sequence"wknotwcode[string!]][;;-----------------------------------------eitherwknot/left=none[wknot/code:wcode][set-codewknot/leftrejoin[wcode"1"]set-codewknot/rightrejoin[wcode"0"]]];;-- end func;-------------------------------merge-2knots:func["function to merge 2 knots into 1 new"t[block!]][;-------------------------------nknot:copyknot;; create new knotnknot/count:t/1/count+t/2/countnknot/right:t/1nknot/left:t/2nknot/depth:t/1/depth+1tab:remove/partt2;; delete first 2 knotsinserttnknot;; insert new generated knot];;-- end func;; count occurence of characters, save in map: m foreachchrmsg[eitherk:select/casemchr[k/count:k/count+1][put/casemchrnknot:copyknotnknot/code:chr]];; create sortable block (=tab) for use as prio queueforeachkkeys-ofm[appendtab:[]:m/:k];; build tree while[1<length?tab][sort/comparetabfunction[ab][a/count<b/countor(a/count=b/countand(a/depth>b/depth))]merge-2knotstab;; merge 2 knots with lowest count / max depth]set-codetab/1"";; generate binary codes, save at leave knot;; display codesforeachksortkeys-ofm[print[k" = "m/:k/code]appendcodes:""m/:k/code];; encode orig message stringforeachchrmsg[k:select/casemchrappendmsg-new:""k/code]print["length of encoded msg "length?msg-new]print["length of (binary) codes "length?codes]print["orig. message: "msgnewline"encoded message: ""^/"msg-new]prin"decoded: ";; decode message (destructive! ):while[notempty?msg-new][foreach[kv]body-ofm[ift:find/matchmsg-newv/code[prinkmsg-new:t]]]
- Output:
= 111 a = 1101 c = 00101 d = 00100 e = 1011 f = 1100 g = 10010 h = 1000 i = 1010 l = 00000 m = 0001 n = 011 o = 0101 p = 00001 r = 00111 s = 0100 t = 100111 u = 100110 x = 00110 length of encoded msg 157 length of (binary) codes 85 orig. message: this is an example for huffman encoding encoded message: 1001111000101001001111010010011111010111111011001101101000100001000001011111110001010011111110001001101100110000011101011111101101100101010100100101001110010 decoded: this is an example for huffman encoding
REXX
/* REXX ---------------------------------------------------------------* 27.12.2013 Walter Pachl* 29.12.2013 -"- changed for test of s=xrange('00'x,'ff'x)* 14.03.2018 -"- use format instead of right to diagnose size poblems* Stem m contains eventually the following node data* m.i.0id Node id* m.i.0c character* m.i.0o number of occurrences* m.i.0l left child* m.i.0r right child* m.i.0f father* m.i.0d digit (0 or 1)* m.i.0t 1=a terminal node 0=an intermediate or the top node*--------------------------------------------------------------------*/ParseArgs Ifs=''Thens='this is an example for huffman encoding'Say'We encode this string:'Says debug=0 o.=0 c.=0 codel.=0 code.='' father.=0 cl=''/* list of characters */doi=1Tolength(s)Callmemorizesubstr(s,i,1)EndIfdebugThenDoDoi=1Toc.0c=c.i Sayico.c EndEnd n.=0Doi=1Toc.0c=c.i n.i.0c=c n.i.0o=o.c n.i.0id=i Calldbgin.i.0idn.i.0cn.i.0o End n=c.0/* number of nodes */ m.=0Doi=1Ton/* construct initial array */Doj=1Tom.0/* sorted by occurrences */Ifm.j.0o>n.i.0oThenLeaveEndDok=m.0TojBy-1k1=k+1m.k1.0id=m.k.0id m.k1.0c=m.k.0c m.k1.0o=m.k.0o m.k1.0t=m.k.0t Endm.j.0id=i m.j.0c=n.i.0c m.j.0o=n.i.0o m.j.0t=1m.0=m.0+1EndIfdebugThenCallshow DoWhilepairs()>1/* while there are at least 2 fatherless nodes */Callmknode/* create and fill a new father node */IfdebugThenCallshow EndCallshow c.=0Doi=1Tom.0/* now we loop over all lines representing nodes */Ifm.i.0tThenDo/* for each terminal node */code=m.i.0d/* its digit is the last code digit */node=m.i.0id/* its id */Dofi=1To1000/* actually Forever */fid=father.node/* id of father */Iffid<>0ThenDo/* father exists */fidz=zeile(fid)/* line that contains the father */code=m.fidz.0d||code/* prepend the digit */node=fid/* look for next father */EndElse/* no father (we reached the top */LeaveEndIflength(code)>1Then/* more than one character in input */code=substr(code,2)/* remove the the top node's 0 */calldbgm.i.0c'->'code/* character is encoded this way */char=m.i.0c code.char=code z=codel.0+1codel.z=code codel.0=z char.code=char EndEndCallshow_char2code/* show used characters and corresponding codes */ codes.=0/* now we build the array of codes/characters */Doj=1Tocodel.0z=codes.0+1code=codel.j codes.z=code chars.z=char.code codes.0=z Calldbgcodes.z'----->'chars.z End sc=''/* here we ecnode the string */Doi=1Tolength(s)/* loop over input */c=substr(s,i,1)/* a character */sc=sc||code.c/* append the corresponding code */EndSay'Length of encoded string:'length(sc)Doi=1Tolength(sc)by70Saysubstr(sc,i,70)End sr=''/* now decode the string */Dosi=1To999Whilesc<>''Doi=codes.0To1By-1/* loop over codes */cl=length(codes.i)/* length of code */Ifleft(sc,cl)==codes.iThenDo/* found on top of string */sr=sr||chars.i/* append character to result */sc=substr(sc,cl+1)/* cut off the used code */Leave/* this was one character */EndEndEndSay'Input ="'s'"'Say'result="'sr'"'Exitshow:/*---------------------------------------------------------------------* show all lines representing node data*--------------------------------------------------------------------*/Say' i pp id c f l r d'Doi=1Tom.0Sayformat(i,3)format(m.i.0o,4)format(m.i.0id,3),format(m.i.0f,3)format(m.i.0l,3)format(m.i.0r,3)m.i.0dm.i.0t EndCalldbgcopies('-',21)Returnpairs:ProcedureExposem./*---------------------------------------------------------------------* return number of fatherless nodes*--------------------------------------------------------------------*/res=0Doi=1Tom.0Ifm.i.0f=0Thenres=res+1EndReturnres mknode:/*---------------------------------------------------------------------* construct and store a new intermediate or the top node*--------------------------------------------------------------------*/ new.=0 ni=m.0+1/* the next node id */Doi=1Tom.0/* loop over node lines */Ifm.i.0f=0ThenDo/* a fatherless node */z=m.i.0id/* its id */Ifnew.0l=0ThenDo/* new node has no left child */new.0l=z/* make this the lect child */new.0o=m.i.0o/* occurrences */m.i.0f=ni/* store father info */m.i.0d='0'/* digit 0 to be used */father.z=ni/* remember z's father (redundant) */EndElseDo/* New node has already left child */new.0r=z/* make this the right child */new.0o=new.0o+m.i.0o/* add in the occurrences */m.i.0f=ni/* store father info */m.i.0d=1/* digit 1 to be used */father.z=ni/* remember z's father (redundant) */LeaveEndEndEndDoi=1Tom.0/* Insert new node according to occurrences */Ifm.i.0o>=new.0oThenDoDok=m.0ToiBy-1k1=k+1m.k1.0id=m.k.0id m.k1.0o=m.k.0o m.k1.0c=m.k.0c m.k1.0l=m.k.0l m.k1.0r=m.k.0r m.k1.0f=m.k.0f m.k1.0d=m.k.0d m.k1.0t=m.k.0t EndLeaveEndEnd m.i.0id=ni m.i.0c='*' m.i.0o=new.0o m.i.0l=new.0l m.i.0r=new.0r m.i.0t=0 father.ni=0 m.0=ni Returnzeile:/*---------------------------------------------------------------------* find and return line number containing node-id*--------------------------------------------------------------------*/dofidz=1Tom.0Ifm.fidz.0id=arg(1)ThenReturnfidz EndCalldbgarg(1)'not found'Pull.dbg:/*---------------------------------------------------------------------* Show text if debug is enabled*--------------------------------------------------------------------*/Ifdebug=1ThenSayarg(1)Returnmemorize:ProcedureExposec.o./*---------------------------------------------------------------------* store characters and corresponding occurrences*--------------------------------------------------------------------*/ParseArgc Ifo.c=0ThenDoz=c.0+1c.z=c c.0=z Endo.c=o.c+1Returnshow_char2code:/*---------------------------------------------------------------------* show used characters and corresponding codes*--------------------------------------------------------------------*/ cl=xrange('00'x,'ff'x)Say'char --> code'DoWhilecl<>''ParseVarclc+1cl Ifcode.c<>''ThenSay' 'c'-->'code.c EndReturn
- Output:
We encode this string: this is an example for huffman encoding i pp id c f l r d 1 1 1 20 0 0 0 1 2 1 9 20 0 0 1 1 3 1 11 21 0 0 0 1 4 1 12 21 0 0 1 1 5 1 15 22 0 0 0 1 6 1 16 22 0 0 1 1 7 1 17 23 0 0 0 1 8 1 18 23 0 0 1 1 9 1 19 24 0 0 0 1 10 2 23 24 17 18 1 0 11 2 22 25 15 16 0 0 12 2 21 25 11 12 1 0 13 2 20 26 1 9 0 0 14 2 2 26 0 0 1 1 15 2 4 27 0 0 0 1 16 2 10 27 0 0 1 1 17 2 14 28 0 0 0 1 18 3 24 28 19 23 1 0 19 3 3 29 0 0 0 1 20 3 6 29 0 0 1 1 21 3 8 30 0 0 0 1 22 3 13 30 0 0 1 1 23 4 27 31 4 10 0 0 24 4 26 31 20 2 1 0 25 4 25 32 22 21 0 0 26 4 7 32 0 0 1 1 27 5 28 33 14 24 0 0 28 6 30 33 8 13 1 0 29 6 29 34 3 6 0 0 30 6 5 34 0 0 1 1 31 8 32 35 25 7 0 0 32 8 31 35 27 26 1 0 33 11 33 36 28 30 0 0 34 12 34 36 29 5 1 0 35 16 35 37 32 31 0 0 36 23 36 37 33 34 1 0 37 39 37 0 35 36 0 0 char --> code --> 111 a --> 1101 c --> 100110 d --> 100111 e --> 1010 f --> 1011 g --> 10010 h --> 0111 i --> 1100 l --> 00011 m --> 0101 n --> 001 o --> 1000 p --> 00010 r --> 00000 s --> 0100 t --> 01100 u --> 00001 x --> 01101 Length of encoded string: 157 0110001111100010011111000100111110100111110100110111010101000100001110 1011110111000000001110111000011011101101011101001111101000110011010001 00111110000110010 Input ="this is an example for huffman encoding" result="this is an example for huffman encoding"
Ruby
Uses a
package PriorityQueue
require'priority_queue'defhuffman_encoding(str)char_count=Hash.new(0)str.each_char{|c|char_count[c]+=1}pq=CPriorityQueue.new# chars with fewest count have highest prioritychar_count.each{|char,count|pq.push(char,count)}whilepq.length>1key1,prio1=pq.delete_minkey2,prio2=pq.delete_minpq.push([key1,key2],prio1+prio2)endHash[*generate_encoding(pq.min_key)]enddefgenerate_encoding(ary,prefix="")casearywhenArraygenerate_encoding(ary[0],"#{prefix}0")+generate_encoding(ary[1],"#{prefix}1")else[ary,prefix]endenddefencode(str,encoding)str.each_char.collect{|char|encoding[char]}.joinenddefdecode(encoded,encoding)rev_enc=encoding.invertdecoded=""pos=0whilepos<encoded.lengthkey=""whilerev_enc[key].nil?key<<encoded[pos]pos+=1enddecoded<<rev_enc[key]enddecodedendstr="this is an example for huffman encoding"encoding=huffman_encoding(str)encoding.to_a.sort.each{|x|px}enc=encode(str,encoding)dec=decode(enc,encoding)puts"success!"ifstr==dec
[" ", "111"] ["a", "1011"] ["c", "00001"] ["d", "00000"] ["e", "1101"] ["f", "1100"] ["g", "00100"] ["h", "1000"] ["i", "1001"] ["l", "01110"] ["m", "10101"] ["n", "010"] ["o", "0001"] ["p", "00101"] ["r", "00111"] ["s", "0110"] ["t", "00110"] ["u", "01111"] ["x", "10100"] success!
Rust
Adapted C++ solution.
usestd::collections::BTreeMap;usestd::collections::binary_heap::BinaryHeap;#[derive(Debug, Eq, PartialEq)]enumNodeKind{Internal(Box<Node>,Box<Node>),Leaf(char),}#[derive(Debug, Eq, PartialEq)]structNode{frequency: usize,kind: NodeKind,}implOrdforNode{fncmp(&self,rhs: &Self)-> std::cmp::Ordering{rhs.frequency.cmp(&self.frequency)}}implPartialOrdforNode{fnpartial_cmp(&self,rhs: &Self)-> Option<std::cmp::Ordering>{Some(self.cmp(&rhs))}}typeHuffmanCodeMap=BTreeMap<char,Vec<u8>>;fnmain(){lettext="this is an example for huffman encoding";letmutfrequencies=BTreeMap::new();forchintext.chars(){*frequencies.entry(ch).or_insert(0)+=1;}letmutprioritized_frequencies=BinaryHeap::new();forcounted_charinfrequencies{prioritized_frequencies.push(Node{frequency: counted_char.1,kind: NodeKind::Leaf(counted_char.0),});}whileprioritized_frequencies.len()>1{letleft_child=prioritized_frequencies.pop().unwrap();letright_child=prioritized_frequencies.pop().unwrap();prioritized_frequencies.push(Node{frequency: right_child.frequency+left_child.frequency,kind: NodeKind::Internal(Box::new(left_child),Box::new(right_child)),});}letmutcodes=HuffmanCodeMap::new();generate_codes(prioritized_frequencies.peek().unwrap(),vec![0u8;0],&mutcodes,);foritemincodes{print!("{}: ",item.0);forbitinitem.1{print!("{}",bit);}println!();}}fngenerate_codes(node: &Node,prefix: Vec<u8>,out_codes: &mutHuffmanCodeMap){matchnode.kind{NodeKind::Internal(refleft_child,refright_child)=>{letmutleft_prefix=prefix.clone();left_prefix.push(0);generate_codes(&left_child,left_prefix,out_codes);letmutright_prefix=prefix;right_prefix.push(1);generate_codes(&right_child,right_prefix,out_codes);}NodeKind::Leaf(ch)=>{out_codes.insert(ch,prefix);}}}
Output:
: 110 a: 1001 c: 101010 d: 10001 e: 1111 f: 1011 g: 101011 h: 0101 i: 1110 l: 01110 m: 0011 n: 000 o: 0010 p: 01000 r: 01001 s: 0110 t: 01111 u: 10100 x: 10000
Scala
objectHuffman{importscala.collection.mutable.{Map,PriorityQueue}sealedabstractclassTreecaseclassNode(left:Tree,right:Tree)extendsTreecaseclassLeaf(c:Char)extendsTreedeftreeOrdering(m:Map[Tree,Int])=newOrdering[Tree]{defcompare(x:Tree,y:Tree)=m(y).compare(m(x))}defstringMap(text:String)=textgroupBy(x=>Leaf(x):Tree)mapValues(_.length)defbuildNode(queue:PriorityQueue[Tree],map:Map[Tree,Int]){valright=queue.dequeuevalleft=queue.dequeuevalnode=Node(left,right)map(node)=map(left)+map(right)queue.enqueue(node)}defcodify(tree:Tree,map:Map[Tree,Int])={defrecurse(tree:Tree,prefix:String):List[(Char,(Int,String))]=treematch{caseNode(left,right)=>recurse(left,prefix+"0"):::recurse(right,prefix+"1")caseleaf@Leaf(c)=>c->((map(leaf),prefix))::Nil}recurse(tree,"")}defencode(text:String)={valmap=Map.empty[Tree,Int]++=stringMap(text)valqueue=newPriorityQueue[Tree]()(treeOrdering(map))++=map.keysIteratorwhile(queue.size>1){buildNode(queue,map)}codify(queue.dequeue,map)}defmain(args:Array[String]){valtext="this is an example for huffman encoding"valcode=encode(text)println("Char\tWeight\t\tEncoding")codesortBy(_._2._1)foreach{case(c,(weight,encoding))=>println("%c:\t%3d/%-3d\t\t%s"format(c,weight,text.length,encoding))}}}
- Output:
Char Weight Encoding t: 1/39 011000 p: 1/39 011001 r: 1/39 01101 c: 1/39 01110 x: 1/39 01111 g: 1/39 10110 l: 1/39 10111 u: 1/39 11000 d: 1/39 11001 o: 2/39 1010 s: 2/39 1101 m: 2/39 1110 h: 2/39 1111 f: 3/39 0000 a: 3/39 0001 e: 3/39 0010 i: 3/39 0011 n: 4/39 100 : 6/39 010
Scala (Alternate version)
// this version uses immutable data only, recursive functions and pattern matchingobjectHuffman{sealedtraitTree[+A]caseclassLeaf[A](value:A)extendsTree[A]caseclassBranch[A](left:Tree[A],right:Tree[A])extendsTree[A]// recursively build the binary tree needed to Huffman encode the textdefmerge(xs:List[(Tree[Char],Int)]):List[(Tree[Char],Int)]={if(xs.length==1)xselse{vall=xs.headvalr=xs.tail.headvalmerged=(Branch(l._1,r._1),l._2+r._2)merge((merged::xs.drop(2)).sortBy(_._2))}}// recursively search the branches of the tree for the required characterdefcontains(tree:Tree[Char],char:Char):Boolean=treematch{caseLeaf(c)=>if(c==char)trueelsefalsecaseBranch(l,r)=>contains(l,char)||contains(r,char)}// recursively build the path string required to traverse the tree to the required characterdefencodeChar(tree:Tree[Char],char:Char):String={defgo(tree:Tree[Char],char:Char,code:String):String=treematch{caseLeaf(_)=>codecaseBranch(l,r)=>if(contains(l,char))go(l,char,code+'0')elsego(r,char,code+'1')}go(tree,char,"")}defmain(args:Array[String]){valtext="this is an example for huffman encoding"// transform the text into a list of tuples.// each tuple contains a Leaf node containing a unique character and an Int representing that character's weightvalfrequencies=text.groupBy(chars=>chars).mapValues(group=>group.length).toList.map(x=>(Leaf(x._1),x._2)).sortBy(_._2)// build the Huffman Tree for this textvalhuffmanTree=merge(frequencies).head._1// output the resulting character codesprintln("Char\tWeight\tCode")frequencies.foreach(x=>println(x._1.value+"\t"+x._2+s"/${text.length}"+s"\t${encodeChar(huffmanTree,x._1.value)}"))}}
Char Weight Code x 1/39 01100 t 1/39 01101 u 1/39 00010 g 1/39 00011 l 1/39 00000 p 1/39 00001 c 1/39 100110 r 1/39 100111 d 1/39 10010 s 2/39 0111 m 2/39 0100 h 2/39 0101 o 2/39 1000 e 3/39 1100 f 3/39 1101 a 3/39 1010 i 3/39 1011 n 4/39 001 6/39 111
Scheme
(define(char-freqporttable)(if(eof-object?(peek-charport))table(char-freqport(add-char(read-charport)table))))(define(add-charchartable)(cond((null?table)(list(listchar1)))((eq?(caartable)char)(cons(listchar(+(cadartable)1))(cdrtable)))(#t(cons(cartable)(add-charchar(cdrtable))))))(define(nodeifytable)(map(lambda(x)(listx'()'()))table))(definenode-freqcadar)(define(huffman-treenodes)(let((queue(sortnodes(lambda(xy)(<(node-freqx)(node-freqy))))))(if(null?(cdrqueue))(carqueue)(huffman-tree(cons(list(list'notleaf(+(node-freq(carqueue))(node-freq(cadrqueue))))(carqueue)(cadrqueue))(cddrqueue))))))(define(list-encodingstreechars)(for-each(lambda(c)(format#t"~a:~a~%"c(encodectree)))chars))(define(encodechartree)(cond((null?tree)#f)((eq?(caartree)char)'())(#t(let((left(encodechar(cadrtree)))(right(encodechar(caddrtree))))(cond((not(orleftright))#f)(left(cons#\1left))(right(cons#\0right)))))))(define(decodedigitstree)(cond((not(eq?(caartree)'notleaf))(caartree))((eq?(cardigits)#\0)(decode(cdrdigits)(cadrtree)))(#t(decode(cdrdigits)(caddrtree)))))(defineinput"this is an example for huffman encoding")(definefreq-table(char-freq(open-input-stringinput)'()))(definetree(huffman-tree(nodeifyfreq-table)))(list-encodingstree(mapcarfreq-table))
- Output:
t:(1 0 0 1 1) h:(1 0 0 0) i:(0 0 1 1) s:(1 0 1 1) :(0 0 0) a:(0 0 1 0) n:(1 1 0) e:(0 1 0 1) x:(1 0 0 1 0) m:(1 0 1 0) p:(1 1 1 0 1) l:(1 1 1 0 0) f:(0 1 0 0) o:(0 1 1 1) r:(1 1 1 1 1) u:(1 1 1 1 0) c:(0 1 1 0 0 1) d:(0 1 1 0 0 0) g:(0 1 1 0 1)
SETL
var forest := {}, encTab := {}; plaintext := 'this is an example for huffman encoding'; ft := {}; (for c in plaintext) ft(c) +:= 1; end; forest := {[f, c]: [c, f] in ft}; (while 1 < #forest) [f1, n1] := getLFN(); [f2, n2] := getLFN(); forest with:= [f1+f2, [n1,n2]]; end; addToTable('', arb range forest); (for e = encTab(c)) print(c, ft(c), e); end; print(+/ [encTab(c): c in plaintext]); proc addToTable(prefix, node); if is_tuple node then addToTable(prefix + '0', node(1)); addToTable(prefix + '1', node(2)); else encTab(node) := prefix; end; end proc; proc getLFN(); f := min/ domain forest; n := arb forest{f}; forest less:= [f, n]; return [f, n]; end proc;
Sidef
funcwalk(n,s,h){if(n.contains(:a)){h{n{:a}}=ssay"#{n{:a}}: #{s}"returnnil}walk(n{:0},s+'0',h)walk(n{:1},s+'1',h)}funcmake_tree(text){varletters=Hash()text.each{|c|letters{c}:=0++}varnodes=letters.keys.map{|l|Hash(a=>l,freq=>letters{l})}varn=Hash()while(nodes.sort_by!{|c|c{:freq}}.len>1){n=Hash(:0=>nodes.shift,:1=>nodes.shift)n{:freq}=(n{:0}{:freq}+n{:1}{:freq})nodes.append(n)}walk(n,"",n{:tree}=Hash())returnn}funcencode(s,t){t=t{:tree}s.chars.map{|c|t{c}}.join}funcdecode(enc,tree){varn=treevarout=""enc.each{|bit|n=n{bit}if(n.contains(:a)){out+=n{:a}n=tree}}returnout}vartext="this is an example for huffman encoding"vartree=make_tree(text)varenc=encode(text,tree)sayencsaydecode(enc,tree)
- Output:
n: 000 s: 0010 o: 0011 h: 0100 l: 01010 g: 01011 x: 01100 c: 01101 d: 01110 u: 01111 p: 10000 t: 10001 i: 1001 : 101 f: 1100 a: 1101 e: 1110 r: 11110 m: 11111 1000101001001001010110010010101110100010111100110011011111110000010101110101110000111111010101000111111001100111111101000101111000001101001101110100100001011 this is an example for huffman encoding
Standard ML
datatype'ahuffman_tree=Leafof'a|Nodeof'ahuffman_tree*'ahuffman_treestructureHuffmanPriority=structtypepriority=int(* reverse comparison to achieve min-heap *)funcompare(a,b)=Int.compare(b,a)typeitem=int*charhuffman_treevalpriority:item->int=#1endstructureHPQueue=LeftPriorityQFn(HuffmanPriority)funbuildTreecharFreqs=letfunauxtrees=letval((f1,a),trees)=HPQueue.removetreesinifHPQueue.isEmptytreesthenaelseletval((f2,b),trees)=HPQueue.removetreesvaltrees=HPQueue.insert((f1+f2,Node(a,b)),trees)inauxtreesendendvaltrees=HPQueue.fromList(map(fn(c,f)=>(f,Leafc))charFreqs)inauxtreesendfunprintCodes(revPrefix,Leafc)=print(String.strc^"\t"^implode(revrevPrefix)^"\n")|printCodes(revPrefix,Node(l,r))=(printCodes(#"0"::revPrefix,l);printCodes(#"1"::revPrefix,r));letvaltest="this is an example for huffman encoding"valcharFreqs=HashTable.mkTable(HashString.hashStringoString.str,op=)(42,Empty)val()=app(fnc=>letvalold=getOpt(HashTable.findcharFreqsc,0)inHashTable.insertcharFreqs(c,old+1)end)(explodetest)valtree=buildTree(HashTable.listItemsicharFreqs)inprint"SYMBOL\tHUFFMAN CODE\n";printCodes([],tree)end
Swift
Rather than a priority queue of subtrees, we use the strategy of two sorted lists, one for leaves and one for nodes, and "merge" them as we iterate through them, taking advantage of the fact that any new nodes we create are bigger than any previously created nodes, so go at the end of the nodes list.
enumHuffmanTree<T>{caseLeaf(T)indirectcaseNode(HuffmanTree<T>,HuffmanTree<T>)funcprintCodes(prefix:String){switch(self){caselet.Leaf(c):print("\(c)\t\(prefix)")caselet.Node(l,r):l.printCodes(prefix+"0")r.printCodes(prefix+"1")}}}funcbuildTree<T>(freqs:[(T,Int)])->HuffmanTree<T>{assert(freqs.count>0,"must contain at least one character")// leaves sorted by increasing frequencyletleaves:[(Int,HuffmanTree<T>)]=freqs.sort{(p1,p2)inp1.1<p2.1}.map{(x,w)in(w,.Leaf(x))}// nodes sorted by increasing frequencyvarnodes=[(Int,HuffmanTree<T>)]()// iterate through leaves and nodes in order of increasing frequencyforvari=0,j=0;;{assert(i<leaves.count||j<nodes.count)// get subtree of least frequencyvare1:(Int,HuffmanTree<T>)ifj==nodes.count||i<leaves.count&&leaves[i].0<nodes[j].0{e1=leaves[i]i++}else{e1=nodes[j]j++}// if there's no subtrees left, then that one was the answerifi==leaves.count&&j==nodes.count{returne1.1}// get next subtree of least frequencyvare2:(Int,HuffmanTree<T>)ifj==nodes.count||i<leaves.count&&leaves[i].0<nodes[j].0{e2=leaves[i]i++}else{e2=nodes[j]j++}// create node from two subtreesnodes.append((e1.0+e2.0,.Node(e1.1,e2.1)))}}funcgetFreqs<S:SequenceTypewhereS.Generator.Element:Hashable>(seq:S)->[(S.Generator.Element,Int)]{varfreqs:[S.Generator.Element:Int]=[:]forcinseq{freqs[c]=(freqs[c]??0)+1}returnArray(freqs)}letstr="this is an example for huffman encoding"letcharFreqs=getFreqs(str.characters)lettree=buildTree(charFreqs)print("Symbol\tHuffman code")tree.printCodes("")
- Output:
Symbol Huffman code u 00000 t 00001 d 00010 r 00011 c 00100 l 00101 o 0011 m 0100 s 0101 n 011 h 1000 g 10010 p 100110 x 100111 f 1010 a 1011 i 1100 e 1101 111
Tcl
packagerequireTcl8.5packagerequirestruct::prioqueue prochuffmanEncode{strargs}{arraysetopts[concat-dumpfalse$args]setcharcount[dictcreate]foreachchar[split$str""]{dictincrcharcount$char}setpq[struct::prioqueue-dictionary];# want lower values to have higher prioritydictfor{charcount}$charcount{$pqput$char$count}while{[$pqsize]>1}{lassign[$pqpeekpriority2]p1p2 $pqput[$pqget2][expr{$p1+$p2}]}setencoding[walkTree[$pqget]]if{$opts(-dump)}{foreach{charhuffCode}[lsort-index1-stride2-commandcompare$encoding]{puts"$char\t[dict get $charcount $char]\t$huffCode"}}$pqdestroyreturn$encoding}procwalkTree{tree{prefix""}}{if{[llength$tree]<2}{return[list$tree$prefix]}lassign$treeleftright return[concat[walkTree$left"${prefix}0"][walkTree$right"${prefix}1"]]}proccompare{ab}{if{[stringlength$a]<[stringlength$b]}{return-1}if{[stringlength$a]>[stringlength$b]}{return1}return[stringcompare$a$b]}setstr"this is an example for huffman encoding"setencoding[huffmanEncode$str-dumptrue]puts$strputs[stringmap$encoding$str]
- Output:
n 4 000 6 101 s 2 0010 m 2 0011 o 2 0100 i 3 1001 a 3 1100 e 3 1101 f 3 1110 t 1 01010 x 1 01011 p 1 01100 l 1 01101 r 1 01110 u 1 01111 c 1 10000 d 1 10001 g 1 11110 h 2 11111 this is an example for huffman encoding 0101011111100100101011001001010111000001011101010111100001101100011011101101111001000111010111111011111110111000111100000101110100010000010010001100100011110
UNIX Shell
#!/bin/bashset-eu # make scratch directoryt="$(mktemp-d)"cd"${t:?mktemp failed}"trap'rm -r -- "$t"'EXIT # get character frequenciesdeclare-afreq=()whilereadaddrline;doforcin$line;do:$((freq[8#$c]++))donedone<<(od-b-v)# convert freqs into a bucket queuedeclare-ii=0forcin${!freq[@]};dofn="${freq[c]}.$((i++))"echo"$c:${freq[c]}">"$fn"done top2(){ls|sort-t.-k1,1n-k2,2n|sed2q;}set--$(top2)while[[$#-gt1]];dodeclare-il="${1%%.*}"r="${2%%.*}"# combine weights intofn="$((l+r)).$((i++))"# ... new node weightmkdir"$fn"mv"$1""$fn/0"mv"$2""$fn/1"set--$(top2)doneecho-e"Symbol\tWeight\tHuffman Code"cd"$fn" find.-typef-execgrep.{}+|tr-d./|awk-F:'{printf "%c\t%d\t%s\n", $2, $3, $1}'|sort-k2,2nr-k3,3n
Ursala
following the algorithm given above
#import std #import nat #import flo code_table = # takes a training dataset to a table <char: code...> -+ *^ ~&v?\~&iNC @v ~&t?\~&h ~&plrDSLrnPlrmPCAS/'01', ~&itB->h fleq-<&d; ^C\~&tt @hthPX ^V\~&lrNCC plus@bd, ^V(div@rrPlX,~&rlNVNC)^*D(plus:-0.@rS,~&)+ *K2 ^/~&h float+ length+- #cast %csAL table = code_table 'this is an example for huffman encoding'
a quick walk through the code starting from the bottom:
*K2 ^/~&h float+ length
compute character frequencies by partitioning the input list of characters by equality, and transforming each equivalence class to a pair containing its member and its cardinality represented as a floating point number^V(div@rrPlX,~&rlNVNC)^*D(plus:-0.@rS,~&)
construct a list of unary trees, one for each character class, with its normalized frequency in the root, and the character in the leaf~&itB->h
while the list contains more than one tree, do the following, and when done take the head of the listfleq-<&d;
sort the trees in increasing order by their roots^C\~&tt @hthPX ^V\~&lrNCC plus@bd
change the first two trees in the sorted list to a single binary tree whose root is the sum of their roots*^
visit the following function on each node of the tree obtained from the loop and propagate the results upward from the leaves~&v?\~&iNC
if the node is a leaf, construct a singleton list containing the pair of its root (a character) and the empty string (of bits)@v ~&t?\~&h
if there is only a single subtree, propagate the result already obtained for it~&plrDSLrnPlrmPCAS/'01'
otherwise there are two subtrees, hence two lists previously computed results propagating upward, so insert a zero into all of the bit strings in the results on the left, and a one into all the ones on the right, concatenate the left and right results, and propagate the contatenation upward
- Output:
< `r: '00000', `l: '00001', `c: '00010', `u: '00011', `n: '001', `m: '0100', `h: '0101', `g: '01100', `d: '01101', `o: '0111', `s: '1000', `t: '10010', `p: '100110', `x: '100111', `a: '1010', `f: '1011', `i: '1100', `e: '1101', ` : '111'>
Wren
Note that the results differ from Java because the PriorityQueue implementation is not the same.
import"./queue"forPriorityQueueclassHuffmanTree{constructnew(freq){_freq=freq}freq{_freq}compareTo(tree){_freq-tree.freq}}classHuffmanLeafisHuffmanTree{constructnew(freq,val){super(freq)_val=val}val{_val}}classHuffmanNodeisHuffmanTree{constructnew(l,r){super(l.freq+r.freq)_left=l_right=r}left{_left}right{_right}}varbuildTree=Fn.new{|charFreqs|vartrees=PriorityQueue.new()varindex=0for(freqincharFreqs){if(freq>0)trees.push(HuffmanLeaf.new(freq,String.fromByte(index)),-freq)index=index+1}if(trees.count==0)Fiber.abort("Something went wrong!")while(trees.count>1){vara=trees.pop()varb=trees.pop()varh=HuffmanNode.new(a[0],b[0])trees.push(h,-h.freq)}returntrees.pop()[0]}varprintCodes// recursive printCodes=Fn.new{|tree,prefix|if(treeisHuffmanLeaf){System.print("%(tree.val)\t%(tree.freq)\t%(prefix)")}elseif(treeisHuffmanNode){// traverse leftprefix=prefix+"0"printCodes.call(tree.left,prefix)prefix=prefix[0...-1]// traverse rightprefix=prefix+"1"printCodes.call(tree.right,prefix)prefix=prefix[0...-1]}}vartest="this is an example for huffman encoding"varfreqs=List.filled(256,0)for(cintest){varix=c.bytes[0]freqs[ix]=freqs[ix]+1}vartree=buildTree.call(freqs)System.print("SYMBOL\tWEIGHT\tHUFFMAN CODE")printCodes.call(tree,"")
- Output:
SYMBOL WEIGHT HUFFMAN CODE n 4 000 m 2 0010 o 2 0011 s 2 0100 c 1 01010 d 1 01011 g 1 01100 l 1 01101 p 1 01110 r 1 01111 t 1 10000 u 1 10001 a 3 1001 6 101 e 3 1100 f 3 1101 i 3 1110 x 1 11110 h 2 11111
Zig
conststd=@import("std");constNode=struct{frequency:usize,kind:union(enum){internal:struct{left:*Node,right:*Node,},leaf:u8,},fninitLeaf(frequency:usize,ch:u8)Node{return.{.frequency=frequency,.kind=.{.leaf=ch},};}fninitInternal(allocator:std.mem.Allocator,left_child:Node,right_child:Node,)!Node{constleft=tryallocator.create(Node);constright=tryallocator.create(Node);left.*=left_child;right.*=right_child;return.{.frequency=left_child.frequency+right_child.frequency,.kind=.{.internal=.{.left=left,.right=right}},};}fndeinit(self:Node,allocator:std.mem.Allocator)void{switch(self.kind){.internal=>|inner|{inner.left.deinit(allocator);inner.right.deinit(allocator);allocator.destroy(inner.left);allocator.destroy(inner.right);},.leaf=>{},}}fncompare(context:void,a:Node,b:Node)std.math.Order{_=context;returnstd.math.order(a.frequency,b.frequency);}};pubfnmain()!void{consttext="this is an example for huffman encoding";vargpa=std.heap.GeneralPurposeAllocator(.{}){};deferstd.debug.assert(gpa.deinit()==.ok);constallocator=gpa.allocator();varfrequencies=std.AutoHashMap(u8,usize).init(allocator);deferfrequencies.deinit();for(text)|ch|{constgop=tryfrequencies.getOrPut(ch);if(gop.found_existing){gop.value_ptr.*+=1;}else{gop.value_ptr.*=1;}}varprioritized_frequencies=std.PriorityQueue(Node,void,Node.compare).init(allocator,{});deferprioritized_frequencies.deinit();varfreq_it=frequencies.iterator();while(freq_it.next())|counted_char|{tryprioritized_frequencies.add(Node.initLeaf(counted_char.value_ptr.*,counted_char.key_ptr.*,));}while(prioritized_frequencies.len>1){tryprioritized_frequencies.add(tryNode.initInternal(allocator,prioritized_frequencies.remove(),prioritized_frequencies.remove(),));}constroot=prioritized_frequencies.items[0];deferroot.deinit(allocator);varcodes=std.AutoArrayHashMap(u8,[]constu8).init(allocator);defercodes.deinit();vararena=std.heap.ArenaAllocator.init(allocator);deferarena.deinit();trygenerateCodes(arena.allocator(),&root,&.{},&codes);conststdout=std.io.getStdOut().writer();varcode_it=codes.iterator();while(code_it.next())|item|{trystdout.print("{c}: {s}\n",.{item.key_ptr.*,item.value_ptr.*,});}}fngenerateCodes(arena:std.mem.Allocator,node:*constNode,prefix:[]constu8,out_codes:*std.AutoArrayHashMap(u8,[]constu8),)!void{switch(node.kind){.internal=>|inner|{constleft_prefix=tryarena.alloc(u8,prefix.len+1);std.mem.copy(u8,left_prefix,prefix);left_prefix[prefix.len]='0';trygenerateCodes(arena,inner.left,left_prefix,out_codes);constright_prefix=tryarena.alloc(u8,prefix.len+1);std.mem.copy(u8,right_prefix,prefix);right_prefix[prefix.len]='1';trygenerateCodes(arena,inner.right,right_prefix,out_codes);},.leaf=>|ch|{tryout_codes.put(ch,prefix);},}}
- Output:
n: 000 m: 0010 d: 00110 t: 00111 o: 0100 h: 0101 x: 01100 r: 01101 c: 01110 u: 01111 s: 1000 g: 10010 p: 100110 l: 100111 a: 1010 i: 1011 : 110 e: 1110 f: 1111
zkl
This code was adapted from Perl, Python and most of the other examples.
fcn buildHuffman(text){ //-->(encode dictionary, decode dictionary) ft:=Dictionary(); foreach c in (text){ ft[c]=ft.find(c,0)+1 } // leafs w/count // build the tree, which is a list of lists of ... tree:=ft.pump(List,fcn([(c,cnt)]){ //-->L( (cnt, ((sym,code))), ...) L(cnt, L(L(c,""))) }).copy(); // make it writable while(tree.len()>1){ // fake up a [lame] priorty queue tree=tree.sort(fcn(a,b){ a[0]>b[0] }); //prioritize high to low a,b:=tree.pop(-2,2); //remove 2 least frequent symbols mc:=fcn(n,c){ n[1] = c + n[1]; }; //(sym,code),"0"|"1" a[1].apply2(mc,"0"); b[1].apply2(mc,"1"); // mc(a[1],"0") tree.append( L(a[0]+b[0],a[1].extend(b[1])) ); //(a,b)-->new node }//-->L(L(39, L( L(" ","000"),L("e","0010"),L("a","0011") ... tree=tree[0][1].pump(List,fcn(i){ // flatten rather than traverse if(T.isType(i))return(Void.Recurse,i,self.fcn); i }); encodeTable:=tree.toDictionary(); // symbol:Huffman code decodeTable:=encodeTable.pump(Dictionary(),"reverse"); // code:symbol return(encodeTable,decodeTable); }
fcn encode(text,table){ text.pump(String,table.get) } fcn decode(bits,table){ // this is a horrible decoder, for testing only w:=bits.walker(); sink:=Sink(String); try{ s:=""; while(1){ s+=w.next(); if(c:=table.find(s)) { sink.write(c); s=""; } }}catch(TheEnd){} sink.close(); }
text:="this is an example for huffman encoding"; encodeTable,decodeTable := buildHuffman(text); encodeTable.pump(Console.println,fcn(kv){"%s : %s".fmt(kv.xplode())}); e:=encode(text,encodeTable); "Encode %d characters (%d bits) to %d bits (%d bytes):" .fmt(text.len(),text.len()*8,e.len(),(e.len()+7)/8).println(); println(e); 0'|Bits decoded to: "%s"|.fmt(decode(e,decodeTable)).println();
- Output:
a : 0011 c : 10101 d : 10100 e : 0010 f : 0110 g : 10111 h : 1000 i : 0101 l : 10110 m : 1001 n : 110 o : 01000 p : 11111 r : 11100 s : 0111 t : 01001 u : 11101 x : 11110 : 000 Encode 39 characters (312 bits) to 157 bits (20 bytes): 0100110000101011100001010111000001111000000101111000111001111111011000100000110010001110000010001110101100110100100111100000010110101010100010100010111010111 Bits decoded to: "this is an example for huffman encoding"
- Programming Tasks
- Compression
- 11l
- Ada
- BBC BASIC
- BBC BASIC examples needing attention
- Examples needing attention
- Bracmat
- C
- C sharp
- Pages with broken file links
- C++
- Clojure
- CoffeeScript
- Common Lisp
- D
- Eiffel
- Erlang
- F Sharp
- Factor
- Fantom
- Fortran
- FreeBASIC
- Go
- Groovy
- Haskell
- Icon
- Unicon
- Icon Programming Library
- J
- Java
- JavaScript
- Julia
- Kotlin
- Lua
- M2000 Interpreter
- Mathematica
- Wolfram Language
- Nim
- Oberon-2
- Objective-C
- OCaml
- Ol
- Perl
- Phix
- PHP
- Picat
- PicoLisp
- PL/I
- PowerShell
- Prolog
- PureBasic
- Python
- Quackery
- Racket
- Raku
- Red
- REXX
- Ruby
- RubyGems
- Rust
- Scala
- Scala (Alternate version)
- Scheme
- SETL
- Sidef
- Standard ML
- Swift
- Tcl
- Tcllib
- UNIX Shell
- Ursala
- Wren
- Wren-queue
- Zig
- Zkl
- Brlcad/Omit
- GUISS/Omit
- Lilypond/Omit
- Openscad/Omit
- TPP/Omit