English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
Huffman coding is a lossless data compression algorithm. In this algorithm, variable length codes are assigned to input different characters. The code length is related to the frequency of character use. The most frequently used characters have the shortest codes, while longer codes are used for the least frequently used characters.
There are mainly two parts. The first is to create the Huffman tree, and the other is to traverse the tree to find the codes.
For example, consider some strings 'YYYZXXYYX', the frequency of character Y is greater than X, and the frequency of character Z is the smallest. Therefore, the code length of Y is less than X, and the code length of X will be less than Z.
The complexity of assigning codes based on the frequency of each character is O(n log n)
Input -String of different characters, for example, 'ACCEBFFFFAAXXBLKE'.
Output -The codes for different characters:
Data: K, Frequency: 1, Code: 0000 Data: L, Frequency: 1, Code: 0001 Data: E, Frequency: 2, Code: 001 Data: F, Frequency: 4, Code: 01 Data: B, Frequency: 2, Code: 100 Data: C, Frequency: 2, Code: 101 Data: X, Frequency: 2, Code: 110 Data: A, Frequency: 3, Code: 111
Input-String of different characters.
Output-The code for each character.
Begin define a node with character, frequency, left and right child of the node for Huffman tree. create a list 'freq' to store the frequency of each character, initially all are 0 for each character c in the string do increase the frequency for character ch in freq list. done for all type of character ch do if the frequency of ch is non zero then add ch and its frequency as a node of priority queue Q. done while Q is not empty do remove item from Q and assign it to left child of node remove item from Q and assign to the right child of node traverse the node to find the assigned code done End
Input-Huffman tree node n, and the code assigned in the last call
Output-code assigned to each character
if left child of node n ≠ φ then traverseNode(leftChild(n), code+’0’) //traverse through the left child traverseNode(rightChild(n), code+’1’) //traverse through the right child else display the character and data of current node.
#include<iostream> #include<queue> #include<string> using namespace std; struct node{ int freq; char data; const node *child0, *child1; node(char d, int f = -1{ //assign values in the node data = d; freq = f; child0 = NULL; child1 = NULL; } node(const node *c0, const node *c1{ data = 0; freq = c0->freq + c1->freq; child0=c0; child1=c1; } bool operator<(const node &a) const { //< 运算符用于在队列中查找优先级 return freq > a.freq; } void traverse(string code = "")const{ if(child0!=NULL){ child0->traverse(code+'0'); //以代码作为左子节点添加0 child1->traverse(code+'1); //添加 1 以代码作为右子节点 }else{ cout << "Data: " << data << ", Frequency: " << freq << ", Code: " << code << endl; } } }; void huffmanCoding(string str){ priority_queue<node> qu; int frequency[256]; for(int i = 0; i <256; i++) frequency[i] = 0; //清除所有频率 for(int i = 0; i < str.size(); i++{ frequency[int(str[i])]++; //增加频率 } for(int i = 0; i <256; i++{ if(frequency[i]){ qu.push(node(i, frequency[i])); } } while(qu.size() >1{ node *c0 = new node(qu.top()); //获取左子节点并从队列中移除 qu.pop(); node *c1 = new node(qu.top()); //获取右子节点并从队列中移除 qu.pop(); qu.push(node(c0, c1)); //将两个子节点的频率相加,然后再次加入队列 } cout << "The Huffman Code: " << endl; qu.top().traverse(); //遍历树以获取代码 } main(){ string str = "ACCEBFFFFAAXXBLKE"; //arbitrary string to get frequency huffmanCoding(str); }
Output Result
The Huffman Code: Data: K, Frequency: 1, Code: 0000 Data: L, Frequency: 1, Code: 0001 Data: E, Frequency: 2, Code: 001 Data: F, Frequency: 4, Code: 01 Data: B, Frequency: 2, Code: 100 Data: C, Frequency: 2, Code: 101 Data: X, Frequency: 2, Code: 110 Data: A, Frequency: 3, Code: 111