English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

Tutorial on Redis Source Code Analysis: In-depth Explanation of Ziplist

Preface

The compressed list (ziplist) is composed of a series of specially encoded memory blocks and plays a very important role in the data storage optimization of Redis. This article summarizes the compressed list ziplist, which is frequently used in Redis. It is not an exaggeration to say that this data structure is everywhere in Redis, except for the list, many other data structures also use it as a transition, such as the SortedSet mentioned in the previous article. There is no need to say more, let's take a detailed look at the introduction.

1. Brief introduction to the data structure of compressed list ziplist

First, let's take a look at the structure of ziplist as a whole, as shown in the following figure:

Structure diagram of compressed list ziplist

It can be seen that there are many fields and different byte sizes, but this is exactly the essence of the compressed list. Let's summarize them one by one.

 

field meaning
zlbytes This field is the first field of the compressed list, which is an unsigned integer, occupying4bytes. It is used to indicate the number of bytes occupied by the entire compressed list (including itself).
zltail An unsigned integer, occupying4bytes. It is used to store the offset from the head of the compressed list to the last entry (not the tail element zlend), which is used in scenarios where fast jumping to the end of the list is required.
zllen An unsigned integer, occupying2bytes. It is used to store the total number of entries contained in the compressed list.
zlend Special entry used to represent the end of a compressed list. It occupies one byte and its value is always255。

Summarized as the head and tail of ziplist, let's summarize the entry field, which is of great importance.

Generally, an entry consists of prevlen, encoding, and entry-The data field consists of three fields, but when the entry is a very small integer, it will omit the entry according to the encoding.-The data field. The following is a summary in order:

The first is the field prevlen:represents the length of the previous entry, which has two encoding methods.

  • when the length is less than255bytes, stored with one byte.
  • when the length is greater than or equal to255when, stored with five bytes, the first byte will be set to255represents the length of the previous entry is represented by the next four bytes.

Then comes the field encoding:it will use different encoding methods according to the content of the current element, as follows:

1、if the element content is a string, the values of encoding are:

00xx xxxx :00开头表示 the length of the string uses6bits represent.

01xx xxxx | xxxx xxxx :01The starting bit indicates that the length of the string is determined by14bit represents, this14bits are used for big-endian storage.

1000 0000 | xxxx xxxx | xxxx xxxx | xxxx xxxx | xxxx xxxx :10The starting bit indicates that the next four bytes are the length of the string, this32bits are used for big-endian storage.

2、if the element content is a number, the values of encoding are:

1100 0000:represents the number of bytes occupied after2bytes.

1101 0000:represents the number of bytes occupied after4bytes.

1110 0000:represents the number of bytes occupied after8bytes.

1111 0000 :represents the number of bytes occupied after3bytes.

1111 1110 :represents the number of bytes occupied after1bytes.

1111 1111 :represents the last element in the compressed linked list (special encoding).

1111 xxxx :represents using the last4bit represents 0~12integer, since 0000,1110and1111Three have been occupied, that is to say, the four digits of xxxx here can only represent 0001~1101,converted to decimal is the number1~13,but Redis stipulates that it is used to represent 0~12,therefore when encountering this encoding, we need to take out the last four digits and subtract1to get the correct value.

The last is the field entry-data:If the value of the element is a string, then save the value of the element itself. If the value of the element is a very small number (by the encoding rules above, that is 0~12),then there is no such field.

The encoding of the compressed linked list is very complex, but this is also the essence of this data structure. Let's look at an example together:

Note:This example is mentioned in the Redis source code

//consists of elements2,5composing the compressed linked list
[0f 00 00 00] [0c 00 00 00] [02 00] [00 f3] [02 f6] [ff]
 |  |  | | | |
 zlbytes zltail entries "2" "5end
//The encoded content of the string "Hello World"
[02] [0b] [48 65 6c 6c 6f 20 57 6f 72 6c 64]

The above is a segment represented in hexadecimal2,5a compressed linked list composed of two elements.

  • The first four bytes represent the total number of bytes occupied by the ziplist, because Redis uses little-endian storage, so15A byte is represented as 0f 00 00 00.
  • The following4bytes to represent the offset of the tail element, which is the offset from the ziplist header (zlbytes) to the last element (note: not the tail node), also using little-endian storage, so it is represented as 0c 00 00 00.
  • Behind it is the zllen field composed of two bytes, indicating the number of elements. In our example, there are two elements, plus little-endian storage, so it is represented as 02 00.
  • Next is the element itself. It is first represented by a variable-length word to indicate the length of the previous element.2as the first element, its size of the previous element is 0, so it occupies one byte 00. According to the encoding rules we mentioned above, element2and5belonging to 0~12The numbers between need to be represented by1111 is encoded in the format of xxx format,2converted to binary as 0010, plus1is 0011(plus1(the reason is explained above), combined to form element2is represented as 00 f3Similarly, element5represented as 02 f6。
  • The last is the tail element, which uses the unused encoding1111 1111That is255。

Next, let's insert a string element 'Hello World' at the end of this compressed linked list. First, let's see how to encode the string. According to the agreed encoding rules, the length of the previous element must be represented by a byte first. Here, the previous element is5It takes up two bytes, so the first byte is used to represent the length of the previous element 02The next is the encoding of the string. The length of the string to be added is11(spaces are also counted), converted to binary is1011According to the encoding rules of the string, using 0000 1011That is, converted to hexadecimal is 0b, and then added to the ASCII code of our string itself, combining to get the encoding of the string: [02] [0b] [48 65 6c 6c 6f 20 57 6f 72 6c 64]。

At this point, the entire compressed linked list becomes:

[0f 00 00 00] [0c 00 00 00] [02 00] [00 f3] [02 f6] [02 0b 48 65 6c 6c 6f 20 57 6f 72 6c 64] [ff]
 |  |  | | |   |   |
 zlbytes zltail entries "2" "5"   "Hello World"  end

Part Two: Analysis of ziplist Command Source Code

After understanding the encoding rules above, let's take a look at the source code of some operations on the compressed linked list ziplist. This article summarizes the basic principles of the compressed linked list through the creation, insertion, deletion, and search of elements.

Firstly, let's create:

//Define the header size of the compressed linked list composed of zlbytes, zltail, and zllen
#define ZIPLIST_HEADER_SIZE (sizeof(uint32_t)*2+sizeof(uint16_t))
//Create a compressed linked list and return a pointer to the list
unsigned char *ziplistNew(void) {
 //Here is why+1This is because the tail element occupies one byte, which is also the minimum size of a compressed list
 unsigned int bytes = ZIPLIST_HEADER_SIZE+1;
 //Allocate memory
 unsigned char *zl = zmalloc(bytes);
 //Set the size of the list
 ZIPLIST_BYTES(zl) = intrev32if be(bytes);
 //Set the offset of the last element
 ZIPLIST_TAIL_OFFSET(zl) = intrev32if be(ZIPLIST_HEADER_SIZE);
 //Set the number of elements
 ZIPLIST_LENGTH(zl) = 0;
 //Set the tail element (the above is just to apply for space)
 zl[bytes-1] = ZIP_END;
 return zl;
}

The logic for creating a compressed list is very simple, which is to apply for fixed space containing the header and tail nodes, and then initialize the list context.

Compared to creation, the source code for adding elements is very long. To facilitate understanding, let's sort out the logic of adding elements before looking at the source code.

  • First, we need to find the size of the preceding element at the specified insertion position because this attribute is part of the new element.
  • Then we need to encode the current element to obtain the corresponding encoding field and the field of the actual element value.
  • The prevlen field of the successor element of the newly inserted element needs to be updated because the element before it has changed. This may cause a cascading update (there is also this problem when deleting elements), because the size of the prevlen field is variable.

The above three steps are the core steps, and the rest include operations such as updating the tail node offset, modifying the number of elements in the list, etc. Of course, since the compressed list is implemented based on an array, memory copying is indispensable when inserting or deleting elements.

After summarizing the above steps, we start to analyze the source code step by step, which is quite long, let's take a look slowly:

//The four parameters in order are: compressed list, insertion position (the new element is inserted after the p element), element value, and element length
unsigned char *__ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen) {
 //Here we save the length of the current list
 size_t curlen = intrev32if be(ZIPLIST_BYTES(zl)), reqlen;
 unsigned int prevlensize, prevlen = 0;
 size_t offset;
 int nextdiff = 0;
 unsigned char encoding = 0;
 long long value = 123456789;
 zlentry tail;
 //1. The purpose of this logic is to obtain the length of the preceding element
 if (p[0] != ZIP_END) {
 //If the element at the insertion position is not the last element, then get the length of the element
 //This is split for convenience later, prevlensize saves the length of the encoding field, and prevlen saves the length of the element itself
 ZIP_DECODE_PREVLEN(p, prevlensize, prevlen);
 } else {
 //If the element at the insertion position is the end element, then the new element needs to be inserted at the end of the linked list
 //Get the last element of the linked list (note: the last element is not equal to the end element)
 unsigned char *ptail = ZIPLIST_ENTRY_TAIL(zl);
 if (ptail[0] != ZIP_END) {
  //If the last element is not the end element, then this element is the preceding element of the new element, and the length of this element is obtained
  prevlen = zipRawEntryLength(ptail);
 }
 //Otherwise, it means that the linked list has no elements at all, that is, the length of the preceding element of the new element is 0
 }
 //2. Encode the new element to get the total size of the new element
 if (zipTryEncoding(s,slen,&value,&encoding)) {
 //If it is a number, then encode it as a number
 reqlen = zipIntSize(encoding);
 } else {
 //The element length is the length of the string
 reqlen = slen;
 }
 //The total length of the new element is the length of the value plus the length of the preceding element and the encoding element
 reqlen += zipStorePrevEntryLength(NULL,prevlen);
 reqlen += zipStoreEntryEncoding(NULL,encoding,slen);
 //If the insertion position is not at the end of the linked list, then the subsequent elements of the new element's prevlen field need to be judged
 //According to the encoding rules above, this field may need to be expanded
 int forcelarge = 0;
 nextdiff = (p[0] != ZIP_END) &63; zipPrevLenByteDiff(p,reqlen) : 0;
 if (nextdiff == -4 && reqlen < 4) {}}
 nextdiff = 0;
 forcelarge = 1;
 }
 //Expand the array to the new calculated size, since the address of the new array may change, therefore, the old offset is recorded here
 offset = p-zl;
 zl = ziplistResize(zl,curlen+reqlen+nextdiff);
 //Calculate the insertion position on the new array
 p = zl+offset;
 //If the newly inserted element is not at the end of the linked list
 if (p[0] != ZIP_END) {
 //copy the new element and its subsequent elements to the new array,-1for the last element
 memmove(p+reqlen,p-nextdiff,curlen-offset-1+nextdiff);
 //The prevlen field of the subsequent element of the new element
 if (forcelarge)
  zipStorePrevEntryLengthLarge(p+reqlen,reqlen);
 else
  zipStorePrevEntryLength(p+reqlen,reqlen);
 //Update the offset of the last element
 ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+reqlen);
 //When the subsequent element of the new element is not only one, the new tail element offset needs to be added with nextdiff
 zipEntry(p+reqlen, &tail);
 if (p[reqlen+tail.headersize+tail.len] != ZIP_END) {
  ZIPLIST_TAIL_OFFSET(zl) =
  intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);
 }
 } else {
 //The new element is inserted at the end of the list, and the offset of the tail is updated
 ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(p-zl);
 }
 //nextdiff != 0 indicates that the length of the subsequent element has changed, so we need to cascade update the subsequent elements of the subsequent elements
 if (nextdiff != 0) {
 offset = p-zl;
 zl = __ziplistCascadeUpdate(zl,p+reqlen);
 p = zl+offset;
 }
 //Write the new element into the list
 p += zipStorePrevEntryLength(p,prevlen);
 p += zipStoreEntryEncoding(p,encoding,slen);
 if (ZIP_IS_STR(encoding)) {
 memcpy(p,s,slen);
 } else {
 zipSaveInteger(p,value,encoding);
 }
 //The number of elements stored in the compressed ziplist+1
 ZIPLIST_INCR_LENGTH(zl,1;
 return zl;
}

After analyzing the logic of inserting elements, take a deep breath, it's really long, and there are many details.

Next, let's take a look at the process of deleting elements. Compared to adding, deletion is relatively simple. After clearing the current element, subsequent elements need to be copied one by one (this is the difference between array and linked list data structures), and then pay attention to whether cascading updates are needed. See the code above:

//The parameters are in order: compressed ziplist, the starting position of the element to be deleted, and the number of elements to be deleted
unsigned char *__ziplistDelete(unsigned char *zl, unsigned char *p, unsigned int num) {
 unsigned int i, totlen, deleted = 0;
 size_t offset;
 int nextdiff = 0;
 zlentry first, tail;
 //Read the element pointed to by p and store it in first
 zipEntry(p, &first);
 for (i = 0; p[0] != ZIP_END && i < num; i++) {}}
  //Count the total length deleted
  p += zipRawEntryLength(p);
  //Count the number of elements actually deleted
  deleted++;
 }
 //The number of bytes to be deleted
 totlen = p-first.p;
 if (totlen > 0) {
  if (p[0] != ZIP_END) {
   //Determine if the size of the element has changed
   nextdiff = zipPrevLenByteDiff(p, first.prevrawlen);
   //Modify the prevlen field of the element after the deleted element
   p -= nextdiff;
   zipStorePrevEntryLength(p, first.prevrawlen);
   //Update the offset of the end element
   ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))-totlen);
   //When the successor element of the deleted element is not just one, the offset of the new end element needs to be added by nextdiff
   zipEntry(p, &tail);
   if (p[tail.headersize+tail.len] != ZIP_END) {
    ZIPLIST_TAIL_OFFSET(zl) =
     intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);
   }
   //Move the remaining elements to the front
   memmove(first.p, p,
    intrev32ifbe(ZIPLIST_BYTES(zl))-(p-zl)-1;
  } else {
   //Directly delete to the end of the list, so no memory copy is required, just modify the offset of the last element
   ZIPLIST_TAIL_OFFSET(zl) =
    intrev32ifbe((first.p-zl)-first.prevrawlen);
  }
  //Resize the array size
  offset = first.p-zl;
  zl = ziplistResize(zl, intrev32ifbe(ZIPLIST_BYTES(zl))-totlen+nextdiff);
  //Modify the number of elements in the list
  ZIPLIST_INCR_LENGTH(zl,-deleted);
  p = zl+offset;
  //nextdiff != 0 indicates that the size of the element has changed, and a cascading update is required
  if (nextdiff != 0)
   zl = __ziplistCascadeUpdate(zl, p);
 }
 return zl;
}

Let's take a look at the element search operation:

//The parameters are in order: compressed list, value of the element to be found, length of the element to be found, number of elements skipped between comparisons
unsigned char *ziplistFind(unsigned char *p, unsigned char *vstr, unsigned int vlen, unsigned int skip) {
 int skipcnt = 0;
 unsigned char vencoding = 0;
 long long vll = 0;
 //Keep looping as long as it's not the end element
 while (p[0] != ZIP_END) {
  unsigned int prevlensize, encoding, lensize, len;
  unsigned char *q;
  //Query the prevlen field of the current element in the linked list
  ZIP_DECODE_PREVLENSIZE(p, prevlensize);
  //Query the encoding field of the current element in the linked list
  ZIP_DECODE_LENGTH(p + prevlensize, encoding, lensize, len);
  q = p + prevlensize + lensize;
  //Reached the position of the element to be compared
  if (skipcnt == 0) {
   //If the current element in the linked list is a string
   if (ZIP_IS_STR(encoding)) {
    //Compare with the string to be searched
    if (len == vlen && memcmp(q, vstr, vlen) == 0) {
     //Matching success, then the pointer of the element to be searched
     return p;
    }
   } else {
    //If the current element is a number and vencoding is 0
    if (vencoding == 0) {
     //Attempt to encode the value to be searched as a number
     if (!zipTryEncoding(vstr, vlen, &vll, &vencoding)) {
      //If the encoding fails, it means the element to be searched is not a number at all
      //Then set vencoding to the maximum value, serving as a marker
      //That is, there is no need to try to encode the value to be searched as a number later
      vencoding = UCHAR_MAX;
     }
     assert(vencoding);
    }
    //If vencoding is not UCHAR_MAX, it means the element to be searched is successfully encoded as a number
    if (vencoding != UCHAR_MAX) {
     //Extract the current element from the linked list by number
     long long ll = zipLoadInteger(q, encoding);
     if (ll == vll) {
      //If two numbers are equal, return the element pointer
      return p;
     }
    }
   }
   //Reset the number of elements to be skipped
   skipcnt = skip;
  } else {
   //Skip elements
   skipcnt--;
  }
  //Traverse the next element
  p = q + len;
 }
 //Traversed the entire list, but did not find the element
 return NULL;
}

Here we have summarized the principles of the four basic operations of creating, adding, deleting, and searching compressed lists.

Summary of Compressed List ziplist Data Structure

The ziplist compression in redis is very widely used, and it is considered to be one of the most characteristic data structures in redis. The essence of this data structure actually lies in the coding rules summarized in the first part of the article. After understanding this part, you can simply read the source code to deepen your understanding.

It has to be said that the source code part is indeed a bit long, and it really requires patience. I was also very confused in the process of reading it. If you are interested in the source code, I suggest that you first sort out what needs to be done for a certain operation (such as the insertion element mentioned above), and then look at the code, which may be easier to understand.

Finally, leave a small question. Since the ziplist in redis has such a high memory utilization, why does it still provide the ordinary linked list structure for users to use?
There is no standard answer to this question. It depends on different people's opinions.

Summary

That's all for this article. I hope the content of this article has certain reference value for everyone's learning or work. If you have any questions, you can leave a message for communication. Thank you for your support of the呐喊 tutorial.

Declaration: The content of this article is from the Internet, and the copyright belongs to the original author. The content is contributed and uploaded by Internet users spontaneously. This website does not own the copyright, does not edit the content manually, and does not assume any relevant legal liability. If you find any content suspected of copyright infringement, please send an email to: notice#oldtoolbag.com (Please replace # with @ when sending an email for reporting. Provide relevant evidence, and once verified, this site will immediately delete the infringing content.)

You May Also Like