📜 ⬆️ ⬇️

Python internal list

I bring to your attention an article based on the publication of Laurent Luce about the implementation of working with lists in CPython. It can be useful to novice programmers in Python, or preparing for an interview. C functions are shown in an abbreviated version and with my comments. The full text of the functions can be found in the source code of CPython 2.7.

This article describes the implementation of the list object in CPython, the most popular implementation of Python. Lists in Python are a powerful tool, and it is interesting to know how they are arranged inside. Take a look at a simple script that adds several integer values ​​to the list and displays them:

>>> l = [] >>> l.append(1) >>> l.append(2) >>> l.append(3) >>> l [1, 2, 3] >>> for e in l: ... print e ... 1 2 3 

As you can see, the list is an iterable object.
')
C-structure of the list object

The list object in CPython is represented by the following structure in C. ob_item is a list of pointers to list items, and allocated is the amount of allocated memory.

 typedef struct { PyObject_VAR_HEAD PyObject **ob_item; Py_ssize_t allocated; } PyListObject; 

List initialization

Let's see what happens when creating an empty list, for example l = []. The PyList_New (0) function is called:

 /* size -   */ PyList_New(Py_ssize_t size) { //      nbytes = size * sizeof(PyObject *); //  ob_item if (size <= 0) op->ob_item = NULL; else { op->ob_item = (PyObject **) PyMem_MALLOC(nbytes); memset(op->ob_item, 0, nbytes); } //     op->allocated = size; return (PyObject *) op; } 

It is important to understand the difference between the allocated memory and the size of the list. The size of the list is the same as len (l). allocated is the amount of allocated memory, which can often be larger than the size of the list. This is done to prevent realloc from being called whenever elements are added. We will understand this in more detail later.

Adding items

We add an integer to the list: l.append (1). What happens at this moment? The C function PyList_Append () is called, which, in turn, calls the function app1 ():

 /* self -     v -     */ app1(PyListObject *self, PyObject *v) { //      ,   -  (-1) if (list_resize(self, n+1) == -1) return -1; //    ,    n  v PyList_SET_ITEM(self, n, v); //     0 return 0; } 

Let's look at the list_resize () function. It allocates more memory than necessary to prevent frequent calls to list_resize. Memory allocation is defined by the following sequence: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...

 /* self -     newsize -    */ list_resize(PyListObject *self, Py_ssize_t newsize) { //         -     if (allocated >= newsize && newsize >= (allocated >> 1)) { assert(self->ob_item != NULL || newsize == 0); Py_SIZE(self) = newsize; return 0; } /*      ,    *      */ new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6); /*    */ if (new_allocated > PY_SIZE_MAX - newsize) { PyErr_NoMemory(); return -1; } else { new_allocated += newsize; } //   if (new_allocated <= (PY_SIZE_MAX / sizeof(PyObject *))) PyMem_RESIZE(items, PyObject *, new_allocated); return 0; } 

4 cells are now allocated for list items and the first one is an integer 1. In the image you can see that l [0] indicates an integer number that we added to the list. Squares marked with a dashed line indicate allocated, but not yet used memory.

The operation of adding an item to the list has O (1) complexity.

image

Add another item to the list: l.append (2). list_resize is called with n + 1 = 2, but since the size of the allocated memory is 4, there is no need to allocate additional memory. The same thing happens when you add two more integers: l.append (3), l.append (4). The picture shows what we have in the end:

image

Insert

Insert a new integer into position 1: l.insert (1.5) and see what happens at a low level. The ins1 () function is called:

 /* self -     where - ,      v -     */ ins1(PyListObject *self, Py_ssize_t where, PyObject *v) { n = Py_SIZE(self); //          if (list_resize(self, n+1) == -1) return -1; //   ,      if (where < 0) { where += n; if (where < 0) where = 0; } //     ,     if (where > n) where = n; //         items = self->ob_item; for (i = n; --i >= where; ) items[i+1] = items[i]; //    items[where] = v; return 0; } 

image

Squares marked with a dashed line indicate allocated, but not yet used memory. So, 8 cells are highlighted, but the size of the list is now 5.

The complexity of the insert operation is O (n).

Popping

When you push an item from the list, l.pop (), listpop () is called. The list_resize is called inside listpop () and if the new size is less than half of the allocated memory, then the list is compressed.

 /* self -     args - ,     */ listpop(PyListObject *self, PyObject *args) { //      Py_ssize_t i = -1; //    -  NULL if (Py_SIZE(self) == 0) { return NULL; } //    -    if (i < 0) i += Py_SIZE(self); v = self->ob_item[i]; //     -  list_resize     if (i == Py_SIZE(self) - 1) { status = list_resize(self, Py_SIZE(self) - 1); return v; } //      ,       status = list_ass_slice(self, i, i+1, (PyObject *)NULL); return v; } 

The complexity of the operation is O (1).

image

You can observe that the 4th cell still points to the whole, but it is important to understand that our list size is 4.

Push out another item. In list_resize (), size-1 = 4-1 = 3 fewer than half the selected cells, so the list is compressed to 6 cells and the new list size becomes 3.

You can observe that the 3 and 4 cells still indicate the whole, but it is important to understand that the size of the list is 3.

image

Deletion

In Python, the list object has a method for deleting an element with a given value: l.remove (5). Called listremove ().

 /* self -     v -     */ listremove(PyListObject *self, PyObject *v) { Py_ssize_t i; //    for (i = 0; i < Py_SIZE(self); i++) { int cmp = PyObject_RichCompareBool(self->ob_item[i], v, Py_EQ); //     -     Python None if (cmp > 0) { if (list_ass_slice(self, i, i+1, (PyObject *)NULL) == 0) Py_RETURN_NONE; return NULL; } else if (cmp < 0) return NULL; } //     -  NULL return NULL; } 


The complexity of the delete operation is O (n).

image

Source: https://habr.com/ru/post/273045/


All Articles