zend_string
, in the second - zend_ulong
.zval
structure, nowhere else. Zval
's can contain data of any type.HashTable
structure: struct _zend_array { zend_refcounted_h gc; union { struct { ZEND_ENDIAN_LOHI_4( zend_uchar flags, zend_uchar nApplyCount, zend_uchar nIteratorsCount, zend_uchar reserve) } v; uint32_t flags; /* 32 */ } u; uint32_t nTableMask; /* — nTableSize */ Bucket *arData; /* */ uint32_t nNumUsed; /* arData */ uint32_t nNumOfElements; /* arData */ uint32_t nTableSize; /* , */ uint32_t nInternalPointer; /* */ zend_long nNextFreeElement; /* */ dtor_func_t pDestructor; /* */ };
arData
, which is a kind of pointer to the memory area of the Bucket
chain. Bucket
itself is a single cell in the array: typedef struct _Bucket { zval val; /* */ zend_ulong h; /* ( ) */ zend_string *key; /* NULL */ } Bucket;
zval
will be stored in the Bucket
structure. Note that it does not use a pointer to zval
, but the structure itself. This is done because in PHP 7, zval
's are no longer placed on the heap (unlike PHP 5), but the target value stored in zval
in the form of a pointer (for example, a PHP string) can be placed in PHP 7.arData
.foreach
instruction to the array, you will receive the data exactly in the order in which they were placed in the array, regardless of their keys: $a = [9=>"foo", 2 => 42, []]; var_dump($a); array(3) { [9]=> string(3) "foo" [2]=> int(42) [10]=> array(0) { } }
zval
'ah, they are stored packed in a Bucket
' s, which are located in the fields of the arData
C-array. Like that: $a = [3=> 'foo', 8 => 'bar', 'baz' => []];
arData
array. In fact, this is a fast sequential memory scan, consuming very few processor cache resources. All data from arData
can be placed in one line, and access to each cell takes about 1 nanosecond. Note: in order to increase the efficiency of using the processor cache, arData
aligned in accordance with the 64-bit model (the optimizing alignment of 64-bit full instructions is also used). The hash table iteration code might look like this: size_t i; Bucket p; zval val; for (i=0; i < ht->nTableSize; i++) { p = ht->arData[i]; val = p.val; /* - val */ }
arData
. To perform this procedure, it is enough just to remember the next available cell in this array, which is stored in the nNumUsed
field. Each time we add a new value, we put it ht->nNumUsed++
. When the number of elements in nNumUsed
reaches the number of elements in the hash table ( nNumOfElements
), we run the compact or resize algorithm, which we will discuss below. idx = ht->nNumUsed++; /* */ ht->nNumOfElements++; /* */ /* ... */ p = ht->arData + idx; /* bucket arData */ p->key = key; /* , */ /* ... */ p->h = h = ZSTR_H(key); /* bucket */ ZVAL_COPY_VALUE(&p->val, pData); /* bucket': */
arData
array arData
not decrease or regroup. Otherwise, performance would be disastrous, since we would have to move data in memory. Therefore, when deleting data from a hash table, the corresponding cell in arData
simply marked in a special way: zval
UNDEF. size_t i; Bucket p; zval val; for (i=0; i < ht->nTableSize; i++) { p = ht->arData[i]; val = p.val; if (Z_TYPE(val) == IS_UNDEF) { continue; } /* - val */ }
arData
array needs to be reorganized so that the “empty” cells disappear (compaction).arData
. The key will be compressed, even if it is integer. This is necessary in order to fit into the array boundaries.arData
. After all, this will mean that the keys used for the array index directly correspond to the keys obtained from the hash, which violates one of the properties of the hash tables in PHP: preserving the order of the elements.arData[5]
, and the data bar is in arData[3]
, it turns out that the data bar go to the data foo. And when iterating arData
elements will not be transferred in the order in which they were added.arData
boundaries. But we do not use it as it is, unlike PHP 5. You must first convert the key using the translation table. It simply matches one integer value resulting from hashing / compression, and another integer value used for addressing within the arData
array.arData
vector. This does not allow the table to use another area of memory, because it is already located next to arData
and, therefore, remains in the same address space. This helps improve data locality. This is what the described scheme looks like in the case of an 8-element hash table (the minimum possible size):nTableMask
). The resulting value is an index that can be used to access the arData
transformation arData
(and not straight cells!).arData
. Two memory areas were merged, so we can store data in memory sequentially. nTableMask
corresponds to a negative value of the table size, so, taking it as a module, we get a value from 0 to –7. Now you can access the memory. When placing the whole arData
buffer in it, we calculate its size using the formula:* bucket' + * (uint32) .
#define HT_HASH_SIZE(nTableMask) (((size_t)(uint32_t)-(int32_t)(nTableMask)) * sizeof(uint32_t)) #define HT_DATA_SIZE(nTableSize) ((size_t)(nTableSize) * sizeof(Bucket)) #define HT_SIZE_EX(nTableSize, nTableMask) (HT_DATA_SIZE((nTableSize)) + HT_HASH_SIZE((nTableMask))) #define HT_SIZE(ht) HT_SIZE_EX((ht)->nTableSize, (ht)->nTableMask) Bucket *arData; arData = emalloc(HT_SIZE(ht)); /* */
(((size_t)(((ht)->nTableSize)) * sizeof(Bucket)) + (((size_t)(uint32_t)-(int32_t)(((ht)->nTableMask))) * sizeof(uint32_t)))
arData
and compare hashes and keys, checking whether this is necessary. If the data is incorrect, we go through the linked list using the field zval.u2.next
, which reflects the next cell for entering data.arData
. And this is one of the main reasons for increasing the performance of hash tables in PHP 7, as well as the entire language. idx = ht->nNumUsed++; /* */ ht->nNumOfElements++; /* */ /* ... */ p = ht->arData + idx; /* arData bucket */ p->key = key; /* */ /* ... */ p->h = h = ZSTR_H(key); /* bucket */ ZVAL_COPY_VALUE(&p->val, pData); /* bucket': */ nIndex = h | ht->nTableMask; /* */ Z_NEXT(p->val) = HT_HASH(ht, nIndex); /* */ HT_HASH(ht, nIndex) = HT_IDX_TO_HASH(idx); /* */
h = zend_string_hash_val(key); /* */ nIndex = h | ht->nTableMask; /* */ idx = HT_HASH(ht, nIndex); /* , */ while (idx != HT_INVALID_IDX) { /* */ p = HT_HASH_TO_BUCKET(ht, idx); /* bucket */ if ((p->key == key) || /* ? ? */ (p->h == h && /* ... */ p->key && /* ( ) */ ZSTR_LEN(p->key) == ZSTR_LEN(key) && /* */ memcmp(ZSTR_VAL(p->key), ZSTR_VAL(key), ZSTR_LEN(key)) == 0)) { /* ? */ _zend_hash_del_el_ex(ht, idx, p, prev); /* ! */ return SUCCESS; } prev = p; idx = Z_NEXT(p->val); /* */ } return FAILURE;
HT_INVALID_IDX
is a special flag that we put in the conversion table. It means "this transformation leads nowhere, no need to continue."arData
and two transformation cells, in which we place the HT_INVALID_IDX
flag. Then we apply a mask that directs the transformation to the first cell ( HT_INVALID_IDX
, there is no data here). #define HT_MIN_MASK ((uint32_t) -2) #define HT_HASH_SIZE(nTableMask) (((size_t)(uint32_t)-(int32_t)(nTableMask)) * sizeof(uint32_t)) #define HT_SET_DATA_ADDR(ht, ptr) do { (ht)->arData = (Bucket*)(((char*)(ptr)) + HT_HASH_SIZE((ht)->nTableMask)); } while (0) static const uint32_t uninitialized_bucket[-HT_MIN_MASK] = {HT_INVALID_IDX, HT_INVALID_IDX}; /* hash lazy init */ ZEND_API void ZEND_FASTCALL _zend_hash_init(HashTable *ht, uint32_t nSize, dtor_func_t pDestructor, zend_bool persistent ZEND_FILE_LINE_DC) { /* ... */ ht->nTableSize = zend_hash_check_size(nSize); ht->nTableMask = HT_MIN_MASK; HT_SET_DATA_ADDR(ht, &uninitialized_bucket); ht->nNumUsed = 0; ht->nNumOfElements = 0; }
uninitialized_bucket
). (ht)->nTableMask = -(ht)->nTableSize; HT_SET_DATA_ADDR(ht, pemalloc(HT_SIZE(ht), (ht)->u.flags & HASH_FLAG_PERSISTENT)); memset(&HT_HASH(ht, (ht)->nTableMask), HT_INVALID_IDX, HT_HASH_SIZE((ht)->nTableMask))
HT_HASH
macro allows HT_HASH
to access the transformation cells in the part of the memory-allocated buffer for which a negative offset is used. The table mask is always negative because the cells of the translation table are indexed to the minus from the beginning of the arData
buffer. Here C programming is revealed in all its glory: billions of cells are available to you, swim in this infinity, just don't drown.arBucket
C-array in memory, and put special UNDEF values in the empty cells. As a result, we cheerfully lose memory. Losses are calculated by the formula:( – ) * Bucket
arData
, rather than inserting spaces into the resulting spaces. Therefore, a situation may arise when we reach the end of arData
, although UNDEF cells are still present in it.arData[0]
to arData[7]
.arData
vector and eventually fill the empty cells by simply redistributing the data. When a table is given a command to resize, it first of all tries to compact itself. It then calculates whether it will have to increase again after compaction. And if it turns out that yes, then the table is doubled. After that, the vector arData
begins to occupy twice as much memory (realloc())
. If it is not necessary to increase, then the data is simply redistributed into cells already allocated in memory. It uses an algorithm that we cannot use every time we remove items, because it spends too many CPU resources, and the exhaust is not so great. You remember the famous programmer compromise between the processor and memory?arData
and fills each UNDEF cell with data from the next non-UUNDEF cell. Simplified it looks like this: Bucket *p; uint32_t nIndex, i; HT_HASH_RESET(ht); i = 0; p = ht->arData; do { if (UNEXPECTED(Z_TYPE(p->val) == IS_UNDEF)) { uint32_t j = i; Bucket *q = p; while (++i < ht->nNumUsed) { p++; if (EXPECTED(Z_TYPE_INFO(p->val) != IS_UNDEF)) { ZVAL_COPY_VALUE(&q->val, &p->val); q->h = p->h; nIndex = q->h | ht->nTableMask; q->key = p->key; Z_NEXT(q->val) = HT_HASH(ht, nIndex); HT_HASH(ht, nIndex) = HT_IDX_TO_HASH(j); if (UNEXPECTED(ht->nInternalPointer == i)) { ht->nInternalPointer = j; } q++; j++; } } ht->nNumUsed = j; break; } nIndex = p->h | ht->nTableMask; Z_NEXT(p->val) = HT_HASH(ht, nIndex); HT_HASH(ht, nIndex) = HT_IDX_TO_HASH(i); p++; } while (++i < ht->nNumUsed);
zend_string
, and the integer keys are immediately used as a hash. Therefore, you can meet zend_hash_add(ht, zend_string, data)
or zend_hash_index_add(ht, long, data)
.zend_hash_str_add(ht, char *, int, data)
. Remember that, be that as it may, the hash table will turn to zend_string
, turn your string key into it and calculate its hash, spending some amount of processor resources. If you can use zend_string
, use. Surely they have already calculated their hashes, so the API will simply take them. For example, the PHP compiler computes the hash of each part of the string that is used, like zend_string
. OPCache stores a similar hash in shared memory. As the author of extensions, I recommend to initialize all your zend_string
literals in zend_string
.zval
present in each Bucket
. In PHP 7, zvals can store any kind of data. In general, the hash table API expects you to package the data in zval
, which the API perceives as a value. zend_hash_str_add_mem(hashtable *, char *, size_t, void *) zend_hash_index_del(hashtable *, zend_ulong) zend_hash_update_ptr(hashtable *, zend_string *, void *) zend_hash_index_add_empty_element(hashtable *, zend_ulong)
zend_hash_index_find(hashtable *, zend_string *) : zval * zend_hash_find_ptr(hashtable *, zend_string *) : void * zend_hash_index_find(hashtable *, zend_ulong) : zval *
zend_hash_add_new()
, it is better not to use it. It uses the engine for internal needs. This API causes the hash table to store data, even if it is already available in the hash (the same key). As a result, you will have duplicates, which may not have the best effect on your work. So you can use this API only if you are completely sure that there is no data in the hash that you are going to add. This will avoid having to search for them.zend_symtable_api()
: static zend_always_inline zval *zend_symtable_update(HashTable *ht, zend_string *key, zval *pData) { zend_ulong idx; if (ZEND_HANDLE_NUMERIC(key, idx)) { /* handle numeric key */ return zend_hash_index_update(ht, idx, pData); } else { return zend_hash_update(ht, key, pData); } }
ZEND_HASH_FOREACH
: #define ZEND_HASH_FOREACH(_ht, indirect) do { \ Bucket *_p = (_ht)->arData; \ Bucket *_end = _p + (_ht)->nNumUsed; \ for (; _p != _end; _p++) { \ zval *_z = &_p->val; \ if (indirect && Z_TYPE_P(_z) == IS_INDIRECT) { \ _z = Z_INDIRECT_P(_z); \ } \ if (UNEXPECTED(Z_TYPE_P(_z) == IS_UNDEF)) continue; #define ZEND_HASH_FOREACH_END() \ } \ } while (0)
arData
, , arData
. -: - arData
. - , .arData
, . , .arData[0]
. , 2 * uint32 = 8 . . , , . ZEND_API void ZEND_FASTCALL zend_hash_packed_to_hash(HashTable *ht) { void *new_data, *old_data = HT_GET_DATA_ADDR(ht); Bucket *old_buckets = ht->arData; ht->u.flags &= ~HASH_FLAG_PACKED; new_data = pemalloc(HT_SIZE_EX(ht->nTableSize, -ht->nTableSize), (ht)->u.flags & HASH_FLAG_PERSISTENT); ht->nTableMask = -ht->nTableSize; HT_SET_DATA_ADDR(ht, new_data); memcpy(ht->arData, old_buckets, sizeof(Bucket) * ht->nNumUsed); pefree(old_data, (ht)->u.flags & HASH_FLAG_PERSISTENT); zend_hash_rehash(ht); }
u.flags
, , -. , -, . , . For example: static zend_always_inline zval *_zend_hash_index_add_or_update_i(HashTable *ht, zend_ulong h, zval *pData, uint32_t flag ZEND_FILE_LINE_DC) { uint32_t nIndex; uint32_t idx; Bucket *p; /* ... */ if (UNEXPECTED(!(ht->u.flags & HASH_FLAG_INITIALIZED))) { CHECK_INIT(ht, h < ht->nTableSize); if (h < ht->nTableSize) { p = ht->arData + h; goto add_to_packed; } goto add_to_hash; } else if (ht->u.flags & HASH_FLAG_PACKED) { /* ... */ } else if (EXPECTED(h < ht->nTableSize)) { p = ht->arData + h; } else if ((h >> 1) < ht->nTableSize && (ht->nTableSize >> 1) < ht->nNumOfElements) { zend_hash_packed_grow(ht); p = ht->arData + h; } else { goto convert_to_hash; } /* ... */
(_ - 2) * (uint32)
. , . void ZEND_FASTCALL zend_hash_real_init(HashTable *ht, zend_bool packed)
zend_hash_real_init()
— , «» ( zend_hash_init()
). , «», - . , . function m() { printf("%d\n", memory_get_usage()); } $a = range(1,20000); /* range() */ m(); for($i=0; $i<5000; $i++) { /* , * : * «» */ $a[] = $i; } m(); /* * - «», * */ $a['foo'] = 'bar'; m();
1406744 1406776 1533752
function m() { printf("%d\n", memory_get_usage()); } /* - . * 32 768 (2^15). */ for ($i=0; $i<32768; $i++) { $a[$i] = $i; } m(); /* */ for ($i=0; $i<32768; $i++) { unset($a[$i]); } m(); /* . - , * */ $a[] = 42; m();
1406864 1406896 1533872
unset()
arData
32 768 , UNDEF-zval'.nNumUsed
, arData
? , , . /* , , : */ /* (idx). * , */ $a[3] = 42; m();
1406864 1406896 1406896
Bucket
, zval
, long
( 42), . zval long. :) , 32 768 , , , . , , . ., , UNDEF-zval' «» .idx 0
), — UNDEF-zval. function m() { printf("%d\n", memory_get_usage()); } /* - . * 32 768 (2^15). */ for ($i=0; $i<32768; $i++) { /* , */ $a[' ' . $i] = $i; } m(); /* */ for ($i=0; $i<32768; $i++) { unset($a[' ' . $i]); } m(); /* . * , */ $a[] = 42; m();
2582480 1533936 1533936
unset()
, . 32 768 zend_string
, 1,5 . $a = ['foo', 1, 'bar'];
$a
— AST-. , , . OPCache AST-, ( ) , . OPCache . $ar = []; for ($i = 0; $i < 1000000; ++$i) { $ar[] = [1, 2, 3, 4, 5, 6, 7, 8]; }
$ar
. 8- . OPCache, 8- IS_IMMUTABLE $ar
, .$ar[42][3] = 'foo';
, 8- $ar[42]
. $a = [1]; $b = [1];
Source: https://habr.com/ru/post/308240/
All Articles