SMAS: “Sorted Multi Array Struct” —an alternative to a binary search tree

Introduction

Hello, habravchane. This is my first post in which I want to share the data structure of SMAS. At the end of the article, a C ++ class of the proposed data structure will be provided. The implementation is recursive, but my goal is to get the message across. Implementing a non-recursive version is the second thing. It is important to “hear” opinions.

What is wrong with the tree?

Yes, everything and you can ~~complete the article, thank you all.~~ It would spit on a significant memory overhead on auxiliary data in the form of references to left-right subtrees and a flag for balancing (different depending on the technique used - red-black, AVL, etc. trees). Another, not so much a minus, is a constant modification of the tree when it is inserted for balancing (the complexity and complexity of the methods for beginners is especially important here). At this, the minuses end and it's hard to imagine something better and more versatile (hash tables, IMHO, even cooler, but OX TORGES OF THESE COLLISIONS).

And why not sorted arrays?

Binary search in a sorted array is implemented with the same logarithmic complexity as in a binary search tree, but there is no need to link to smaller or larger subtrees, like in a binary search tree.

But what is wrong with sorted arrays?
')
Yes, everything is so, you can ~~finish the article and use sorted arrays instead of binary search trees to~~ enjoy the saved resources, if only we do not need to add-delete elements from such an array, because if we add new elements (delete old ones), the array will say: " ~~Sort~~ me completely ". But the trouble with sorting is that it is slow if you need to insert a new element somewhere between the others, especially at the beginning.

But there is an ideal case - when a new element is more than the others and it is enough to simply add it to the end. This is ideal - just one operation, but the probabilistic assessment of such an ideal case forces us to use binary trees or hash tables, or a huge number of other really beautiful and clever mechanisms. Based on probability theory, such a situation is rare - only if all elements are added to the array in ascending order (irony-analogy with a binary search tree, when sorted input data is the worst case of a tree degenerating into a linked list with linear complexity).

A special case, as the pattern?

But let's consider an array of just one element. It doesn't matter what value is entered - it always fits into the beginning of the empty to this array and the one-element array, naturally, is always sorted. And if we want to enter the 2nd element in our data structure? Let's just create another one-element array and get two sorted one-element arrays. Getting a new sorted array of two sorted ones is a simple task, although not very fast ... But in this case, we already need to sort the array not with every insertion, but with every second insertion. We use “deferred sorting”.

And if we want to make more than 2 elements? ~~Want-not harmful~~ - we will write a new 2-element array somewhere and continue to add elements to one-element arrays and also on the 4th element we will have 2 2 element sorted arrays, which we will be able to combine in the same way in one 4-element and so on, until the ~~blue screen of death pops up and the processor explodes with a black hole that pulls in all of the surrounding space-time~~ (something has blown me ...) until memory allows it.

Load distribution

It is clear that this algorithm already significantly reduces the number of sorts, relative to the sorting of a regular array, because the first level of arrays must be sorted once, the 2nd level - every 4th insert, the 3rd level - every 8th insert - actually logarithmic complexity.

We could calm down on this, but ~~for weaklings~~ we will try to distribute the load with each insert evenly, adding our “pending sorting” step by step and performing each step when inserting the next element for all levels - this will evenly distribute the load on all inserts - which is required .

Search

The search is a classic binary search on a sorted array with the only difference that there is a lot of arrays here and you need to search in turn for all. Therefore, the logarithmic complexity of the search is disputed (if I am not mistaken, this is double the logarithmic complexity). But in the task, this is quite acceptable, given the memory saved.

Implementation

While the kettle was boiling, I remembered that I wanted to write more, but then the kettle began to boil and his whistle made me forget that I wanted to write more ...

The implementation was quite confusing, because in my head there was only a superficial idea of how this whole car would go - the code was written almost at random, ~~so I’ll go to sleep with grief~~ , while the program structure of the class algorithm contains only 3 methods ( considering the constructor and destructor). The code does not contain anything other than working with arrays, cycles, and OOP ideologies — everything is simple and, I hope, understandable, to the ~~genius of~~ madness.

Actually code

#define _CRT_SECURE_NO_WARNINGS #include <stdio.h> #include <windows.h> #define undefined 2147483648 class ARRAY { private: ARRAY*nextLevel; unsigned int *keyResult, //          - keyLeft, keyRight *keyLeft, //     *keyRight, //     *keyTemp, //     *valResult, //   , *valLeft, //   *valRight, //  *valTemp, //  countResult,countLeft,countRight, //        1  2   0 . len;//  ( 0-  2  ) void upd() //       { if(keyResult) { if(countLeft<len) { if(countRight<len) { if(keyLeft[countLeft]<keyRight[countRight])//   1-   2- -   { keyResult[countResult]=keyLeft[countLeft]; valResult[countResult]=valLeft[countLeft]; countResult++; countLeft++; } else//   2-   1- -      { keyResult[countResult]=keyRight[countRight]; valResult[countResult]=valRight[countRight]; countResult++; countRight++; } } else // 2-        .     { keyResult[countResult]=keyLeft[countLeft]; valResult[countResult]=valLeft[countLeft]; countResult++; countLeft++; } } else { if(countRight<len) // 1-     -  .     { keyResult[countResult]=keyRight[countRight]; valResult[countResult]=valRight[countRight]; countResult++; countRight++; } //           -    } if(countResult==len*2) //      { if(!nextLevel)nextLevel=new ARRAY; // . ,      nextLevel->set(keyResult,valResult,len*2); keyResult=NULL; keyLeft=NULL; keyRight=NULL; keyTemp=NULL; valResult=NULL; valLeft=NULL; valRight=NULL; valTemp=NULL; } } if(nextLevel)nextLevel->upd(); } void set(unsigned int*Key,unsigned int*Val,unsigned int Len) { len=Len; if(!keyTemp){keyTemp=Key;valTemp=Val;} else { keyLeft=Key; valLeft=Val; keyRight=keyTemp; valRight=valTemp; keyTemp=NULL; valTemp=NULL; countResult=0; countLeft=0; countRight=0; keyResult=new unsigned int[len*2]; valResult=new unsigned int[len*2]; } upd(); } public: ~ARRAY() { if(keyResult)delete[]keyResult; if(keyLeft)delete[]keyLeft; if(keyRight)delete[]keyRight; if(keyTemp)delete[]keyTemp; if(valResult)delete[]valResult; if(valLeft)delete[]valLeft; if(valRight)delete[]valRight; if(valTemp)delete[]valTemp; if(nextLevel)delete nextLevel; } ARRAY() { keyResult=NULL; keyLeft=NULL; keyRight=NULL; keyTemp=NULL; valResult=NULL; valLeft=NULL; valRight=NULL; valTemp=NULL; countResult=0; countLeft=0; countRight=0; len=0; nextLevel=NULL; } void set(unsigned int Key,unsigned int Val) { unsigned int*k=new unsigned int[1];k[0]=Key; unsigned int*v=new unsigned int[1];v[0]=Val; set(k,v,1); } unsigned int get(unsigned int Key) { unsigned int l,r,i; if(keyTemp) { l=0;r=len; while(l!=r-1) { i=(l+r)/2; if(keyTemp[i]<Key)l=i; else if(keyTemp[i]>Key)r=i; else return valTemp[i]; } if(keyTemp[l]==Key)return valTemp[l]; } if(keyLeft) { l=0;r=len; while(l!=r-1) { i=(l+r)/2; if(keyLeft[i]<Key)l=i; else if(keyLeft[i]>Key)r=i; else return valLeft[i]; } if(keyLeft[l]==Key)return valLeft[l]; } if(keyRight) { l=0;r=len; while(l!=r-1) { i=(l+r)/2; if(keyRight[i]<Key)l=i; else if(keyRight[i]>Key)r=i; else return valRight[i]; } if(keyRight[l]==Key)return valRight[l]; } if(nextLevel)return nextLevel->get(Key); else return undefined; } }; void main() { ARRAY*arr=new ARRAY; char inp[256]; unsigned int Key,Val; while(gets(inp)&&inp[0]) { if(inp[0]=='+'){Key=atoi(&inp[1]);gets(inp);Val=atoi(inp);arr->set(Key,Val);} else if(inp[0]=='='){Key=atoi(&inp[1]);Val=arr->get(Key);if(Val!=undefined)printf("%d\n",Val);else printf("undefined\n");} else system("cls"); } delete arr; }

PS:
The algorithm does not implement the removal, although there are some ~~erotic~~ fantasies. But this is a separate article. Here I tried ~~to be clever in~~ communicating the very idea, and if someone has their own thoughts on the implementation of the deletion, we should say. Although in fact this structure was developed for a specific task that does not require removal or nothing at all except quick insertion and quick search, there was a desire to modify the class by implementing other methods that are often necessary in everyday work, which I will do when the ~~cancer hangs on the mountain~~ - someday

Source: https://habr.com/ru/post/306898/

All Articles