📜 ⬆️ ⬇️

Just about lists, dictionaries and sets or TOP 5 data structures



Hey. Her! Do not say “Damn it! I know the difference between the list and the vector; I don’t need this article. ” Please look under the cut and refresh your knowledge. I hope, however, that you can learn a lot more from this article, and some may finally figure out why there are so many types of data for collections of objects.

Introduction


It just so happens that in the programming of the collection there are many, there are not very many different entities - lists, arrays, vectors, sets, stacks, queues, associative arrays, and most of these data structures have several more subspecies.

There must be reasons why there are so many different variations for a simple representation of any set of objects.
')
Should there be a difference between a list and an array? Between an associative array and a hash table?

Collection


For starters, the most boring (yes, I love that). What is a collection in general?

A collection is a data structure (type, class, or even better, an interface) that is designed to contain a number of objects (depending on the language and terminology, they must be of the same type or may be of different types).

Different types of collections can be static or dynamic, i.e. change their size or remain constant, can be ordered (more precisely taking into account the order of the elements) and unordered (respectively not taking into account).

Several standard operations are provided above the collections (now we’ll talk about mutable, i.e. changeable collections), such as: getting the size, adding an element, deleting an element, searching (there is any element in the collection or not), there are a lot of them.

Okay, I did my secret duty, now let's go!

1 Vector (Vector, Array)



And what were you waiting for?

A vector (also known as a one-dimensional array) is an ordered set of elements with random access by numeric index. What is important to us in this definition? Never mind. Just kidding, in fact, almost every word is important to us:

Elements are accessed by a numeric index (usually starting with the 0th index, although there are exceptions), usually accessing an element of the collection by index is written as myFavoriteCats [i] or blackKitties [5]. Moreover, to denote this very number - the index, use the letter i.

And when one letter is not enough, they drag j and k here.

So, further we understand that access is arbitrary - it means we can refer to the elements under the indices 0, 42, 2014 and generally expect that the operation will be O (1) complexity, i.e. constant, and no matter which element we request, it will immediately return to us at the speed of light.

Next - the vector is an ordered collection, which is actually understandable - we have such concepts as the first, last element, for each specific element we can also name the previous and the next.

Release


Typically, a vector (as a low-level structure) will be a descriptor containing various information, inseparable from the structure itself (the only thing that is best to keep there is a vector size) and a pointer to the first element.

Such an implementation will allow for constant time to get access to an arbitrary element of the vector by its index, as well as allow copying, concatenation and other simple operations at a low level.

And indeed, it is very easy to get access to a certain element - we add an index to the pointer to the first item (with some corrections for the size of the data type) and we get a pointer to the desired item! It remains to dereference and we have the necessary kitty in the variable!

Okay, a vector is a cool structure, but it also has flaws (and who doesn’t have them?), For example, you can't just take a new element and add it to the vector! Especially cram it in the middle. It is also impossible to say that we have cats with numbers 0, 1 and 4, but not with numbers 2 and 3 (they used to be, but it turned out that they were dogs).

One can imagine a vector as a bookshelf with compartments, in each of which exactly one book fits. To shove a new novel by Dontsova between the 10th and 11th volume of the Great Sovetskaya Encyclopedia, you need to try hard and shift all volumes from the 11th to the 65th volume (you can cheat and put the 11th volume in the end, but I don’t he said, and in that case we will lose orderliness).


In my memory, everything is exactly that

Application


In our case, the vector would be perfect for the top 10 cutest kittens, because There is no need to add and delete elements (only change), there should be no gaps between the 1st and the 5th place, and it’s convenient to call the number.

Okay. In any case, the vector is cool, we just see what other collections are there.

2 List



First volume

Wow! Task list for today, shopping list in the store. Wedding guest list ... So. Get to the point.

We already know that the elements of the vector lie neatly one after another, beautifully and evenly. This gives us both advantages and disadvantages.

The list in this respect is the completely opposite thing - its elements can be scattered in memory as you please! Because of this, we lose the ability to quickly get an item on the index, and we can not quickly copy the entire list, but we get a pretty nice thing - we can insert items in constant time to any place! According to rumors, elements from the list are also deleted for O (1).

Implementation


Hm And what about the formal definition?

A list is an ordered set of elements, for each of which a pointer to the next one is stored (or for a doubly linked list and the next and previous one) list items.

For the last item in the list, we store a null pointer (in the diagrams I’ll use a null cat pointer, don't be intimidated).

Attention! In a canon list implementation, in order to get the size of the list, it is necessary to bypass the entire list — reaching the null pointer (linear time is O (n) complexity) and although in some implementations the size is cached in the list descriptor (or in the first element), not worth relying on it.


If I could, I would place one list item at the north pole, and the other somewhere in the area of ​​Betelgeuse

Application


The list would be suitable for (attention!) List of homeless kittens, sorted by age (ascending). We just need to add and remove items from the list often (you don’t think anything like that - kittens are taken away), and the first elements of the list are more often needed - I would take myself a little fluffy kitten, rather than an 8-year-old mana.

Okay. Lists are kind of a simple structure. What else is there?

3 Set



This is Seth

A similar concept is in mathematics, and more precisely in set theory. The set is different from both the vector and the list, although their implementation may be similar.

A set is an unordered set of elements, without repetitions. Wow And all? No you random access, nothing! Why is this necessary?

As we know in the vector, you can quickly get an element by index, in the list you can quickly add or remove an element, and what about the set?

In the set, you can quickly check whether there is any element inside, or it is not. Let's say if I wanted to find out if a particular cat is on my favorite list, then for the list and for the vector I would have to go through (in the worst case) all the elements!

Implementation


In the set, because it is out of order, you can sort the elements when adding and in which case arrange a binary search. Hm This is a paradox, a collection is unordered, but inside everything will be in order. Here it is important to understand that if you add a new element to the set, it is not a fact that it will go to the end.

In fact, when working with a set, one cannot rely on any order of elements at all, it can be any - that is why it is a set and an unordered collection.

It should be noted that the set can be implemented in many different ways, for example, you can use hashing to search for items even faster, so I’ll not consider the implementation in detail. Let me just say that you can cheat and use our knowledge on the lists.

In general, there are also ordered sets, sets with repetitions (multiset), and probably there should be an ordered multiset.


Set theory is easier if you take a lot of kittens

Application


A lot of ideal for a list of your favorite kittens, because they are many. Ha! Just kidding

But it really does, because such a collection does not need to be sorted (orderliness is not important) and we can easily check if any particular cat is in this set (let's say I have 100 kittens and loved ones I feed with shrimps).

Okay. The sets are good too, but is there really something else?

4 Vocabulary (Associative Array, Map, Dictionary)



Admit it is better than just a dictionary

A dictionary (it is an associative array) is the same vector, but with slight differences. Not only numbers, but also any other data types (even other collections!) Can act as an index (which will be called in the dictionary). Gaps are also allowed if we still use an integer as a key, for example, we may have an element associated with key 5, but there is no element associated with key 4.

What does all this mean in practice? The only thing is that in square brackets for the image to the element on the “index” we can indicate an arbitrary type, for example allMyCats [“Murka”].

Implementation


Naked it is clear that you can simply start an array (or list) of pairs (Key, Value) and add a special function that will run through this list and return a specific value by the key associated with it.

We also cannot say which pair is the first, which is the last, and that it used to be “Murka” or “Borka”, therefore the dictionary is considered an unordered structure.

Again, only one value can be associated with each key, so for the given example the dictionary doesn’t fit the dictionary in its purest form.

The implementation, as is the case with the set, can be completely different, you can arrange pairs by key and use binary search to get the element (in this case, the elements must be ordered). Again, you can implement a dictionary using key hashing, which is quite often used with strings.

Application


The most plausible and competent way is to use a dictionary along with a list, where the dictionary key is a string — the name of the cat, and the value — a list of cats with that name. This will allow you to quickly find all the cats named Murka and select the one that is currently needed.


It looks like this in memory std :: map <std :: color, std :: list <std :: cat >>

And I have news for you - the types of collections are over. Well, that's it. Generally no more. Totally.

5 Stack



Another cat will be Stack Overflow

Ha! I deceived you (I am joking) There are also a couple of data structures that represent collections.

So a stack is a collection with unusual access, more precisely with unusual rules regarding how elements can be added and removed.

It's simple - the added item, called the “last”, is the first to leave the stack.

The stack is very necessary and useful in programming. For example, using the stack, a nested procedure call is performed — the return address and the arguments of the function being called are saved to the stack.

Implementation


In the high-level implementation, there is nothing particularly interesting - a pointer to the list and elements are added to the beginning of this list, and removed from it.

In the low-level implementation (more precisely, how it is implemented in modern architectures) there are interesting moments.

The stack there is a small reserved area of ​​memory and two pointers are stored together with it - at the beginning of the stack (where the first added element is) and at the end of the stack - where the last added one lies.

If you put too much data on the stack, the program will end with a familiar error — Stack Overflow, which means that the pointer to the end of the stack has exceeded the upper allowable limit.

The reverse situation can also happen (Stack Underflow), if you try to pick up more from the stack than it does, but it does not occur in high-level languages ​​(understand why - we are not allowed to directly work with the stack).

If anyone is interested in how it all works - studying assembler for some popular architecture, like i386, can help you.

Application


One could describe in this place a stack of poor kittens as high as a mountain, but in fact in high-level languages ​​the stack is rarely needed, often there is enough recursion, which uses the stack implicitly. I didn’t use a contrived example (and couldn’t think of a normal one, I'm sorry), so let's go on to the next item.

miscellanea


In fact, there are a lot of collections, such as a queue, a two-way queue (decks), a two-linked list, a ring set, and priority queues.

There are trees (yes their whole forest!) And graphs.

There are probabilistic data structures, such as a probabilistic set and a list with gaps.

I really want to write about all this, but time and space in the habr is not always enough.

However, there are many (or vector) things related to the topic, which I would like to mention at least in passing, let the curious reader ask me to go read a smart book.

Strings


First of all, how strings are implemented in some languages ​​may seem strange. The simplest and most effective solution is probably the solution. C - a string is a character set, with a null character at the end, which allows you to do without a descriptor.

In C ++, std :: string is already more like a vector.

Well, in the old Pascal descriptor (most precisely, only the length) is stored in the zero element of the array.

In Haskell, a String is a list of characters ([Char]), which means that obtaining the length of a string has complexity O (n). But they are very convenient to run around recursively.

In general, a string is an ordered set of characters and no more. Which type of collection will be used is not important (well, I would not advise using a lot, ha!).

Queue


The queue is very similar to the stack and at the same time is its opposite - the first we get back is not the element that we added last, but the one that “stands in line” the longest. The queue is a very convenient structure, but despite the fact that the principle of its work is similar to the stack, there is a slight difference in the effective implementation.

For the stack, we could cheat and allocate an acceptable size of memory, in which case expanding it, because the stack either decreases or increases, because elements and are added and removed “from one end”. If we present the work of the queue, it will “crawl in memory” - the beginning will constantly move upwards, so the trick that is applicable to the stack will work worse and there will be much better to use a doubly linked list (and do not forget to store pointers to the first and the last items).

You can also try to implement the queue on two stacks, but this is also less effective.

There is also a deck (two-way queue - deque). It is possible to add elements both at the end and at the beginning. And take them, too, and from the end and from the beginning.

Conclusion



Wow I'm starting to repeat

I did not mention at all about the combination of various collections, thanks to which matrices and tables are formed. Also, I did not touch the trees, the ring set, wrote almost nothing about the queue, very little information on hashing (I did get away with a couple of words from this topic) and other optimization methods.

However, I think the article will fulfill its role - it will simply and clearly state the basics of data structures for readers of varying degrees of preparedness. And I will be happy to continue and highlight many (or all, ha!) Other topics in the same vein.

Thanks to those who were able to read this up to these lines (how did they endure it?).

Bye and good luck!

Source: https://habr.com/ru/post/232009/


All Articles