Python: collections, part 1/4: classification, general approaches and methods, conversion

Part 1	Part 2	Part 3	Part 4

A collection in Python is a program object (container variable) that stores a set of values of the same or different types, allowing you to access these values, as well as use special functions and methods depending on the type of collection.

A frequent problem when studying collections is that after examining each type in some detail, then usually not enough attention is paid to clarifying the picture as a whole, there are no clear similarities and differences between the types, not shown as one and the same task to solve for each of the collections in comparison .

This is exactly the problem I want to try to solve in this series of articles - to consider a number of approaches to working with standard collections in Python in comparison between collections of different types, rather than separately, as is usually shown in the training materials. In addition, I will try to touch on some points that cause difficulties and mistakes for beginners.
')
For whom: for Python learners who already have an initial understanding of the collections and work with them, who want to systematize and deepen their knowledge, put them into a coherent picture.

We will consider the standard built-in collection data types in Python: list (list), tuple (tuple), string (string), sets (set, frozenset), dictionary (dict). Collections from the collections module will not be considered, although much of the article should be applicable when working with them.

1. Classification of collections

Explanation of terminology:

Indexation - each element of the collection has its own sequence number - index. This allows you to access an element by its ordinal index, to carry out slicing (“slicing”) - to take part of the collection by choosing based on their index. These issues will be discussed in detail later in a separate article.

Uniqueness - each element of the collection can appear in it only once. This creates the requirement that the data types used for each element should not change; for example, the list cannot be such an element.

Collection mutability - allows you to add new members to the collection or delete them after creating the collection.

Dictionary Note (dict):

the dictionary itself is modified - you can add / remove new pairs of key: value;
values of dictionary elements are mutable and not unique;
but the keys are not changeable and unique, so, for example, we cannot make a list with a dictionary key, but we can have a tuple. From the uniqueness of the keys, it also follows the uniqueness of the elements of the dictionary - key: value pairs.

UPD: Important note from sakutylev : In order for an object to be a dictionary key, it must be hashed. In a tuple, it is possible that its element is a non-hash object, and accordingly the tuple itself is also not hash and cannot act as a dictionary key.
```
a = (1, [2, 3], 4) print(type(a)) # <type 'tuple'> b = {a: 1} # TypeError: unhashable type: 'list' 
```
UPD: I thank morff for attentiveness - {} create a dictionary without values, and with values, depending on the syntax, they can create both a set and a dictionary:
```
 a = {} print(type(a)) # <class 'dict'> b = {1, 2, 3} print(type(b)) # <class 'set'> c = {'a': 1, 'b': 2} print(type(c)) # <class 'dict'> 
```

2 General approaches to working with any collection

Having understood the classification, consider what you can do with any standard collection regardless of its type (in the examples list and dictionary, but it works for all other standard collections of the types under consideration):

 #      (   ): my_list = ['a', 'b', 'c', 'd', 'e', 'f'] my_dict = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6}

2.1 Printing collection items using the print () function

 print(my_list) # ['a', 'b', 'c', 'd', 'e', 'f'] print(my_dict) # {'a': 1, 'c': 3, 'e': 5, 'f': 6, 'b': 2, 'd': 4} #  ,        .

2.2 Counting the number of members of a collection using the len () function

 print(len(my_list)) # 6 print(len(my_dict)) # 6 -    -   . print(len('ab c')) # 4 -     1

2.3 Verifying the ownership of an item in this collection using the in operator

x in s - returns True if the item is in the s collection and False - if it is not.
There is also the option of checking not the affiliation: x not in s , where in fact there is simply adding a negative before the boolean value of the previous expression.

 my_list = ['a', 'b', 'c', 'd', 'e', 'f'] print('a' in my_list) # True print('q' in my_list) # False print('a' not in my_list) # False print('q' not in my_list) # True

For the dictionary , the options are clear from the code below:

 my_dict = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6} print('a' in my_dict) # True -       print('a' in my_dict.keys()) # True -    print('a' in my_dict.values()) # False -   '' — ,   print(1 in my_dict.values()) # True

Can I check pairs? Can!

 print(('a',1) in my_dict.items()) # True print(('a',2) in my_dict.items()) # False

For a string, you can search for not only one character, but also a substring:

 print('ab' in 'abc') # True

2.4 Traversing all elements of a collection in a for in loop

In this case, the elements of the collection will be sequentially looped through until all of them are enumerated.

 for elm in my_list: print(elm)

Pay attention to the following points:

The order of processing items for non-indexed collections will not be the same as when they were created.

The dictionary loop has its own characteristics:

  for elm in my_dict: #    ,    #  for elm in my_dict.keys() print(elm) for elm in my_dict.values(): #        print(elm)

But most often need a pair of key (key) - value (value).

 for key, value in my_dict.items(): #   .items()   (, ), #     key, value print(key, value)

Possible error : Do not change the number of elements of the collection in the loop body during the iteration on the same collection! - This generates not always obvious at first glance errors.

To avoid this kind of side effects, you can, for example, iterate a copy of the collection:
```
 for elm in list(my_list): #          my_list, #       . 
```

2.5 Functions min (), max (), sum ()

The min (), max () functions — the search for the minimum and maximum elements, respectively — work not only for numeric values, but also for string values.
sum () is the summation of all elements, if they are all numeric.

 print(min(my_list)) # a print(sum(my_dict.values())) # 21

3 General methods for part of collections

A number of methods for collection types are used in more than one collection for solving problems of the same type.

UPD: Important additions in the third article: Adding and deleting elements of variable collections .

Explanation of the methods and examples:

. count () is the method of counting certain elements for non-unique collections (string, list, tuple), returns how many times an element is found in the collection.
```
 my_list = [1, 2, 2, 2, 2, 3] print(my_list.count(2)) # 4    2 print(my_list.count(5)) # 0 -        
```

. index () - returns the minimum index of the passed item for indexed collections (string, list, tuple)

 my_list = [1, 2, 2, 2, 2, 3] print(my_list.index(2)) #    2    1 (  !) print(my_list.index(5)) # ValueError: 5 is not in list -    !

. copy () method returns a shallow (non-recursive) copy of the collection (list, dictionary, both types of set).

 my_set = {1, 2, 3} my_set_2 = my_set.copy() print(my_set_2 == my_set) # True -   -    print(my_set_2 is my_set) # False -    -      id

. clear () is a mutable collection method (list, dictionary, set) that removes all elements from the collection and turns it into an empty collection.
```
 my_set = {1, 2, 3} print(my_set) # {1, 2, 3} my_set.clear() print(my_set) # set() 
```

Special methods for comparing sets (set, frozenset)

set_a. isdisjoint (set_b) - true if set_a and set_b have no common elements.
set_b. issubset (set_a) - if all elements of set_b belong to set_a, then set_b is entirely included in set_a and is its subset (set_b is a subset)
set_a. issuperset (set_b) - accordingly, if the condition above holds true, then set_a is a superset

 set_a = {1, 2, 3} set_b = {2, 1} #    ! set_c = {4} set_d = {1, 2, 3} print(set_a.isdisjoint(set_c)) # True -    print(set_b.issubset(set_a)) # True - set_b    set_a,  set_b -  print(set_a.issuperset(set_b)) # True - set_b    set_a,  set_a -

In case of equality of sets, they are both a subset and a superset for each other.

 print(set_a.issuperset(set_d)) # True print(set_a.issubset(set_d)) # True

4 Converting one type of collection to another

Depending on the tasks at hand, one collection type can be converted to another collection type. For this, as a rule, it is enough to transfer one collection to the function of creating another (they are in the table above).

 my_tuple = ('a', 'b', 'a') my_list = list(my_tuple) my_set = set(my_tuple) #     ! my_frozenset = frozenset(my_tuple) #     ! print(my_list, my_set, my_frozenset) # ['a', 'b', 'a'] {'a', 'b'} frozenset({'a', 'b'})

Please note that when converting one collection to another, data loss is possible:

When converting to a set, duplicate elements are lost, since the set contains only unique elements! Actually, the test for uniqueness is usually the reason to use the set in tasks where we have a need for it.
When converting an indexed collection to non-indexed, information about the order of the elements is lost, and in some cases it can be critical!
After converting to an unmodifiable type, we will no longer be able to change the elements of the collection — delete, modify, add new ones. This can lead to errors in our data processing functions if they were written to work with mutable collections.

Additional details:

In the way above it will not be possible to create a dictionary , since it consists of key: value pairs.

This limitation can be circumvented by creating a dictionary by combining keys with values using zip ():
```
 my_keys = ('a', 'b', 'c') my_values = [1, 2] #     - #       -   my_dict = dict(zip(my_keys, my_values)) print(my_dict) # {'a': 1, 'b': 2} 
```

Create a string from another collection:

 my_tuple = ('a', 'b', 'c') my_str = ''.join(my_tuple) print(my_str) # abc

Possible error : If your collection contains editable elements (for example, a list of lists), then it cannot be converted into an unmodifiable collection, since its elements can only be non-editable!
```
 my_list = [1, [2, 3], 4] my_set = set(my_list) # TypeError: unhashable type: 'list' 
```

Note : The most powerful and flexible ways - collection generators will be considered separately in the fourth part of the cycle , since there are many nuances and use cases that are rarely focused on and detailed analysis is required.

UPD: ShashkovS in the comments posted links to important and useful information on the algorithmic complexity of operations with collections:

TimeComplexity (aka "Big O" or "Big Oh") (in English)
Complexity of Python Operations (in English)

Part 1	Part 2	Part 3	Part 4

I invite you to discuss:

If I somewhere made an inaccuracy or did not take into account something important - write in the comments, important comments will later be added to the article indicating your authorship.
If some points are not clear and clarification is required - write your questions in the comments - or I or other readers will give an answer, and efficient questions with answers will be later added to the article.

Source: https://habr.com/ru/post/319164/

All Articles