📜 ⬆️ ⬇️

Should Python strings be iterable?

And Guido created the lines in the image of C, in the image of the arrays of characters created them. And I saw Guido that it was good. Or not?

Imagine that you are writing a completely idiomatic code bypassing certain data with nesting. Beautiful is better than ugly, simple is better than complex, so you stop at the following code:

from collections.abc import Iterable def traverse(list_or_value, callback): if isinstance(list_or_value, Iterable): for item in list_or_value: traverse(item, callback) else: callback(list_or_value) 

You write a unit test, and what would you think? It does not work, and not just does not work, but
')
 >>> traverse({"status": "ok"}, print) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 4, in traverse File "<stdin>", line 4, in traverse File "<stdin>", line 4, in traverse [Previous line repeated 989 more times] File "<stdin>", line 2, in traverse File "/usr/local/opt/python/libexec/bin/../../Frameworks/Python.framework/Versions/3.7/lib/python3.7/abc.py", line 139, in __instancecheck__ return _abc_instancecheck(cls, instance) RecursionError: maximum recursion depth exceeded in comparison 

How? Why? In search of an answer, you immerse yourself in the wonderful world of collections of infinite depth.

Indeed, a string is the only built-in Iterable that always returns an Iterable as an element! We can, of course, construct another example by creating a list and adding it to ourselves Razik Two, but how often do you find this in your code? And the line is Iterable infinite depth, which, under the cover of night, has penetrated right into your production.

Another example. Somewhere in the code you needed to repeatedly check for the presence of elements in containers. You decide to write a helper, which accelerates it in many ways. You write a universal solution that uses only the __contains__ method (the only method in the Abstract Base Container class), but then decide to add super-optimization for a special case — the collection. After all, you can just go over it and make a set !

 import functools from typing import Collection, Container def faster_container(c: Container) -> Container: if isinstance(c, Collection): return set(c) return CachedContainer(c) class CachedContainer(object): def __init__(self, c: Container): self._contains = functools.lru_cache()(c.__contains__) def __contains__(self, stuff): return self._contains(stuff) 

Iii ... your decision doesn't work! Here you go! Again!

 >>> c = faster_container(othello_text) >>> "Have you pray'd to-night, Desdemona?" in c False 

(But the wrong answer was issued really quickly ...)

Why? Because a string in Python is an amazing collection in which the semantics of the __contains__ method __contains__ not consistent with the semantics of __iter__ and __len__ .

In fact, the string is a collection:

 >>> from collections.abc import Collection >>> issubclass(str, Collection) True 

But a collection of ... what? __iter__ and __len__ believe that this is a collection of characters:

 >>> s = "foo" >>> len(s) 3 >>> list(s) ['f', 'o', 'o'] 

But __contains__ thinks it's a collection of substrings!

 >>> "oo" in s True >>> "oo" in list(s) False 

What can be done?


Although the behavior of str.__contains__ may seem strange in the context of __contains__ implementations of other standard types, this behavior is one of many little things that make Python so convenient as a scripting language; allowing to write on it fast and literary code. I would not suggest changing the behavior of this method, especially since we almost never use it to check for the presence of a single character in a string.

And by the way, do you know why? Because we almost never use a string as a collection of characters in a scripting language! Manipulations with specific characters in the string, access by index - most often the lot of tasks for interviews. So, maybe you should remove __iter__ from the line, hide it behind some method like .chars() ? This would solve both of the indicated problems.

Time for Friday's discussion in the comments!

Source: https://habr.com/ru/post/451252/


All Articles