📜 ⬆️ ⬇️

How namedtuple or dynamic type creation works

We in Buruki love not only people and numbers . We are also tirelessly improving our basic tool, the Python language. Link for those who want to improve with us. In this translation article, the author examines the namedtuple device and, along the way, talks about one of the main concepts of the language.

A couple of days ago, I was on my way to San Francisco. There was no Internet on the plane, so I read the source code for the standard Python 2.7 library. The implementation of namedtuple seemed to me particularly interesting, probably because in reality everything is much simpler than I thought before.

Here are the sources. If you have never known about namedtuple , then I recommend to familiarize yourself with this feature.

Code


 ################################################################################ ### namedtuple ################################################################################ 

Wow! An impressive title, right?
')
In the beginning, as it should be, the definition of a function, and an example of a good doctoral .

 def namedtuple(typename, field_names, verbose=False, rename=False): """      . >>> Point = namedtuple('Point', 'x y') >>> Point.__doc__ #    'Point(x, y)' >>> p = Point(11, y=22) #        >>> p[0] + p[1] #     33 >>> x, y = p #     >>> x, y (11, 22) >>> px + py #     33 >>> d = p._asdict() #    >>> d['x'] 11 >>> Point(**d) #    Point(x=11, y=22) >>> p._replace(x=100) #     Point(x=100, y=22) """ 

Then disassembly begins with arguments. Note the use of the basestring in the basestring call — so we’ll determine that we’re working with a string if the object type is unicode or str (this definitely works in Python <3.0).

  #     .    : #      #      . if isinstance(field_names, basestring): field_names = field_names.replace(',', ' ').split() #    /  

If the attribute rename , then all wrong field names will be renamed according to their positions.

  field_names = tuple(map(str, field_names)) if rename: names = list(field_names) seen = set() for i, name in enumerate(names): if (not all(c.isalnum() or c=='_' for c in name) or _iskeyword(name) or not name or name[0].isdigit() or name.startswith('_') or name in seen): names[i] = '_%d' % i seen.add(name) field_names = tuple(names) 

Note the generator expression wrapped in all() . Such a record, all(bool_expr(x) for x in things) , is an extremely convenient way to describe the desired result in one expression.

  for name in (typename,) + field_names: if not all(c.isalnum() or c=='_' for c in name): raise ValueError( 'Type names and field names can only contain alphanumeric characters and underscores: %r' % name ) if _iskeyword(name): raise ValueError('Type names and field names cannot be a keyword: %r' % name) if name[0].isdigit(): raise ValueError('Type names and field names cannot start with a number: %r' % name) 

Check for duplicate names:

  seen_names = set() for name in field_names: if name.startswith('_') and not rename: raise ValueError('Field names cannot start with an underscore: %r' % name) if name in seen_names: raise ValueError('Encountered duplicate field name: %r' % name) seen_names.add(name) 

And now the real fun begins. (I’m sure that creating a data type at runtime is fun). Prepare in different ways the names of the fields to be embedded in the code template It is interesting to use the textual representation of the tuple and slice notation to define argtxt .

  #      numfields = len(field_names) argtxt = repr(field_names).replace("'", "")[1:-1] #       reprtxt = ', '.join('%s=%%r' % name for name in field_names) 

And that's what's happening under the hood of namedtuple . This string will later turn into Python code.

  template = '''class %(typename)s(tuple): '%(typename)s(%(argtxt)s)' \n __slots__ = () \n _fields = %(field_names)r \n def __new__(_cls, %(argtxt)s): 'Create new instance of %(typename)s(%(argtxt)s)' return _tuple.__new__(_cls, (%(argtxt)s)) \n @classmethod def _make(cls, iterable, new=tuple.__new__, len=len): 'Make a new %(typename)s object from a sequence or iterable' result = new(cls, iterable) if len(result) != %(numfields)d: raise TypeError('Expected %(numfields)d arguments, got %%d' %% len(result)) return result \n def __repr__(self): 'Return a nicely formatted representation string' return '%(typename)s(%(reprtxt)s)' %% self \n def _asdict(self): 'Return a new OrderedDict which maps field names to their values' return OrderedDict(zip(self._fields, self)) \n __dict__ = property(_asdict) \n def _replace(_self, **kwds): 'Return a new %(typename)s object replacing specified fields with new values' result = _self._make(map(kwds.pop, %(field_names)r, _self)) if kwds: raise ValueError('Got unexpected field names: %%r' %% kwds.keys()) return result \n def __getnewargs__(self): 'Return self as a plain tuple. Used by copy and pickle.' return tuple(self) \n\n ''' % locals() 

Actually, this is the pattern of our new class.

Using locals() for string interpolation seems to me very convenient. Python lacks the simple interpolation of local variables. In Groovy and CoffeeScript, for example, you can write something like "{name} is {some_value}" . But I think that this Python version will completely come off: "{name} is {some_value}".format(**locals()) .

You probably noticed that __slots__ is defined as an empty tuple. Python in this case does not use dictionaries for instances as namespaces, which saves resources a little. Due to the immutability that is inherited from the parent class ( tuple ), and the inability to add new attributes (because __slots__ = () ), instances of namedtuple -types are value objects .

Go ahead. For each name, a read-only property is created. _itemgetter is itemgetter from the operator module, which returns a function of one argument, which is just right for the property.

  for i, name in enumerate(field_names): template += " %s = _property(_itemgetter(%d), doc='Alias for field number %d')\n" % (name, i, i) if verbose: print template 

So, we have a grand line with a pit code. What to do with it? Execution in a limited namespace seems reasonable. See how exec ... in used here exec ... in :

  #       . #     ,   # frame.f_globals['__name__'] namespace = dict(_itemgetter=_itemgetter, __name__='namedtuple_%s' % typename, OrderedDict=OrderedDict, _property=property, _tuple=tuple) try: exec template in namespace except SyntaxError, e: raise SyntaxError(e.message + ':\n' + template) result = namespace[typename] 

Very clever! The idea of ​​executing a line of code in an isolated namespace, and then pulling a new type out of it is unusual for me. For details on exec go to the post of Armin Ronaker.

Next, a bit of magic to define the __module__ new class as a module that called namedtuple :

  try: result.__module__ = _sys._getframe(1).f_globals.get('__name__', '__main__') except (AttributeError, ValueError): pass 

and that is all!

  return result 

Simple, isn't it?

Thoughts on implementation


The most interesting part of the code described above was the dynamic execution of a line of code in the namespace, which is created exclusively for this execution only. This move emphasizes the simplicity of the Python data model: all namespaces, including modules and classes, are essentially dictionaries. Studying the insides of a namedtuple again proves the power of such simplicity.
Using this technique, we could simplify the validation of field names, and instead of

 for name in (typename,) + field_names: if not all(c.isalnum() or c=='_' for c in name): raise ValueError('Type names and field names can only contain alphanumeric characters and underscores: %r' % name) if _iskeyword(name): raise ValueError('Type names and field names cannot be a keyword: %r' % name) if name[0].isdigit(): raise ValueError('Type names and field names cannot start with a number: %r' % name) 

could write

 for name in (typename,) + field_names: try: exec ("%s = True" % name) in {} except (SyntaxError, NameError): raise ValueError('Invalid field name: %r' % name) 

to directly and briefly test the validity of the identifier. But in this case, we will lose accuracy in describing the problem when an error occurs. And since this is a standard library, explicit error messages make the current implementation a better choice.

Between us only find


We are very lucky that the standard Python library is so easy to read. Do not forget about it, read the source code of the embedded modules that you use - it is simple and useful!

And generally speaking. Explore the possibilities of the tools that you use, do not do cycling!

Source: https://habr.com/ru/post/189882/


All Articles