📜 ⬆️ ⬇️

Stop writing classes

Photo of Jack Didriha from G + profile A sign that an object should not be a class - if it has only 2 methods, and one of them is initialization, __init__. Every time you see this, think: "I guess I just need one function."

Every time when you create only one copy from a written class, use it only once and then throw it away, you should think: “oh, we should refactor it! You can make it easier, much easier! ”

Translation of the report of Jack Didriha , one of the key developers of the Python language. The report was made on March 9, 2012 at the PyCon US conference.

All of you have read the Zen Python , probably many times. Here are a few items from it:
')


Wrote this text by Tim Peters. He is smarter than you and me. How many people do you know after whom the sorting algorithm was named? This is the man who wrote the Zen Python. And all the points say: “Do not make it difficult. Do it easy. ”That’s what the talk is about.

So, first of all, do not make it difficult, where you can make it easier. Classes are very complex or can be very complex. But we still make it difficult, even trying to make it easier. Therefore, in this report we will read a little code and find out how to notice that we are going the wrong way and how to get back.

At my work, I tell colleagues: “I hate the code, and I want it to be as small as possible in our product.” We sell the functionality, not the code. Customers are not because of the code, but because of the wide functionality. Every time the code is removed, this is good. There are four of us, and in the last year we stopped counting the number of lines in the product, but we continue to introduce new functionality.

Classes



From this report you first need to remember this code. This is the largest class abuse in nature.

class Greeting(object): def __init__(self, word='Hello'): self.word = word def greet(self, name): return '%s, %s!' % (self.word, name) >>> greeting = Greeting('Hola') >>> greeting.greet('Jorge') Hola, Jorge! 


This is not a class, although it looks like a class. Name - noun, "greeting." It takes arguments and saves them to __init__. Yes, it looks like a class. It has a method that reads the state of an object and does something else, like in classes. Below it is written how this class is used: create a copy of the Greetings and then use this Greeting to do something else.

But this is not a class, or it should not be a class. A sign of this - there are only 2 methods in it, and one of them is initialization, __init__. Every time you see this, think: "I guess I just need one function."

Every time when you create only one copy from a written class, use it only once and then throw it away, you should think: “oh, we should refactor it! You can make it easier, much easier! ”

 def greet(name): ob = Greeting('') print ob.greet(name) return 


This function consists of 4 lines of code. But how can you do the same thing in just 2 lines:

 def greet(greeting, name): return '%s, %s!' % (greeting, name) import functools greet = functools.partial(greet, '') greet('') 


If you always call a function with the same first argument, the standard library has a tool! functools.partial. Look at the code above: add an argument, and the result can be called multiple times.

I do not know how many of you have a diploma in IT, I have it. I taught concepts like

- separation of powers
- reduction of code connectivity
- encapsulation
- implementation isolation

Since I graduated from high school, I have not used these terms for 15 years. Hearing these words, you know, they fool you. These terms are not required by themselves. If they are used, people mean completely different things that only interfere with the conversation.

Example: pants turn into ...



Many of you use third-party libraries in your daily work. Every time you need to use someone else's code, the first thing to do is read it. It is not known what is there, what quality, whether they have tests and so on. You need to check the code before turning it on. Sometimes it is hard to read the code.

The third-party API library, let's call it ShaurMail, included 1 package, 22 modules, 20 classes and 660 lines of code. I had to read all this before turning it into a product. But it was their official API, so we used it . Every time an API update came, I had to look at the diffs, because it was unknown what they changed. Did you send patches - did they appear in the update?

660 lines of code, 20 classes - it's a bit too much if the program only needs to give a list of email addresses, the text of the letter and find out which letters have not been delivered, and who have unsubscribed.

What is class abuse? Often people think that they will need something in the future. ... It will not be necessary. Write everything when needed. In the ShaurMail library there is a ShaurHash module, in which there are 2 lines of code:

 class ShaurHash(dict): pass 


Someone decided that later a dictionary add-on would be needed. It was not needed, but everywhere in the code there are lines like the first:

 my_hash = ShaurMail.ShaurHash.ShaurHash(id='cat') d = dict(id='cat') d = {'id': 'cat'} 


The second and third lines of code - no one needs to explain them. But this mantra “ShaurMail-ShaurHash-ShaurHash” was repeated everywhere. The threefold repetition of the word "Shawr" is another sign of excess. From repetitions to all only harm. You annoy the user, forcing him to write "Shawr" three times. (This is not a real company name, but a fictional one.)

Then they fired this guy and hired someone who knew what he was doing. Here is the second version of the API:

 class API: def __init__(self, key): self.header = dict(apikey=key) def call(self, method, params): request = urllib2.Request( self.url + method[0] + '/' + method[1], urllib.urlencode(params), self.header ) try: response = json.loads(urllib2.urlopen(request).read()) return response except urllib2.HTTPError as error: return dict(Error=str(error)) 


In that there were 660 lines, in this one - 15. Everything that this code does - uses the methods of the standard library. It is read entirely, easily, in seconds, and you can immediately understand what it does. By the way, there was also a test suite of 20 lines. Here's how to write. When they updated the API, I could read the changes in just a couple of seconds.

But here you can notice the problem. There are two methods in the class, and one of them is __init__. The authors did not hide this. The second method is call, call. Here's how to use this API:

 ShaurMail.API(key=' ').call(('mailing', 'statistics'), {'id': 1}) 


The string is long, so we do an alias and call it multiple times:

 ShaurMail.request = ShaurMail.API(key=' ').call ShaurMail.request(('mailing', 'statistics'), {'id': 1}) 


Notice, we use this class as a function. She he should be. If you see something like that, you know, the class is not needed here. Therefore, I sent them the third version of the API:

 ShaurMail_API = url = 'https://api.shaurmail.com/%s/%s' ShaurMail_API_KEY = ' ' def request(noun, verb, **params): headers = {'apikey': ShaurMail_API_KEY} request = urllib2.Request(ShaurMail_API % (noun, verb), urllib.urlencode(params), headers) return json.loads(urllib2.urlopen(request).read()) 


It does not create files in our project at all, because I inserted it into the module where it is used. It does everything that the 15-string API did, and everything that the 660-string API did.

This is where we started and what we came up with:



It is easier to use, easier to write, no one needs to figure out what is happening.

Standard library



Who came from the Java language, perhaps, believes that namespaces are needed for taxonomy. This is not true. They are needed to prevent name matches. If you have deep hierarchies of spaces, it doesn’t give anyone anything. ShaurMail.ShaurHash.ShaurHash - just extra words that people need to remember and write.

In the standard library of Python, the namespace is very shallow, because you either remember the name of the module, or you need to look in the documentation. No good if you need to figure out the chain, in which package to search, in which package in it, in which package further, and what is the name of the module in it. You just need to know the name of the module.

To our shame, here is an example from our own code, and the same sins can be seen here:

 services.crawler.crawlerexceptions.ArticleNotFoundException 


A package in which a 2-line module, an exception class, and a “pass”. To use this exception, it is necessary to write twice “crawler”, twice the word “exception”. The name ArticleNotFoundException repeats itself. This is not necessary. If you call exceptions, let it be EmptyBeer, BeerError, BeerNotFound, but BeerNotFoundError is already a lot.

You can simply use exceptions from the standard library. They are understandable to all. Unless you need to catch a specific state, a LookupError is fine. If you received an e-mail in the mail, you still have to read it, so it does not matter what the exception is called.

In addition, exceptions in the code usually go after raise and except, and everyone immediately understands that these are exceptions. Therefore, do not add “Exception” to the title.

There are rusty details in the standard Python library, but it is a very good example of code organization:



The 10 files in the package are many, but only because of some third-party projects added to the library, where there were packages from just 2 files. If you want to create a new exception, think better, because the standard library cost 1 exception for 1200 lines of code.

I am not against classes in principle. Classes are needed - when a lot of changing data and related functions. However, in everyday work this happens infrequently. Regularly you have to work with the standard library, and there are already suitable classes there. For you they have already written.

The only exception to the Python library is the heapq module. Heap queue, "heap queue" is an array that is always sorted. There are a dozen methods in the heapq module, and they all work with the same heap. The first argument always remains the same, which means the class really comes to mind here.

 heapify(data) pushleft(data, item) popleft(data) pushright(data, item) popright(data) 

etc.

Every time I need to use heapq, I take the implementation of this class from my toolkit.

 class Heap(object): def __init__(self, data=None, key=lambda x: None): self.heap = data or [] heapq.heapify(self.heap) self.key = key def pushleft(self, item): if self.key: item = (self.key(item), item) heapq.pushleft(self.heap, item) def popleft(self): return heapq.popleft(self.heap)[1] 


Classes grow like weeds



The state of OAuth in Python is unimportant. Again, there are third-party libraries, and before using them in your project, they need to be read.

I tried to use the URL shortener from Google: I needed to take the URLs and just cut them. Google has a project with 10,000 lines of code. 115 modules and 207 classes. I wrote a rebuke about this in Google + , but few saw it, and Guido commented: “I disclaim responsibility for the Google API code.” 10,000 lines of code - there will definitely be some Something like Shaur Meyl. Here, for example, the class Flow ("flow"), from which others inherit.

 class Flow(object): pass class Storage(object): def put(self, data): _abstract() def get(self): _abstract() def _abstract(): raise NotImplementedError 


He is empty. But he has his own module, and each time reading the class that inherits it, you need to go, check that file and again make sure that that class is empty. Someone was looking into the future and decided: “I’ll write 3 lines of code now so that I don’t have to change these 3 lines in the future.” And he took the time from everyone who reads his library. There is another class Storage, (Storage) which does almost nothing. It correctly handles errors using standard exceptions, but it does aliases, and again you need to go read their code to find out how it works.

I needed a week to implement OAuth2. It took a couple of days to read ten thousand lines of code, after which I began to look for other libraries. Found python-oauth2. This is the second version of python-oauth, but it doesn’t really work with OAuth2, which was not immediately clear. However, this library is slightly better than Google: only 540 lines and 15 classes.

I rewrote it even easier and called python-foauth2 . 135 lines of code and 3 classes, and it’s still a lot, I haven’t sufficiently refactored it. Here is one of these three classes:

 class Error(Exception): pass 


Shameless!

A life



Gosper's Glider Gun Last example. You all have seen the game “Life” of Conway , even if you do not know her name. There is a checkered field, each turn you consider adjacent for each cell, and depending on them, it will be either alive or dead. And such beautiful stable patterns as a glider are obtained: the cells come to life in front, and die from behind, and the glider flies across the field.

Pulsar The game "life" is very simple: a field and a couple of rules. We set this task at the interview, because if you do not know how this - we have nothing to talk about. Many immediately say “The cell is a noun. The class is necessary. ”What are the properties in this class? Place, live or not, the state in the next turn, everything? There are still neighbors. Then they begin to describe the field. A field is a multitude of cells, therefore, it is a grid, it has a “calculate” method that counts the cells inside.

 class Cell(object): def __init__(self, x, y, alive=True): self.x = x self.y = y self.alive = alive self.next = None def neigbors(self): for i, j in itertools.product(range(-1, 2), repeat=2): if (i, j) != (0, 0): yield (self.x + i, self.y + j) class Board(object): def __init__(self): self.cells = {} # { (x, y): Cell() } def advance(self): for (x, y), cell in self.cells.items(): alive_neighbors = len(cell.neighbors) cell.next = (alive_neighbors == 3 or (alive_neighbors == 2 and cell.alive)) 


At this point, I must say "stop": we have a class Field, in which there are 2 methods: __init__ and "make a move." There is one property in it - a dictionary, which means with a dictionary and it is necessary to work. Note that it is not necessary to store the neighbors of the point, they are already in the dictionary. A live point or not is just a boolean value, so we will only store the coordinates of live cells. And since only True is stored in the dictionary, it is not a dictionary that is needed, but simply a set of coordinates. Finally, a new state is not needed, you can simply re-create a list of living cells.

 def neigbors(point): x, y = point for i, j in itertools.product(range(-1, 2), repeat=2): if any((i, j)): yield (x + i, y + j) def advance(board): newstate = set() recalc = board | set(itertools.chain(*map(neighbors, board))) for point in recalc: count = sum((neigh in board) for neigh in neighbors(point)) if count == 3 or (count == 2 and point in board): newstate.add(point) return newstate glider = set([(0, 0), (1, 0), (2, 0), (0, 1), (1, 2)]) for i in range(1000): glider = advance(glider) print glider 


Glider moves It turns out very simple, concise implementation of the game. Two classes are not necessary here. Below - the coordinates of the airframe, they are inserted into the field, and the glider flies. Everything. This is the full realization of the game "life."

Summary



1. If you see a class with two methods, including __init__, this is not a class.
2. Do not create new exceptions if they are not needed (and they are not needed).
3. Simplify harder.

From the translator: in the comments I see that many perceived the report as a complete denial of the PLO. This is mistake. Point 1 of the results clearly says that such is not a class. Classes are needed, but the essence of the report is that it is not necessary to abuse them.

Source: https://habr.com/ru/post/140581/


All Articles