📜 ⬆️ ⬇️

Unobvious optimization in speed when solving a specific problem in Python

Let's start


There is a SQL database. The task is described by three phrases:

Task more detailed

  1. The script should be executed very often.
  2. Data uploading consists in reading the result of the simplest SELECT * FROM table query from the database. In a table / view, rows are usually more than 100,000, columns ~ 100.
  3. Validation is a check of a set of conditions of the rowObject.Column1 == Value (<,>,! =) Condition and more complex checks. The point is that the check requires a column by name.
  4. Generation of a report on the result of checks.

Pay attention to paragraph 1.

The rest is not so interesting.
(I use the sqlite database as an example )

To use any ORM for such a task is at least strange. We do it in the forehead (to simplify, we unload the entire result into memory)
import sqlite3 conn = sqlite3.connect(filePath) result = tuple(row for row in conn.cursor().execute("SELECT * FROM test")) 

After the execution of the result contains a tuple of tuples. We need an object with attributes.
Complicate:
 ColsCount = 100 class RowWrapper(object): def __init__(self, values): self.Id = values[0] for x in xrange(ColsCount): setattr(self, "Col{0}".format(x), values[x + 1]) result = tuple(RowWrapper(row) for row in conn.cursor().execute(self.query)) 

We are ready to go to point 2. Or not? And let's measure the speed of both examples (the full test code is here ).
100,000 lines, 101 columns
I got in seconds:
Sample 1: 4.64823588605
Sample 2: 17.1091031498
On creation of instans of a class> 10 is spent
With ++ the programmer inside me wanted to do something about it.

The solution was found such

We use namedtuple from the collections module. I will not describe in detail here the principle of its work. I will give only a small example demonstrating the functionality we need.
 import collections columns = ('name', 'age', 'story') values = ('john', '99', '...blahblah...') SuperMan = collections.namedtuple('SuperMan', columns) firstSuperMan = SuperMan._make(values) print(firstSuperMan.name) print(firstSuperMan.age) print(firstSuperMan.story) 

And now an example in the context of the task:
 import collections columns = tuple(itertools.chain(('Id',), tuple("Col{0}".format(x) for x in xrange(ColsCount)))) TupleClass = collections.namedtuple("TupleClass", Columns) result = tuple(TupleClass._make(row) for row in conn.cursor().execute(self.query)) 

Measure the speed:
Sample 1: 4.30456730876
Sample 2: 15.3314512807
Sample 3: 4.67008026138
')
It is quite another matter. Full sample code with the creation of the base and measurements of speed look here

For examples in the article was used


Source: https://habr.com/ru/post/151163/


All Articles