Hi, Habrozhiteli!
Continuing the
topic , in this post we will summarize the competition for backend developers, tell you about common mistakes and good solutions to questions.
The competition consisted of eight assignments that test knowledge of Python and related technologies.
')
Question 1. What are the methods of organizing parallel computing, what are the advantages and disadvantages of each method?
We received whole treatises on how there are ways to separate the calculations, which, in fact, is not surprising - the problem is very relevant in the modern world. In answering this question, it was enough to mention two main ways:
- Thread level parallelism
All those who have passed the test are well aware of how calculations are performed in parallel threads. But not many have indicated that they know what GIL is (Global Interpreter Lock, this method of blocking slows down the execution of all threads except that it has established a lock). And, yes, it would be cool to mention coroutine frameworks that allow you to bypass the GIL, but they are quite difficult to use and do not solve the problems associated with IO synchronization. Speaking of the latter, there are libraries that allow you to perform asynchronous IO, for example, twisted.internet.fdesc is part of twisted, but using these frameworks is quite labor-intensive.
- Process Level Parallelism
This approach allows better use of processor power, is not sensitive to GIL, however, the synchronization of processes still presents some difficulties.
Question 2. Evaluate the complexity of the algorithm and suggest options for improvement.
def uniq(iterable): result = [] for i in interable: if i not in result: result.append(i) return result
In fact, the majority of respondents made a mistake with the calculation of complexity.
if i not in result:
Since result is a list, the complexity of this operation is O (n), where n is the length of result. And the length of the result varies from 1 to n with each iteration of iterable. We have an arithmetic progression, and, therefore, the complexity will be - O (n ^ 2).
To improve, it is necessary to change the data structure. The type set on an insert has complexity O (1) and on search in itself too O (1) (if it is interesting more, read about
time complexity ).
You can rewrite it like this:
def uniq(iterable): result = set() for i in interable: if i not in result: result.append(i) return list(result)
and the complexity will become linear.
But cooler:
def uniq(iterable): result = list(set(iterable))
Question 3. What is the difference and why?
r = range(3) a = r * 2 print a
In fact, this question did not cause any difficulty for one participant. But for the curious reader, we will explain.
The specificity of the multiplication operator applied to the list is as follows: it repeats the list using shallow copy.
That is, it is permissible to rewrite expressions:
a = z + z
This creates the effect described in the question.
Question 4. We invented our own class Point for storing coordinates and primitive operations on them.
It looks like this:
class Point(): def __init__(self): self.a = None self.b = None self.c = None ...
Then he discovered that a program that operates with a large number of instances of such a class eats a lot of memory. One million such objects occupy a minimum of 360mb. What takes so much memory and how can you improve the situation?
This question brought the greatest amount of trouble to the participants. The most correct way is to use the __slots__ property, where you need to list the names of all the fields, this will greatly reduce the size of the object.
class Point(object): __slots__=('a', 'b', 'c') def __init__(self): self.a = None self.b = None self.c = None
There is also an alternative solution: replace the class Point with a tuple.
Question 5. What happens? Who is guilty? What to do?
def add1Cent(sum): return sum + 0.01 s = 0 for i in range(5): s = add1Cent(s) """ now I have 5 cents! """ print s == 0.05
This question is very simple, and no one made a mistake in it.
Indeed, the value in float is stored in IEEE 754 format and contains an error. To work with currencies, it is better to use the Decimal type, and if there is no such possibility, then write your own comparison function for float.
Question 6. Rewrite the following code using generators
file = open('somefile.csv') total = 0 for line in file: csv_list = line.split(',') if csv_list[5]: total += int(csv_list[5]) print ": ", total
What are the advantages of the resulting code? What are the cons?
This task did not create difficulties. This code can be rewritten in hundreds of ways, we were waiting for something like:
total = sum(int(line.split(',')[5]) for line in open('somefile.csv') if line.split(',')[5])
Pros: a little code.
Cons: unreadable, 30% performance drop.
Question 7. What is the difference between 'for' and u 'for? Where and why is this important?
On this issue, the majority of those who underwent testing broke. And this is surprising, since the topic is very important.
''
All lines in python are encoded in ASCII by default (this is why a problem occurs during compilation if the line contains a non-ASCII character). This behavior can be changed by declaring the encoding in the file header, writing in the first line, for example β# coding = utf-8β, then all the lines will be perceived as encoded in UTF-8.
u''
Unicode string. This encoding supports all existing languages ββand characters. The benefit of using this encoding for working with data is obvious. For example, for multilingual applications.
Some python functions convert strings to the default encoding, for example print ().
Working with cvs and xml libraries is impossible without a qualitative understanding of the differences in these data types.
Question 8. How to rewrite the code so that the constructor of class A is launched in the constructor B
class A: def __init__(self): print "init in A" class B(A): def __init__(self): print "init in B" b = B()
The absolute majority handled this question. So just note the fact that the cool answer contained two examples: for new style objects and old style.
Best of backenders
Today, a
ghostwriter Maxim Avanov came to visit us, we gave him a tour of the office and introduced him to the team members. Maxim traveled to us by train as much as 14 hours from the city of Cheboksary, where, as he said, he makes his media resource about computer games that he wants to promote to the foreign market.
Maxim about Island and startups:
It's great that there are people who see the point in being serious about their work.
The fashion for startups and the pursuit of investor money is somehow unhealthy.
It is necessary to do what you love and what you believe.
Thanks to everyone who participated!
Team Island
PS: The next post will be devoted to analysts.
PPS: By the way, we have jobs for
python developers !