Sort: key vs cmp

When sorting in Python 2, there are at least two ways to sort this “customize” sorting: these are the key and cmp parameters. The first was added only in Python 2.4, and the second was removed in Python 3.0. These facts seem to suggest that key indeed better. Who does not agree with this or not sure - please under the cat.

First, a small digression to the documentation, so that everything does not look too messy.

For sorting in Python, one usually uses either built-in `sorted`, or` list.sort`. Actually a challenge

 sorted(iterable, **kwargs)

equivalent to code

 lst = list(iterable); lst.sort(**kwargs); lst

so let's focus on `list.sort` later.

`list.sort` accepts three optional parameters:` key` is a function (actually any callable object) that accepts a list item and returns an object that will be used in the comparison during sorting instead of the original list item; `cmp` is a function that takes two elements of the list and returns -1, 0 or 1 depending on the relationship between the parameters passed; `reversed` - if` True`, then the list will be sorted in reverse order.
')
So, what to use if, for example, you need to sort the list of objects by the `id` field? Look at the `cmp` and` key` approaches:

 lst.sort(cmp=lambda x, y: cmp(x['id'], y['id'])) #

 lst.sort(key=lambda x: x['id']) # , ,

Not to be unfounded about the speed, a couple of tests:

 >>> lst = [{'id': x, 'name': 'test'} for x in xrange(1000)] >>> random.shuffle(lst) >>> def with_cmp(lst): ... lst.sort(cmp=lambda x, y: cmp(x['id'], y['id'])) ... >>> def with_key(lst): ... lst.sort(key=lambda x: x['id']) ... >>> timeit.timeit('with_cmp(lst[:])', setup='from __main__ import with_cmp, lst', number=1000) 2.7731389999389648 >>> timeit.timeit('with_key(lst[:])', setup='from __main__ import with_key, lst', number=1000) 0.55310797691345215

Why such a big difference? The fact is that if the `key` parameter is present, its value is applied to all list items only once at the beginning of sorting ( sorts ), while` cmp` values are used for each comparison! Given that the team-grade ( parsing algorithm ), which is used in Python, has an average score of nlog (n), we can assume that in our example lambda from `cmp` was called several times more often.

In fact, you can (and should) do it even faster - using not the slow Python function, but the native one written in C. The operator library helps a lot. Here's what the results from operator.itemgetter will look like (there are still docs.python.org/library/operator.html#operator.attrgetter , methodcalled and many other tasty things ):

 >>> from operator import itemgetter >>> def with_op(lst): ... lst.sort(key=itemgetter('id')) ... >>> timeit.timeit('with_op(lst[:])', setup='from __main__ import with_op, lst, itemgetter', number=1000) 0.4054520057716877

Total, 20 7x increase in speed in comparison with the first option - not bad, right?

Understood with speed, about clarity. I do not think that the use of `key` \` cmp` should be a matter of taste, for the latter is a great example of the abstraction that flows . Everything is fine as long as the function-value of the `cmp` parameter uses built-in` cmp`, which hides the details of the sorting mechanism, but what will you do when you are asked to predict the output of the following code:

 >>> def numeric_compare(x, y): return x - y >>> sorted([5, 2, 4, 1, 3], cmp=numeric_compare)

Of course, you remember that `cmp` should return -1, 0 or 1, but what exactly? If `x` is greater than` y`, should it be 1 or -1? If you return 2, will there be an error or 2 will be interpreted as 1? Of course, it is possible to find answers to questions in half a minute, but why search? I think it is better to use a better abstraction, that is, the `key` parameter.

Warning questions, I agree that there are probably rare examples of tasks where the `key` is not enough. But these are exceptions, and there are also recipes for them (for example, one such is Sorting Mini-HOW TO: cmp_to_key ).

Thanks for attention.

PS Problem. Why does the following code behave so strangely:

 >>> ls = [(1,2,3), (2,1,3), (2,3,1)] >>> ls.sort(key=reversed) >>> ls [(1, 2, 3), (2, 3, 1), (2, 1, 3)] >>> ls.sort(key=reversed) >>> ls [(2, 1, 3), (1, 2, 3), (2, 3, 1)]

Answer:
`reversed` returns an object of type` <type 'reversed'> `for which rich-comparsion methods are undefined. Therefore, `list.sort` uses the` id` of objects that are constantly changing for comparison. Use `operator.itemgetter (slice (None, None, -1))` instead of `reversed`.

Source: https://habr.com/ru/post/138625/

All Articles

Sort: key vs cmp

More articles: