When one starts writing in python, the typical reaction is disappointment about how slow it is compared to any compilable language. After a while, you learn numpy and find out it's actually not so bad.

Having spent a month with numpy, I found out that many things can be written in it.

Having spent a year with it, I found out that almost any algorithm may be vectorized, though it's sometimes non-trivial.

I'm still quite disappointed about the most answers at stackoverflow, where people prefer plain python for any nontrivial thing more complicated than computing sum of array.


For instance, you need to compute statistics of values in array.

There is a function in scipy.stats library which is created specially for this purpose:

order_statistics = rankdata(initial_array)

Another option is to sort array and keep track of initial positions. Alternatively, you can do in numpy with one-liner:


order_statistics = numpy.argsort(numpy.argsort(initial_array))

(isn't this beatiful?)
Want to compute mean value over the group of events? With one-liner? Here you go:

means = numpy.bincount(group_indices, weights=values) / numpy.bincount(group_indices)

Writing oblivious decision tree in numpy is very simple and computations there are done really fast.

As a non-trivial problem: will you be able to write application of usual decision tree in pure numpy? For simplicity, you can first consider only trees with equal depth of all leaves.

See also: