Skip to content Skip to sidebar Skip to footer

Numpy Vectorisation Of Python Object Array

Just a short question that I can't find the answer to before i head off for the day, When i do something like this: v1 = float_list_python = ... # v2 =

Solution 1:

numpy is fast because it performs numeric operations like this in fast compiled C code. In contrast the list operation operates at the interpreted Python level (streamlined as much as possible with Python bytecodes etc).

A numpy array of numeric type stores those numbers in a data buffer. At least in the simple cases this is just a block of bytes that C code can step through efficiently. The array also has shape and strides information that allows multidimensional access.

When you multiply the array by a scalar, it, in effect, calls a C function titled something like 'multiply_array_by_scalar', which does the multiplication in fast compiled code. So this kind of numpy operation is fast (compared to Python list code) regardless of the number of cores or other multi-processing/threading enhancements.

Arrays of objects do not have any special speed advantage (compared to lists), at least not at this time.

Look at my answer to a question about creating an array of arrays, https://stackoverflow.com/a/28284526/901925 I had to use iteration to initialize the values.

Have you done any time experiments? For example, construct an array, say (1000,2). Use tolist() to create an equivalent list of lists. And make a similar array of objects, with each object being a (2,) array or list (how much work did that take?). Now do something simple like len(x) for each of those sub lists.

Solution 2:

@hpaulj provided a good answer to your question. In general, from reading your question it occurred to me that you do not actually understand what "vectorization" does under the hood. This writeup is a pretty decent explanation of vectorization and how it enables faster computations - http://quantess.net/2013/09/30/vectorization-magic-for-your-computations/

With regards to point 1 - Distributing computations across multiple cores, this is not always the case with Numpy. However, there are libraries like numexpr that enable multithreaded, highly efficient Numpy array computations with support for several basic logical and arithmetic operators. Numexpr can be used to turbo charge critical computations when used in conjunction with Numpy as it avoids replicating large arrays in memory for vectorization routines (as is the case for Numpy) and can use all cores on your system to perform computations.

Post a Comment for "Numpy Vectorisation Of Python Object Array"