Skip to content Skip to sidebar Skip to footer

Numpy: Selecting N Points Every M Points

If I have a numpy.ndarray that's, say, 300 points in size (1 x 300 for now), and I wanted to select 10 points every 30 points, how would I do that? In other words: I want the first

Solution 1:

To select 10 elements off each block of 30 elements, we can simply reshape into 2D and slice out the first 10 columns from each row -

a.reshape(-1,30)[:,:10]

The benefit is the output would be a view into the input and as such virtually free and without any extra memory overhead. Let's have a sample run to show and prove those -

In [43]: np.random.seed(0)

In [44]: a = np.random.randint(0,9,(1,300))
    
In [48]: np.shares_memory(a,a.reshape(10,30)[0,:,:10])
Out[48]: True

If you need a flattened version, use .ravel() -

a.reshape(-1,30)[:,:10].ravel()

Timings -

In [38]:a=np.random.randint(0,9,(300))# @sacul's solnIn [39]:%%timeit...:msk= [True] *10+ [False] *20...:out=a[np.tile(msk,len(a)//len(msk))]100000loops,best of 3:7.6µsperloop# From this postIn [40]:%timeita.reshape(-1,30)[:,:10].ravel()1000000loops,best of 3:1.07µsperloopIn [41]:a=np.random.randint(0,9,(3000000))# @sacul's solnIn [42]:%%timeit...:msk= [True] *10+ [False] *20...:out=a[np.tile(msk,len(a)//len(msk))]100loops,best of 3:3.66msperloop# From this postIn [43]:%timeita.reshape(-1,30)[:,:10].ravel()100loops,best of 3:2.32msperloop# If you are okay with `2D` output, it is virtually freeIn [44]:%timeita.reshape(-1,30)[:,:10]1000000loops,best of 3:519nsperloop

Generic case with 1D array

A. No. of elements being multiple of block length

For a 1D array a with number of elements being a multiple of n, to select m elements off each block of n elements and get a 1D array output, we would have :

a.reshape(-1,n)[:,:m].ravel()

Note that ravel() flattening part makes a copy there. So, if possible keep the unflattened 2D version for memory efficiency.

Sample run -

In [59]: m,n = 2,5

In [60]: N = 25

In [61]: a = np.random.randint(0,9,(N))

In [62]: a
Out[62]: 
array([5, 0, 3, 3, 7, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7, 7, 8, 1, 5, 8, 4,
       3, 0, 3])

# Select 2 elements off each block of 5 elements
In [63]: a.reshape(-1,n)[:,:m].ravel()
Out[63]: array([5, 0, 3, 5, 6, 8, 7, 7, 8, 4])

B. Generic no. of elements

We would leverage np.lib.stride_tricks.as_strided, inspired by this post to select m elements off each block of n elements -

defskipped_view(a, m, n):
    s = a.strides[0]
    strided = np.lib.stride_tricks.as_strided
    shp = ((a.size+n-1)//n,n)
    return strided(a,shape=shp,strides=(n*s,s), writeable=False)[:,:m]

defslice_m_everyn(a, m, n):
    a_slice2D = skipped_view(a,m,n)
    extra = min(m,len(a)-n*(len(a)//n))
    L = m*(len(a)//n) + extra
    return a_slice2D.ravel()[:L]

Note that skipped_view gets us a view into the input array and possibly into memory region not assigned to the input array, but after that we are flattening and slicing to restrict it to our desired output and that's a copy.

Sample run -

In [170]: np.random.seed(0)
     ...: a = np.random.randint(0,9,(16))

In [171]: a
Out[171]: array([5, 0, 3, 3, 7, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7])

# Select2 elements off each block of5 elements
In [172]: slice_m_everyn(a, m=2, n=5)
Out[172]: array([5, 0, 3, 5, 6, 8, 7])

In [173]: np.random.seed(0)
     ...: a = np.random.randint(0,9,(19))

In [174]: a
Out[174]: array([5, 0, 3, 3, 7, 3, 5, 2, 4, 7, 6, 8, 8, 1, 6, 7, 7, 8, 1])

# Select2 elements off each block of5 elements
In [175]: slice_m_everyn(a, m=2, n=5)
Out[175]: array([5, 0, 3, 5, 6, 8, 7, 7])

Solution 2:

You could create a mask and index by the mask, repeated until it reaches the length of your array:

msk = [True] * 10 + [False] * 20

arr[np.tile(msk, len(arr)//len(msk))]

Minimal example:

In an array of 30 values, select 1 element, then skip 2 elements:

>>> arr
array([6, 7, 2, 7, 1, 9, 1, 4, 4, 8, 6, 5, 2, 6, 3, 6, 8, 5, 6, 7, 2, 1, 9,
       6, 7, 2, 1, 8, 2, 2])

msk = [True] * 1 + [False] * 2>>> arr[np.tile(msk, len(arr)//len(msk))]
array([6, 7, 1, 8, 2, 6, 6, 1, 7, 8])

Explanation:

msk is a boolean mask

>>> msk
[True, False, False]

You can then repeat that mask with np.tile, until it is the same length as your original array (i.e. the length of your array divided by the length of your mask):

>>> np.tile(msk, len(arr)//len(msk))
array([ True, False, False,  True, False, False,  True, False, False,
        True, False, False,  True, False, False,  True, False, False,
        True, False, False,  True, False, False,  True, False, False,
        True, False, False], dtype=bool)

Then it's a simple matter of indexing by a boolean, which numpy excels at

Solution 3:

IIUC

get = 10skip = 20k = [item for z in [np.arange(get) + idx for idx in np.arange(0, x.size, skip+get)] for item in z]

Then just slice

x[k]

Example:

x = np.arange(100)
x[k]

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 30, 31, 32, 33, 34, 35, 36,
       37, 38, 39, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 90, 91, 92, 93,
       94, 95, 96, 97, 98, 99])

Post a Comment for "Numpy: Selecting N Points Every M Points"