Skip to content Skip to sidebar Skip to footer

How To Extract Arrays From An Arranged Numpy Array?

This is a relative question of the post How to extract rows from an numpy array based on the content?, and I used the following code to split rows based on the content in the colum

Solution 1:

Here's an approach considering pair of elements from each row as indexing tuples -

# Convert to linear index equivalentslidx = np.ravel_multi_index(arr[:,:2].T,arr[:,:2].max(0)+1)

# Get sorted indices of lidx. Using those get shifting indices.# Split along sorted input array along axis=0 using those.sidx = lidx.argsort()
out = np.split(arr[sidx],np.unique(lidx[sidx],return_index=1)[1][1:])

Sample run -

In [34]: arr
Out[34]: 
array([[2, 7, 5],
       [3, 4, 6],
       [2, 3, 5],
       [2, 7, 7],
       [4, 4, 7],
       [3, 4, 6],
       [2, 8, 5]])

In [35]: out
Out[35]: 
[array([[2, 3, 5]]), array([[2, 7, 5],
        [2, 7, 7]]), array([[2, 8, 5]]), array([[3, 4, 6],
        [3, 4, 6]]), array([[4, 4, 7]])]

For a detailed info on converting group of elements as indexing tuple, please refer to this post.

Solution 2:

The numpy_indexed package (disclaimer: I am its author) contains functionality to efficiently perform these type of operations:

import numpy_indexed as npi
npi.group_by(a[:, :2]).split(a)

It has decent test coverage, so id be surprised if it tripped on your seemingly straightforward test case.

Solution 3:

If I apply that split line directly to your array I get your result, an empty array plus the original

In [136]: np.split(a,np.unique(a[:,1],return_index=True)[1][1:])
Out[136]: 
[array([], shape=(0, 3), dtype=int32), 
 array([[2748309,  246211,       1],
        [2748309,  246211,       2],
        [2747481,  246201,      54]])]

But if I first sort the array on the 2nd column, as specified in the linked answer, I get the desired answer - with the 2 arrays switched

In [141]: sorted_a=a[np.argsort(a[:,1])]
In [142]: sorted_a
Out[142]: 
array([[2747481,  246201,      54],
       [2748309,  246211,       1],
       [2748309,  246211,       2]])
In [143]: np.split(sorted_a,np.unique(sorted_a[:,1],return_index=True)[1][1:])
Out[143]: 
[array([[2747481,  246201,      54]]), 
 array([[2748309,  246211,       1],
        [2748309,  246211,       2]])]

Post a Comment for "How To Extract Arrays From An Arranged Numpy Array?"