Skip to content Skip to sidebar Skip to footer

Efficiently Generating Multiple Instances Of Numpy.random.choice Without Replacement

I'm new to Python. While reading, please mention any other suggestions regarding ways to improve my Python code. Question: How do I generate a 8xN dimensional array in Python con

Solution 1:

Create a random array of specified shape and then sort along the axis where you want to keep the limits, thus giving us a vectorized and very efficient solution. This would be based on this smart answer to MATLAB randomly permuting columns differently. Here's the implementation -

Sample run -

In [122]: N = 10

In [123]: np.argsort(np.random.rand(8,N),axis=0)+1
Out[123]: 
array([[7, 3, 5, 1, 1, 5, 2, 4, 1, 4],
       [8, 4, 3, 2, 2, 8, 5, 5, 6, 2],
       [1, 2, 4, 6, 5, 4, 4, 3, 4, 7],
       [5, 6, 2, 5, 8, 2, 7, 8, 5, 8],
       [2, 8, 6, 3, 4, 7, 1, 1, 2, 6],
       [6, 7, 7, 8, 6, 6, 3, 2, 7, 3],
       [4, 1, 1, 4, 3, 3, 8, 6, 8, 1],
       [3, 5, 8, 7, 7, 1, 6, 7, 3, 5]], dtype=int64)

Runtime tests -

In [124]: def sortbased_rand8(N):
     ...:     return np.argsort(np.random.rand(8,N),axis=0)+1
     ...: 
     ...: def rand_M(N):
     ...:     M = np.zeros(shape = (8, N))
     ...:     for i in range (0, N):
     ...:         M[:, i] = np.random.choice(8, size = 8, replace = False) + 1 
     ...:     return M
     ...: 

In [125]: N = 5000

In [126]: %timeit sortbased_rand8(N)
100 loops, best of 3: 1.95 ms per loop

In [127]: %timeit rand_M(N)
1 loops, best of 3: 233 ms per loop

Thus, awaits a 120x speedup!

Solution 2:

How about shuffling, that is to say, permuting?

import random
import numpy
from timeit import Timer 

defB_rand_M(N):
    a = numpy.arange(1,9)
    M = numpy.zeros(shape = (8, N))
    for i inrange (0, N):
        M[:, i] = numpy.random.permutation(a)
    return M

# your original implementationdefJ_rand_M(N):
    M = numpy.zeros(shape = (8, N))
    for i inrange (0, N):
        M[:, i] = numpy.random.choice(8, size = 8, replace = False) + 1return M

some timings:

defcompare(N):
    for f in (J_rand_M, B_rand_M):
        t = Timer(lambda: f(N)).timeit(6)
        print'time for %s(%s): %.6f' % (f.__name__, N, t)

for i inrange(6):
    print'N = 10^%s' % i
    compare(10**i)
    print

gives

N = 10^0
time forJ_rand_M(1): 0.001199
time forB_rand_M(1): 0.000080

N = 10^1
time forJ_rand_M(10): 0.001112
time forB_rand_M(10): 0.000335

N = 10^2
time forJ_rand_M(100): 0.011118
time forB_rand_M(100): 0.003022

N = 10^3
time forJ_rand_M(1000): 0.110887
time forB_rand_M(1000): 0.030528

N = 10^4
time forJ_rand_M(10000): 1.100540
time forB_rand_M(10000): 0.304696

N = 10^5
time forJ_rand_M(100000): 11.151576
time forB_rand_M(100000): 3.049474

Solution 3:

Just a comment on your runtime analysis of the problem - my intuition is that O(n) is the best possible runtime you can possibly obtain when generating O(n) truly random numbers.

Have you tried actually running your code with n = 10 million? Your assumption that the runtime will scale by 1000 when the input grows by a factor of 1000 may not be true in practice, as there is usually a constant term when executing any program (loading libraries, etc.), which may be significant depending on the problem.

That being said, it looks like the question linked by Eric Wright does a very thorough job and can easily be adapted to fit your question.

Solution 4:

Use below code for your array generation

import numpy as np
N=1e7# THe value you want to have
np.random.randint(1,high=8,size=(8,N))

Hope this helps, it will surely not going to take that much time.

Post a Comment for "Efficiently Generating Multiple Instances Of Numpy.random.choice Without Replacement"