Efficiently Generating Multiple Instances Of Numpy.random.choice Without Replacement
Solution 1:
Create a random array of specified shape and then sort along the axis where you want to keep the limits, thus giving us a vectorized and very efficient solution. This would be based on this smart answer
to MATLAB randomly permuting columns differently
. Here's the implementation -
Sample run -
In [122]: N = 10
In [123]: np.argsort(np.random.rand(8,N),axis=0)+1
Out[123]:
array([[7, 3, 5, 1, 1, 5, 2, 4, 1, 4],
[8, 4, 3, 2, 2, 8, 5, 5, 6, 2],
[1, 2, 4, 6, 5, 4, 4, 3, 4, 7],
[5, 6, 2, 5, 8, 2, 7, 8, 5, 8],
[2, 8, 6, 3, 4, 7, 1, 1, 2, 6],
[6, 7, 7, 8, 6, 6, 3, 2, 7, 3],
[4, 1, 1, 4, 3, 3, 8, 6, 8, 1],
[3, 5, 8, 7, 7, 1, 6, 7, 3, 5]], dtype=int64)
Runtime tests -
In [124]: def sortbased_rand8(N):
...: return np.argsort(np.random.rand(8,N),axis=0)+1
...:
...: def rand_M(N):
...: M = np.zeros(shape = (8, N))
...: for i in range (0, N):
...: M[:, i] = np.random.choice(8, size = 8, replace = False) + 1
...: return M
...:
In [125]: N = 5000
In [126]: %timeit sortbased_rand8(N)
100 loops, best of 3: 1.95 ms per loop
In [127]: %timeit rand_M(N)
1 loops, best of 3: 233 ms per loop
Thus, awaits a 120x
speedup!
Solution 2:
How about shuffling, that is to say, permuting?
import random
import numpy
from timeit import Timer
defB_rand_M(N):
a = numpy.arange(1,9)
M = numpy.zeros(shape = (8, N))
for i inrange (0, N):
M[:, i] = numpy.random.permutation(a)
return M
# your original implementationdefJ_rand_M(N):
M = numpy.zeros(shape = (8, N))
for i inrange (0, N):
M[:, i] = numpy.random.choice(8, size = 8, replace = False) + 1return M
some timings:
defcompare(N):
for f in (J_rand_M, B_rand_M):
t = Timer(lambda: f(N)).timeit(6)
print'time for %s(%s): %.6f' % (f.__name__, N, t)
for i inrange(6):
print'N = 10^%s' % i
compare(10**i)
print
gives
N = 10^0
time forJ_rand_M(1): 0.001199
time forB_rand_M(1): 0.000080
N = 10^1
time forJ_rand_M(10): 0.001112
time forB_rand_M(10): 0.000335
N = 10^2
time forJ_rand_M(100): 0.011118
time forB_rand_M(100): 0.003022
N = 10^3
time forJ_rand_M(1000): 0.110887
time forB_rand_M(1000): 0.030528
N = 10^4
time forJ_rand_M(10000): 1.100540
time forB_rand_M(10000): 0.304696
N = 10^5
time forJ_rand_M(100000): 11.151576
time forB_rand_M(100000): 3.049474
Solution 3:
Just a comment on your runtime analysis of the problem - my intuition is that O(n) is the best possible runtime you can possibly obtain when generating O(n) truly random numbers.
Have you tried actually running your code with n = 10 million? Your assumption that the runtime will scale by 1000 when the input grows by a factor of 1000 may not be true in practice, as there is usually a constant term when executing any program (loading libraries, etc.), which may be significant depending on the problem.
That being said, it looks like the question linked by Eric Wright does a very thorough job and can easily be adapted to fit your question.
Solution 4:
Use below code for your array generation
import numpy as np
N=1e7# THe value you want to have
np.random.randint(1,high=8,size=(8,N))
Hope this helps, it will surely not going to take that much time.
Post a Comment for "Efficiently Generating Multiple Instances Of Numpy.random.choice Without Replacement"