Skip to content Skip to sidebar Skip to footer

Reading A Few Lines As Numpy Array From A Huge File

I have a text file that contains a billion words and their corresponding 300 dimensional word vectors.I need to extract a few thousands of words & their words vectors from the

Solution 1:

I don't know this is the fastest (probably not) but it works reasonably well (I tested it on a >100,000 lines file):

F = filter(lambda s: s.strip().split()[0] in word_set if s.strip() elseFalse,
           open(fn, 'rt'))
x = np.genfromtxt(F, *yourargs, **yourkwds)

This is for Python2. In Python3 it seems one has to .encode() the input.

Post a Comment for "Reading A Few Lines As Numpy Array From A Huge File"