Find Gaps In A Sequence Of Strings
I have got a sequence of strings - 0000001, 0000002, 0000003.... upto 2 million. They are not contiguous. Meaning there are gaps. Say after 0000003 the next string might be 0000006
Solution 1:
You could sort the list of ids and then step through it once only:
deffind_gaps(ids):
"""Generate the gaps in the list of ids."""
j = 1for id_i insorted(ids):
whileTrue:
id_j = '%07d' % j
j += 1if id_j >= id_i:
breakyield id_j
>>> list(find_gaps(["0000001", "0000003", "0000006"]))
['0000002', '0000004', '0000005']
If the input list is already in order, then you can avoid the sorted
(though it does little harm: Python's adaptive mergesort is O(n) if the list is already sorted).
Solution 2:
For storing sequence of 2 millions ints you can use bitarray. Here each bit means one integer (the integer of that index in bitarray). Example code:
gaps = []
# bitarray is 0 based
a = bitarray.bitarray(total + 1)
a.setall(False)
for sid in curr_ids:
a[int(sid)] = Truefor i inrange(1, total):
ifnot a[i]:
gaps.append('%07d' %(i))
return gaps
Solution 3:
seq = *the sequence of strings*
n = 2000000gaps = set(str(i).zfill(7) for i in range(1,n+1)) - set(seq)
Solution 4:
I would suggest take it int rather than string for processing and then making it a string again in output
j=0
n=2000000
#create a list of int number from your string
foo = [i for i in range(n)]
#creating gaps
foo.remove(1)
foo.remove(50)
while j<n:
for i in foo:
if i>j:
print'%07d'%j
j+=1
j+=1
Post a Comment for "Find Gaps In A Sequence Of Strings"