List Manipulation In Python With Pop()

November 26, 2023 Post a Comment

In short, I need to remove multiple items from a list according to their indexes. However, I can't use pop because it shifts the indexes (without some clumsy compensating system).

Solution 1:

Are your lists large? If so, use ifilter from itertools to filter out elements that you don't want lazily (with no up front cost).

Lists not so large? Just use a list comprehension:

newlist = [x for x in oldlist if x not in ['a', 'c'] ]

This will create a new copy of the list. This is not generally an issue for efficiency unless you really care about memory consumption.

As a happy medium of syntax convenience and laziness ( = efficiency for large lists), you can construct a generator rather than a list by using () instead of []:

interestingelts = (x for x in oldlist if x not in ['a', 'c'])

After this, you can iterate over interestingelts, but you can't index into it:

for y in interestingelts:    # okprint y

 print interestingelts[0]     # not ok: generator allows sequential access only

Solution 2:

You want a list comprehension:

L = [c for c in L if c not in ['a', 'c']]

Or, if you really don't want to create a copy, go backwards:

foriinreversed(range(len(L))):
    ifL[i]in['a', 'c']:
        L.pop(i)    # delL[i]ismoreefficient

Thanks to ncoghlan for reversed() & phooji for del L[i] suggestions. (I decided to leave it as L.pop(i), since that's how the question was initially formulated.)

Also, as J.S. Sebastian correctly points out, going backwards is space efficient but time inefficient; most of the time a list comprehension or generator (L = (...) instead of L = [...]) is best.

Edit:

Ok, so since people seem to want something less ridiculously slow than the reversed method above (I can't imagine why... :) here's an order-preserving, in-place filter that should differ in speed from a list comprehension only by a constant. (This is akin to what I'd do if I wanted to filter a string in c.)

write_i = 0for read_i in range(len(L)):
    L[write_i] = L[read_i]
    if L[read_i] not in ['a', 'c']:
         write_i += 1

del L[write_i:]
print L
# output: ['b', 'd']

Solution 3:

Summary

use list comprehension (or genexpr) to remove multiple items from a list
if your input is a large byte-string then use str.translate() to delete characters
deleting one item at a time del L[i] is slow for large lists

If items are bytes as in your example you could use str.translate():

defremove_bytes(bytestr, delbytes):
    """
    >>> remove_bytes(b'abcd', b'ac') == b'bd'
    True
    """return bytestr.translate(None, delbytes)

In general multiple items could be removed using slicing:

defremove_inplace_without_order(L, delitems):
    """Remove all items from `L` that are in `delitems` (not preserving order).

    >>> L = list(range(4)); remove_inplace_without_order(L, [0,2]); L
    [3, 1]
    """
    idel = len(L) # items idel.. to be removedfor i inreversed(range(len(L))):
        if L[i] in delitems:
            idel -= 1
            L[i] = L[idel] # save `idel`-th itemdel L[idel:] # remove items all at once#NOTE: the function returns `None` (it means it modifies `L` inplace)

As @phooji and @senderle already mentioned list comprehension (or generator expression) is preferable in your case:

defremove_listcomp(L, delitems):
    return [x for x in L if x notin delitems]

Here's a performance comparison for L=list("abcd"*10**5); delitems="ac":

| function                     | time, msec |  ratio |
|------------------------------+------------+--------|
| list                         |       4.42 |    0.9 |
| remove_bytes                 |       4.88 |    1.0 |
| remove                       |       27.3 |    5.6 |
| remove_listcomp              |       36.8 |    7.5 |
| remove_inplace_without_order |       71.2 |   14.6 |
| remove_inplace_senderle2     |       83.8 |   17.2 |
| remove_inplace_senderle      |      15000 | 3073.8 |
#+TBLFM: $3=$2/@3$2;%.1f

Where

try:
    from itertools import ifilterfalse as filterfalse
except ImportError:
    from itertools import filterfalse # py3kdefremove(L, delitems):
    return filterfalse(delitems.__contains__, L)

defremove_inplace_senderle(L, delitems):
    for i inreversed(range(len(L))):
        if L[i] in delitems:
            del L[i]

defremove_inplace_senderle2(L, delitems):
    write_i = 0for read_i inrange(len(L)):
        L[write_i] = L[read_i]
        if L[read_i] notin delitems:
             write_i += 1del L[write_i:]

remove_inplace_senderle() is slow due to it uses O(N**2) algorithm. Each del L[i] might cause all items to the right to be moved left to close the gap.

The time column in the above table includes time it takes to create a new input list (the first row) due to some algorithms modify an input inplace.

Here's timings for the same input but without creating a new list on each iteration:

 | function        | time, msec | ratio |
 |-----------------+------------+-------|
 | remove_bytes    |      0.391 |     1 |
 | remove          |       24.3 |    62 |
 | remove_listcomp |       33.4 |    85 |
 #+TBLFM: $3=$2/@2$2;%d

The table shows that itertools.ifilterfalse() doesn't provide a significant improvement over listcomp.

In general it is not worth it or even harmful to think about performance for such tasks unless a profiler proven that this code is a bottleneck and it is important for your program. But it might be useful to be aware of alternative approaches that could provide more than an order of magnitude improvement in speed.

Python Playground

List Manipulation In Python With Pop()

Solution 1:

Solution 2:

Solution 3:

Summary

Post a Comment for "List Manipulation In Python With Pop()"