Python Is Slow When Iterating Over A Large List

March 31, 2024 Post a Comment

I am currently selecting a large list of rows from a database using pyodbc. The result is then copied to a large list, and then i am trying to iterate over the list. Before I aba

Solution 1:

This should not be slow with Python native lists - but maybe ODBC's driver is returning a "lazy" object that tries to be smart but just gets slow. Try just doing

allIDRows = list(clientItemsCursor.fetchall())

in your code and post further benchmarks.

(Python lists can get slow if you start inserting things in its middle, but just iterating over a large list should be fast)

Solution 2:

It's probably slow because you load all result in memory first and performing the iteration over a list. Try iterating the cursor instead.

And no, scripts shouldn't be that slow.

clientItemsCursor.execute("Select ids from largetable where year =?", year);
for clientItemrow in clientItemsCursor:
    aID = str(clientItemrow[0])
    count = count + 1

Solution 3:

More investigation is needed here... consider the following script:

bigList = range(500000)
doSomething = ""
arrayList = [[x] for x in bigList]  # takes a few seconds
for x in arrayList:
    doSomething += str(x[0])
    count+=1

This is pretty much the same as your script, minus the database stuff, and takes a few seconds to run on my not-terribly-fast machine.

Solution 4:

When you connect to your database directly (I mean you get an SQL prompt), how many secods runs this query?

When query ends, you get a message like this:

NNNNN rowsinset (0.01 sec)

So, if that time is so big, and your query is slow as "native", may be you have to create an index on that table.

Solution 5:

This is slow because you are

Getting all the results
Allocating memory and assigning the values to that memory to create the list allIDRows
Iterating over that list and counting.

If execute gives you back a cursor then use the cursor to it's advantage and start counting as you get stuff back and save time on the mem allocation.

clientItemsCursor.execute("Select ids from largetable where year =?", year);
for clientItemrow in clientItemsCursor:
   count +=1

Other hints:

create an index on year
use 'select count(*) from ... to get the count for the year' this will probably be optimised on the db.
Remove the aID line if not needed this is converting the first item of the row to a string even though its not used.

Python Playground