Python Is Slow When Iterating Over A Large List
Solution 1:
This should not be slow with Python native lists - but maybe ODBC's driver is returning a "lazy" object that tries to be smart but just gets slow. Try just doing
allIDRows = list(clientItemsCursor.fetchall())
in your code and post further benchmarks.
(Python lists can get slow if you start inserting things in its middle, but just iterating over a large list should be fast)
Solution 2:
It's probably slow because you load all result in memory first and performing the iteration over a list. Try iterating the cursor instead.
And no, scripts shouldn't be that slow.
clientItemsCursor.execute("Select ids from largetable where year =?", year);
for clientItemrow in clientItemsCursor:
aID = str(clientItemrow[0])
count = count + 1
Solution 3:
More investigation is needed here... consider the following script:
bigList = range(500000)
doSomething = ""
arrayList = [[x] for x in bigList] # takes a few seconds
for x in arrayList:
doSomething += str(x[0])
count+=1
This is pretty much the same as your script, minus the database stuff, and takes a few seconds to run on my not-terribly-fast machine.
Solution 4:
When you connect to your database directly (I mean you get an SQL prompt), how many secods runs this query?
When query ends, you get a message like this:
NNNNN rowsinset (0.01 sec)
So, if that time is so big, and your query is slow as "native", may be you have to create an index on that table.
Solution 5:
This is slow because you are
- Getting all the results
- Allocating memory and assigning the values to that memory to create the list allIDRows
- Iterating over that list and counting.
If execute gives you back a cursor then use the cursor to it's advantage and start counting as you get stuff back and save time on the mem allocation.
clientItemsCursor.execute("Select ids from largetable where year =?", year);
for clientItemrow in clientItemsCursor:
count +=1
Other hints:
- create an index on year
- use 'select count(*) from ... to get the count for the year' this will probably be optimised on the db.
- Remove the aID line if not needed this is converting the first item of the row to a string even though its not used.
Post a Comment for "Python Is Slow When Iterating Over A Large List"