0% found this document useful (0 votes)
83 views

Use Generators For Fetching Large DB Record Sets

This document discusses a Python recipe for iterating over large result sets from a database cursor. It presents a ResultIter generator that uses the cursor's fetchmany() method to retrieve records from the database in batches, rather than fetching all records at once. This allows iterating over query results without needing to load the entire result set into memory at once. The generator yields records one by one, providing an interface similar to using fetchall() while avoiding memory issues with large result sets.

Uploaded by

IT Hub LK
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views

Use Generators For Fetching Large DB Record Sets

This document discusses a Python recipe for iterating over large result sets from a database cursor. It presents a ResultIter generator that uses the cursor's fetchmany() method to retrieve records from the database in batches, rather than fetching all records at once. This allows iterating over query results without needing to load the entire result set into memory at once. The generator yields records one by one, providing an interface similar to using fetchall() while avoiding memory issues with large result sets.

Uploaded by

IT Hub LK
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

USE GENERATORS FOR FETCHING

LARGE DB RECORD SETS (PYTHON


RECIPE) BY CHRISTOPHER PRINOS
ACTIVESTATE CODE
(HTTP://CODE.ACTIVESTATE.COM/RECIPES
/137270/)

When using the python DB API, it's tempting to always use a cursor's fetchall()
method so that you can easily iterate through a result set. For very large result
sets though, this could be expensive in terms of memory (and time to wait for the
entire result set to come back). You can use fetchmany() instead, but then have to
manage looping through the intemediate result sets. Here's a generator that
simplifies that for you.

Python, 11 lines
1 # This code require Python 2.2.1 or later
2 from __future__ import generators # needs to be at the top of your module
3
4 def ResultIter(cursor, arraysize=1000):
5 'An iterator that uses fetchmany to keep memory usage down'
6 while True:
7 results = cursor.fetchmany(arraysize)
8 if not results:
9 break
10 for result in results:
11 yield result

To iterate through the result of a query, you often see code like this:

# where con is a DB API 2.0 database connection object


cursor = con.cursor()
cursor.execute('select * from HUGE_TABLE')
for result in cursor.fetchall():
doSomethingWith(result)

This is fine if fetchall() returns a small result set, but not so great if the query
result is very large, or takes a long time to return. 'very large' and 'long time' is
relative of course, but in any case it's easy to see that cursor.fetchall() is going to
need to allocate enough memory to store the entire result set in memory at once.
In addition, the doSomethingWith function isn't going to get called until that entire
query finishes as well.

Doing it one at a time with cursor.fetchone() is an option, but doesn't take


advantage of the database's efficiency when returning multiple records for a single
(as opposed to multiple) queries.
To address this, there's a cursor.fetchmany() method that returns the next 'n' rows
of the query, allowing you to strike a time/space compromise between the other
two options. The ResultIter function shown here provides a generator-based
implementation that lets you take advantage of fetchmany(), but still use the
simple notation of fetchall()

ResultIter would be used like so:

...
# where con is a DB-API 2.0 database connection object
cursor = con.cursor()
cursor.execute('select * from HUGE_TABLE')

for result in ResultIter(cursor):


doSomethingWith(result)

This looks similar to code above, but internally the ResultIter generator is chunking
the database calls into a series of fetchmany() calls. The default here is that a
1000 records at a time are fetched, but you can change that according to your own
requirements (either by changing the default, or just using the second parameter
to ResultIter(). As always, trying different values with the profiler is probably a
good idea...performance could vary based on schema, database type, and/or
choice of python DB API 2.0 module.

Tags: database

You might also like