How to get a row-by-row MySQL ResultSet in python

Tags: mysql python
By : itsadok
Source: Stackoverflow.com
Question!

MySQL ResultSets are by default retrieved completely from the server before any work can be done. In cases of huge result sets this becomes unusable. I would like instead to actually retrieve the rows one by one from the server.

In Java, following the instructions here (under "ResultSet"), I create a statement like this:

stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
              java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);

This works nicely in Java. My question is: is there a way to do the same in python?

One thing I tried is to limit the query to a 1000 rows at a time, like this:

start_row = 0
while True:
    cursor = conn.cursor()
    cursor.execute("SELECT item FROM items LIMIT %d,1000" % start_row)
    rows = cursor.fetchall()
    if not rows:
        break
    start_row += 1000
    # Do something with rows...

However, this seems to get slower the higher start_row is.

And no, using fetchone() instead of fetchall() doesn't change anything.

Clarification:

The naive code I use to reproduce this problem looks like this:

import MySQLdb

conn = MySQLdb.connect(user="user", passwd="password", db="mydb")
cur = conn.cursor()
print "Executing query"
cur.execute("SELECT * FROM bigtable");

print "Starting loop"
row = cur.fetchone()
while row is not None:
    print ", ".join([str(c) for c in row])
    row = cur.fetchone()

cur.close()
conn.close()

On a ~700,000 rows table, this code runs quickly. But on a ~9,000,000 rows table it prints "Executing Query" and then hangs for a long long time. That is why it makes no difference if I use fetchone() or fetchall().

By : itsadok


Answers

I found the best results mixing a bit from some of the other answers.

This included setting cursorclass=MySQLdb.cursors.SSDictCursor (for MySQLdb) or pymysql.cursors.SSDictCursor (for PyMySQL) as part of the connection settings. This will let the server hold the query/results (the "SS" stands for server side as opposed to the default cursor which brings the results client side) and build a dictionary out of each row (e.g. {'id': 1, 'name': 'Cookie Monster'}).

Then to loop through the rows, there was an infinite loop in both Python 2.7 and 3.4 caused by while rows is not None because even when cur.fetchmany(size=10000) was called and there were no results left, the method returned an empty list ([]) instead of None.

Actual example:

query = """SELECT * FROM my_table"""
conn = pymysql.connect(host=MYSQL_CREDENTIALS['host'], user=MYSQL_CREDENTIALS['user'],
                          passwd=MYSQL_CREDENTIALS['passwd'], charset='utf8', cursorclass = pymysql.cursors.SSDictCursor)
cur = conn.cursor()
results = cur.execute(query)
rows = cur.fetchmany(size=100)
while rows:
    for row in rows: 
        process(row)
    rows = cur.fetchmany(size=100)
cur.close()
conn.close()
By : Garren


Try to use MySQLdb.cursors.SSDictCursor

con = MySQLdb.connect(host=host,
                  user=user,
                  passwd=pwd,
                  charset=charset,
                  port=port,
                  cursorclass=MySQLdb.cursors.SSDictCursor);
cur = con.cursor()
cur.execute("select f1, f2 from table")
for row in cur:
    print row['f1'], row['f2']


Did you try this version of fetchone? Or something different?

row = cursor.fetchone() 
while row is not None:
    # process
    row = cursor.fetchone()

Also, did you try this?

 row = cursor.fetchmany(size=1)
 while row is not None:
     # process
     row = cursor.fetchmany( size=1 )

Not all drivers support these, so you may have gotten errors or found them too slow.


Edit.

When it hangs on execute, you're waiting for the database. That's not a row-by-row Python thing; that's a MySQL thing.

MySQL prefers to fetch all rows as part of it's own cache management. This is turned off by providing a the fetch_size of Integer.MIN_VALUE (-2147483648L).

The question is, what part of the Python DBAPI becomes the equivalent of the JDBC fetch_size?

I think it might be the arraysize attribute of the cursor. Try

cursor.arraysize=-2**31

And see if that forces MySQL to stream the result set instead of caching it.

By : S.Lott


This video can help you solving your question :)
By: admin