Skip to content
Blog
Go back

Benchmarking the Python SQLite3 Connection

We have several SQLite calls in our pipeline that look something like this

for file in file_list:
    conn = sqlite3.connect(database)
    c = conn.cursor()
    command = 'SELECT * FROM table'
    c.execute(command)
    conn.close()

I originally wrote our database calls this way because at the time (this was about 2 years ago), I was worried about the code crashing with an open database connection. I thought this would be vaguely “bad” so I was over enthusiastic about keeping connections closed. The design pattern stuck.

But someone pointed out to us that it would be faster to do something like this:

conn = sqlite3.connect(database)
c = conn.cursor()
for file in file_list:
    command = 'SELECT * FROM table'
    c.execute(command)
conn.close()

The reason being that there is some overhead involved when opening a Python SQLite3 connection. When you loop over the connection opening/closing steps you multiply this overhead.

My officemate went ahead and tested the overhead for running both design patterns over a test set of queries. Based on our results you can infer a connection overhead of 0.0134s. I took that data and assumed a linear increase for the overhead time for each additional query to extrapolate the results up to scale of our filesystem. The results are below. The first row are the real results and the next two are the extrapolated predictions.

MultipleSingleDeltaDelta %RecordsFactorNotes
0:02:230:02:050:00:1812.59%93001wfc3g flt files
0:19:130:16:480:02:2512.59%750008.0645All flt files
3:31:253:04:480:26:3612.59%82500088.7097All fits files

As you can see, we achieve a 13% speed up, which if we are querying the entire database saves us almost 30 min. Since we are running this process overnight this isn’t a huge deal. Going forward it would be smart to use the faster form. However, I’m not sure if it’s worth it to go back and fix our old SQLite calls as we have more pressing issues.

All in all, this was an interesting exercise and will probably lead to some more detailed profiling of our pipeline in the future.


Share this post on:

Previous Post
Pain with the Windows Command Prompt
Next Post
Hipster Passwords