When SQLite Row Count Made No Sense

A small application that I wrote for some personal purposes was recently facing too many "database locked" errors. Yes, it was using SQLite, but there wasn't any serious concurrent access happening. So, instead of outright making my program "scalable" or going for something like PostgreSQL, I decided to inspect the database file first.

First of all, the file was suspiciously large, measuring at 553 MiB. People say SQLite is capable of dealing with larger databases, but there was no reason for my database to be that big in the first place. So, the next move was to vacuum it, which failed saying "Error: stepping, database disk image is malformed". Naturally, I performed a recovery (took three minutes):

time sqlite3 main.corrupt.sqlite3 '.recover'|sqlite3 main.sqlite3

For some reason, the new file was even bigger (599 MiB). Vacuuming a newly constructed file is most probably meaningless, but I ran the command, taking another three minutes and making no change.

By this point, I had started suspecting that the database contained some spurious data. After all, my program was quick-and-dirty, offloading many checks and de-duplication to the database engine. So, I started inspecting the schema and each table, coming across this anomaly:

$ sqlite3 main.sqlite3 'select count(*) from wiki_url_by_nseid_imported;'
8556330

$ sqlite3 main.sqlite3 'select count(nseid) from wiki_url_by_nseid_imported;'
580

If you are wondering what the anomaly is, let it be known that nseid is the primary key of the table in question. How can the result of a row count by the primary key be any different from that of a plain row count? If the table has a primary key, each row should have a unique non-null value for it, meaning the counts cannot differ, even by one.

For a moment, I thought I had come across an unknown bug. Then I dug up this from the official documentation, a peculiarity I had forgotten:

According to the SQL standard, PRIMARY KEY should always imply NOT NULL. Unfortunately, due to a bug in some early versions, this is not the case in SQLite. Unless the column is an INTEGER PRIMARY KEY or the table is a WITHOUT ROWID table or a STRICT table or the column is declared NOT NULL, SQLite allows NULL values in a PRIMARY KEY column. SQLite could be fixed to conform to the standard, but doing so might break legacy applications. Hence, it has been decided to merely document the fact that SQLite allows NULLs in most PRIMARY KEY columns.

So yes, it's a bug, but it's known and has been deemed behavior. Apparently my program has a bug too (which tries to insert rows without a non-null primary key), which combined with the SQLite behavior resulted in those millions of spurious rows.

After deleting the NULL rows and vacuuming once again, the database file shrunk from 599 MiB to a mere 7.9 MiB. Now I have to add some explicit constraints to the database, find the issue in my program, and fix it.

So it always makes sense to inspect what you already have, instead of "scaling up" right away introducing new issues and keeping the old ones.

P.S.: While this write-up was about a problem in SQLite, it is actually one of the most robust pieces of software out there when used correctly and in the right context. Have a look at how it is tested.


Tags: sqlite, bugs, database, experience

Read more from Nandakumar at nandakumar.org/blog/