I finally had time to run a few benchmarks. I wanted to measure how the different back-ends compared to each other in query performance, memory consumption and disk usage. The back-ends being compared are SQLite (the current back-end used in xmms2-devel), S4 mmap (the “old” S4 back-end) and S4 midb (a new experimental S4 back-end).
Just to cover my back: I do not have much experience with benchmarks, and I am in no way an objective observer (slight bias towards S4, me?). If you see something that seems wrong, let me know; I probably made a mistake.
First we will look at query performance. This is perhaps the factor that is most visible to the end user. The time measured is the time spent inside the xmms_collection_client_query_infos. It was measured by running gettimeofday at the beginning and end of the function and printing the difference in microseconds. I also used a slightly modified nyxmms2 that uses xmms_coll_query_infos instead of xmms_coll_query_ids.
So, to the results. The table shows the average time used in the query (calculated by where is the time used on the i’th run) and the sample standard deviation (calculated by and then taking the square-root to get S). n was 10 for all benchmarks. The last column is the average time that SQLite used divided by the average time S4 midb used.
||SQLite / midb
|“tracknr:4” OR “artist~foo” AND NOT “artist:weezer”
As we can see S4 is faster in all cases, ranging from about 2 to over a 1000 times faster. Why “one” is so much slower on SQLite I don’t know, could be inefficient SQL query, could be something else. Also worth noting is that S4 is mostly bound by the size of the result set. Queries resulting in small result sets (“one” for example) gives short query times. I’ll get back to why later.
To get good performance S4 trades memory for speed. The big question is if the trade-off is worth it. For now we will settle for finding out exactly how much memory the different back-ends use. Memory consumption was measured by using massif, a valgrind tool. The back-ends were run two times, once by just starting up and shutting down and another time with a query for “*” before shutting down. That way I hope to visualize idle memory usage versus usage when searching. All numbers are in MiB.
As we can see both S4 implementation adds about 9 MiB when searching compared to the 7 MiB SQLite add. Also it may seem like S4 with mmap is the big winner here, but what massif does not show is shared memory, the kind mmap uses. If that had been counted in we would have added about 22 MiB to S4 mmap, bringing it up to about the same memory usage as S4 midb. So with a media library with 9,361 entries S4 uses a little over twice the memory SQLite uses at idle.
Finally, the one most people probably will not care about with today’s abundance of disk space: disk usage. It’s simply the size of the datafile.
In terms of performance and disk space S4 (especially the midb version) is the clear winner, but it eats up about twice as much memory compared to SQLite. Fair tradeoff? Maybe for a 10,000 entry media library, but what about one with 100,000 entries? We would probably see memory usage around 200 MiB.
A note on S4 query performance
As I said earlier S4’s query speed is mostly bound by the size of the result set. To see why we have to dig a little. Running XMMS2 in Callgrind (another Valgrind tool) is a nice way to find bottlenecks and hotspots. Opening the generated callgrind.out file in KCachegrind reveals a few interesting things:
As we can see xmms_medialib_query_ids (the function calling s4_query) only contribute 3.64% of the running time of the query. The rest of the time is spent fetching the properties (artist, album, …) of the entries we found in the query and inserting them into the dictionary we return. We see that _entry_propery_get_ is called 28,000 times. The media library this was run on has about 500,000 relationships (a relationship is an entry-to-entry mapping, for example (“songid”, 1) -> (“title”, “One”)). Using B+ leaves with room for 101 entries and a fill rate of 50% (it was measured to be around 53%, a bit away from the normally assumed 67% for B+ trees) this gives about 10,000 B+ leaves. A simple traversal of the 10,000 leaves would probably be faster than calling _entry_property_get_ 28,000 times. An observation well worth taking into account when we design the S4 query layer.