PDA

View Full Version : Performance of SQLite and large databases



Bronx
2011-08-05, 12:33
Hi all

I've been a user of the Squeezebox for 8 years now, Iím a silent reader of the forum. I have all models except the Slimp and Squeezebox 1 in operation. My database is very large and includes over 300,000 tracks. It runs stable and ok on a dedicated dual Pentium D with 3.0 GHz and 1 Giga memory .
I am always cautious with updates. I am running 7.5.4, use the scanner.exe from 7.5.2 due to crashes and I'll only update in a few weeks when 7.6.x goes stable.

I worry about SQLite. Does anyone have experience with such large databases with SQLite? In the beta forum Iíve read somewhere that with very large databases SQLite may run unexpectedly slow and possibly even slower than MySQL? Thank you for your answers.

Sincerely
Daniel

erland
2011-08-05, 12:42
Hi all

I've been a user of the Squeezebox for 8 years now, Iím a silent reader of the forum. I have all models except the Slimp and Squeezebox 1 in operation. My database is very large and includes over 300,000 tracks. It runs stable and ok on a dedicated dual Pentium D with 3.0 GHz and 1 Giga memory .
I am always cautious with updates. I am running 7.5.4, use the scanner.exe from 7.5.2 due to crashes and I'll only update in a few weeks when 7.6.x goes stable.
l
The scanner crashes are probably those that have been fixed in 7.5.6 nightly.



I worry about SQLite. Does anyone have experience with such large databases with SQLite? In the beta forum Iíve read somewhere that with very large databases SQLite may run unexpectedly slow and possibly even slower than MySQL? Thank you for your answers.

Nobody has actually been able to verify it's slower with real testing as far as I know, except from the issue some people currently have with the 7.6 scanner the 7.6 release seems to be generally faster than 7.5.
So I wouldn't worry to much about this, SQLite is a lot better today than it was five years ago.

slate
2011-08-05, 12:54
pallfreeman got 200k http://forums.slimdevices.com/showthread.php?t=89205

For me the 7.6.x fullscan times are 4 times faster than 7.5.x.

If you are cautious then wait for 7.6.1 to be released.

Would you have problems if you tried 7.6.1 today?! you might:
- if you use cue sheets
- use unc paths
- your tags are a mess
- have plenty of playlists; some of which are not updated
-

anyway when you do try pleas get back here and give some information about your library, hardware and scan times before and after

Bronx
2011-08-05, 13:00
Wow, Erland, Slate thanx for the quick response. I'll surely will report back when I dare to use 7.6.x in some days.
Daniel

Wirrunna
2011-08-05, 14:46
Bronx, add another Gb of RAM, it's very cheap. Then set the the new advanced performance option in 7.6.x to use extra memory.
http://forums.slimdevices.com/showthread.php?t=88904&page=3 shows scanning performance of one of my boxes, a mini-itx with a low power dual core AMD X2 235E @ 2.7Ghz, 4Gb RAM and the library on Samsung 5400RPM Eco-Green HD204 2Tb drives, the system on a 500Gb WD Blue 2 1/2 drive, Win XP.

Philip Meyer
2011-08-06, 16:12
>I worry about SQLite. Does anyone have experience with such large
>databases with SQLite? In the beta forum Iíve read somewhere that with
>very large databases SQLite may run unexpectedly slow and possibly
>even slower than MySQL? Thank you for your answers.
>
I worried about the move to SQLite too. Not so much scanning time; I had heard initial reports that in general browsing may be slower for large libraries.

For my library (~40000 tracks - titchy compared to yours!), 7.6.x is only marginally quicker at scanning (about 5 mins), but it's less reliable at the moment.

But so far, it feels like 7.6.x is a bit faster for browsing, which I am much more interested in than scan time.

It may not be specifically the transition from MySQL to SQLite that has been the deciding factor in performance gains, but rather a combination of other changes in the code. i.e. refactoring code, optimising the number of queries made, better file scanning, less scan phases to tidy up DB after files have been scanned.

In my case, I had optimised by own MySQL instance, so it was already faster than the SBS built-in MySQL instance default settings.

But it would be interesting to hear how performance scales over DB size - whether effect on performance is linear or exponential as the number of tracks increases.

Please post your findings when you give it a try.

JJZolx
2011-08-06, 16:49
I'm generating a very large test library right now and will post the results after it's been scanned under both SQLite and MySQL. Maybe sometime tomorrow or Monday.

It won't be possible to compare the overall scan times of SQLite/7.6 to MySQL/7.6, since a clear & rescan with MySQL in 7.6 hangs during the precaching of artwork. It's possible to have the artwork precached afterward, however, by running a new & changed scan.

JJZolx
2011-08-06, 20:12
Ok, the new library has been generated: 1,000,000 tracks (Why mess around, right?)

I'm scanning it now with a 7.6.1 server running SQLite. I'm not certain, but already I may be seeing something. Discovery of the 1M files took just 9:28. Scanning the first 25k tracks took about 10 minutes. Now, at 50k tracks, the time for the scanning new files pass is over 42 minutes. Much slower per track than for my 33k track library, which takes less than 12 minutes total to scan.

If it keeps degrading like this as the database grows larger, it's going to take a _very_ long time to scan the whole library. Maybe longer than I have the patience for.

It will be interesting to see how well SBS with MySQL handles scanning the same library. I may also have to scale back the test to something like 250k tracks.

bluegaspode
2011-08-06, 20:46
Does anyone know if an analyze tables is done after inserting each 50000 tracks (or different count)
Every major database needs this if there are bulk inserts and reads happening at the same time (which I expect the scanner to does?)

erland
2011-08-06, 21:32
Does anyone know if an analyze tables is done after inserting each 50000 tracks (or different count)
Every major database needs this if there are bulk inserts and reads happening at the same time (which I expect the scanner to does?)

As far as I can see it runs ANALYZE(SQLite) or OPTIMIZE TABLE(MySQL) at the end of a full rescan and after every 100 tracks in a new/changed scan.

JJZolx
2011-08-06, 21:56
If it keeps degrading like this as the database grows larger, it's going to take a _very_ long time to scan the whole library. Maybe longer than I have the patience for.

Well, that just plain wasn't going to happen. It would have taken weeks.

Trying again with a 260k track library. I'll run the MySQL scan first this time.

erland
2011-08-06, 22:02
Well, that just plain wasn't going to happen. It would have taken weeks.

Trying again with a 260k track library. I'll run the MySQL scan first this time.

Just out of interest, do you have the "Database Memory Config" set to "Normal" or "High" ?

With a library of this size, it's probably recommended to set it to "High".

JJZolx
2011-08-06, 22:04
Just out of interest, do you have the "Database Memory Config" set to "Normal" or "High" ?

With a library of this size, it's probably recommended to set it to "High".

Yes, it's set to High.

Philip Meyer
2011-08-06, 23:56
>Does anyone know if an analyze tables is done after inserting each 50000
>tracks (or different count)
>Every major database needs this if there are bulk inserts and reads
>happening at the same time (which I expect the scanner to does?)

One suggestion during "new schema" discussions, was to have two phases - 1st phase to read all tags for each song into a songs table, and 2nd phase to then process that content.

If that were the case, the scanner could delete/disable all indexes whilst bulk inserting songs during first scan phase, and then create indexes at the end, as it wouldn't need to read any content back until then.

Then work out distinct albums with compilation flags, genres, years, etc by querying the song tags table.


I'm not sure how the scanner works now, but assume it needs to read content back out of the DB when deciding how to process each file scanned. As it looks like there aren't any tidy-up phases any more, it must be running queries or reading content back out to determine if adding a new song to an album, whether the album should become a compilation, etc.

If that is the case, the scanner is going to become increasingly inefficient as the number of songs increases.

erland
2011-08-07, 01:21
I'm not sure how the scanner works now, but assume it needs to read content back out of the DB when deciding how to process each file scanned. As it looks like there aren't any tidy-up phases any more, it must be running queries or reading content back out to determine if adding a new song to an album, whether the album should become a compilation, etc.

As far as I can see from the code it does some reading, but to know for sure one would have to enable logging on the database.



If that is the case, the scanner is going to become increasingly inefficient as the number of songs increases.

I have a generated library with 50 000 very small FLAC files and I can scan that within 10-15 minutes.

I also did some test with a generated library with same size as my normal FLAC library with 3500 tracks and the result were:
- Real library (91 GB): full rescan 92 seconds
- flactruncate.pl library (31 MB): full rescan 32 seconds
- My random generated flac library (15 MB): full rescan 25 seconds

So the scanning time is greatly affected by the file size. If the FLAC file size results in a linear performance loss related to library size, it would mean that a 50 000 FLAC library would take about 25-40 minutes in my setup.

I'm not sure if JJZolx is using a generated library with full size FLAC files or if it's a library with small FLAC files.

One thing that should be mentioned is that my generated library doesn't have any album covers, so if the performance is somehow related to artwork handling that won't be seen in my tests, at least not until I've added album covers to the test library.

If there is an interest, I'll try to run some more tests in my environment with different library sizes. However, I can't run realistic tests of scanning performance because of limited disc space to store full size FLAC files, but I can run tests to compare SQLite/MySQL with small FLAC files or to compare browsing performance which should be independent of FLAC file size.

The only issue is that I only have a virtual environment, so it's not going to be 100% accurate and comparable with a real environment.

arztde
2011-08-07, 01:39
I like to point to an exact comparison of the MySQL and SQLite in the German Forums. Maybee MrFloppy can give some comments in English here.

After Compare the 2 database the result is clear for the SQLite.

http://forums.slimdevices.com/showthread.php?t=89118

JJZolx
2011-08-07, 09:36
Here are the results of the first test with a 260,000 file library. The Flac files are all 10 seconds of silence, then tagged exactly as my real Flac library. What I did to generate the library was to duplicate the albums in my 32.7k file library multiple times, changing the album and folder name each time by appending a number 01, 02, etc. All of the tagging is identical otherwise. Covers were also copied to the library's folders.

These times were from the initial scans launched by starting up the server with an empty cache folder and (in the case of MySQL) an empty database. Both scans probably benefited from system caching, since I ran them soon after generating the library itself. I ran the MySQL scan first, so the SQLite scan, if anything, would have benefited more from caching.

Instead of giving total scan time, I'll list the time it took for the 'Discovering Files' pass and the 'Scanning new files' pass at 1/4, 1/2 and all the way through the library. This is mostly because the MySQL scan must be aborted during the precaching of artwork, so a total time can't be had, but it also shows how both slow down as the library database grows, but SQLite it much more affected.



MySQL
------------------------------------
Discovering files 1:59
Scanning new @ 65k files 5:40
Scanning new @ 130k files 12:17
Scanning new @ 260k files 26:12

SQLite
------------------------------------
Discovering files 1:40
Scanning new @ 65k files 9:08
Scanning new @ 130k files 26:49
Scanning new @ 260k files 1:12:17


Second run, this time running full scans from the Basic Settings page in the web ui. Similar results.



MySQL
------------------------------------
Scanning new @ 260k files 27:11

SQLite
------------------------------------
Scanning new @ 260k files 1:15:41


I should note that the above results do not jive with what I see when scanning my 32.7k file library, where SQLite is faster.

erland
2011-08-07, 11:52
These times were from the initial scans launched by starting up the server with an empty cache folder and (in the case of MySQL) an empty database. Both scans probably benefited from system caching, since I ran them soon after generating the library itself. I ran the MySQL scan first, so the SQLite scan, if anything, would have benefited more from caching.

Would it be easy for you to run the one of the tests also with 7.5.6 ?
Reason being that it would be a bit interesting to know how 7.5.6 with MySQL compares with 7.6.1 with SQLite.

If it's easy to repeat the tests, it would also be interesting to see how much difference the "Database Memory Config" parameter in SBS Settings/Advanced/Performance makes.

I think I'll try to run the 50k and 100k test libraries I published earlier in my setup to see if I get similar result as you do.

Finally, are your tests on Windows, Linux or OSX ? I'm guessing it's Windows ?

Philip Meyer
2011-08-07, 12:48
Here are the results of the first test with a 260,000 file library.

I've plotted those figures on a graph.

Interesting - it shows that SQLite isn't linear - performance gets slightly worse as the library size increases, whereas MySQL is very close to linear.

Obviously, this is a small sample size ;-)

This does seem to match my experience - 40,000 songs, MySQL and SQLite scan times about the same.

Philip Meyer
2011-08-07, 13:10
Is it possible to time how long it takes to browse the library using SQLite and MySQL engines, or is it not possible to get a scan to complete to a satifactory level with MySQL DB? i.e. if artwork was turned off, for example?

Phil

kappclark
2011-08-07, 16:23
Is it possible to split the library up ?? It may be convenient to keep that many tracks in one folder, but why not split them up ?

This may not be possible (I have nevr tried it, but then again, I only have a 65GB library!)

erland
2011-08-08, 15:13
I've done some test on my Ubuntu box and compared SQLite with MySQL and my results are a bit different than JJZolx.

I've cleared the file system cache before each test and I've executed each test case twice to ensure I get the same performance. All the tests started with an empty Cache directory and I started to server and let it do a full rescan. The tests has been run in a virtual environment but since I executed them twice I feel that they are repeatable and due to this should be fairly accurate.

The library I used was the generated 50000 track test library I've posted about earlier in the beta forum. It contains very small FLAC files which tags has been generated from a discogs database dump. All music files were stored on a local drive. Statistics from library is:

Total Tracks: 49,808
Total Albums: 4,313
Total Artists: 7,495
Total Genres: 13

The tests were executed on 7.6.1 r33004 (which contains a new SQLite version but still haven't incorporated the latest WAL related changes Andy has done)

Anyway, the results are as follows (Discovering files + Scanning new files):

MySQL (High database memory setting)
* Discovering files: 00:00:33
* Scanning new files: 00:34:26

MySQL (Normal database memory setting)
* Discovering files: 00:00:34
* Scanning new files: 02:23:09

SQLite (High database memory setting)
* Discovering files: 00:00:30
* Scanning new files: 00:10:51

SQLite (Normal database memory setting)
* Discovering files: 00:00:30
* Scanning new files: 00:16:28

As you can see, my results are very different from the similar results JJZolx saw with 65 000 tracks. In all my tests SQLite is a lot faster than MySQL which is completely opposite to what JJZolx got earlier in his tests on Windows.

Another thing that's clear is that the "Database Memory Config" in SBS Settings/Advanced/Performance makes a big difference, especially on MySQL.

A quick analysis indicates that the difference could be:
- Linux vs Windows
- SQLite version (I think I used a newer version than JJZolx)
- MySQL version (I used the same as in 7.5, not sure what JJZolx uses)
- Tagging difference (my library is tagged differently than JJZolx)
- FLAC size difference (my FLAC files are a bit smaller than JJZolx)

I plan to run a few more tests to compare with different library sizes to see if I see the same issues where it slows down with a larger library as JJZolx saw in his tests. Will get back to more information about this as soon as I've executed the tests.

Also, the artwork precaching phase does work in my Linux setup with MySQL in the above tests but I didn't include it in these test results as I wanted the results too be comparable with those made by JJZolx. The time of the pre-caching phase seems to vary a lot between different test runs, not sure exactly why, but it varies between 2-6 minutes in time in my tests with exactly same configuration, the variation happens both with SQLite and MySQL so it doesn't seem to be database engine related.

JJZolx
2011-08-08, 15:20
As you can see, my results are very different from the similar results JJZolx saw with 65 000 tracks. In all my tests SQLite is a lot faster than MySQL which is completely opposite to what JJZolx got earlier in his tests on Windows.

No, I did not test with a 65,000 track library. The library had 260,000 tracks. The results I posted above showed the elapsed time at different points in the scan.

I see results similar to yours when the library is small.

Philip Meyer
2011-08-08, 15:30
>A quick analysis indicates that the difference could be:
>- Linux vs Windows
>- SQLite version (I think I used a newer version than JJZolx)
>- MySQL version (I used the same as in 7.5, not sure what JJZolx uses)
>- Tagging difference (my library is tagged differently than JJZolx)
>- FLAC size difference (my FLAC files are a bit smaller than JJZolx)

For 7.5.x, I run my own instance of MySQL. I optimised the settings a bit, which seemed to make quite a difference compared to the built-in MySQL instance settings. I think Jim has a similar setup.

How did you change the SQLite version? I thought this was built-in to SBS?

erland
2011-08-08, 15:52
>A quick analysis indicates that the difference could be:
>- Linux vs Windows
>- SQLite version (I think I used a newer version than JJZolx)
>- MySQL version (I used the same as in 7.5, not sure what JJZolx uses)
>- Tagging difference (my library is tagged differently than JJZolx)
>- FLAC size difference (my FLAC files are a bit smaller than JJZolx)

For 7.5.x, I run my own instance of MySQL. I optimised the settings a bit, which seemed to make quite a difference compared to the built-in MySQL instance settings. I think Jim has a similar setup.

I used the "High" setting in "Database Memory Setting" parameter in SBS Settings/Advanced/Performance, but maybe you have done even more optimization ?

I used the MySQL version bundled with 7.5.6, I think it's 5.0.21 if I remember correctly.



How did you change the SQLite version? I thought this was built-in to SBS?

It is, Andy changed it in 7.6.1 r33004

erland
2011-08-09, 22:00
No, I did not test with a 65,000 track library. The library had 260,000 tracks. The results I posted above showed the elapsed time at different points in the scan.

I see results similar to yours when the library is small.

Do you use any customized settings in your MySQL instance besides those in the "High" Database Memory Config ?
If so, could you share them ?

Which MySQL version are you using ?
I'm using MySQL 5.0.21.

I just tried with a 250 000 track library (with 7.6.1 r33004) and SQLite took about 4 hours for a full scan in my environment, MySQL is in progress but about 6 hours after it has started it has still just reached 100 000 out of 250 000 tracks, so it's clearly slower than SQLite also in a large library in my setup.

Maybe it's just Linux vs Windows but it feels a bit strange that we get such opposite results if it's just the operating system that differs.

Maybe it could be the CPU ?
My virtual machine only have access to one of the CPU cores and I'm guessing you have a multi core CPU which probably is an advantage with MySQL both since it runs in a separate process and also because it's probably more optimized for usage with a multi core CPU than SQLite.

If our MySQL configuration is the same, maybe I'll try to run the 250 000 library directly on the laptop with a Core2Duo P8400 CPU to see if I see the same thing in a non virtual environment with a dual core CPU.

Besides Linux I have also have a Windows Vista installation on the laptop so I could try how it looks in both Ubuntu and Vista on the same hardware but I'm not sure that's worth anything as I think Windows Vista is generally considered to be a lot slower than both XP and Windows 7 which it's probably more likely that people using Windows are using.

erland
2011-08-10, 03:08
Maybe it could be the CPU ?
My virtual machine only have access to one of the CPU cores and I'm guessing you have a multi core CPU which probably is an advantage with MySQL both since it runs in a separate process and also because it's probably more optimized for usage with a multi core CPU than SQLite.

It's not the CPU at least not with the "High" MySQL configuration provided with SBS. I just tried it on the Core2Duo laptop (still on Linux and 7.6.1 r33004 with MySQL 5.0.21) and I'm seeing the same thing as in the single core virtual environment, MySQL is is a lot slower than SQLite.

It's only using one of the cores, both with SQLite and MySQL, so number of cores doesn't seem to have any effect, at least not as long as you don't use the system for other things while scanning.

Bronx
2014-12-09, 09:19
Hi all

Just for information:

I'm a user with a large database consisting of 400k tracks. I was stuck on 7.5.6 due to slow performance of SQLite. I just updated to 7.9 and I can say WOW! Thank you thank you thank you!!!

Scanning times (all with SSD, Intel Core Duo 2.5 ghz, 3 gb RAM):
7.5.6 delete and rebuild : about 12 hours / later versions with SQlite: 24+ hours (high memory setting)
7.5.6 new and changed tracks: about 1 hour / later versions with SQlite: 3 hours


7.9: delete and rebuild : about 3 1/1 hours (max. memory setting), a quarter of 7.5.6 with mysql
7.9: new an changed tracks: about 1/2 hour, half of 7.5.6 with mysql

Interface is snappy and full text search is incredible, with under 1 second!

I'm very happy. Thank you Michael!

Bronx
2014-12-09, 09:25
...sorry, LMS 7.9 running under Windows 7 Home Premium.

Cheers Bronx