PDA

View Full Version : Why does a playlist-only scan require a full database cleanup?



smr888
2008-01-21, 09:11
I do not understand why a playlist-only scan should require a cleanup of the database.

I have been asked not to discuss this in the bug tracker; apparently extremely long, potentially unnecessary scanning is not considered a bug. (Possibly it's a feature?)

My comments from bug 6308 are listed below:

*****

I have four playlists. The scan of those playlists finishes in 28 seconds.

I am then forced to sit through:

"Merge Various Artists" (1547 of them), takes 5:04

"Database Cleanup #1" (31782), takes 1:34

"Database Cleanup #2", might have finished in 2 minutes or so, hard to tell.

So basically 9 minutes to scan my four playlists, with a lot of scans I didn't ask for.

*****

At the risk of trying your patience, can you explain to me why
doing the playlist checks you mention should require [the above listed scans] . . .

From the outside, I would have thought that a user-initiated playlist-only scan would have meant something like this:

"I have created this playlist. Please add it to your list of playlists."


It apparently also seems to mean:

"Please make sure every song in the playlist is present in the database so you can play it."


which is fine, but I would not have thought it _also_ meant

"and while you're at it, conduct an eight minute cleanup of my database."


and I'm not completely sure why it should have to mean that. Especially when those other database cleanup scans take place AFTER the playlist scan (if the messages to the user are to be believed), meaning that the playlist scan doesn't even benefit from scanning a newly clean database, if that was supposed to be the point.

*****

Any help appreciated. I cannot see how these additional database scans and cleanups could under any condition be considered necessary. If you want a full database cleanup, then do a full scan! If you've added new music, do a full scan! (Or a "new music scan".)

But if I just want to create a playlist and add it to my list of playlists, that process should not require anything beyond checking to make sure that the playlist entries are in the database -- should it?

andyg
2008-01-21, 09:16
If a playlist contains items which are not in your database, the playlist scan adds them to the database. This is probably why a cleanup is required.

smr888
2008-01-21, 09:19
Thanks Andy. I will give that a try, and see if that is what is causing this behavior.

smr888
2008-01-21, 09:25
Nope, that is not it. I took an existing playlist, whose entries were already all in the database, chopped it down to four songs, saved it under a different name, and requested a playlist-only scan. The playlist portion, as usual, took about 30 seconds.

Now it is running "Merge Various Artists" -- 1:38

Now it is doing the database cleanups, which will take another five minutes or so.

Is this really necessary?

Siduhe
2008-01-21, 09:33
Andy, I have exactly the same experience as the OP. My system does a full clear and rescan at 4am and I have not added any music today. I use MusicIP, but no changes to the database/cache there either. I also get 2 database cleanups on a playlist only scan as follows (c/p):

Playlist Scan ( of ) Complete 00:00:03
Merge Various Artists (0 of 0) Complete 00:00:13
Database Cleanup #1 (0 of 0) Complete 00:00:20
Database Cleanup #2 (0 of ) Complete 00:00:17
Database Optimize (0 of ) Complete 00:00:14
SqueezeCenter has finished scanning your music collection.Total Time:00:01:07

It's not an issue for me because the scans are done pretty quickly (my playlists are short), but I don't think it can be as a result of new music needing to be added to the database.

kdf
2008-01-21, 09:37
On 21-Jan-08, at 8:25 AM, smr888 wrote:
>
> Is this really necessary?

for the time being, yes it is.
-kdf

smr888
2008-01-21, 11:29
>>for the time being, yes it is.<<

This is the second unhelpful response you've given me, kdf. Would you please not respond to me if you don't have something useful to say? I'd appreciate it. Your responses make me want to stop reporting bugs, and if that is your intention, you are succeeding admirably.

kdf
2008-01-21, 11:41
>
>>>for the time being, yes it is.<<
>
> This is the second unhelpful response you've given me, kdf.

I'm sorry you feel that way. However, it was intended as a very simple
response to a simple question. The process is there for many reasons.
Every check through playlists involves gathering new data (even if the
data is already in the database), which then needs post-scan processing to
sort out where it fits into the database. In time, we may work out ways
to get rid of post scan, but for now it's necessary. It isn't always about
NEW data, but doing as good a job as possible to match the playlist
information with all that has been done to organise the metadata for the
tracks referenced.

In many aspects, it's similar to a "check new and changed" scan, but in
this case only the files listed in playlists and the playlists themselves,
instead of the whole library.

-kdf

smr888
2008-01-21, 11:46
Thank you. Obviously there is more to the process than is dreamt of in my philosophy.

kdf
2008-01-21, 11:57
>
> Thank you. Obviously there is more to the process than is dreamt of in
> my philosophy.

If it helps, I believe the idea is that it should make db access cleaner,
faster and more stable later on.

What I don't know offhand is what the specifics are of phase '1' and '2'.

-kdf

JJZolx
2008-01-21, 13:28
Those aren't additional scans. They're phases of the scanning process.

Are they necessary when no new tracks have been added to the database? Probably not all of them. That could be raised as a bug or enhancement request. Lots of room for optimization.

kdf
2008-01-21, 13:47
>
> Lots of room for optimization.
>
Optimisation is an ingoing process, and not applicable as a bug report
unless there is a specific benchmark to be achieved. A goal without a
specific, measurable result is of dubious value.

The optimise process is simply a call for mysql to "optimise" the tables.
Even with no 'new data', there is manipulation. I would also be curious
as to how much activity the db is seeing not from scans, but from general
use: ratings updates, play counts, this-party plugin activity etc. All of
these would have an effect on the db and may play into the time taken for
mysql to
"OPTIMIZE TABLE tracks;"

However, as a fun investigation, try making schema_optimize.sql an empty
file and see what happens. It should certainly save time, but the result
on the db would probably depend on the contents of the db. This sql file
not only optimises tables, but rebuilds the contributor_albums in response
to bug 4882.

-kdf

smr888
2008-01-21, 13:51
I tried to raise this as a bug (that is, to discuss it within an already reported bug, 6308, that was similar to what I was commenting on) and was told to discuss enhancements and the implementation out here in the forums . . .

JJZolx
2008-01-21, 14:02
Optimisation is an ingoing process, and not applicable as a bug report
unless there is a specific benchmark to be achieved. A goal without a
specific, measurable result is of dubious value.

Not really, but you're entitled to your opinion.

If the server is doing A and B and C and D, while B, C and D are unnecessary in some cases, and taking 80% of the time required, then requesting that those processes be eliminated would seem to be a logical request.

kdf
2008-01-21, 14:08
>

> If the server is doing A and B and C and D, while B, C and D are
> unnecessary in some cases, and taking 80% of the time required, then
> requesting that those processes be eliminated would seem to be a
> logical request.

Eliminating isn't 'optimising'. Removing an entire process is a specific
goal, "make it faster' isn't.

If you want it removed, you can try this your self as I have described.
Feel free to post your results to the already mentioned bug report if you
like.

-kdf

JJZolx
2008-01-21, 14:14
Eliminating isn't 'optimising'.

Tell that to the guy waiting 10 minutes to scan a playlist when it should take 30 seconds.

kdf
2008-01-21, 14:22
>
> kdf;261193 Wrote:
>> Eliminating isn't 'optimising'.
>
> Tell that to the guy waiting 10 minutes to scan a playlist when it
> should take 30 seconds.

I'm referring to your choice of word, Jim, not saying elimination doesn't
have beneficial effects. Simple elimination, however, isn't the case
here, as it is a needed process, and it's far easier to demad change than
to implement it (less accountability in the former). I think it would be
useful to know why the OP's case seems to take noticeably longer than
other posted results, especially given the rather small playlist content.

-kdf

smr888
2008-01-21, 14:50
>>it's far easier to demad change than to implement it (less accountability in the former).<<

I agree with that. But one should consider the user experience. If a scan is described as "playlist-only" when it is actually "NOT tracks, then playlists, then every other housekeeping task we usually do", it is not well described.

Consider further the user preparing for a Christmas party, who is simply trying to get a playlist into SlimServer, and finds himself waiting ten minutes every time he tweaks the playlist and wants to try it again, and you might begin to understand the frustration that this problem engenders.

>>I think it would be useful to know why the OP's case seems to take noticeably longer than other posted results, especially given the rather small playlist content.<<

That is a good question, and I don't know the answer. The other posted result, where the scan phases all took in the tens of seconds, was significantly quicker than mine. Sidhue didn't mention how many tracks he is scanning, though . . . I have something like 31,000, which I take to be a large number. I also have a decently fast machine but by no means a barnburner, and running Win XP, which I believe is probably slower than Unix/Linux for the purpose.

I will mention that upping the database cache size from 10,000 to 100,000 seems to have resulted in the time for the subsequent scans being cut to about four minutes from eight, which is nice . . . but it's still a long time to turn around adding a playlist to the system. In addition, one has to be a bytehead to figure out how to increase the cache size. The system ought to prompt the user about resizing its own cache, or do it dynamically, or SOMETHING that isn't "require the user to find the proper configuration file and edit it."

I still don't completely understand why any further scanning is necessary for a playlist. As a list of tracks, any metadata searches or whatever of the playlists (are there such?) should just look through to the database tracks. I don't know what else you were alluding to when you said:

"Every check through playlists involves gathering new data (even if the data is already in the database), which then needs post-scan processing to sort out where it fits into the database. . . . It isn't always about NEW data, but doing as good a job as possible to match the playlist information with all that has been done to organise the metadata for the tracks referenced."

What is this new data? If the tracks are already in the database, there isn't any new data . . . if they are not, then by all means add them . . .

It sounds like the working assumption at design time was that playlists might contain tracks not already in the database, so the scanner had better handle that on the fly by adding them. IF that's correct, I guess it's nice that it does that, but not so nice if it does it to the cost of all the users like myself, who put together playlists in text files based on what's already on the server, and just want to add them to the Slim database.

JJZolx
2008-01-21, 15:24
I'm referring to your choice of word, Jim, not saying elimination doesn't have beneficial effects.

If eliminating unnecessary steps in a process is not a form of optimizing that process, then I don't know what is.

kdf
2008-01-21, 16:27
>
> kdf;261206 Wrote:
>> I'm referring to your choice of word, Jim, not saying elimination
>> doesn't have beneficial effects.
>
> If eliminating unnecessary steps in a process is not a form of
> optimizing that process, then I don't know what is.

Only if it does any good. Premature optimisation...etc etc. You can
always ask Dean for the full quote. I'm not going to continue debating
with an oversimplification. Please feel free to discuss real details at
any time.

Anyway, I'm probably not the one you really want to discuss this with
anyway. I'm likely not going to be the one to spend time testing,
debugging or fixing anything in the area any time soon.

-kdf

gregklanderman
2008-01-21, 21:49
>>>>> smr <smr888> writes:
> Consider further the user preparing for a Christmas party, who is
> simply trying to get a playlist into SlimServer, and finds himself
> waiting ten minutes every time he tweaks the playlist and wants to try
> it again, and you might begin to understand the frustration that this
> problem engenders.

Using playlists is very clunky right now. Just the fact that you have
to go select "rescan playlists" as you make changes makes it a pain.
Look at the timestamps, if the file changed, re-read it. Of course
you cannot do that today if the playlist scan takes 2-10 minutes. But
if the server is smart about only running those extra steps when a
library change is detected that could effect the result, and doesn't
re-scan playlist files whose timestamp is unchanged, and only re-scans
lazily as you browse into a playlist, it should be completely
seamless.

greg