PDA

View Full Version : Trackstat: Unrated tracks appear as rated with 1 star



chaug
2014-10-05, 08:12
I just realized that I don't have any more unrated tracks although there should be hundreds or even thousands of them. It looks like all previously unrated tracks are now rated with 1 star (on a 10 star scale or 0.4 on the five star scale). The problem with this is that I can no longer exclude low rated tracks from playing (because many - if not most - of the low rated tracks are not "bad").

I suspect that the issue might be related to me changing the rating scale from 5 star to 10 star a couple of weeks ago, but I'm not sure.

In any case, I'd like to tell trackstat to change all 1 star ratings to unrated. Is it possible?

chaug
2014-10-05, 08:35
OK, I need to qualify this and perhaps change my question. I said that I have no more unrated songs because when I selected Trackstat -> Not rated -> Not rated songs in the webinterface, I got an ampty list (after almost a minute of waiting). It turns out that that list is not empty but the connection to the server timed out or sumething. I noticed that there are unrated songs since they show "Unrated" in their context menu. So I ran the "Trackstat -> Not rated -> Not rated song" thing again, this time on my SB Touch which had the advantage that it actually told me that it lost connection to the server and allowed me to reconnect. After several tries it did reconnect and showed me a list of unrated songs. However, that list is rather strange, since 90 percent of the first 200 entries are the same song. In addition all of the songs have a playcount of 2 or more, which doesnt make sense since I have Trackstat Autorating turned on ever since I use it so that there should not be any unrated song with a playcount greater than 0.

So, in other words, I might say that my strackstat database seems to be in a mess and I wonder if there is any way to edit it manually, e.g. in an excel file or so in order to fix at least the most obvious errors?

chaug
2014-10-05, 09:36
I have managed to open the latest trackstat backup file in excel (just drag it into excel!). It takes a couple of minutes until Excel displays the 20 MB xml file as a table, but it does. And what I see is that although I have about 9000 songs in my library, the trackstat database has more than 65000 entries. Many songs have multiple entries (I saw one that is represented 250 times!) with the exact same data. From what I can see when I look at the file size of the nightly trackstat backups, this magic increase of songs happens incrementally every day as the file size increases between 50 and 300 kb every day, even on days when I was on vacation and not a single song had been played, added or deleted.

So let me modify my question once again: if I ever manage to clear up that mess manually in Excel, will trackstat be able to re-import the resulting xml file?

And, of course: how can the duplication of tracks in trackstat be avoided in the future?

erland
2014-10-05, 14:06
Do you know if you have musicbrainz tags on your music files ?

MusicBrainz change philosophy a couple of years ago so the same song on multiple albums got the same identity and this can cause duplication of entries in the TrackStat database tables. There is an "Enable musicbrainz tags" option in TrackStat that can be used to disable the musicbrainz related logic if this is what's causing the problem.

To get rid of the duplicates, I think it should be enough to use the "Remove all data" option in TrackStat settings page in the "Backup/Restore/Clear" section and after that restore the TrackStat backup. The restore process happens in the background and can take some time but it should only import one of the duplicate entries. If you want to be sure you can get back to the current situation it might be a good idea to shutdown LMS and take a backup of the library.db and persist.db file in the LMS Cache directory before you clear and restore the TrackStat data.

chaug
2014-10-05, 16:55
Do you know if you have musicbrainz tags on your music files ?

MusicBrainz change philosophy a couple of years ago so the same song on multiple albums got the same identity and this can cause duplication of entries in the TrackStat database tables. There is an "Enable musicbrainz tags" option in TrackStat that can be used to disable the musicbrainz related logic if this is what's causing the problem.

I have musicbrainz tags enabled and have gone into pains of putting musicbrainz ids on as many of my tracks as possible (unfortunately I've not managed to figure out how I can submit those ones that are not yet in the MB database, so I still have some without, but most of the duplicates that I mentioned above have an MB ID (that's how I identified them: I sorted the table by MB IDs).



To get rid of the duplicates, I think it should be enough to use the "Remove all data" option in TrackStat settings page in the "Backup/Restore/Clear" section and after that restore the TrackStat backup. The restore process happens in the background and can take some time but it should only import one of the duplicate entries. If you want to be sure you can get back to the current situation it might be a good idea to shutdown LMS and take a backup of the library.db and persist.db file in the LMS Cache directory before you clear and restore the TrackStat data.

So you are saying that it I can go ahead and edit the XML file in Excel, save it and import it in Trackstat? Great! But since this will be a couple of hours of work (i.e. deleting about 56000 duplicates), I'd like to make sure first that TrachStat is not going to then start producing duplicates again. It seems to me that it is currently doing so every night (or every night it becomes visible in the trackstat backups) since the size of these backups is growing steadily every day. Why should that behaviour stop once I import the the cleaned backup? I sense I need to do something with the plugin too...

erland
2014-10-05, 22:09
So you are saying that it I can go ahead and edit the XML file in Excel, save it and import it in Trackstat? Great! But since this will be a couple of hours of work (i.e. deleting about 56000 duplicates), I'd like to make sure first that TrachStat is not going to then start producing duplicates again. It seems to me that it is currently doing so every night (or every night it becomes visible in the trackstat backups) since the size of these backups is growing steadily every day. Why should that behaviour stop once I import the the cleaned backup? I sense I need to do something with the plugin too...

If you have musicbrainz tags and is affected by the new behaviour of musicbrainz tags, you need to disable the "Enable musicbrainz tags" option in TrackStat settings page.

After this, you should be able to:
1. Shutdown LMS and take a backup of the persist.db and library.db files in the LMS Cache directory (just as an extra precaution in case something goes wrong)
2. Start LMS
3. Goto TrackStat settings page to the "Backup/Restore/Clear" section and click "Remove all data"
4. Specify the full path to the backup file (the one with all the duplicates) in the "Backup file" field
5. Click "Restore from file"
6. Wait, this can take a few hours in a long library. After the restore operation is finished you will get a log entry in the LMS server.log saying that restore operation is finished.

The whole process is described on TrackStat documentation page:
- Clearing data: http://wiki.slimdevices.com/index.php/TrackStat_plugin#Issues_with_duplicate_musicbrainz _tags
- Restoring backup: http://wiki.slimdevices.com/index.php/TrackStat_plugin#Restore_statistics_after_upgrade. 2Freinstall

Finally, just to be clear, you don't need to remove any duplicates from the backup file, the restore process should make sure it only restores one of the duplicate entries.
It's important that you disable the "Enable musicbrainz tags" option, else you are very likely going to get new duplicates again every day.

chaug
2014-10-06, 14:25
Yes, I can follow those instructions, but I'm still not sure whether this is getting to the source of the problem. The thing is that my duplicate MB tags are not due to the same song existing on different records. Maybe I got some of those, but the vast majority of what is blowing up my trackstat database are duplicate records of the exact same same file. The one with the most duplicates actually had 652 dupes, and there were many more with hundreds of duplicates.

I have discovered a convenient feature in Excel that actually allows you to remove duplicates with just a few clicks. I did that and it removed 45000 duplicates (and I defined a duplicate as having the exact same data in each and every column of the table). After that cleanup, I still have 20000 records left,, while I have around 9000 tracks on my server. The largrest number of duplicate MB tags for any one track is now 7. The reason why there are still duplicate MB tags left is that those records differ in some other aspect, such as "date added", "last played", "playcount" or "rating". But not (except some very few exceptions) the URL (i.e. the file). I can see that by having excel highlight all duplicate values within a specific column.

As far as I can see, this means two things: one is that my (main) problem is not the one described at http://wiki.slimdevices.com/index.php/TrackStat_plugin#Issues_with_duplicate_musicbrainz _tags

And second, I will loose quite a lot of data if I let trackstat do the cleaning up by following by following the instructions at http://wiki.slimdevices.com/index.php/TrackStat_plugin#Restore_statistics_after_upgrade. 2Freinstall because (I suppose) trackstat cannot reconcile the different "playcounts", "last played dates" or "ratings" that its database contains for identical files (not just identical MB tags, which will be gone by then).

As regards the first point, I would like to understand what the root of the problem is. Unfortunately, I have not the slightest clue.

As regards the second point, I can at least make a suggestion for an alternative way of getting rid of the duplicates: I guess I need to find a way to merge (rather than delete) the remaining duplicates in my backup file, i.e. to tell Excel that it should keep the latest "played" date, the highest "rating", and add up all the playcounts, or something like that. Or can Trackstat do that?

P.S. One quick comment on the "Issues with duplicate musicbrainz tags (http://wiki.slimdevices.com/index.php/TrackStat_plugin#Issues_with_duplicate_musicbrainz _tags)": why is this an issue anyway? As far as I understood things, the Musicbrainz policy is that "duplicate" IDs will only be given to identical recordings. Which makes sense, because it simply is a duplicate and I thought the whole point of enabling Musicbraniz tags in Trackstat was that it will make Trackstat recognize these duplicates as such and treat the same song as the same, even when it exists in two files, e.g. one on the original album, and another one in some compilation so that if I rate the track on the original album, the same rating will be applied to the same song in the compilation. I'd consider this a feature, not a bug, as they say.

erland
2014-10-06, 21:29
As regards the first point, I would like to understand what the root of the problem is. Unfortunately, I have not the slightest clue.

It's probably easiest if you shutdown LMS and send me a zip with the library.db and persist.db files from your setup, you will find them in the LMS cache directory. This way it's possible for me to look at your database and see if I can see what's going on.

Alternatively, you could install the free "Database Query" plugin and run its "TrackStat inconsistency/problems" report and post the result. I'm just a bit afraid that it can cause more problems, because it can take a while to run and I know some user managed to corrupt their whole LMS database by aborting it in the middle of the run. If you try it, please make sure you have shutdown LMS and taken a copy of the library.db and persist.db files before, so you can restore them if something goes wrong.



As regards the second point, I can at least make a suggestion for an alternative way of getting rid of the duplicates: I guess I need to find a way to merge (rather than delete) the remaining duplicates in my backup file, i.e. to tell Excel that it should keep the latest "played" date, the highest "rating", and add up all the playcounts, or something like that. Or can Trackstat do that?

The restore operation itself will never remove data.

Currently the restore operation always overwrites the information, I'll consider adding a features in the future that can pick the entry with latest played date, highest rating, but it will probably not be added in the next couple of weeks.

Also, please note that the TrackStat backup entries contains two type of elements:
- <track> : Which represent the current play count, added time, rating
- <historyentry> : Which represent all previous times when a track has changed rating or been played

The <track> entry should have a single occurrence for each track.
The <historyentry> should have an occurrence for each time a track has been played and it's normal that a track have multiple <historyentry> elements but they should have different values in their <played> or <rating> sub elements.

It's possible to disable the history logging which cause the historyentry elements if you like, this is done through the "History" option in TrackStat setting page. Disabling it won't delete the history, but you can clear all TrackStat data and restore the backup and then the restore process should only restore the <track> elements and not the <historyentry> elements from the backup file.



P.S. One quick comment on the "Issues with duplicate musicbrainz tags (http://wiki.slimdevices.com/index.php/TrackStat_plugin#Issues_with_duplicate_musicbrainz _tags)": why is this an issue anyway? As far as I understood things, the Musicbrainz policy is that "duplicate" IDs will only be given to identical recordings. Which makes sense, because it simply is a duplicate and I thought the whole point of enabling Musicbraniz tags in Trackstat was that it will make Trackstat recognize these duplicates as such and treat the same song as the same, even when it exists in two files, e.g. one on the original album, and another one in some compilation so that if I rate the track on the original album, the same rating will be applied to the same song in the compilation. I'd consider this a feature, not a bug, as they say.

It's a feature on musicbrainz side and it's the right way for them to handle it.

The problem for TrackStat is that it assumed that it would identify a track uniquely (which it did a couple of years ago), but TrackStat can't presume musicbrainz tags exists since only some users have musicbrainz tags, so TrackStat database is based on the file path and it just use musicbrainz tags (if they exist). Due to this TrackStat will have two entries if you have two tracks in your library that represents the same recording. Normally this wouldn't be a problem but due to how TrackStat refresh operation is implemented the tracks with same musicbrainz id will be duplicated each time the TrackStat refresh operation runs. It would be possible to fix this but it would significantly increase the scanning time so I've avoided it so far.

chaug
2014-10-07, 07:51
It's probably easiest if you shutdown LMS and send me a zip with the library.db and persist.db files from your setup, you will find them in the LMS cache directory. This way it's possible for me to look at your database and see if I can see what's going on.

I sent you the files via email.



The restore operation itself will never remove data.

Currently the restore operation always overwrites the information, I'll consider adding a features in the future that can pick the entry with latest played date, highest rating, but it will probably not be added in the next couple of weeks.

I don't understand how you can say that restore will never remove data and then say that it will "overwrite" information. Is overwriting not a way of removing?



Also, please note that the TrackStat backup entries contains two type of elements:
- <track> : Which represent the current play count, added time, rating
- <historyentry> : Which represent all previous times when a track has changed rating or been played

The <track> entry should have a single occurrence for each track.
The <historyentry> should have an occurrence for each time a track has been played and it's normal that a track have multiple <historyentry> elements but they should have different values in their <played> or <rating> sub elements.

OK, that explains - at least partly - why the backup file is growing continuously. However, I still don't understand why it should grow on days when clearly no song has been played or added (i.e. when nobody was home).

More importantly, this double structure of the backup tells me that Excel was not doing a perfect job when automatically transforming it into a table. At least, it wasn't clear to me that there are two types of records. So I went back and chose a different way of importing the xml file (where you get to map the fields yourself) and I mapped only the track entries into the table and ignored the history.

16464

The result is not so much different, however: There are 52000 records (instead of 9000) and when I remove duplicates it takes away 41000 leaving me with 11000, i.e. there are 41000 completely identical track records and another 2000 partly identical, where (presumably) the url is the same but other fields differ.

Anyway, you will see this when you look at the files. I just wanted to update my analysis of the XML file in case anyone ever wants to do the same.

erland
2014-10-07, 08:41
I don't understand how you can say that restore will never remove data and then say that it will "overwrite" information. Is overwriting not a way of removing?

Sorry for the confusion, just to clarify this:

Let's say you have 4 duplicate records for file1 in the TrackStat database.
And that you remove all duplicates for file1 in the backup, so you only have one entry for file1 in the backup file.

Now, if you restore the backup file one of the 4 duplicate records in the database will be updated during the restore process but there will still be 4 records in the database. The only way to remove information from the TrackStat database is to use the "Remove all data" or "Delete unused statistic" buttons in the TrackStat settings page section "Backup/Restore/Clear". This behavior is intentional to minimize risk that a user accidentally do something that delete TrackStat data.



I just wanted to update my analysis of the XML file in case anyone ever wants to do the same.

Thanks, I'll get back to you after I've looked at your database file, will need some time to prepare an environment where I can load them but I'll try to get the time to do it sometime later this week or early next week.

chaug
2014-10-18, 14:26
I followed the instructions for restoring Trackstat statistics from the backup (http://forums.slimdevices.com/showthread.php?102245-Trackstat-Unrated-tracks-appear-as-rated-with-1-star&p=792434&viewfull=1#post792434) and, of course, the procedure itself went fine. It also lead to significantly reduced size of the nightly trackstat backups (from 26 MB to 6 MB) and it also looks like the backups are not increasing more than what you'd expect from the additional history records.

Regarding "data loss" discussed above: I did not fix this manually (i.e. merge those records so that the correct last played date, rating and playcount gets imported), but I understand that the information is still in the backup file so that I can do that whenever I feel that the stats are too messed up, right?

Anyway, the main purpose of this post is to say that I think I found out at what points the missing ratings are produced (i.e. the reason why I have many trackstat records with playcount > 1 but without rating, despite autorating being turned on): It seems to occur when I skip ahead to the next song. It does not happen every time, but maybe every other time. So, it seems there is some bug in Trackstat that prevents it from doing it's job in those cases where the song is not played until the end.

As I write this, it occurs to me that it might depend on how much of the song has been played. So I made a quick test: I skipped the song currently playing at about 70 percent and then checked its rating: it was unrated. I then skipped ahead the next song when it was just about 20 percent into the song: it was rated correctly.

Here are my threshold settings in Trackstat:
Minimum played percent = 1%
Automatic rating increase percentage = 95%
Automatic rating decrease percentage = 50%

What seems to be relevant here is the 50 percent threshold which is obviously triggered when I skip ahead before half the track is played. This works fine, according to my quick test above. But if I skip ahead at a later point in the track, the rating should not be decreased (and neither should it be increased unless more than 95% have been played). It seems that the error of missing ratings occurs in that window where the previous rating should be neither decreased nor increased. Apparently, Trackstat is so busy complying with the "do not change previous rating" rule that it forgets the "rate unrated track" rule. Is that possible?

erland
2014-10-18, 23:45
Regarding "data loss" discussed above: I did not fix this manually (i.e. merge those records so that the correct last played date, rating and playcount gets imported), but I understand that the information is still in the backup file so that I can do that whenever I feel that the stats are too messed up, right?

They will of course be in the old backup.
If you make a new backup the duplicates will be gone in that file, so if you want to go back it's probably a good idea to keep the old backup file.
Of course, as you continue to play music in your library the old backup file will be inaccurate, so it's probably not worth much to keep it after you have verified that things seems to look as you want.

If it looks like your rating/playback statistics looks reasonably correct now, I would suggest that you just make sure to disable the musicbrainz setting in the TrackStat settings page and enjoy the music instead of investigating the duplicate issue further. Feels like it's going to be a lot of work to fix it manually and I doubt it's worth the trouble if it looks reasonably correct now.



Anyway, the main purpose of this post is to say that I think I found out at what points the missing ratings are produced (i.e. the reason why I have many trackstat records with playcount > 1 but without rating, despite autorating being turned on): It seems to occur when I skip ahead to the next song. It does not happen every time, but maybe every other time. So, it seems there is some bug in Trackstat that prevents it from doing it's job in those cases where the song is not played until the end.

As I write this, it occurs to me that it might depend on how much of the song has been played. So I made a quick test: I skipped the song currently playing at about 70 percent and then checked its rating: it was unrated. I then skipped ahead the next song when it was just about 20 percent into the song: it was rated correctly.

Here are my threshold settings in Trackstat:
Minimum played percent = 1%
Automatic rating increase percentage = 95%
Automatic rating decrease percentage = 50%

What seems to be relevant here is the 50 percent threshold which is obviously triggered when I skip ahead before half the track is played. This works fine, according to my quick test above. But if I skip ahead at a later point in the track, the rating should not be decreased (and neither should it be increased unless more than 95% have been played). It seems that the error of missing ratings occurs in that window where the previous rating should be neither decreased nor increased. Apparently, Trackstat is so busy complying with the "do not change previous rating" rule that it forgets the "rate unrated track" rule. Is that possible?
I can confirm that you have found the problem, the behaviour in TrackStat currently is that if an unrated track is played longer than the lower limit but not long enough to increase the rating it's left unchanged rater than set to the default rating value. I'll put it on the todo list to fix this, but I want to make really sure I don't break anything else so it will require a bit of investigation and due to this make take some time.

chaug
2014-10-19, 04:12
I can confirm that you have found the problem, the behaviour in TrackStat currently is that if an unrated track is played longer than the lower limit but not long enough to increase the rating it's left unchanged rater than set to the default rating value. I'll put it on the todo list to fix this, but I want to make really sure I don't break anything else so it will require a bit of investigation and due to this make take some time.

Okay, but at least we have that one figured out! I guess, in the meantime, I will change my rating settings to

Automatic rating increase percentage = 95%
Automatic rating decrease percentage = 95%

It should do as a workaround. :)

Just for the record, for anyone reading this thread: two questions remain unanswered at this point:

1. Where did those duplicates come from that are not based on duplicate MB IDs? [Update: this is probably a non issue, since there don't seem to be any non-MBID-related duplicates]

2. Where do those ratings of never played tracks come from (especially those rated at 3 or 4 percent)?

If anyone is experiencing similar issues despite having MB ID's turned off in Trackstat, please post here.