Is there someone with a large music library or with a library with a lot of different file formats that could help me a bit ?
It works in my small 3500 tracks library but I would like to try it with something larger.
I would like you to test the "Duplicate Detector" plugin which exists in my testing repository:
You will need the latest Squeezebox Server 7.5 nightly or 7.6 nightly release, revision 31264 or later.Code:http://erlandplugins.googlecode.com/svn/repository/trunk/testing.xml
I would like to to install the plugin and:
1. Goto "Extras/Duplicate Detector" in SBS web interface and start detection. You need to hit the "Refresh" link to see the current progress.
2. The default is to detect using the first 10000 audio bytes, if you don't get any incorrect duplicates reported try decreasing this number in the Duplicate Detector settings pages and see how low you can go.
3. Look in the server.log and see if you get any strange errors during the detection.
The plan for this is a lot more than detecting duplicates, this is just an initial experiment to make sure it's possible to uniquely identify music files when ignoring tags and just looking at the audio data. The long term intention is to be able to use this to connect statistics and metadata to a track and still be able to handle that the file is moved, renamed or re-tagged.
It won't detect duplicate files if they use different file formats, it only looks at the audio data and since this differs from FLAC and MP3 version of the same time it won't consider these to be duplicates.
It should consider two files a duplicate if they have the same audio data but have different tags.
Report back how low/high you had to set the setting without getting any incorrect duplicates reported.
Also report which operating system you have verified it on and what kind of music files you have (FLAC, MP3, ...).
The performance difference between 7.5 and 7.6 was very big in my library, 7.6 was many, many times faster.
Results 1 to 10 of 72
-
2010-09-01, 23:17 #1
Need help to verify duplicate detection
Erland Isaksson (My homepage)
(Developer of many plugins/applets (both free and commercial).
If you like to encourage future presence on this forum and/or third party plugin/applet development, consider purchasing some plugins)
You may also want to try my Android apps Squeeze Display and RSS Photo Show
Interested in the future of music streaming ? ickStream - A world of music at your fingertips.
-
2010-09-03, 03:33 #2
I just tried the plugin with the default setting. I only get Unable to calculate checksum error messages. The web UI reports as many duplicates as there are files detected. The .txt file for viewing the duplicate files is empty.
I guess there must be something wrong.
Version: 7.5.2 - r31223 @ Thu Aug 19 02:06:58 PDT 2010
Hostname: musik
Server IP Address: 192.168.1.101
Server HTTP Port Number: 9000
Operating system: Windows XP - EN - cp1252
Platform Architecture: 586
Perl Version: 5.10.0 - MSWin32-x86-multi-thread
MySQL Version: 5.0.22-community-nt-log
Total Players Recognized: 42 x SB3 (wired), Receiver (wired), Boom (wireless), Controller, iPeng on iPhone 4 & iPad, muso on remote computer running Win 7 64-bit | 7.7.3 on Win XP
-
2010-09-03, 07:43 #3Senior Member
- Join Date
- Dec 2009
- Location
- Germany
- Posts
- 713
-
2010-09-03, 08:39 #4
For some reason the "Show duplicates" link downloads a file called duplicates.txt.rdp (Safari/OSX 10.6). Is there a reason you used a .binfile extension instead of just .txt?
I'm getting a lot of duplicates even with the default of 10000 bytes. I will look into that, makes me worry I did something wrong in Audio::Scan.
-
2010-09-03, 09:06 #5
Yeah it's LAME padding causing the problem, hmm...
-
2010-09-03, 09:43 #6
I didn't want it to open in the browser if the file got huge, so I wanted the content type to be set to "application/octet-stream". It works for me with Safari on OSX 10.6 but maybe that's because I'm running SBS on a separate Linux machine ?
If you have any ideas why it appends .rdp, let me know.
Do you mean that LAME encoded files will cause incorrect duplicates ? Is this independent of md5_size settings ?
I've got reports from a user with 100 000 tracks library and so far he seems to get different number of duplicates when using different md5_size settings, so far he has tested a couple of settings between 10 000 and 10 000 000 and all report different number of duplicates.
Is this an indication that MD5 might not be good enough in a large library ?Erland Isaksson (My homepage)
(Developer of many plugins/applets (both free and commercial).
If you like to encourage future presence on this forum and/or third party plugin/applet development, consider purchasing some plugins)
You may also want to try my Android apps Squeeze Display and RSS Photo Show
Interested in the future of music streaming ? ickStream - A world of music at your fingertips.
-
2010-09-03, 09:54 #7
MD5 is fine, the problem is the first 10000 bytes of these files are identical. I think the easiest way to deal with it is to not take the bytes from the very beginning of the file but from somewhere in the middle.
-
2010-09-03, 10:02 #8Erland Isaksson (My homepage)
(Developer of many plugins/applets (both free and commercial).
If you like to encourage future presence on this forum and/or third party plugin/applet development, consider purchasing some plugins)
You may also want to try my Android apps Squeeze Display and RSS Photo Show
Interested in the future of music streaming ? ickStream - A world of music at your fingertips.
-
2010-09-03, 10:10 #9
Need help to verify duplicate detection
On Sep 3, 2010, at 1:02 PM, erland wrote:
>
> andyg;574039 Wrote:
>> MD5 is fine, the problem is the first 10000 bytes of these files are
>> identical. I think the easiest way to deal with it is to not take the
>> bytes from the very beginning of the file but from somewhere in the
>> middle.
>>
> It just felt strange that using a md5_size of 100 000 reports different
> number of duplicates than a setting of 500 000, shouldn't the padding be
> irrelevant when using a larger md5_size settings ?
Yeah, any false-positive duplicates need to be investigated as to why so many bytes are identical.
-
2010-09-04, 01:04 #10
Thanks, that did it.
I scanned my +115k library and 9,548 duplicates were found with the default setting. When I clicked show duplicates SBS stalled (music playing stopped, mysql at 50% CPU) and I left it that way overnight. No change the morning after so I had to force SBS to quit. The same happened when I after a restart of SBS tried showing the duplicates again.
When I checked the show duplicates .txt file in the beginning of the scan it worked and reported duplicates that were not duplicates (it was perhaps only half a page of data then).
Log file is attached.2 x SB3 (wired), Receiver (wired), Boom (wireless), Controller, iPeng on iPhone 4 & iPad, muso on remote computer running Win 7 64-bit | 7.7.3 on Win XP

Reply With Quote


