But in this whole discussion, has an actual case of collision been verified yet? I.e. a case where different data produced identical MD5s? I know that it is possible, but I assume that it should be very, very rare. (Like "won't happen before the next ice-age" rare.) And I know you've found different, non-identical files producing identical MD5s, but it hasn't been clear to me from this discussion that the regions examined in the non-identical files were, in fact, non-identical.
Results 61 to 70 of 72
-
2010-09-13, 15:09 #61
Last edited by gharris999; 2010-09-13 at 15:13.
-
2010-09-13, 15:32 #62
Need help to verify duplicate detection
>But in this whole discussion, has an actual case of collision been
>verified yet? I.e. a case where different data produced identical MD5s?
>
I believe I had some collisions. I scanned once with 10000 bytes, which found some false positive duplicates. I rescanned with 5000 bytes and got a higher count of false positives.
Now, okay, that could be due to the padding issue with mp3 files, but there wasn't just more when the byte count was reduced - I had different files detected in the two scans. This suggests an element of randomness to the hits.
I will re-run the test on 7.6, when I get the scanner to actually work!
-
2011-07-02, 04:14 #63
I'd like some help testing the Duplicate Detector plugin algorithm again.
Could anyone who have a reasonably large library try to run it again if you are using SBS 7.5.4 or later. My 3400 track library is way to small to test it properly.
Goto Plugins/Duplicate Detector/Settings and make sure "Number of bytes" is set to 10 000.
Then goto "Extras/Duplicate Detector" and click the "Start detection" link to initiate it. You can hit the "Refresh" link to see the progress.
After you've run it, I'm interested to hear:
- Which SBS version you used
- Number of "Checksum duplicates"
- Number of "Duplicates"
- What kind of files (FLAC, MP3, ...) you have in the library
- What kind of files (FLAC, MP3, ...) that were incorrectly detected as duplicates or checksum duplicates
I'd expect that there are a few incorrect "checksum duplicates" but hopefully no incorrect "duplicates".
I'm especially interesting to know how it works with:
- Non FLAC based cue sheets
- FLAC based cue sheets
- Non FLAC normal music files, for example MP3, AAC, WAV
- Both 7.5.4 and 7.6
Releases earlier than 7.5.4 isn't of interest as I know it doesn't work reliable on these.Erland Isaksson (My homepage)
(Developer of many plugins/applets (both free and commercial).
If you like to encourage future presence on this forum and/or third party plugin/applet development, consider purchasing some plugins)
You may also want to try my Android apps Squeeze Display and RSS Photo Show
Interested in the future of music streaming ? ickStream - A world of music at your fingertips.
-
2011-07-02, 10:27 #64Senior Member
- Join Date
- Mar 2008
- Location
- Paris
- Posts
- 150
Hi Erland,
I've tried Duplicate Detector V0.3 with SBS 7.6-r32517 on Linux ReadyNas Pro(x86).
My library contains mostly FLAC, MP3 and AAC files (no cue sheets).
Scanning is fast and seems efficient. Here is the plugin output:
- Detecting using (number of bytes): 10000
- Detected: 66598
- Checksum duplicates: 434
- Incorrect duplicates: 162
- Duplicates: 272
Regards.
VolponeSTREAM => SqueezeBoxServer 7.6 Beta / ReadyNas Pro (x86) | SB3 - Duet - Boom - Ipeng (Iphone) - Touch
AUDIO => Rega DAC | NAD C162 pre-amp | NAD C272 power-amp | Triangle Celius 202 speakers
-
2011-07-02, 10:36 #65
Here are my statistics:
Duplicate detector 0.3
SBS 7.5.4 r32171 on Win XP
Detecting using (number of bytes): 10000
Detected: 120736
Checksum duplicates: 1052
Incorrect duplicates: 91
Duplicates: 961
A mix of mp3 and flac, mostly mp3, no cue sheets.
I will send you a PM with my incorrect duplicates.
Thanks for trying out things also on larger libraries. Let me know if you need any other help testing.Last edited by vagskal; 2011-07-02 at 10:40.
2 x SB3 (wired), Receiver (wired), Boom (wireless), Controller, iPeng on iPhone 4 & iPad, muso on remote computer running Win 7 64-bit | 7.7.3 on Win XP
-
2011-07-02, 22:27 #66Erland Isaksson (My homepage)
(Developer of many plugins/applets (both free and commercial).
If you like to encourage future presence on this forum and/or third party plugin/applet development, consider purchasing some plugins)
You may also want to try my Android apps Squeeze Display and RSS Photo Show
Interested in the future of music streaming ? ickStream - A world of music at your fingertips.
-
2012-04-29, 02:00 #67Junior Member
- Join Date
- Dec 2010
- Posts
- 20
- SBS version: 7.7.1
- mp3 only, no cue sheets
- result:
No incorrect "duplicates", the detected ones are for real. Of course there are many more duplicates in my library, which weren't detected - e.g. all my songs with duplicate musicbrainz ids (album vs. compilation issue).Code:Detecting using (number of bytes): 10000 Detected: 9548 Checksum duplicates: 6 Duplicates: 6
-
2012-04-29, 02:10 #68
I'm guessing the duplicates that do exist in your library but aren't indicated might be based on different master CD's causing the audio data to be almost but not exactly the same ?
This plugin doesn't look at the tags at all, it only looks at the audio data in the files, so even if two tracks have the same musicbrainz identifier it won't be considered to be a duplicate unless the audio data also is exactly the same.Erland Isaksson (My homepage)
(Developer of many plugins/applets (both free and commercial).
If you like to encourage future presence on this forum and/or third party plugin/applet development, consider purchasing some plugins)
You may also want to try my Android apps Squeeze Display and RSS Photo Show
Interested in the future of music streaming ? ickStream - A world of music at your fingertips.
-
2012-04-29, 23:14 #69
hey Erland,
couldn't the detector do degrees? meaning, exact matches would be 100% certainty, but other files could be reported to be 98% likely to be dupes, that kind of thing?
thats what i would need, something that gave me percentages, b/c i believe i'll have lots of things that to me, are dupes, but wouldn't pass the 100% exact match test.
-
2012-04-29, 23:26 #70
To support that it has to use a completely different technology, currently it does a hash of parts of the audio data and the hash can be completely different even if only a single bit it the data differs. For your usage, I think something based on acoustid.org or similar fingerprint technologies that looks at the acoustics of the audio data would be preferred.
Erland Isaksson (My homepage)
(Developer of many plugins/applets (both free and commercial).
If you like to encourage future presence on this forum and/or third party plugin/applet development, consider purchasing some plugins)
You may also want to try my Android apps Squeeze Display and RSS Photo Show
Interested in the future of music streaming ? ickStream - A world of music at your fingertips.


Reply With Quote

