Home of the Squeezebox™ & Transporter® network music players.
Page 7 of 8 FirstFirst ... 5678 LastLast
Results 61 to 70 of 72
  1. #61
    Senior Member gharris999's Avatar
    Join Date
    Apr 2005
    Location
    Santa Fe, NM
    Posts
    3,299
    Quote Originally Posted by Philip Meyer View Post
    >Don't worry about the performance optimizations yet, there are many
    >ways to solve that, I'm sure we can get decent performance on this in
    >one way or another. At the moment it's more important that we can
    >guarantee the uniqueness and that it works with all file formats.


    There is never a way to guarantee uniqueness with MD5.
    The more audio content that is checked, the less chance of duplicates. Even checking the whole audio content would not guarantee false duplicates. Checking all data would be really costly on performance.

    I was exploring the idea of calculating the checksum on a block of data, and then if that is a duplicate re-calculate the checksum by reading more data. Nice idea in concept, to perform the check on a small subset unless necessary, but in reality I can't see how it would work.
    But in this whole discussion, has an actual case of collision been verified yet? I.e. a case where different data produced identical MD5s? I know that it is possible, but I assume that it should be very, very rare. (Like "won't happen before the next ice-age" rare.) And I know you've found different, non-identical files producing identical MD5s, but it hasn't been clear to me from this discussion that the regions examined in the non-identical files were, in fact, non-identical.
    Last edited by gharris999; 2010-09-13 at 15:13.

  2. #62
    Senior Member Philip Meyer's Avatar
    Join Date
    Apr 2005
    Location
    UK
    Posts
    5,568

    Need help to verify duplicate detection

    >But in this whole discussion, has an actual case of collision been
    >verified yet? I.e. a case where different data produced identical MD5s?
    >

    I believe I had some collisions. I scanned once with 10000 bytes, which found some false positive duplicates. I rescanned with 5000 bytes and got a higher count of false positives.

    Now, okay, that could be due to the padding issue with mp3 files, but there wasn't just more when the byte count was reduced - I had different files detected in the two scans. This suggests an element of randomness to the hits.

    I will re-run the test on 7.6, when I get the scanner to actually work!

  3. #63
    Senior Member erland's Avatar
    Join Date
    Dec 2005
    Location
    Sweden
    Posts
    10,315
    I'd like some help testing the Duplicate Detector plugin algorithm again.

    Could anyone who have a reasonably large library try to run it again if you are using SBS 7.5.4 or later. My 3400 track library is way to small to test it properly.

    Goto Plugins/Duplicate Detector/Settings and make sure "Number of bytes" is set to 10 000.

    Then goto "Extras/Duplicate Detector" and click the "Start detection" link to initiate it. You can hit the "Refresh" link to see the progress.

    After you've run it, I'm interested to hear:
    - Which SBS version you used
    - Number of "Checksum duplicates"
    - Number of "Duplicates"
    - What kind of files (FLAC, MP3, ...) you have in the library
    - What kind of files (FLAC, MP3, ...) that were incorrectly detected as duplicates or checksum duplicates

    I'd expect that there are a few incorrect "checksum duplicates" but hopefully no incorrect "duplicates".

    I'm especially interesting to know how it works with:
    - Non FLAC based cue sheets
    - FLAC based cue sheets
    - Non FLAC normal music files, for example MP3, AAC, WAV
    - Both 7.5.4 and 7.6

    Releases earlier than 7.5.4 isn't of interest as I know it doesn't work reliable on these.
    Erland Isaksson (My homepage)
    (Developer of many plugins/applets (both free and commercial).
    If you like to encourage future presence on this forum and/or third party plugin/applet development, consider purchasing some plugins)
    You may also want to try my Android apps Squeeze Display and RSS Photo Show
    Interested in the future of music streaming ? ickStream - A world of music at your fingertips.

  4. #64
    Senior Member
    Join Date
    Mar 2008
    Location
    Paris
    Posts
    150
    Quote Originally Posted by erland View Post
    I'd like some help testing the Duplicate Detector plugin algorithm again.
    Hi Erland,
    I've tried Duplicate Detector V0.3 with SBS 7.6-r32517 on Linux ReadyNas Pro(x86).
    My library contains mostly FLAC, MP3 and AAC files (no cue sheets).
    Scanning is fast and seems efficient. Here is the plugin output:
    - Detecting using (number of bytes): 10000
    - Detected: 66598
    - Checksum duplicates: 434
    - Incorrect duplicates: 162
    - Duplicates: 272

    Regards.
    Volpone
    STREAM => SqueezeBoxServer 7.6 Beta / ReadyNas Pro (x86) | SB3 - Duet - Boom - Ipeng (Iphone) - Touch
    AUDIO => Rega DAC | NAD C162 pre-amp | NAD C272 power-amp | Triangle Celius 202 speakers

  5. #65
    Senior Member vagskal's Avatar
    Join Date
    Oct 2008
    Location
    Sweden
    Posts
    643
    Here are my statistics:

    Duplicate detector 0.3
    SBS 7.5.4 r32171 on Win XP
    Detecting using (number of bytes): 10000
    Detected: 120736
    Checksum duplicates: 1052
    Incorrect duplicates: 91
    Duplicates: 961

    A mix of mp3 and flac, mostly mp3, no cue sheets.

    I will send you a PM with my incorrect duplicates.

    Thanks for trying out things also on larger libraries. Let me know if you need any other help testing.
    Last edited by vagskal; 2011-07-02 at 10:40.
    2 x SB3 (wired), Receiver (wired), Boom (wireless), Controller, iPeng on iPhone 4 & iPad, muso on remote computer running Win 7 64-bit | 7.7.3 on Win XP

  6. #66
    Senior Member erland's Avatar
    Join Date
    Dec 2005
    Location
    Sweden
    Posts
    10,315
    Quote Originally Posted by volpone View Post
    Hi Erland,
    I've tried Duplicate Detector V0.3 with SBS 7.6-r32517 on Linux ReadyNas Pro(x86).
    My library contains mostly FLAC, MP3 and AAC files (no cue sheets).
    Scanning is fast and seems efficient. Here is the plugin output:
    - Detecting using (number of bytes): 10000
    - Detected: 66598
    - Checksum duplicates: 434
    - Incorrect duplicates: 162
    - Duplicates: 272

    Regards.
    Volpone
    Does all files listed when you hit the "Show duplicates" link (the 272 files) seem to be real duplicates or are there any of them that looks like they have been incorrectly classified as duplicates ?
    Erland Isaksson (My homepage)
    (Developer of many plugins/applets (both free and commercial).
    If you like to encourage future presence on this forum and/or third party plugin/applet development, consider purchasing some plugins)
    You may also want to try my Android apps Squeeze Display and RSS Photo Show
    Interested in the future of music streaming ? ickStream - A world of music at your fingertips.

  7. #67
    Junior Member
    Join Date
    Dec 2010
    Posts
    20
    - SBS version: 7.7.1
    - mp3 only, no cue sheets
    - result:
    Code:
    Detecting using (number of bytes): 10000
    Detected: 9548
    Checksum duplicates: 6 
    Duplicates: 6
    No incorrect "duplicates", the detected ones are for real. Of course there are many more duplicates in my library, which weren't detected - e.g. all my songs with duplicate musicbrainz ids (album vs. compilation issue).

  8. #68
    Senior Member erland's Avatar
    Join Date
    Dec 2005
    Location
    Sweden
    Posts
    10,315
    Quote Originally Posted by haschmich View Post
    - SBS version: 7.7.1
    - mp3 only, no cue sheets
    - result:
    Code:
    Detecting using (number of bytes): 10000
    Detected: 9548
    Checksum duplicates: 6 
    Duplicates: 6
    No incorrect "duplicates", the detected ones are for real. Of course there are many more duplicates in my library, which weren't detected - e.g. all my songs with duplicate musicbrainz ids (album vs. compilation issue).
    I'm guessing the duplicates that do exist in your library but aren't indicated might be based on different master CD's causing the audio data to be almost but not exactly the same ?

    This plugin doesn't look at the tags at all, it only looks at the audio data in the files, so even if two tracks have the same musicbrainz identifier it won't be considered to be a duplicate unless the audio data also is exactly the same.
    Erland Isaksson (My homepage)
    (Developer of many plugins/applets (both free and commercial).
    If you like to encourage future presence on this forum and/or third party plugin/applet development, consider purchasing some plugins)
    You may also want to try my Android apps Squeeze Display and RSS Photo Show
    Interested in the future of music streaming ? ickStream - A world of music at your fingertips.

  9. #69
    Banned MrSinatra's Avatar
    Join Date
    Nov 2005
    Location
    Pa
    Posts
    3,696
    hey Erland,

    couldn't the detector do degrees? meaning, exact matches would be 100% certainty, but other files could be reported to be 98% likely to be dupes, that kind of thing?

    thats what i would need, something that gave me percentages, b/c i believe i'll have lots of things that to me, are dupes, but wouldn't pass the 100% exact match test.

  10. #70
    Senior Member erland's Avatar
    Join Date
    Dec 2005
    Location
    Sweden
    Posts
    10,315
    Quote Originally Posted by MrSinatra View Post
    hey Erland,

    couldn't the detector do degrees? meaning, exact matches would be 100% certainty, but other files could be reported to be 98% likely to be dupes, that kind of thing?

    thats what i would need, something that gave me percentages, b/c i believe i'll have lots of things that to me, are dupes, but wouldn't pass the 100% exact match test.
    To support that it has to use a completely different technology, currently it does a hash of parts of the audio data and the hash can be completely different even if only a single bit it the data differs. For your usage, I think something based on acoustid.org or similar fingerprint technologies that looks at the acoustics of the audio data would be preferred.
    Erland Isaksson (My homepage)
    (Developer of many plugins/applets (both free and commercial).
    If you like to encourage future presence on this forum and/or third party plugin/applet development, consider purchasing some plugins)
    You may also want to try my Android apps Squeeze Display and RSS Photo Show
    Interested in the future of music streaming ? ickStream - A world of music at your fingertips.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •