PDA

View Full Version : Finding duplicate tracks?



rkrug
2016-06-20, 04:59
Hi

LMS runs on Linux, I use a Mac. How can I find duplicate tracks
(preferably with some fuzzy matching in tags)?

MusicID does it, but I did not get around installing and running the
analysis yet.

Cheers,

Rainer
--
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel : +33 - (0)9 53 10 27 44
Cell: +33 - (0)6 85 62 59 98
Fax : +33 - (0)9 58 10 27 44

Fax (D): +49 - (0)3 21 21 25 22 44

email: Rainer (AT) krugs (DOT) de

Skype: RMkrug

PGP: 0x0F52F982

DJanGo
2016-06-20, 06:03
Hi,

we first had to talk about: what is is duplicated track?

I do have 12"/Ep and albums and mostly one track out of a 12" is 100% the same as in the album.
Sometimes its not - even when the tags are the same (mostly the 12" are some secs longer)

But even when they are 100% ident. - i wont trash the 12" or the Album Version.
Yes i could use playlists for this - but thats not my taste.

So whats your decision?

JerryS
2016-06-20, 08:19
Hi,
we first had to talk about: what is is duplicated track?


Yes, I agree with that. So, in my fairly modest jazz collection I have 29 tracks of Body And Soul, 28 of All the Things You Are, 26 of 'Round Midnight, 26 of Caravan, 24 of Autumn Leaves, etc, etc, none of which are duplicates, being recorded by different combinations of artists on different occasions, at different venues....

I can't see any easy way of picking out true duplicates programatically. What I would do is use an SQL query on the LMS database to pick out duplicate track titles and then investigate further. Something like:-

SELECT *
FROM tracks
INNER JOIN (SELECT titlesearch
FROM tracks
GROUP BY titlesearch
HAVING COUNT(id) > 1) dup
ON tracks.titlesearch = dup.titlesearch

works for me in mysql giving you all the information needed to investigate further including file location and playtime. Sorry, I don't use sqlite which is now the default db in LMS but no doubt a bit of googling will fix you up with a suitable query string if the above doesn't work.

There is an old military maxim that defences are best deployed at the perimeter. I take a lot of care in buying music to avoid compilations or anything without reasonable provenance which can slip duplicates into my collection.

Regards

JerryS

audiomuze
2016-06-20, 21:43
Two ways I know of, one is DupeGuru Music Edition (https://www.hardcoded.net/dupeguru_me/) (it's freeware) and a 2nd method I know of for FLAC files only which leverages Python to build a SQLite table containing the metadata (excluding embedded artwork) of all tracks in a directory tree you point it to. Included for each track is the md5sum of the audio stream that's automatically embedded in each FLAC file on creation. Once the table is generated you can run a query against it to highlight all duplicate md5sum entries which you can then investigate further.

rkrug
2016-06-21, 01:16
Thanks everybody - and you are all right.

The question is obviously what is a duplicate - that's why I put in the
"fuzzy matching of tags".

But I must say I like the SQL approach - I know sqlite a bit and will look
into this.

By the way - how does the library view which hides the lossy version
when a lossless exist work? Match of all tags?

Thanks,

Rainer


audiomuze <audiomuze.7hncvb (AT) no-mx (DOT) forums.slimdevices.com> writes:

> Two ways I know of, one is 'DupeGuru Music Edition'
> (https://www.hardcoded.net/dupeguru_me/) (it's freeware) and a 2nd
> method I know of for FLAC files only which leverages Python to build a
> SQLite table containing the metadata (excluding embedded artwork) of all
> tracks in a directory tree you point it to. Included for each track is
> the md5sum of the audio stream that's automatically embedded in each
> FLAC file on creation. Once the table is generated you can run a query
> against it to highlight all duplicate md5sum entries which you can then
> investigate further.
>
>
>
> SqueezeWand | 'Vivere DAC MKI'
> (http://vivereaudio.com/post/2013/08/16/DAC-I-is-Born!.aspx) | 'ATC
> SCA2'
> (http://www.atcloudspeakers.co.uk/hi-fi/electronics/source-pre-amplifiers/sca2/)
> | 'ATC SCM100ASLT'
> (http://www.atcloudspeakers.co.uk/hi-fi/loudspeakers/tower-series/scm100aslt/)
>
> *'Linux finally gets a great audio tagger'
> (http://www.ubuntugeek.com/linux-finally-gets-a-great-audio-tagger.html):
> 'puddletag' (http://puddletag.sourceforge.net/)* - now packaged in most
> Linux distributions.
> ------------------------------------------------------------------------
> audiomuze's Profile: http://forums.slimdevices.com/member.php?userid=33613
> View this thread: http://forums.slimdevices.com/showthread.php?t=105776
>
>

mherger
2016-06-21, 01:47
> By the way - how does the library view which hides the lossy version
> when a lossless exist work? Match of all tags?

https://github.com/Logitech/slimserver/blob/public/7.9/Slim/Plugin/ExtendedBrowseModes/Libraries.pm#L30

As you can see from that code snippet it's a rather simple check.
Basically same album, artist and title name, year and track number. You
could add duration, genres etc.

--

Michael

rkrug
2016-06-21, 01:57
Michael Herger <slim (AT) herger (DOT) net> writes:

>> By the way - how does the library view which hides the lossy version
>> when a lossless exist work? Match of all tags?
>
> https://github.com/Logitech/slimserver/blob/public/7.9/Slim/Plugin/ExtendedBrowseModes/Libraries.pm#L30
>
> As you can see from that code snippet it's a rather simple
> check. Basically same album, artist and title name, year and track
> number. You could add duration, genres etc.

Thanks.

Rainer

--
Rainer M. Krug
email: Rainer<at>krugs<dot>de
PGP: 0x0F52F982

audiomuze
2016-06-21, 10:53
The code I was referring to is here: https://github.com/keith-g/audiodb

Here's code to find duplicate records based on md5sum:

SELECT
__md5sig, __dirpath,__filename, COUNT(*)
FROM
audio
GROUP BY
__md5sig
HAVING
COUNT(*) > 1
ORDER BY
__md5sig;

kelgbla
2018-10-16, 20:32
Update: MusicGuru is no longer available and it is part of dupeGuru now. A list of most recommended duplicate file finder apps: https://www.tunesbro.com/best-duplicate-file-finder-mac.html

Blackfiction
2018-11-11, 09:19
Update: MusicGuru is no longer available and it is part of dupeGuru now. A list of most recommended duplicate file finder apps: https://www.tunesbro.com/best-duplicate-file-finder-mac.html

Another program you could use is bliss. See blisshq.com. It will search for duplicates within the same album.

matka
2018-11-19, 11:16
Hi

LMS runs on Linux, I use a Mac. How can I find duplicate tracks
(preferably with some fuzzy matching in tags)?

MusicID does it, but I did not get around installing and running the
analysis yet.

Cheers,

Rainer
--
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel : +33 - (0)9 53 10 27 44
Cell: +33 - (0)6 85 62 59 98
Fax : +33 - (0)9 58 10 27 44

Fax (D): +49 - (0)3 21 21 25 22 44

email: Rainer (AT) krugs (DOT) de

Skype: RMkrug

PGP: 0x0F52F982
I wonder if track chromaprint fingerprinting can help you.

I've been looking into python beets scripts lately. One of the plugins is for track duplication detection, I have not used it so can't comment how well it works.
I have been using however chroma plugin for fingerprinting of tracks and I'm quite impressed at the functionality.

Once you fingerprint all your tracks, it should be rather straightforward to identify duplicates.

http://beets.io/