PDA

View Full Version : Duplicate albums titles from tags



jwgraves
2005-05-26, 05:30
I have 4000 songs by 120 artists on 500 albums, all MP3 files and all tags are in perfect shape.

Here's the problem. When 'browsing artists' a number of albums that are not by that artist appear in the list. If I click on the album title, I then see a few songs (not all of them) with the other artists name displayed. I have finally seen the pattern here. The "duplicate" albums or songs are always titles that just happen to exist on an album by BOTH artists. For example: Footprints by Wayne Shorter and also by Miles Davis, on different albums. When viewing albums by Wayne Shorter I see all of his albums PLUS the one additional album by Miles Davis that also has a track called Footprints.

I've checked all tags and am certain they are correct which I can also verify because other software I use reads the tags fine. I've also wiped the cache and rescanned the library and still have the problem.

Anyone else have this bug?

cdfreak
2005-05-26, 11:40
It's not a bug per se, more of the way Slimserver does the albums. It already knows to look for "Greatest Hits" and "Best of" as an album name and split the albums out...but others (ie, "Footprints" in your case) will show up in the same list when browse by album is chosen.

In slimserver.. server settings...behavior there is a section called "Common Album Titles". You'll see Greatest Hits, Best of, and Live already listed there. Add "Footprints" (and any other duped album names) to the list, click change, and then do a cache wipe and re-scan. It should hopefully put them in seperate folders now. I'd add Best Of, The Best of, The Best Of as well...


hope it helps!

fuzzyT
2005-05-26, 11:53
jwgraves wrote:

> Here's the problem. When 'browsing artists' a number of albums that
> are not by that artist appear in the list.

It seems like the SQL that populates this display should be constraining
by both Album _and_ Artist.

--rt

cdfreak
2005-05-26, 13:02
But if you constrain by both album and artist, you'll have 10-15 of each compilation cd, and some soundtracks. In this case, you'd need to have a tag specified for "cdartist" (Various Artists) or to somehow note that it is a compilation cd or multiple artist album.

Or, you could rename the album to be more than just the common name, like Footsteps to be Artist - Footsteps. But this would make displaying look rather poor when it says "Miles Davis - Footsteps by Miles Davis".

There's not a very easy solution from a database perspective without a field that is album unique. If it doesn't come from the id3 tag it needs to come from the file/folder location.

Going by *folder* giving each "folder" an ID# in the database, and everything in that folder is an album might work, but it would need to make sure you were at the lowest level of folders so someone with "Music/A/Aerosmith/Toys in the Attic" hierarchy is covered as well as people who have just "Music/A/Aerosmith - Toys in the Attic" or (heaven forbid) "Music/Aerosmith - Toys in the Attic". No idea what the workload cost would be using one method of scanning vs the other though.

fuzzyT
2005-05-26, 13:19
i wrote too quickly. to be more precise, i should have said:

"the SQL that populates this display should be constraining by an ID
unique to this album"

cdfreak wrote:
> But if you constrain by both album and artist, you'll have 10-15 of each
> compilation cd, and some soundtracks. In this case, you'd need to have a
> tag specified for "cdartist" (Various Artists) or to somehow note that
> it is a compilation cd or multiple artist album.

you're right. various artists albums are a special case. SS needs to
deal with this case generally. once that is done, then it should take
care of this case as well.

it seems like a workable approach to this problem should already exist.
do the standard tagging systems have a marker for compilations? how do
other library management and playback applications deal with issue?

> Or, you could rename the album to be more than just the common name,
> like Footsteps to be Artist - Footsteps. But this would make displaying
> look rather poor when it says "Miles Davis - Footsteps by Miles Davis".

could work, but seems a horrible hack.

> There's not a very easy solution from a database perspective without a
> field that is album unique. If it doesn't come from the id3 tag it
> needs to come from the file/folder location.

is there no unique album key in the SS DB schema? that's a real
weakness. i can see why it would be hard to nail down, as most audio
files are track-based and not album-based.

a key based on uniqueness of Artist-Album concatenation might be good
enough once the V/A problem is addressed.

> Going by *folder* giving each "folder" an ID# in the database, [...]

any solution that relies on identifying unique albums by path is going
to fail for users that don't follow the expected conventions. i suppose
it would be possible to allow user to enter custom masks to hint this
type of interpretation.

thoughts?

--rt

pfarrell
2005-05-26, 13:28
On Thu, 2005-05-26 at 16:19 -0400, ron thigpen wrote:

> "the SQL that populates this display should be constraining by an ID
> unique to this album"

> is there no unique album key in the SS DB schema? that's a real
> weakness. i can see why it would be hard to nail down, as most audio
> files are track-based and not album-based.

I haven't looked at the source recently, that is what the developers
list is about, but it sure had it when I last looked.

> a key based on uniqueness of Artist-Album concatenation might be good
> enough once the V/A problem is addressed.

This is really a hard problem.
The "standard" way most software on the 'net identifies a CD is
using the CDDB hash, which is not unique, neither is any rational
combination of artist/album names. While most people
don't have duplicate copies of CDs on their shelves (For some
reason, I have half a dozen where I forgot that I bought it already)
if you look at any large collection, you will see "identical"
CDs that are different. Due to lots of reasons, sometimes
they even have differing UPC codes, shorter or longer songs, etc.

Kinda like how we identify people by their "name" when
we know that "Pat Farrell" is hardly unique.

One of the beauties of moving SS 6.* to using Sql is that
you don't need a solution that is 100% applicable out of the box.
Special cases can be handled special.


--
Pat
http://www.pfarrell.com/music/slimserver/slimsoftware.html

fuzzyT
2005-05-26, 13:47
Pat Farrell wrote:

> I haven't looked at the source recently, that is what the developers
> list is about, but it sure had it when I last looked.

if true, that's good news. any idea on how the value is set?

(happy to move the thread if this gets annoying...)

>>a key based on uniqueness of Artist-Album concatenation might be good
>>enough once the V/A problem is addressed.

> This is really a hard problem.

yeah, i understand just enough about this to see how it would be hard.

it helps that SS can probably just shoot for uniqueness _within the
music library_ and not in the world. also, i'm not sure uniqueness
would need to span refreshes of the database. for purposes of the SS
UI, it just needs to be a pointer to a group of track files.

the issue is how to define the boundaries of these file groups based
solely on information available in the tags (and perhaps pathname).

> The "standard" way most software on the 'net identifies a CD is
> using the CDDB hash, which is not unique, neither is any rational
> combination of artist/album names.

this is just a hash of the CD title track data isn't it? it might not
do the trick all by itself, but if it exists it would be a very strong
hint. hash/ID collisions should be reasonably rare. but not everyone's
tracks will have this tag.

is this working any differently for those using iTunes integration?

--rt

Dondi
2005-05-26, 14:03
This is a problem that I had incurred when sharing the music library on my network with my Windows Media Center computer and its MY MUSIC portion of the GUI/Player.

Windows MCE uses Windows Media Player on the back-end for its library (which acts somewhat differently than the plain ole Windows Medis Player itself). The issue of multiple album listings of a single album occurred when the ALBUM ARTIST field was empty.... OR was improperly populated by the artist of that particular song instead of a generic field like 'COMPILATION' or 'VARIOUS ARTISTS'

Windows media center populates its album listing FIRST by the ALBUM ARTIST tag. Once this issue had wreaked havoc on my Media Center, the solution was as simple as going back through my compilations (and eventually the entire library) and making sure the ALBUM ARTIST field was populated correctly. Anything that was a soundtrack, I populated as 'SOUNDTRACK' in the ALBUM ARTIST field... same for TRIBUTE albums, COMPILATION, etc. This alleviated several issues with the way MCE listed ALBUMS: It solved the issue of having albums of the same title from a single artist mixed together with a single compilation album of the same title. It also alleviated the 'GREATEST HITS' issue as well as the issue of a 12-track compilation album being displayed as 12 separate albums of the same title.

This extra field has saved my library in Windows Media Center and it seems that it may be of use with Slim Server. As an added precaution, in the case of a GREATEST HITS fiasco, I also named every "common" album title with an added Identifier i.e., GREATEST HITS: ARTIST in the ALBUM field.

My $.02
-- D

pfarrell
2005-05-26, 14:26
On Thu, 2005-05-26 at 16:47 -0400, ron thigpen wrote:
> Pat Farrell wrote:
> > The "standard" way most software on the 'net identifies a CD is
> > using the CDDB hash, which is not unique, neither is any rational
> > combination of artist/album names.
>
> this is just a hash of the CD title track data isn't it? it might not
> do the trick all by itself, but if it exists it would be a very strong
> hint. hash/ID collisions should be reasonably rare. but not everyone's
> tracks will have this tag.

It is not about the name or title. And it is not associated with the
specific track/song

It is a hash calculated from the number of tracks and the length of each
track. Since lots of pop/rock/country CDs have around 10 songs and the
songs are about 3 minutes long each, there are a lot of collisions
in those types of songs. In other areas, like Classical Symphonies,
there are usually only 6 movements per Album, and there is a fair amount
of variance, so you don't see that many conflicts.
I don't know what the actual statistics are, but when you use
any of the rippers that talk to CDDB (or freedb) to get track info,
you see it 'frequently' pop up a selection dialog box to let
the user resolve the collisions.

Some metadata taggers store the value in the MP3/ogg/flac files,
others seem to not bother. It is really an "album" characteristic,
not a song/track characteristic.

An additional problem is that the cddb data (and the freedb data)
is not very clean. For many albums that I've prepared,
the dialog box from CDex will show three "names" that are clearly
the same album to a person, but that have differing genres, or
punctuation of the names (all the stuff that started this thread).

Still, it is a start, and with a little careful mangling, you
could probably automate 99% of the cases.


--
Pat
http://www.pfarrell.com/music/slimserver/slimsoftware.html

fuzzyT
2005-05-26, 14:51
Pat Farrell wrote:
> It is not about the name or title. And it is not associated with the
> specific track/song

all info concerning CDDB ID acknowledged. fit w/ what i know about it.

i'm not sure the CDDB ID is the best SS-AlbumID candidate, but i do
think the approach has merit.

problems with using CDDB directly include all of it's warts: needs
external access, requires writing to a track's tag data, is only best
gathered at CD Rip time, may be missing in many files, and may not be in
a standard tag.

what i was thinking of would be more of an analogue of that approach,
implemented in SS. while scanning tags, SS would look at a minimally
sufficient amount of tag and pathname data to identify albums. some
rules might need to fire to use different seed data for certain variants
(ex: if composer not null use composer, if compilation ignore artist,
etc). some condenation or has of this reviewed data could be stored in
the SS DB. it wouldn't make a good candidate Albums PK, but could prove
useful in doing 'SELECT DISTINCT's.

it would still depend on having pretty tight tags, but that's unavoidable.

just thinking out loud.

--rt

kolding
2005-05-26, 16:35
Here's the problem. When 'browsing artists' a number of albums that are not by that artist appear in the list. If I click on the album title, I then see a few songs (not all of them) with the other artists name displayed. I have finally seen the pattern here. The "duplicate" albums or songs are always titles that just happen to exist on an album by BOTH artists. For example: Footprints by Wayne Shorter and also by Miles Davis, on different albums. When viewing albums by Wayne Shorter I see all of his albums PLUS the one additional album by Miles Davis that also has a track called Footprints.


On a similar note, I'd like to see a change happen where we could have some way to tag albums by different artists that have the same name as different. SlimServer really only works on the album name, so "Up" by Great Big Sea and "Up" by Peter Gabriel should get marked as separate albums, but if you Browse by Album, you go into "Up" and see all the songs. And worse, you see them in track order, so you see track 1 from one album, track 1 from the other, track 2 from the first, track 2 from the second, etc, etc, etc...

It would be nice to be able to add some sort of AlbumID tag to work around this.

Dondi
2005-05-26, 16:43
On a similar note, I'd like to see a change happen where we could have some way to tag albums by different artists that have the same name as different. SlimServer really only works on the album name, so "Up" by Great Big Sea and "Up" by Peter Gabriel should get marked as separate albums, but if you Browse by Album, you go into "Up" and see all the songs. And worse, you see them in track order, so you see track 1 from one album, track 1 from the other, track 2 from the first, track 2 from the second, etc, etc, etc...

It would be nice to be able to add some sort of AlbumID tag to work around this.

This is exactly what the ALBUM ARTIST field is for and most Tagging applications already utilize this field i.e., Tag&Rename for example.

Here is a great thread discussing the value of the ALBUM ARTIST tag that may be insightful to some

http://www.thegreenbutton.com/community/shwmessage.aspx?ForumID=42&MessageID=86549

-- D

Josh Coalson
2005-05-27, 16:11
--- Pat Farrell <pfarrell (AT) pfarrell (DOT) com> wrote:

> On Thu, 2005-05-26 at 16:47 -0400, ron thigpen wrote:
> > Pat Farrell wrote:
> > > The "standard" way most software on the 'net identifies a CD is
> > > using the CDDB hash, which is not unique, neither is any rational
> > > combination of artist/album names.
> >
> > this is just a hash of the CD title track data isn't it? it might
> not
> > do the trick all by itself, but if it exists it would be a very
> strong
> > hint. hash/ID collisions should be reasonably rare. but not
> everyone's
> > tracks will have this tag.
>
> It is not about the name or title. And it is not associated with the
> specific track/song
>
> It is a hash calculated from the number of tracks and the length of
> each
> track. Since lots of pop/rock/country CDs have around 10 songs and
> the
> songs are about 3 minutes long each, there are a lot of collisions
> in those types of songs. In other areas, like Classical Symphonies,
> there are usually only 6 movements per Album, and there is a fair
> amount
> of variance, so you don't see that many conflicts.
> I don't know what the actual statistics are, but when you use
> any of the rippers that talk to CDDB (or freedb) to get track info,
> you see it 'frequently' pop up a selection dialog box to let
> the user resolve the collisions.

yes. CDDB has many collisions because it's a very bad hash.
it doesn't use enough bits from the CD TOC and the hashing
formula is naive. but there are better ones, like CDindex,
which uses most of the TOC bits and SHA-1 for the hash. I
use a similar hash for my own collection (which uses even
more TOC bits and a slightly better message formulation).
practically speaking collisions with these hashes only occur
for CDs with identical TOCs.

http://wiki.musicbrainz.org/wiki.pl?DiscIDCalculation

Josh




__________________________________
Discover Yahoo!
Have fun online with music videos, cool games, IM and more. Check it out!
http://discover.yahoo.com/online.html

jwgraves
2005-05-28, 04:37
It's not a bug per se, more of the way Slimserver does the albums. It already knows to look for "Greatest Hits" and "Best of" as an album name and split the albums out...but others (ie, "Footprints" in your case) will show up in the same list when browse by album is chosen.

In slimserver.. server settings...behavior there is a section called "Common Album Titles". You'll see Greatest Hits, Best of, and Live already listed there. Add "Footprints" (and any other duped album names) to the list, click change, and then do a cache wipe and re-scan. It should hopefully put them in seperate folders now. I'd add Best Of, The Best of, The Best Of as well...


hope it helps!


Thanks for the response, but I don't think that's what is happening here. The album is not duplicated, the song title is. In my example, Footprints is on Miles Davis' album "Miles Smiles" and is also on Wayne Shorter's album "Adams Apple", so I can't add an album called Footprints to the 'Common Album Titles' section.
Is Slimserver truly reading the MP3 tags or does it create it's music database from the directory structure?
It is definitely a bug (IMHO) because when I view an artist, Slimserver will show an album that is not by that artist, all because in that album is a song title that also happens to be by the original artist. This is wrong and does not happen on any other music software I use (including portable music players).

kdf
2005-05-28, 07:41
Quoting jwgraves <jwgraves.1pqpgn (AT) no-mx (DOT) forums.slimdevices.com>:

> Is Slimserver truly reading the MP3 tags or does it create it's music
> database from the directory structure?

yes, it definitely does use the tags. Only when no valid tags can be found,
will it then use something based on directory structure and matching the
patterns set in server settings, formats, Guess Tags.

> It is definitely a bug (IMHO) because when I view an artist, Slimserver
> will show an album that is not by that artist, all because in that album
> is a song title that also happens to be by the original artist. This is
> wrong and does not happen on any other music software I use (including
> portable music players).

I'd suggest that you check your tags first. The server will look for other
contributor tags, such as composer, band, orig artist etc. Those are included
as 'artists' for the purposes of browse by artist and more. There is a server
setting for this under server settings, behavior (I think, might be another
tab). The setting is Composer in Artists. You may have to rescan (or even
wipe cache) after changing this setting to reindex all the artist info from
your songs.

If you do eventually determine that this is now happening, please do file a bug
report and attach the two files in question that highlight this effect.

cheers,
kdf

-kdf

jwgraves
2005-05-30, 14:46
kdf,

Bingo! The files in question all had the "Composer" tag valued and that was causing them to appear in the artists list of albums and also the composers list of albums. I tried changing the setting under "Behavior" to not include "composer, band and orchestra" but that had no affect (even after clearing the cache). Only way to fix the problem was to clear out the "Composer" tag in all the files and clear the cache out and rescan the library.

Everything is working fine now....thanks for the help.