PDA

View Full Version : metadata musings



Mike Hartley
2005-04-06, 18:56
Sean,
PLEASE forgive my ignorance on this topic, but I'm interested because I am setting up a new dedicated server on a 3G wireless network for a new SB2 and am making format decisions. Specifically, this discussion of meta-data for the file has me interested. Is this basic track, album and articst information, or are you discussing additional information beyond this?

I'm trying to decide whether to keep the server as is with Win XP and copy my files over in their current state as WMA lossless or re-rip or convert everything to FLAC. I was also considering the radical step of using a Linux Deb distro for the server in conjunction with FLAC and bagging Win XP altogether.

However, this and some other research are giving me some second thoughts. From this and other discussions here, it seems that FLAC files can be difficult to create and manage. If the info you are talking about is the title/album/track/genre stuff, I find Win Media player retrieves it pretty reliably and ripping is super easy. Forgetting my inherent dislike of MS, if it's really that much more of a pain to manage ripping and tagging in FLAC, what's the advantage other than open format?

Help me out here! :-)

Mike

-----Original Message-----
From: Sean Goller [mailto:sean (AT) goller (DOT) net]
Sent: Wed 4/6/2005 9:23 PM
To: Slim Devices Discussion
Cc:
Subject: Re: [slim] metadata musings



michael wrote:
> Sean Goller <sean (AT) goller (DOT) net> writes:
> ...
>
>>>>Right now I'm just trying to see if someone's got an existing schema
>>>>that gets me 90% of the way there so I can get things up and running
>>>>reasonably quickly. If I can do that, then I can see if I can make
>>>>slimserver deal with the archive setup. That gets me a fairly robust
>>>>music archive/jukebox like thing.
>>>
>>>For what it's worth, slimserver will read a musicbrainz xml
>>>description of the flac file out of an app block. The thing that
>>>isn't available yet is a tool to insert the xml into the flac. But
>>>I'm sure it would be easy to make it read an external version of that
>>>xml as well. And the tools to pull the data from musicbrainz would
>>>lend themselves well to inclusion in an automated system like what you
>>>describe.
>
> ...
>
>>Err, could you expand on that a bit? Do you mean the xml description
>>of the cd represented by the flac file?
>
>
> yes.
>
>
>>I've been futzing with the
>>internal guts of libmusicbrainz all day, and I have code now that
>>generates the proper musicbrainz DiscID given an flac-generated (NOT
>>EAC) cuesheet text file. Now I'm working on turning that into code
>>that just extracts the cuesheet data from the image itself and
>>bypasses the whole textfile step.
>
>
> Would you be willing to share what you've put together? I've been
> working on the same thing, but so far my calculated DiscID doesn't
> match the one cdlookup returns.
>

Certainly. I managed to get a version working that operates directly on
the flac image. I'll warn you, it's frankensteined all to hell C code
and doesn't properly deal with byte order issues automatically. (the
code I'm putting up will work properly out of the box on x86) but it
gets the job done. It's at http://www.goller.net/imageid/imageid.tar.gz

I think the next step is to use the musicbrainz DiscID object to do the
generation instead, because I missed that on my first pass through the
library, and it's easier if musicbrainz maintains that particular part
of the code instead of me, or whomever. :)

>
>>So with respect to fetching things from musicbrainz, I'm at
>>http://musicbrainz.org/cdindex/69WPuFcKSoXA4Trt1kY4tGSg6Vo- here with
>>my test disc/image. How does that relate to what you're talking about?
>>Are you talking about the information at:
>>http://mm.musicbrainz.org/mm-2.1/album/94348021-7c03-4a58-bfcb-ee865f449200
>>?
>
>
> Essentially, but it needs to go one level of detail deeper than the
> web page does. (when you're querying their server directly, you can
> specify how deep to go.) So the result you're looking for in this case
> would look like the example I've attached.
>

Could you give me the actual url used to fetch that example? I'm being
dim I know, but it would help me out. :)


>
>>If so that means all that's left is some way of transforming a DiscID
>>into a GUID. Scanning musicbrainz' wiki, it looks like
>>MBQ_GetCDInfoFromCDIndexId will get me a list of album GUIDs that are
>>linked to that DiscId, which gets me the RDF url above.
>
>
> Sounds like you're on the right track. The only tricky bit is that to
> get both the album and the song detail, you have to query the server
> directly. (or at least I haven't found an easy way to do it through
> the web interface.)
>
> Now all you need to do it either stuff that xml into an application
> block in your flac, or modify slim server to recognise it external to
> your flac file. :)
>
> I hope that helps.
>
> -michael


That works. It seems to me that the simplest thing to do for a "new"
flac image would be to calculate the CDIndexId then fetch the metadata
from musicbrainz, caching it all in an external source (filesystem or
db) for future reference and keep the file itself pristine. This is one
way of automatically fixing the "outdated tags" problem that users of
metadata storage sites like musicbrainz have, with no ambiguity. If
SlimServer refetches the metadata once every N image references (20?
100?) then you're guaranteed to have about as up to date information as
you're going to get. (without pissing off musicbrainz for eating all
their bandwidth, that is)

So since all this discussion is in the context of SlimServer, I guess
the way forward (at least from what I'm interested in) is to use the
perl module for accessing FLAC internal cuesheets, tie that to either a
perl binding to the musicbrainz DiscId object (best choice) or a
separate implementation of CDIndexId generation (reasonable) and then
use that as a basis for obtaining metadata within the server.

-Sean.

Sean Goller
2005-04-06, 19:28
Mike Hartley wrote:
> Sean,
> PLEASE forgive my ignorance on this topic, but I'm interested because I am setting up a new dedicated server on a 3G wireless network for a new SB2 and am making format decisions. Specifically, this discussion of meta-data for the file has me interested. Is this basic track, album and articst information, or are you discussing additional information beyond this?
>
> I'm trying to decide whether to keep the server as is with Win XP and copy my files over in their current state as WMA lossless or re-rip or convert everything to FLAC. I was also considering the radical step of using a Linux Deb distro for the server in conjunction with FLAC and bagging Win XP altogether.
>
> However, this and some other research are giving me some second thoughts. From this and other discussions here, it seems that FLAC files can be difficult to create and manage. If the info you are talking about is the title/album/track/genre stuff, I find Win Media player retrieves it pretty reliably and ripping is super easy. Forgetting my inherent dislike of MS, if it's really that much more of a pain to manage ripping and tagging in FLAC, what's the advantage other than open format?
>
> Help me out here! :-)
>
> Mike
>

Warning: Grandiose Plan Discussion Ahead

What I am trying to accomplish is to create a complete lossless archive
of all my CDs. By "complete" I mean I can accomplish anything i'd want
to do with the music on the original CD with the data in this archive,
without having to go through the drudgery of swapping CDs for days when
I want to perform some operation on my entire collection. (like encode
to the new lossy format of the day) FLAC gets me that, because I can use
EAC to rip the CD to WAV, encode that WAV to FLAC and embed the cuesheet
into the resulting image. That single file has enough information in it
to get me metadata information from musicbrainz or freedb whenever I
want it. I don't have to worry about the directory structure I've chosen
to store the images in, filename conventions, or anything. All I need is
the data in that file, and that data will *never* change. For all
intents and purposes I can mount whatever drive it's on read-only.

This attitude towards metadata is especially key with "relatively"
volatile metadata storage, like musicbrainz. If I treat metadata as
being external information that's cached as opposed to *part of the data
itself* then it's much simpler to update and maintain. The simplest and
most resource-abusive example is every time slimserver loads up a FLAC
image, it calculates the CDIndexId and fetches the metadata from
musicbrainz and caches it locally for reference while any track from it
is being played. Now SlimServer is *guaranteed* to have up-to-the-second
track information from musicbrainz. Would I suggest doing this? Heck no,
I pay for server bandwidth too. But writing a script that once a month
slowly walks through your collection updating metadata isn't beyond the
realm of possiblity.

In specific response to your question, it doesn't matter what the
metadata is. Since the CD's TOC is derivable directly from the FLAC
image, you can use that information to generate whatever token
(musicbrainz CDIndexId, freedb DiscID, what have) you need to get
metadata from the internet.

I guess part of this comes from how you view music. Most ripping
software uses the track as the atomic unit of data, since that's how
people listen to it, individual songs. But once you start talking about
keeping a network-based copy of your entire music collection around, it
doesn't really make as much sense to break it down that far. FLAC
(especially with version 1.1.2) makes it very easy to extract TOC-based
sections of data from an image, which means it's easy to create
track-based "export" versions of your library. I don't care if the hot
new music player only plays AAC and doesn't understand FLAC, I can write
a script that walks over my entire archive and export everything to AAC,
with the appropriate metadata fetched from a cache or on the fly. And
two years later, if I junk that player and get another one that only
plays OGG, I can delete my AAC library and re-export everything to OGG,
with little to no human intervention. SlimServer doesn't care if your
music storage is track-based or album-based, as long as it can get to an
individual track and provide metadata to the listener.

As for FLAC creation and management, I highly suggest you look at
flacattack (http://www.uninformative.com/flacattack/) which coupled with
EAC (http://www.exactaudiocopy.org) is what I'm using to create FLAC
images. It's simple and relatively fast, depending on your hardware.
What I will probably end up doing is making a modified version of
flacattack that names the file according to the musicbrainz CDIndexId,
or Josh's hash if he cares to share the algorithm. :)

Whew!

Hope that answers your question. :)

-Sean.

Mike Hartley
2005-04-07, 06:03
Sean,
Thanks for the detail. And, yes, it does clarify. Funny you brought up
EAC. I just downloaded it last night. But all I could find on the site was
a series of Betas with no final. Are you using the latest Beta version, or
am I looking in the wrong place?

Mike
----- Original Message -----
From: "Sean Goller" <sean (AT) goller (DOT) net>
To: "Slim Devices Discussion" <discuss (AT) lists (DOT) slimdevices.com>
Sent: Wednesday, April 06, 2005 10:28 PM
Subject: Re: [slim] metadata musings


> Mike Hartley wrote:
> > Sean,
> > PLEASE forgive my ignorance on this topic, but I'm interested because I
am setting up a new dedicated server on a 3G wireless network for a new SB2
and am making format decisions. Specifically, this discussion of meta-data
for the file has me interested. Is this basic track, album and articst
information, or are you discussing additional information beyond this?
> >
> > I'm trying to decide whether to keep the server as is with Win XP and
copy my files over in their current state as WMA lossless or re-rip or
convert everything to FLAC. I was also considering the radical step of
using a Linux Deb distro for the server in conjunction with FLAC and bagging
Win XP altogether.
> >
> > However, this and some other research are giving me some second
thoughts. From this and other discussions here, it seems that FLAC files can
be difficult to create and manage. If the info you are talking about is the
title/album/track/genre stuff, I find Win Media player retrieves it pretty
reliably and ripping is super easy. Forgetting my inherent dislike of MS,
if it's really that much more of a pain to manage ripping and tagging in
FLAC, what's the advantage other than open format?
> >
> > Help me out here! :-)
> >
> > Mike
> >
>
> Warning: Grandiose Plan Discussion Ahead
>
> What I am trying to accomplish is to create a complete lossless archive
> of all my CDs. By "complete" I mean I can accomplish anything i'd want
> to do with the music on the original CD with the data in this archive,
> without having to go through the drudgery of swapping CDs for days when
> I want to perform some operation on my entire collection. (like encode
> to the new lossy format of the day) FLAC gets me that, because I can use
> EAC to rip the CD to WAV, encode that WAV to FLAC and embed the cuesheet
> into the resulting image. That single file has enough information in it
> to get me metadata information from musicbrainz or freedb whenever I
> want it. I don't have to worry about the directory structure I've chosen
> to store the images in, filename conventions, or anything. All I need is
> the data in that file, and that data will *never* change. For all
> intents and purposes I can mount whatever drive it's on read-only.
>
> This attitude towards metadata is especially key with "relatively"
> volatile metadata storage, like musicbrainz. If I treat metadata as
> being external information that's cached as opposed to *part of the data
> itself* then it's much simpler to update and maintain. The simplest and
> most resource-abusive example is every time slimserver loads up a FLAC
> image, it calculates the CDIndexId and fetches the metadata from
> musicbrainz and caches it locally for reference while any track from it
> is being played. Now SlimServer is *guaranteed* to have up-to-the-second
> track information from musicbrainz. Would I suggest doing this? Heck no,
> I pay for server bandwidth too. But writing a script that once a month
> slowly walks through your collection updating metadata isn't beyond the
> realm of possiblity.
>
> In specific response to your question, it doesn't matter what the
> metadata is. Since the CD's TOC is derivable directly from the FLAC
> image, you can use that information to generate whatever token
> (musicbrainz CDIndexId, freedb DiscID, what have) you need to get
> metadata from the internet.
>
> I guess part of this comes from how you view music. Most ripping
> software uses the track as the atomic unit of data, since that's how
> people listen to it, individual songs. But once you start talking about
> keeping a network-based copy of your entire music collection around, it
> doesn't really make as much sense to break it down that far. FLAC
> (especially with version 1.1.2) makes it very easy to extract TOC-based
> sections of data from an image, which means it's easy to create
> track-based "export" versions of your library. I don't care if the hot
> new music player only plays AAC and doesn't understand FLAC, I can write
> a script that walks over my entire archive and export everything to AAC,
> with the appropriate metadata fetched from a cache or on the fly. And
> two years later, if I junk that player and get another one that only
> plays OGG, I can delete my AAC library and re-export everything to OGG,
> with little to no human intervention. SlimServer doesn't care if your
> music storage is track-based or album-based, as long as it can get to an
> individual track and provide metadata to the listener.
>
> As for FLAC creation and management, I highly suggest you look at
> flacattack (http://www.uninformative.com/flacattack/) which coupled with
> EAC (http://www.exactaudiocopy.org) is what I'm using to create FLAC
> images. It's simple and relatively fast, depending on your hardware.
> What I will probably end up doing is making a modified version of
> flacattack that names the file according to the musicbrainz CDIndexId,
> or Josh's hash if he cares to share the algorithm. :)
>
> Whew!
>
> Hope that answers your question. :)
>
> -Sean.
>

Mike Hartley
2005-04-07, 06:03
Sean,
Thanks for the detail. And, yes, it does clarify. Funny you brought up
EAC. I just downloaded it last night. But all I could find on the site was
a series of Betas with no final. Are you using the latest Beta version, or
am I looking in the wrong place?

Mike
----- Original Message -----
From: "Sean Goller" <sean (AT) goller (DOT) net>
To: "Slim Devices Discussion" <discuss (AT) lists (DOT) slimdevices.com>
Sent: Wednesday, April 06, 2005 10:28 PM
Subject: Re: [slim] metadata musings


> Mike Hartley wrote:
> > Sean,
> > PLEASE forgive my ignorance on this topic, but I'm interested because I
am setting up a new dedicated server on a 3G wireless network for a new SB2
and am making format decisions. Specifically, this discussion of meta-data
for the file has me interested. Is this basic track, album and articst
information, or are you discussing additional information beyond this?
> >
> > I'm trying to decide whether to keep the server as is with Win XP and
copy my files over in their current state as WMA lossless or re-rip or
convert everything to FLAC. I was also considering the radical step of
using a Linux Deb distro for the server in conjunction with FLAC and bagging
Win XP altogether.
> >
> > However, this and some other research are giving me some second
thoughts. From this and other discussions here, it seems that FLAC files can
be difficult to create and manage. If the info you are talking about is the
title/album/track/genre stuff, I find Win Media player retrieves it pretty
reliably and ripping is super easy. Forgetting my inherent dislike of MS,
if it's really that much more of a pain to manage ripping and tagging in
FLAC, what's the advantage other than open format?
> >
> > Help me out here! :-)
> >
> > Mike
> >
>
> Warning: Grandiose Plan Discussion Ahead
>
> What I am trying to accomplish is to create a complete lossless archive
> of all my CDs. By "complete" I mean I can accomplish anything i'd want
> to do with the music on the original CD with the data in this archive,
> without having to go through the drudgery of swapping CDs for days when
> I want to perform some operation on my entire collection. (like encode
> to the new lossy format of the day) FLAC gets me that, because I can use
> EAC to rip the CD to WAV, encode that WAV to FLAC and embed the cuesheet
> into the resulting image. That single file has enough information in it
> to get me metadata information from musicbrainz or freedb whenever I
> want it. I don't have to worry about the directory structure I've chosen
> to store the images in, filename conventions, or anything. All I need is
> the data in that file, and that data will *never* change. For all
> intents and purposes I can mount whatever drive it's on read-only.
>
> This attitude towards metadata is especially key with "relatively"
> volatile metadata storage, like musicbrainz. If I treat metadata as
> being external information that's cached as opposed to *part of the data
> itself* then it's much simpler to update and maintain. The simplest and
> most resource-abusive example is every time slimserver loads up a FLAC
> image, it calculates the CDIndexId and fetches the metadata from
> musicbrainz and caches it locally for reference while any track from it
> is being played. Now SlimServer is *guaranteed* to have up-to-the-second
> track information from musicbrainz. Would I suggest doing this? Heck no,
> I pay for server bandwidth too. But writing a script that once a month
> slowly walks through your collection updating metadata isn't beyond the
> realm of possiblity.
>
> In specific response to your question, it doesn't matter what the
> metadata is. Since the CD's TOC is derivable directly from the FLAC
> image, you can use that information to generate whatever token
> (musicbrainz CDIndexId, freedb DiscID, what have) you need to get
> metadata from the internet.
>
> I guess part of this comes from how you view music. Most ripping
> software uses the track as the atomic unit of data, since that's how
> people listen to it, individual songs. But once you start talking about
> keeping a network-based copy of your entire music collection around, it
> doesn't really make as much sense to break it down that far. FLAC
> (especially with version 1.1.2) makes it very easy to extract TOC-based
> sections of data from an image, which means it's easy to create
> track-based "export" versions of your library. I don't care if the hot
> new music player only plays AAC and doesn't understand FLAC, I can write
> a script that walks over my entire archive and export everything to AAC,
> with the appropriate metadata fetched from a cache or on the fly. And
> two years later, if I junk that player and get another one that only
> plays OGG, I can delete my AAC library and re-export everything to OGG,
> with little to no human intervention. SlimServer doesn't care if your
> music storage is track-based or album-based, as long as it can get to an
> individual track and provide metadata to the listener.
>
> As for FLAC creation and management, I highly suggest you look at
> flacattack (http://www.uninformative.com/flacattack/) which coupled with
> EAC (http://www.exactaudiocopy.org) is what I'm using to create FLAC
> images. It's simple and relatively fast, depending on your hardware.
> What I will probably end up doing is making a modified version of
> flacattack that names the file according to the musicbrainz CDIndexId,
> or Josh's hash if he cares to share the algorithm. :)
>
> Whew!
>
> Hope that answers your question. :)
>
> -Sean.
>

Sean Goller
2005-04-07, 09:58
Mike Hartley wrote:
> Sean,
> Thanks for the detail. And, yes, it does clarify. Funny you brought up
> EAC. I just downloaded it last night. But all I could find on the site was
> a series of Betas with no final. Are you using the latest Beta version, or
> am I looking in the wrong place?
>
> Mike

The latest beta is the right one. EAC has been in perpetual beta since
forever. :)

It's pretty stable though. I don't think it's ever crashed on me.

-S.

Christian Pernegger
2005-04-07, 12:04
I finally got the studio version of Pink Floyd's The Wall and
ripped/encoded it as FLAC, as always. The album was playing in
original order and on repeat...
The last track of disc 2 ended with someone talking, who gets cut off
rather apruptly, followed by about half a second of silence, then some
more silence at the beginning of disc 1, and more or less the same
melody continued for a few seconds.

Combined with the fact that the sb2 display is not exactly in sync
with the audio at the end of songs I first thought I was seeing things
and invalid data in the buffer, but foobar2000 does the same thing,
the last track is cut off. Is it supposed to be this way or is my rip
bad? I'm afraid my last 'real' cd player died some time ago so I can't
really check if I'm missing the last few samples.

Thanks,

C.

JJ
2005-04-07, 12:13
Can't you play red-book audio using your PC's CD/DVD drive? If you
connect the drive's audio output to the system's sound card you should be
able to.


----- Original Message -----
From: "Christian Pernegger" <pernegger (AT) gmail (DOT) com>
To: "Slim Devices Discussion" <discuss (AT) lists (DOT) slimdevices.com>
Sent: Thursday, April 07, 2005 1:04 PM
Subject: [slim] OT: Pink Floyd - The Wall strangeness


>I finally got the studio version of Pink Floyd's The Wall and
> ripped/encoded it as FLAC, as always. The album was playing in
> original order and on repeat...
> The last track of disc 2 ended with someone talking, who gets cut off
> rather apruptly, followed by about half a second of silence, then some
> more silence at the beginning of disc 1, and more or less the same
> melody continued for a few seconds.
>
> Combined with the fact that the sb2 display is not exactly in sync
> with the audio at the end of songs I first thought I was seeing things
> and invalid data in the buffer, but foobar2000 does the same thing,
> the last track is cut off. Is it supposed to be this way or is my rip
> bad? I'm afraid my last 'real' cd player died some time ago so I can't
> really check if I'm missing the last few samples.

Christian Pernegger
2005-04-07, 12:18
> Can't you play red-book audio using your PC's CD/DVD drive? If you
> connect the drive's audio output to the system's sound card you should be
> able to.

I'd have to dig up one of these analog CD audio cables and rummage
around in the machine - I'd rather not. Digital CD playback works fine
of course, but that's no different than ripping in the first place.

C.

Steve Bernard, Jr
2005-04-07, 12:29
On Apr 7, 2005 3:04 PM, Christian Pernegger <pernegger (AT) gmail (DOT) com> wrote:
> I finally got the studio version of Pink Floyd's The Wall and
> ripped/encoded it as FLAC, as always. The album was playing in
> original order and on repeat...
> The last track of disc 2 ended with someone talking, who gets cut off
> rather apruptly, followed by about half a second of silence, then some
> more silence at the beginning of disc 1, and more or less the same
> melody continued for a few seconds.

The snippet of "Outside the Wall" at the beginning of "In the Flesh?"
(the first song on the first disc) is a deliberate cyclical thing. If
you listen closely, at the very end of "Outside the Wall", the voice
begins, "Isn't this where-" and gets cut off. The beginning of "In
the Flesh?" continues the sentence with "-we came in?" with some more
of the accordion melody before the actual first song comes in.

I dunno if it was cut in such a way that it'd be seamless if played on
endless repeat like that, but it should be close. The little gap
might be part of your FLACs or inserted as the server queues up the
first album again. But the duplication of the music at the very
beginning of the record is supposed to be there.

-Steve

Christian Pernegger
2005-04-07, 12:54
> The snippet of "" at the beginning of "In the Flesh?"
> (the first song on the first disc) is a deliberate cyclical thing. If
> you listen closely, at the very end of "Outside the Wall", the voice
> begins, "Isn't this where-" and gets cut off. The beginning of "In
> the Flesh?" continues the sentence with "-we came in?" with some more
> of the accordion melody before the actual first song comes in.
>
> I dunno if it was cut in such a way that it'd be seamless if played on
> endless repeat like that, but it should be close.

Ah, k. I didn't get what they were actually saying at the end of
Outside the Wall - so there really is nothing missing, just an extra
second of silence. Hmm, maybe I'll change it with a wave editor to be
truly cyclic. The idea is kind of neat.

Anyway, thanks for clearing that up!

C.

Christian Pernegger
2005-04-08, 04:24
> > The snippet of "" at the beginning of "In the Flesh?"
> > (the first song on the first disc) is a deliberate cyclical thing. If
> > you listen closely, at the very end of "Outside the Wall", the voice
> > begins, "Isn't this where-" and gets cut off. The beginning of "In
> > the Flesh?" continues the sentence with "-we came in?" with some more
> > of the accordion melody before the actual first song comes in.
> >
> > I dunno if it was cut in such a way that it'd be seamless if played on
> > endless repeat like that, but it should be close.

I had another look at the files and indeed there shouldn't be much of
a gap at all.
How does slimserver handle playlists (or songs) on repeat? Is the
playlist in this case truly circular and does slimserver fill the
SB2's buffer with the first samples of the first track while the last
track is running or does it let the buffer empty itself and upon
completion of the playlist starts it again? The second method would
explain the gap.

Granted, that's not a problem on most music CDs but I can imagine
gapless repeat to be of use to people with ambient sond CDs and the
like.

C.