PDA

View Full Version : metadata musings [was SqueezeBox 2,Slimserver 6 and FLAC question]



Josh Coalson
2005-04-04, 21:56
--- Sean Goller <sean (AT) goller (DOT) net> wrote:
> Josh wrote:
>
> > I use single file with embedded cue for CDs. all my metadata is
> > in an external DB and I've been using custom s/w with a rio
> receiver
> > for playback. now that I have a SB2 to play with I want to look
> > into how to do the same thing (even if it means I have to tag
> > the FLACs somehow).
> >
> > Josh
> >
>
>
> Hi Josh,
> Would you be willing share a bit on how you've implemented this?
> I'm
> currently wrestling with how to store flac images in a meaningful
> way,
> and hadn't considered external storage for metadata. What do you
> actually name your images? Something human-readable, or something
> more
> processing-oriented like the md5sum of the flac file? would you be
> willing to share your schema?

I currently use mysql for the store. CD primary keys are formed
from a custom 160 bit hash I compute from the CD TOC data. it
is an SHA-1 digest of a message composed by packing as much bits
as possible from the TOC (including track type). it's similar to
the musicbrainz/cdindex hash but uses more bits from the TOC to
further reduce chances of collision.

the schema is a little more complicated than id3v1 but less than
v2. I have mostly classical music so the main thing I wanted was
separation of performer and composer, multiple performers/composers
possible, hierarchical genre and mood, and keywords, and a simple
query language that uses it all for quickly composing dynamic
playlists (think "play all slow chamber music by j.s. bach random").

I was planning to release it as open-source but never got the whole
way through. it's not in a shape to release, and metadata is a
personal thing; I was not sure that it would be as useful to anyone
else as it is to me.

Josh

Sean Goller
2005-04-05, 00:02
Josh Coalson wrote:
> --- Sean Goller <sean (AT) goller (DOT) net> wrote:
>
>>Josh wrote:
>>
>>
>>>I use single file with embedded cue for CDs. all my metadata is
>>>in an external DB and I've been using custom s/w with a rio
>>
>>receiver
>>
>>>for playback. now that I have a SB2 to play with I want to look
>>>into how to do the same thing (even if it means I have to tag
>>>the FLACs somehow).
>>>
>>>Josh
>>>
>>
>>
>>Hi Josh,
>> Would you be willing share a bit on how you've implemented this?
>>I'm
>>currently wrestling with how to store flac images in a meaningful
>>way,
>>and hadn't considered external storage for metadata. What do you
>>actually name your images? Something human-readable, or something
>>more
>>processing-oriented like the md5sum of the flac file? would you be
>>willing to share your schema?
>
>
> I currently use mysql for the store. CD primary keys are formed
> from a custom 160 bit hash I compute from the CD TOC data. it
> is an SHA-1 digest of a message composed by packing as much bits
> as possible from the TOC (including track type). it's similar to
> the musicbrainz/cdindex hash but uses more bits from the TOC to
> further reduce chances of collision.
>
> the schema is a little more complicated than id3v1 but less than
> v2. I have mostly classical music so the main thing I wanted was
> separation of performer and composer, multiple performers/composers
> possible, hierarchical genre and mood, and keywords, and a simple
> query language that uses it all for quickly composing dynamic
> playlists (think "play all slow chamber music by j.s. bach random").
>
> I was planning to release it as open-source but never got the whole
> way through. it's not in a shape to release, and metadata is a
> personal thing; I was not sure that it would be as useful to anyone
> else as it is to me.
>
> Josh
>

Cool! I'd still like to know what you name the actual .flac files,
though. The hash? I was considering just doing an md5 of the flac image,
but it sounds like yours is better.

Here's where I'm coming from, which is something of a black box
approach. I'd like to be able to take a physical CD, stick it in a
drive, and archive it in such a way that I can run automated tools
against it and Do Things. Things like automatically extract the entire
archive into a lossy format du jour. When I first looked at using
slimserver what I figured on doing was using an mp3 extract of the
archive for it. However now I'm seeing that flac image support is
somewhat in place, so applying slimserver to the archive itself seems
feasible. While a useful hack, I'm not so thrilled about using the
vorbis CUESHEET tag to store metadata long-term, especially since
updating/modifying the data seems like a non-trivial process. (at least,
I haven't seen anything other than metaflac or foobar2000 that'll do it,
and manipulating a multiline tag is annoying at best) Overall it just
seems better to separate the audiodata from the metadata entirely.

Right now I'm just trying to see if someone's got an existing schema
that gets me 90% of the way there so I can get things up and running
reasonably quickly. If I can do that, then I can see if I can make
slimserver deal with the archive setup. That gets me a fairly robust
music archive/jukebox like thing.

-Sean.

michael
2005-04-05, 12:52
Sean Goller <sean (AT) goller (DOT) net> writes:
....
> While a useful hack, I'm not so thrilled about using the
> vorbis CUESHEET tag to store metadata long-term, especially since
> updating/modifying the data seems like a non-trivial process. (at
> least, I haven't seen anything other than metaflac or foobar2000
> that'll do it, and manipulating a multiline tag is annoying at best)
> Overall it just seems better to separate the audiodata from the
> metadata entirely.
>
> Right now I'm just trying to see if someone's got an existing schema
> that gets me 90% of the way there so I can get things up and running
> reasonably quickly. If I can do that, then I can see if I can make
> slimserver deal with the archive setup. That gets me a fairly robust
> music archive/jukebox like thing.

For what it's worth, slimserver will read a musicbrainz xml
description of the flac file out of an app block. The thing that
isn't available yet is a tool to insert the xml into the flac. But
I'm sure it would be easy to make it read an external version of that
xml as well. And the tools to pull the data from musicbrainz would
lend themselves well to inclusion in an automated system like what you
describe.

-michael

--
"good, fast, cheap: pick any two. (you can't have all three)"
- RFC 1925 (http://www.ietf.org/rfc/rfc1925.txt)

Sean Goller
2005-04-05, 17:35
michael wrote:
> Sean Goller <sean (AT) goller (DOT) net> writes:
> ...
>
>>While a useful hack, I'm not so thrilled about using the
>>vorbis CUESHEET tag to store metadata long-term, especially since
>>updating/modifying the data seems like a non-trivial process. (at
>>least, I haven't seen anything other than metaflac or foobar2000
>>that'll do it, and manipulating a multiline tag is annoying at best)
>>Overall it just seems better to separate the audiodata from the
>>metadata entirely.
>>
>>Right now I'm just trying to see if someone's got an existing schema
>>that gets me 90% of the way there so I can get things up and running
>>reasonably quickly. If I can do that, then I can see if I can make
>>slimserver deal with the archive setup. That gets me a fairly robust
>>music archive/jukebox like thing.
>
>
> For what it's worth, slimserver will read a musicbrainz xml
> description of the flac file out of an app block. The thing that
> isn't available yet is a tool to insert the xml into the flac. But
> I'm sure it would be easy to make it read an external version of that
> xml as well. And the tools to pull the data from musicbrainz would
> lend themselves well to inclusion in an automated system like what you
> describe.
>
> -michael
>
> --
> "good, fast, cheap: pick any two. (you can't have all three)"
> - RFC 1925 (http://www.ietf.org/rfc/rfc1925.txt)
>

Err, could you expand on that a bit? Do you mean the xml description of
the cd represented by the flac file? I've been futzing with the internal
guts of libmusicbrainz all day, and I have code now that generates the
proper musicbrainz DiscID given an flac-generated (NOT EAC) cuesheet
text file. Now I'm working on turning that into code that just extracts
the cuesheet data from the image itself and bypasses the whole textfile
step.

So with respect to fetching things from musicbrainz, I'm at
http://musicbrainz.org/cdindex/69WPuFcKSoXA4Trt1kY4tGSg6Vo- here with my
test disc/image. How does that relate to what you're talking about? Are
you talking about the information at:
http://mm.musicbrainz.org/mm-2.1/album/94348021-7c03-4a58-bfcb-ee865f449200
?

If so that means all that's left is some way of transforming a DiscID
into a GUID. Scanning musicbrainz' wiki, it looks like
MBQ_GetCDInfoFromCDIndexId will get me a list of album GUIDs that are
linked to that DiscId, which gets me the RDF url above.

-S.

michael
2005-04-06, 13:19

michael
2005-04-06, 13:23

Sean Goller
2005-04-06, 18:23
michael wrote:
> Sean Goller <sean (AT) goller (DOT) net> writes:
> ...
>
>>>>Right now I'm just trying to see if someone's got an existing schema
>>>>that gets me 90% of the way there so I can get things up and running
>>>>reasonably quickly. If I can do that, then I can see if I can make
>>>>slimserver deal with the archive setup. That gets me a fairly robust
>>>>music archive/jukebox like thing.
>>>
>>>For what it's worth, slimserver will read a musicbrainz xml
>>>description of the flac file out of an app block. The thing that
>>>isn't available yet is a tool to insert the xml into the flac. But
>>>I'm sure it would be easy to make it read an external version of that
>>>xml as well. And the tools to pull the data from musicbrainz would
>>>lend themselves well to inclusion in an automated system like what you
>>>describe.
>
> ...
>
>>Err, could you expand on that a bit? Do you mean the xml description
>>of the cd represented by the flac file?
>
>
> yes.
>
>
>>I've been futzing with the
>>internal guts of libmusicbrainz all day, and I have code now that
>>generates the proper musicbrainz DiscID given an flac-generated (NOT
>>EAC) cuesheet text file. Now I'm working on turning that into code
>>that just extracts the cuesheet data from the image itself and
>>bypasses the whole textfile step.
>
>
> Would you be willing to share what you've put together? I've been
> working on the same thing, but so far my calculated DiscID doesn't
> match the one cdlookup returns.
>

Certainly. I managed to get a version working that operates directly on
the flac image. I'll warn you, it's frankensteined all to hell C code
and doesn't properly deal with byte order issues automatically. (the
code I'm putting up will work properly out of the box on x86) but it
gets the job done. It's at http://www.goller.net/imageid/imageid.tar.gz

I think the next step is to use the musicbrainz DiscID object to do the
generation instead, because I missed that on my first pass through the
library, and it's easier if musicbrainz maintains that particular part
of the code instead of me, or whomever. :)

>
>>So with respect to fetching things from musicbrainz, I'm at
>>http://musicbrainz.org/cdindex/69WPuFcKSoXA4Trt1kY4tGSg6Vo- here with
>>my test disc/image. How does that relate to what you're talking about?
>>Are you talking about the information at:
>>http://mm.musicbrainz.org/mm-2.1/album/94348021-7c03-4a58-bfcb-ee865f449200
>>?
>
>
> Essentially, but it needs to go one level of detail deeper than the
> web page does. (when you're querying their server directly, you can
> specify how deep to go.) So the result you're looking for in this case
> would look like the example I've attached.
>

Could you give me the actual url used to fetch that example? I'm being
dim I know, but it would help me out. :)


>
>>If so that means all that's left is some way of transforming a DiscID
>>into a GUID. Scanning musicbrainz' wiki, it looks like
>>MBQ_GetCDInfoFromCDIndexId will get me a list of album GUIDs that are
>>linked to that DiscId, which gets me the RDF url above.
>
>
> Sounds like you're on the right track. The only tricky bit is that to
> get both the album and the song detail, you have to query the server
> directly. (or at least I haven't found an easy way to do it through
> the web interface.)
>
> Now all you need to do it either stuff that xml into an application
> block in your flac, or modify slim server to recognise it external to
> your flac file. :)
>
> I hope that helps.
>
> -michael


That works. It seems to me that the simplest thing to do for a "new"
flac image would be to calculate the CDIndexId then fetch the metadata
from musicbrainz, caching it all in an external source (filesystem or
db) for future reference and keep the file itself pristine. This is one
way of automatically fixing the "outdated tags" problem that users of
metadata storage sites like musicbrainz have, with no ambiguity. If
SlimServer refetches the metadata once every N image references (20?
100?) then you're guaranteed to have about as up to date information as
you're going to get. (without pissing off musicbrainz for eating all
their bandwidth, that is)

So since all this discussion is in the context of SlimServer, I guess
the way forward (at least from what I'm interested in) is to use the
perl module for accessing FLAC internal cuesheets, tie that to either a
perl binding to the musicbrainz DiscId object (best choice) or a
separate implementation of CDIndexId generation (reasonable) and then
use that as a basis for obtaining metadata within the server.

-Sean.

michael
2005-04-07, 18:06
Sean Goller <sean (AT) goller (DOT) net> writes:
....
>>>I've been futzing with the
>>>internal guts of libmusicbrainz all day, and I have code now that
>>>generates the proper musicbrainz DiscID given an flac-generated (NOT
>>>EAC) cuesheet text file. Now I'm working on turning that into code
>>>that just extracts the cuesheet data from the image itself and
>>>bypasses the whole textfile step.
>>
>> Would you be willing to share what you've put together? I've been
>> working on the same thing, but so far my calculated DiscID doesn't
>> match the one cdlookup returns.
>
> Certainly. I managed to get a version working that operates directly
> on the flac image. I'll warn you, it's frankensteined all to hell C
> code and doesn't properly deal with byte order issues
> automatically. (the code I'm putting up will work properly out of the
> box on x86) but it gets the job done. It's at
> http://www.goller.net/imageid/imageid.tar.gz

Thanks. I'll take a look at that and see if it reveal what I've been
doing wrong in mine.

> I think the next step is to use the musicbrainz DiscID object to do
> the generation instead, because I missed that on my first pass through
> the library, and it's easier if musicbrainz maintains that particular
> part of the code instead of me, or whomever. :)

Yeah, I tried to do it by hand myself. It seemed easy enough from the
description of the process they have up. Oh well.

>>>So with respect to fetching things from musicbrainz, I'm at
>>>http://musicbrainz.org/cdindex/69WPuFcKSoXA4Trt1kY4tGSg6Vo- here with
>>> my test disc/image. How does that relate to what you're talking
>>> about? Are you talking about the information at:
>>>http://mm.musicbrainz.org/mm-2.1/album/94348021-7c03-4a58-bfcb-ee865f449200
>>>?
>> Essentially, but it needs to go one level of detail deeper than the
>> web page does. (when you're querying their server directly, you can
>> specify how deep to go.) So the result you're looking for in this case
>> would look like the example I've attached.
>>
>
> Could you give me the actual url used to fetch that example? I'm being
> dim I know, but it would help me out. :)

Well, as I said that's the tricky part. As far as I can tell, there's
not a way to control the query depth through the web interface.
Awhile ago I hacked one of their example programs (getalbum I think)
to spit out the data like what was attached. The example using your
disc was, well, an example. I quickly cut-n-pasted it together from
four different web urls. The data's all there, just not in one concise
url. :)

....

> That works. It seems to me that the simplest thing to do for a "new"
> flac image would be to calculate the CDIndexId then fetch the metadata
> from musicbrainz, caching it all in an external source (filesystem or
> db) for future reference and keep the file itself pristine.

I'm hoping to go one step further and stuff the data into an app
block in the flac itself. I've done this by hand, and it works. Now
I just need to put together a tool that takes a flac with a cuesheet,
calculates the id, does the lookup from the server, and stuffs the
results back into the file. (Then once that's working well, pester the
musicbrainz folks into integrating it into the next version of their
tagger tool.)
But I can certainly see the appeal of keeping that data in a db
instead/as well.

> This is
> one way of automatically fixing the "outdated tags" problem that users
> of metadata storage sites like musicbrainz have, with no ambiguity. If
> SlimServer refetches the metadata once every N image references (20?
> 100?) then you're guaranteed to have about as up to date information
> as you're going to get. (without pissing off musicbrainz for eating
> all their bandwidth, that is)
>
> So since all this discussion is in the context of SlimServer, I guess
> the way forward (at least from what I'm interested in) is to use the
> perl module for accessing FLAC internal cuesheets, tie that to either
> a perl binding to the musicbrainz DiscId object (best choice) or a
> separate implementation of CDIndexId generation (reasonable) and then
> use that as a basis for obtaining metadata within the server.

Sounds like a good plan. Let me know how you progress.

-michael

--
Dear Dad: Hate you, eloping with Mom. Taking your cigars
and sports car. -- Love, Sigmund