PDA

View Full Version : New Schema



dean
2008-09-27, 07:35
I thought I'd note here that Brandon has begun development on a major
rework of the SqueezeCenter database back-end and schema.

It's an ambitious effort, but his initial design is quite exciting.

He's put some notes up at http://wiki.slimdevices.com/index.php/NewSchema

As he says there: "Soon I'll have some very rough draft code running,
which will make it easier for us to discuss and fix all of the issues."

Look for more news soon.

-dean

SuperQ
2008-09-27, 09:01
I like it so far, I see it lists "multiple users/libraries" That's great for some of the uses I use the squeezebox. At work we have a squeezecenter server which allows a number of users to dump music to. With multi-library we could allow users to use their own squeezebox/softsqueeze, but also still allow everyone to select from all libraries what is being played over our squeezebox with the stereo attached.

funkstar
2008-09-27, 09:22
Sounds great, that Wiki page does make interesting reading. Looking forward to hearing about more details etc.

Philip Meyer
2008-09-27, 09:33
>I thought I'd note here that Brandon has begun development on a major
>rework of the SqueezeCenter database back-end and schema.
>
>It's an ambitious effort, but his initial design is quite exciting.
>
>He's put some notes up at http://wiki.slimdevices.com/index.php/NewSchema
>
The design goals look great - includes many things that people would like to see. Including all possible artist contributors is a benefit I wasn't expecting (I would make use of Mix artist and cover band).

I'm worried about the move back to SQLite though. Does the database engine matter - could it be a design goal that it would work with any arbitrary database engine? It does sound like SQLite has improved over the last 2 years, but what are the actual reasons for moving away from MySQL? Is the memory footprint or performance of MySQL the concern? I like having my own external MySQL instance so I can poke around inside it, run my own queries, etc. Third-party plugins/applications also connect to a MySQL instance, eg. Moose, although they will need rewrites for the schema changes anyway.

I don't like the idea of trying to store the physical DB files with the music library content. This could cause no end of trouble with permissions. It's imposing a restriction that the music library folder needs to be writeable, which is not the case today. Read-only drives, remote drives where reading may be quick but writing slow. It also means it could be harder for third-party apps to work out where to find the databases, and for support to aid users when there are configuration problems.

I'm also not sure about the idea of configurable schema. One well-defined schema that allows for all of the design goals is all that is needed. The ability to modify the schema seems a bit over the top, and will cause headaches for third-party developers that would need to know how to access the data? It will also impact performance? I think different uses of tags (eg. support tags for an iPod user), should be handled by alternative implementations of the scanner (scanner plugins?), rather than modifying the database schema. "concatenate these two scanned tags X, Y from the FLAC format files, and strip the leading << from them, putting the result into the database attribute Z" sounds like an alternative rule for the scanner (business rules), rather than database schema (database layer).

There's nothing on the wiki page about how data persistance is going to be solved. eg. ability to store song stats (ratings, etc), and their linkage back to the original source files (cope with filename changes via MUSICBRAINS id's or checksums, etc).

Phil

JJZolx
2008-09-27, 09:41
> A key component of the design is a move back to SQLite in place of MySQL.

You've got to be kidding.

> Another key change is moving to multiple independent physical databases,
> one per library, stored with the library itself. By default, if your
> music is in /my/music/folder, the database will be stored there as well.

What about official support for MySQL and a single database for all libraries?

JJZolx
2008-09-27, 09:44
I'm also not sure about the idea of configurable schema. One well-defined schema that allows for all of the design goals is all that is needed. The ability to modify the schema seems a bit over the top, and will cause headaches for third-party developers that would need to know how to access the data? It will also impact performance?

I agree. The more incomprehensible you make the database, the less likely plugin developers are going to want to go there.

dean
2008-09-27, 10:24
On Sep 27, 2008, at 9:41 AM, JJZolx wrote:

>
>> A key component of the design is a move back to SQLite in place of
> MySQL.
>
> You've got to be kidding.
Nope. We definitely need to make SC smaller to make it run on smaller
devices (like NAS drives).

> What about official support for MySQL and a single database for all
> libraries?
Part of Brandon's plan is to enable MySQL as an optional backend for
those who want to hack, like before.

And if you want one big library, then you are good!

dean
2008-09-27, 10:27
On Sep 27, 2008, at 9:33 AM, Phil Meyer wrote:
> Is the memory footprint or performance of MySQL the concern?

Yes, one of the major concerns.

> I like having my own external MySQL instance so I can poke around
> inside it, run my own queries, etc. Third-party plugins/
> applications also connect to a MySQL instance, eg. Moose, although
> they will need rewrites for the schema changes anyway.
Brandon's plan is to support MySQL as an optional backend for those
who want it.

> I don't like the idea of trying to store the physical DB files with
> the music library content. This could cause no end of trouble with
> permissions.
That could be an issue, but to allow dynamic libraries (think about
plugging in a USB stick with some music on it), it's a good
solution. Of course, if that volume is not writable, a very
reasonable fallback would be to put the library back in the cache
folder.

> I'm also not sure about the idea of configurable schema. One well-
> defined schema that allows for all of the design goals is all that
> is needed. The ability to modify the schema seems a bit over the
> top, and will cause headaches for third-party developers that would
> need to know how to access the data? It will also impact
> performance? I think different uses of tags (eg. support tags for
> an iPod user), should be handled by alternative implementations of
> the scanner (scanner plugins?), rather than modifying the database
> schema. "concatenate these two scanned tags X, Y from the FLAC
> format files, and strip the leading << from them, putting the result
> into the database attribute Z" sounds like an alternative rule for
> the scanner (business rules), rather than database schema (database
> layer).
Good point. I'll let Brandon address this issue.

> There's nothing on the wiki page about how data persistance is going
> to be solved. eg. ability to store song stats (ratings, etc), and
> their linkage back to the original source files (cope with filename
> changes via MUSICBRAINS id's or checksums, etc).
This has been discussed, but I'll also defer to Brandon on this.

Philip Meyer
2008-09-27, 11:06
>> I don't like the idea of trying to store the physical DB files with
>> the music library content. This could cause no end of trouble with
>> permissions.
>That could be an issue, but to allow dynamic libraries (think about
>plugging in a USB stick with some music on it), it's a good
>solution. Of course, if that volume is not writable, a very
>reasonable fallback would be to put the library back in the cache
>folder.

If a friend came over with a USB stick to play his music, I'm not sure he'd be too happy if some extra files got written to the device. Firewalls, virus scanners, etc could get in the way. What if the device is writeable, but half way through the writing process it runs out of space (more likely than the local filesystem). Often people with eg a 2GB USB stick will fill it with as many songs as possible; there may not be much space on the device. You may be able to read songs on an iPod in disc mode, but would you want to write to an iPod?

What would need to be written anyway? If it's dynamic (plugged in, writes library to the device, removed, taken away to another machine, more files put on it and then plugged back in) in nature, then the library would need to be rescanned when reinserted? Being able to do something like the equivalent of "Browse Music Folder" for any arbitrary folder or device is okay? Being able to play from dynamic devices may be useful (not sure I'd ever use it personally), but is there a need to build a dynamic library from that dynamic source? Just the ability to play music from it (i.e. not writing back to the permanent library) would be the aim? Writing some temporary cache data for performance reasons maybe (eg. transcoding, or for caching folder scans, tag info perhaps).

erland
2008-09-27, 23:43
I thought I'd note here that Brandon has begun development on a major
rework of the SqueezeCenter database back-end and schema.

It's an ambitious effort, but his initial design is quite exciting.

Even though there are changes in there which I don't like, I'm really happy that major changes in SqueezeCenter starts to happen to catch up with the user needs.

It's hard to comment on some things as long as they are only shown on the wiki page as documentation, so I hope we are soon going to see some visibility in svn. Some of my comments below might be completely wrong if I haven't interpreted the information on the wiki correctly.



A key component of the design is a move back to SQLite in place of MySQL. SQLite has seen a lot of improvement since we last used it, and removing the disk/code size and complexity of shipping and running an independent MySQL server is a win for us, especially on small platforms.

I obviously has misunderstand something, I thought NAS boxes wasn't the target platform of SqueezeCenter. If you are targeting NAS boxes we are definitely needing other changes than the database schema. If I remember correctly most people with a real computer got a big performance boost when we switched to MySQL while most people with NAS boxes got a performance decrease.

As I've understand the web interface has always been slow on NAS boxes, so I suppose this means that you are going to make other optimizations for slow hardware than just the database switch ?
If you don't, my feeling is that you will only get halfway and make everyone disappointed, decreased performance for users on fast hardware and still too slow performance on NAS boxes. How does the new Default skin work on NAS boxes which I think communicates more with the server due to usage of Ajax ?

As a user who run SqueezeCenter on a real computer, I see nothing good coming from a switch to SQLite. However, as long as you make MySQL an option for advanced users I'm happy.

I can understand the switch if the plan is to create a partnership with some of the NAS vendors to include a pre-installed SqueezeCenter on some NAS boxes.


Another key change is moving to multiple independent physical databases, one per library, stored with the library itself.

What's the main advantage of having separate libraries ?
It have to get more complex if a user wants to make all music available in one listening room and only part of the music in another listening room.
Why not just assign the tracks to libraries in a database ?
Is the reason to get smaller database files and less complex SQL queries ?



By default, if your music is in /my/music/folder, the database will be stored there as well.

My gut feeling tells me that this means that the only way to define which music that should go into a specific library is by putting it under a specific main directory ?
There will be no way to have some music files that are available in several libraries without either duplicating the music files or by using softlinks/shortcuts ?

I would have preferred a solution where the tag information in the files could be used to define which music that goes into which library. This would make it more flexible and independent of the directory structure on the disk. It would also open up for having support for third party plugins to define libraries in ways that you haven't thought of. For example, putting everything with "genre=Christmas" into a separate library which you only use during christmas time.

Storing the library database together with the music file will not be accepted by many Linux users. I suspect many of the more advanced Linux users has done as I have and mounted the music files directory as read-only to make sure that no application can trash the music files by accident. However, as long as there is a fallback solution which makes it possible to store the music database in the cache directory or something similar I'll be happy. For development purposes I would like to make it either a startup option or a preferences setting so you can easily have different libraries that use the same music files. Today I use different start scripts for SqueezeCenter that use the --cachedir parameter to point to different Cache files which also means different database files.



The other big game-changing component of the design is to drop the idea of a one-size-fits-all database schema completely. Instead, that's replaced with a language for defining the schema

I really hope there is a plan for external data sources and third party scanners in this new solution.

IMO, one of the main limitations in SqueezeCenter today is that there isn't possible to include things in the SqueezeCenter database that isn't stored in tags.

IMO, tags should only be one source of information, other information should be possible to retrieve from other external internet sources and other applications. Externally retrieved information will not fit into the "artist role" concept, we are talking about things like "similar artists", "ratings", "categorisation/tagging", "related artists", "concert locations".

Personally I would have preferred a standard schema that covers all type of information, but I can understand that for performance reasons it might be better to let each user define it's own schema customizations.



The core of it will be a new independent chunk of code that implements a generic media library service and API, which can then be plugged into the back of SC in place of the current code. It will encapsulate all physical access to the on-disk library of media files itself as well as the database of metadata, the scanning to import that metadata, and the APIs for search/filter/browse/etc.

I suppose this means that third party plugins can forget to use SQL directly ?
I suspected this road was going to be closed sometime and my gut feeling tells me this new solution is the end of it.

Or will there be some third party plugin api that makes it possible for third party plugins to extend the new generic library service ?

For any users of my Custom Browse, MultiLibrary and SQL Playlist third party plugins that read this. If my feeling is right, this new schema probably means the end of the road for most(maybe all) of these plugins, so if you use these plugins for a specific reason, make sure to wish for similar functionality in the new schema.
All these plugins has always felt like a temporary solution until SqueezeCenter catch up and implement real support for similar functionality, so this isn't a bad thing as long as everything you need is included.

Philip Meyer
2008-09-28, 01:18
>My gut feeling tells me that this means that the only way to define
>which music that should go into a specific library is by putting it
>under a specific main directory ?
>
That was my worry too. Any design for the SqueezeCenter database should not put dependancies on the users source folder structure. Someone may want a library of Classical music, someone else may want a library of live music. There could be an overlap, or the could be several source folders that contain both classical and live music. The scanner should get the data into the SqueezeCenter library and from there a user should be able to decide how to segregate into music libraries.

>I would have preferred a solution where the tag information in the
>files could be used to define which music that goes into which library.
>
I think that is essential. An obvious configuration of a library for many users would be to base the content on a specific genre.

>This would make it more flexible and independent of the directory
>structure on the disk.
It needs to be totally independent of the source folder structure. A constraint that libraries could only be a sub-section of a main source folder would also be bad.

>Storing the library database together with the music file will not be
>accepted by many Linux users.
If all library information is stored in one place, such that it is always available to SqueezeCenter, SqueezeCenter would always see the library content even when a particular source folder was not currently visible (eg. external harddisk turned off). It would perhaps be desirable for this case (eg. SqueezeCenter could detect when libraries are visible/not visible; the UI could show music info in grey if it is in a library by the library is off-line).


>I really hope there is a plan for external data sources and third party
>scanners in this new solution.
>
If the schema is not concrete, external data sources (plugins and other apps) will not know how to interpret the content; they would need extra information from SqueezeCenter. eg. An API to describe the schema. I can't see that happening.

>IMO, one of the main limitations in SqueezeCenter today is that there
>isn't possible to include things in the SqueezeCenter database that
>isn't stored in tags.
>
I thought it was possible to create your own tables to store information from other sources, eg. ratings.

>Personally I would have preferred a standard schema that covers all
>type of information, but I can understand that for performance reasons
>it might be better to let each user define it's own schema
>customizations.
>
I would have thought for performance reasons, it's a good idea not to customise the schema.

>I suppose this means that third party plugins can forget to use SQL
>directly ?
>I suspected this road was going to be closed sometime and my gut
>feeling tells me this new solution is the end of it.
>
>Or will there be some third party plugin api that makes it possible for
>third party plugins to extend the new generic library service ?
>
I would have thought that the target would be for plugins not to need to write SQL, but to use the SqueezeCenter DB interface. That API could then improve responsiveness through cached queries, etc.

>For any users of my Custom Browse, MultiLibrary and SQL Playlist third
>party plugins that read this. If my feeling is right, this new schema
>probably means the end of the road for most(maybe all) of these
>plugins, so if you use these plugins for a specific reason, make sure
>to wish for similar functionality in the new schema.
>
>All these plugins has always felt like a temporary solution until
>SqueezeCenter catch up and implement real support for similar
>functionality, so this isn't a bad thing as long as everything you need
>is included.
>
I don't imagine for one moment that SC would ever contain all of the features provided through your plugins :( It's not just their ability to provide access to finding music to play in customisable ways, but the configurable UI that is provided to access this functionality.

I suggest if there's any developer working on the new schema, that wants to see the capabilities that users may expect, they should try using Erlands plugins.

Phil

eLR!C
2008-09-28, 01:21
Well, it seems I'm maybe the only one here who see the migration to SQLite as a good thing : I like the idea of being able to backup the whole database and restore it with a simple rsync operation.

By the way, I hope the new metadata schema won't decrease performances (flat tables are always quicker for read operations ...). Let's wait for a diagram or some code examples so that we can understand the details.

erland
2008-09-28, 01:59
>I really hope there is a plan for external data sources and third party
>scanners in this new solution.
>
If the schema is not concrete, external data sources (plugins and other apps) will not know how to interpret the content; they would need extra information from SqueezeCenter. eg. An API to describe the schema. I can't see that happening.

I really hope you are wrong regarding this, I would really miss the possibility to integrate information stored in other places than tags.

I don't think plugins for this would need to use the schema directly, a plugin would typically just have to get access to an API that make it possible to add a custom attribute on a track, artist or album. Additionally the plugin would of course have to be executed during the scanning process.



>Or will there be some third party plugin api that makes it possible for
>third party plugins to extend the new generic library service ?
>[/color]
I would have thought that the target would be for plugins not to need to write SQL, but to use the SqueezeCenter DB interface. That API could then improve responsiveness through cached queries, etc.

In theory I agree, however to make that possible this means that the SqueezeCenter DB interface has to expose the same functionality as the underlying SQL database.

Today this connection is pretty tight, but I'm suspecting that the talk about a generic library media service with an API to search/browse the database will be more limited than todays possibilities.

This will be fine for 95% of all plugins, most plugins only wants to list all tracks, artists, albums or get a specific track, artist, album or do some simple querying like getting all tracks in a specific genre. However, it will probably not work for more advanced queries like the one I use in for example TrackStat and Custom Browse.

I was thinking that it might be possible to make an API in the media service to make it possible for third party plugins to implement their own query functions which can internally use SQL or DBIx directly if they want to. These query functions could be exposed by the media service through its api. Kind of similar to what I think MusicIP does with filters and recipes.

Abstracting the database is a good thing in most cases, but if you do it the wrong way it also means that you can't use the possibilities of the underlying database engine.

With this new schema with multiple databases it might be hard for a plugin to extend the database with new tables, the API on top of the media service might not even make it possible to use the information in custom tables anyway. I today use custom tables in the Dynamic Playlist, Custom Scan, Multi Library and TrackStat plugins.

But it feels like we are only guessing at the moment, it will probably get a lot easier to understand the new schema as soon as we get to see some code in svn.



I don't imagine for one moment that SC would ever contain all of the features provided through your plugins :( It's not just their ability to provide access to finding music to play in customisable ways, but the configurable UI that is provided to access this functionality.

I'm pretty sure my plugins provides more functionality and more customization than what's actually required for 95% of the users of them.

So I don't think SC will need to support all features, it would be enough if it supported the most important ones. However, I've currently no idea which features that are most important.

It's hard to say what the possibilities to use them towards the new schema will be until we have seen some kind of API towards the new media service. However, plugins like TrackStat (statistics browsing), Custom Browse and SQL Playlist (the more advanced playlists) heavily depends on the possibilities offered by the SQL language. So unless the new API offers the same functionality I can't see a way to offer the same functionality after the schema change.

But again, we really need to see some code before we know for sure.

Philip Meyer
2008-09-28, 02:07
>Well, it seems I'm maybe the only one here who see the migration to
>SQLite as a good thing : I like the idea of being able to backup the
>whole database and restore it with a simple rsync operation.

Surely its easy to backup MySQL DB's too?

My understanding of the current plan is that there could be more than one SQLite DB - one per source music folder. Therefore you'd have to find each one to backup.

But why the need to backup? In most cases it would be better to recreate from reading the source file tags again. Unless there is other information in the DB(s) that you need to restore (such as ratings). I've always preferred to export the volatile parts out of the DB and back them up separately.

dean
2008-09-28, 07:34
On Sep 28, 2008, at 1:59 AM, erland wrote:
> But it feels like we are only guessing at the moment, it will probably
> get a lot easier to understand the new schema as soon as we get to see
> some code in svn.

Agreed, maybe I jumped the gun here a little bit by starting the
discussion (Brandon wanted to wait until there was some code to
publish, but I pushed ahead.)

My goal was to get folks thinking about the problem and bring up
issues (like the one about permissions in the library volume) early.
More eyes are better, as long as the conversation doesn't devolve to a
point where nothing can get done.

dean
2008-09-28, 07:39
On Sep 27, 2008, at 11:43 PM, erland wrote:
> I obviously has misunderstand something, I thought NAS boxes wasn't
> the
> target platform of SqueezeCenter.
One of the goals of this New Schema effort is to reduce the system
requirements for SqueezeCenter. Even on desktop computers, SC's
memory and CPU footprint is too big. And the demand for smaller,
embedded versions of SC is growing.
>
> As I've understand the web interface has always been slow on NAS
> boxes,
> so I suppose this means that you are going to make other optimizations
> for slow hardware than just the database switch ?
Yes and some already have been done. 7.2.1 is MUCH more efficient
than 7.1 on NAS drives like the ReadyNAS.

> As a user who run SqueezeCenter on a real computer, I see nothing good
> coming from a switch to SQLite. However, as long as you make MySQL an
> option for advanced users I'm happy.
Yep, that's part of the plan.

peterw
2008-09-28, 09:12
>Storing the library database together with the music file will not be
>accepted by many Linux users.
If all library information is stored in one place, such that it is always available to SqueezeCenter, SqueezeCenter would always see the library content even when a particular source folder was not currently visible (eg. external harddisk turned off). It would perhaps be desirable for this case (eg. SqueezeCenter could detect when libraries are visible/not visible; the UI could show music info in grey if it is in a library by the library is off-line).


I also would like to see SC7 continue to work w/ read-only file libs. Another benefit is that it's more likely the owner could put the music files on separate drives and spin them down when the SBs are not playing local music for power, noise, and heat savings.

-Peter

JJZolx
2008-09-28, 09:24
One of the goals of this New Schema effort is to reduce the system
requirements for SqueezeCenter. Even on desktop computers, SC's
memory and CPU footprint is too big. And the demand for smaller,
embedded versions of SC is growing.

The memory size of bundled MySQL, as it's configured with SqueezeCenter, is a fraction of that of SqueezeCenter itself. I expect that SQLite uses a lot more in-memory data structures, and you're just moving more database code into SqueezeCenter itself. Is there really much, if anything, to be gained?

One thing that I'm reading into the "flexible schema" thing is the idea that a user would be told to use a different schema for, say, a classical music library. If that's the case:

- Will it be possible to have combined libraries?
- Or do a search accross more than one library?

If I do a search for "Yo Yo Ma", will I be able to pull up his classical works, as well as his popular ones? I see keeping a separate library as a poor approach to the cataloging challenges presented by classical collections. But maybe I'm reading too much into it.

max.spicer
2008-09-28, 11:34
I also would like to see SC7 continue to work w/ read-only file libs. Another benefit is that it's more likely the owner could put the music files on separate drives and spin them down when the SBs are not playing local music for power, noise, and heat savings.

-Peter

Hear, hear! I really don't want to have to give any music program write access to my library unless I think it should be updating it. I've spent far too long ripping and tagging to have a bug in a program trash my library! No disrespect to SqueezeCenter.

Max

PS Yes, I've got backups, but that's not the point.

eLR!C
2008-09-28, 11:48
But why the need to backup? In most cases it would be better to recreate from reading the source file tags again. Unless there is other information in the DB(s) that you need to restore (such as ratings). I've always preferred to export the volatile parts out of the DB and back them up separately.

You are right. I was thinking that next version (with metadata) may include tagging that is not in the media file itself, then a backup would be really usefull (btw my server crashed 2 weeks ago and i would have been really happy to restore everything with a simple rsync operation).

Regarding the need to backing up DB separately, let's just say that @ home I'm no "admin" and I don't want to elaborate complex backup strategies ;)

Anyway, as the MySQL backend will be available, each of us will be happy with next release :)

blblack
2008-09-29, 10:13
Well now that Dean let the cat out of the bag, I'll try to do a quick summary response to the major points raised here:

"I still want MySQL" - the backend database API will still be DBIx::Class, using ->deploy() support, so there's really no reason it can't support MySQL for power users and custom builds as well, that just won't be the default.

"I don't want the database stored with the library" - As noted on the wiki page, there should be support for optionally (or in the case of readonly, perhaps automatic) storing the database elsewhere. Aside from the readonly thing, another reason for this is performance. Some USB sticks might have the I/O rate to support streaming MP3s, but underperform compared to your main HDD when it comes to creating and searching a SQL database.

"Why not have a universal static schema?" - That's what we've been trying to have for years. I suppose eventually we could add enough columns to the database to support 99% of the users, but then most of those columns will go unused for most of the users anyways, and the 1%-ers will still make noise. This gives a level of flexibility that allows us to support all niche users without making the DB indexes larger than necessary for the common case. Also, while it is an abstract concern at this time, this type of flexibility paves the way for other future library types supported under the same system (as in photo, video, mixed-media, etc).

"With a dynamic schema nobody can write plugins that touch the DB" - That's true. I don't want any code outside of this new media library code touching the DB, whether it's a plugin or core SC code. One of the big benefits here is getting a strong separation between this code and the rest of SC, through a well-defined API. Anything (within reason) that people want to do in terms of customizing the scanner operations should be doable via the schema definition file (if not, then the code needs to be upgraded so that you can). On the other end for read operations, the API can be made rich enough to support your plugin without needing to let your plugin violate the API barrier. And yes, the API will include self-description. It's expected that code using the API will make initial calls to determine what types of metadata exist before executing queries on those metadata, for instance.

"What about non-tag-sourced metadata" - this is a really tough one, suggestions welcome. The current approach of this new design is that the library databases are essentially throwaway library metadata indexing systems, with no original content of their own. This has a lot of benefits, including that all code other than the scanner doesn't need to do any writes, that denormalization for performance doesn't really have a downside, and that users never "lose" anything by trashing the database - there should never be a reason to back it up.

The downside is there's no room in that model for things like ratings, which are dynamically updated via SC during "read-only" operations like browsing and playing. One option is to put them in the central database rather than the library database, although keeping that in sync is problematic (but doable by keying on the library name and the physical pathname within the library for per-track data). Chances are that with multi-user support most of these things (like ratings) are no longer purely track attributes anyways, but user<=>track attributes, and users will only exist in the central database.

It's going to be painful, and I fully expect backlash from the developer community, which is kinda why I wanted to get some draft code knocked out first to give you a better idea what this looks like. Coming soon on that.

Please keep in mind also that we as developers are 1%-er's too. Some of the concerns driving these changes are about making life easier for "normal" users - people who buy a product from a major retailer, click whatever looks like a "next" button 15 times in a row, and then hit play.

JJZolx
2008-09-29, 10:41
With multiple library databases, where will things like user profiles be kept? Will there be a "main" database or a database for things other than cataloging music libraries?

Will persistent data like track ratings and number of plays be kept per database? Per user per database?

blblack
2008-09-29, 10:50
With multiple library databases, where will things like user profiles be kept? Will there be a "main" database or a database for things other than cataloging music libraries?

Yes. The main thing being moved out to these per-library databases is basically anything that's sourced from scanning tags


Will persistent data like track ratings and number of plays be kept per database? Per user per database?

Things like rating will have to be per-user per-track, and if they're in the central database with the user profiles, then the full key will need to be something like userid, libraryid, trackpath (tracks might have numeric ids internally assigned as well, but they can't be presumed consistent between rescans).

Mark Miksis
2008-09-29, 11:51
I'm not sure who's driving the "user profiles" effort for 7.3, but it might be useful to see a similar wiki description of the design goals alongside this discussion. Seems like there's a lot of interaction between the two...

CatBus
2008-09-29, 12:02
"I still want MySQL" - the backend database API will still be DBIx::Class, using ->deploy() support, so there's really no reason it can't support MySQL for power users and custom builds as well, that just won't be the default.

I honestly don't care one way or the other on this issue, but if the reason for SQLite is for better performance on low-end NAS devices, and NAS builds are typically custom builds, then wouldn't it make sense to have the default for non-NAS builds be MySQL and the default for NAS builds be SQLite?

erland
2008-09-29, 13:39
"What about non-tag-sourced metadata" - this is a really tough one, suggestions welcome. The current approach of this new design is that the library databases are essentially throwaway library metadata indexing systems, with no original content of their own. This has a lot of benefits, including that all code other than the scanner doesn't need to do any writes, that denormalization for performance doesn't really have a downside, and that users never "lose" anything by trashing the database - there should never be a reason to back it up.

Does all this also mean that with the current approach it is impossible for a plugin to store custom attributes in the database ?

Today several of my plugins create additional tables in the database to store their information. Would this be impossible with this new approach or will the plugin code be able to add attributes to the schema and then use the official media service API to read and write to those attributes ?




The downside is there's no room in that model for things like ratings, which are dynamically updated via SC during "read-only" operations like browsing and playing.

Well IMHO we really need a way to handle ratings and play statistics, so this needs to be solved.



One option is to put them in the central database rather than the library database, although keeping that in sync is problematic (but doable by keying on the library name and the physical pathname within the library for per-track data). Chances are that with multi-user support most of these things (like ratings) are no longer purely track attributes anyways, but user<=>track attributes, and users will only exist in the central database.

I'm not sure if this fits into the new scanner code, but the following enhancement request contains some extensions to the current scanner:
http://bugs.slimdevices.com/show_bug.cgi?id=6023

The patch provided in that enhancement report makes it possible for a plugin to register callback functions that should be called whenever a track is added or updated. This makes it a lot easier to keep things in sync if you have an external storage that needs to be synchronized with the SqueezeCenter database. Although, I'm not sure if this solution fits in the new schema design or not.

I've tried to keep things in sync by using urls and musicbrainz identifiers with the current SqueezeCenter database and custom tables. It gets very messy, so I would recommend that you avoid that route unless there at least is some support for getting events when a track changes in the library database. The url/path is not enough, because it means that you loose all your custom data when you move the music folder to a new drive or a new machine, so if you decide to go this route you at least need to combine it with musicbrainz identifiers or something similar.

It's hard to make further suggestions before we get to see some code, it still feels like I'm guessing a bit regarding how the new schema solution will work.

However, my spontaneous idea, is that it would be useful to have some kind of API for scanner plugins. You need to support the bundled MusicIP(MusicMagic) and iTunes plugin anyway, so at least create an API which they use to write their scanned data into the SqueezeCenter database. This together with some event mechanism which third party plugins could use to detect when a track, artist, album has been changed, added or deleted in the library database will enhance the possibilities a lot. Another big problem for third party developers in the current scanner solution is that there is no way for a third party plugin to run code within the scanner process, the scanner process is currently hard coded to use ONLY the bundled MusicIP and iTunes plugins.

For me it's critical that I can persistently store custom things related to a track, album, artist from a plugin. If I can't do it through the API and I can't do it some unofficial way, most of my plugins won't work with the new schema. If that happens, I suppose I can always remain on 7.2 forever, but I really hope the possibilities with the new schema from a third party plugin point of view doesn't get that limited.



It's going to be painful, and I fully expect backlash from the developer community, which is kinda why I wanted to get some draft code knocked out first to give you a better idea what this looks like. Coming soon on that.

I'm looking forward to it.

I would recommend that you prioritize to get the code to a state where you feel comfortable to show it instead of trying to answer all the questions in this thread. It's easy to to get the wrong idea and ask stupid questions when we only have some brief documentation to look at.



Please keep in mind also that we as developers are 1%-er's too. Some of the concerns driving these changes are about making life easier for "normal" users - people who buy a product from a major retailer, click whatever looks like a "next" button 15 times in a row, and then hit play.

Agreed, previously SqueezeCenter has focused a lot on the advanced users, so some focus on the "normal" users are definitely the right direction from a Logitech point of view.

It will probably get a lot of the advanced users a bit upset, but as long as this is only a few percentage of the total number of users it probably doesn't matter.

gharris999
2008-09-29, 14:55
Branden: I'd like to ask a couple of nitty-gritty questions:

1). Will the new schema scheme mean the end of SC support for audio formats (e.g. flacs) with embedded cuesheets?

2). Most of the new schema talk I've seen has centered around providing more flexibility in terms of cataloging relationships between contributors and their roles. What about genres? Is the new schema definition language likely to support the concept of sub-genres?

E.G., if I want to browse my library like this:


Genre: 'Classical era' -> SubGenre: 'Chamber Music' -> SubSubGenre: 'String Works' -> Composer: 'Beethoven, L' -> Albums: ...list of albums having the above attributes..

..am I likely to be able to (assuming coherent tags in the cuesheet) do this under this new system?

peterw
2008-09-29, 15:07
"I don't want the database stored with the library" - As noted on the wiki page, there should be support for optionally (or in the case of readonly, perhaps automatic) storing the database elsewhere. Aside from the readonly thing, another reason for this is performance. Some USB sticks might have the I/O rate to support streaming MP3s, but underperform compared to your main HDD when it comes to creating and searching a SQL database.

And USB disk drives. My 80 GB Video iPod is dog slow at USB data transfer compared to a 2.5" drive in an el cheapo enclosure. I thought the Controller hardware supported USB host mode. So with the increasing importance of SqueezePlay, it's not hard to imagine a Squeezebox v4 with a USB port that could build a SQLite representation of my iPod's MP3s and send that database back to the SC7 host. If SC7 could deal with multiple sq3 databases and Squeezebox v4 ran Linux, we could have a nice way of allowing any SB in the house (even a SLIMP3?) to access music on an iPod connected to one SB4... Mmmmm!

Philip Meyer
2008-09-29, 16:23
Hi Brandon,

Thanks for providing more info - relieves a few concerns. I'm excited by the concepts in general, but the most important goal I'm interested in is performance (throughout the app).

Are there any timeframes at all for these things? Will it all be done in one release, or will we see a gradual change?

>"I still want MySQL" - the backend database API will still be
>DBIx::Class, using ->deploy() support, so there's really no reason it
>can't support MySQL for power users and custom builds as well, that
>just won't be the default.
>
Great!

>"I don't want the database stored with the library" - As noted on the
>wiki page, there should be support for optionally (or in the case of
>readonly, perhaps automatic) storing the database elsewhere. Aside
>from the readonly thing, another reason for this is performance. Some
>USB sticks might have the I/O rate to support streaming MP3s, but
>underperform compared to your main HDD when it comes to creating and
>searching a SQL database.
>
Yes, I can see many reasons for not doing it. I haven't seen a good reason for the change yet. I can't think of any other app that has ever stored their content with the source content. SC7 made great efforts to move cache, settings and logs into the correct place for each OS, and this seems to be moving away from that decision?

>"Why not have a universal static schema?"
I'm not against this in concept, if it works, and doesn't cause inefficiencies. I've had bad experiences of things such as object relational mappings that provide abstract access to the data layer. Things start out good, and then get chucked out because speed isn't good enough.

It's not clear exactly how this will work in combination with other goals. eg. Many users have music libraries containing both popular music and classical. Would they be able to have two databases - one for the popular music content with the default schema, and one of the classical content with an alternative schema? What is the relationship with the scanning code - a different scanner per schema? How will that work when the pop and classical source files are in the same source folder structure? How will the UI work - will it be transparent as to what schema type will be used for each type of library? How would browsing/searching work across libraries with different schemas? It would seem an almost total rewrite of the whole application would be required to access different possibilities when new schema types are introduced.

Would the default schema be pop-oriented, and remove support for classical tags to gain speed improvements - ie. would not support composer, conductor, orchestra tags?

>"With a dynamic schema nobody can write plugins that touch the DB" -
>That's true. I don't want any code outside of this new media library
>code touching the DB, whether it's a plugin or core SC code.
>
That's a good thing, but the API needs to be well documented and easy to use.

It will mean other applications (such as rich GUI's like Moose) should only use perl to access the API to access the database, or the CLI. There are other applications today that are not written in Perl and access the database directly as they are written in different languages. Maybe the CLI needs to be enhanced to provide the same level of access as the DB API.

I guess I am worried about the transition period, where there are many plugins that won't work and would require rewrites to get going. The uptake to the new SC may be slow as people don't want to lose their favourite plugins.

>"What about non-tag-sourced metadata" - this is a really tough one,
>suggestions welcome.
>
Could store non-tag metadata in a different database - keep it away from the databases that can be trashed and rebuilt.

>The downside is there's no room in that model for things like ratings,
>which are dynamically updated via SC during "read-only" operations like
>browsing and playing. One option is to put them in the central database
>rather than the library database, although keeping that in sync is
>problematic (but doable by keying on the library name and the physical
>pathname within the library for per-track data).
>
I believe there was talk to avoid physical pathname dependencies. If possible use another identification mechanism such that source files could move and a rescan would detect that a file on a new path is the same as a previously scanned song and thus ratings etc would not be lost.

>It's going to be painful, and I fully expect backlash from the
>developer community, which is kinda why I wanted to get some draft code
>knocked out first to give you a better idea what this looks like.
>Coming soon on that.
>
Great.

>Please keep in mind also that we as developers are 1%-er's too. Some
>of the concerns driving these changes are about making life easier for
>"normal" users - people who buy a product from a major retailer, click
>whatever looks like a "next" button 15 times in a row, and then hit
>play.
>
In my day job I'm a developer, and although I have developed a few (simple!) SC plugins and patches, I'm fairly practical-minded when it comes to using SC and functionality. I need SC to be simple for my wife to use too!

Phil

Philip Meyer
2008-09-29, 16:28
>Will persistent data like track ratings and number of plays be kept per
>database? Per user per database?
>
As well as providing database schema changes and a nice API, is anyone actively thinking how the UI will work? eg. if there's more than one user profile, how to change users on each type of UI (web UI, SB player UI, controller, CLI). Can different players have different user profiles selected? There's all kind of considerations such as effects on synced playback if one of the players in the sync group is switched to a different user.

Philip Meyer
2008-09-29, 16:53
>> Please keep in mind also that we as developers are 1%-er's too. Some
>> of the concerns driving these changes are about making life easier for
>> "normal" users - people who buy a product from a major retailer, click
>> whatever looks like a "next" button 15 times in a row, and then hit
>> play.
>>
>Agreed, previously SqueezeCenter has focused a lot on the advanced
>users, so some focus on the "normal" users are definitely the right
>direction from a Logitech point of view.
>
>It will probably get a lot of the advanced users a bit upset, but as
>long as this is only a few percentage of the total number of users it
>probably doesn't matter.
>
Not sure I totally agree. Obviously the common user must be the focus, but I feel that the average user is more tech or gadget-aware. Most portable music players (eg. iPods) support ratings and smart playlists (eg. play music rated 3* or higher). It must be important to make it easy to install and play music. But they have that now in SC7.2 - none of these things will make that easier (can only get harder).

I don't think many of the changes are about making life easier for "normal" users - they are answering demands for functions that normal users have requested a lot over the last few years.

Eventually all new users will want to do something more advanced than plug-and-play - whether it be new ways of browsing/searching their music library or adding ratings, etc. A casual user is not likely to need all functions, but a collection of casual users may collectively want a large proportion of the functions.

The really big issues to be resolved is what the new UI will be like to support the extra functionality without being complicated for new users.

I suggest the installer would by default create a single library, with a single user profile. The user would not see any user-selection or library selection options unless additional users/library are configured.

Will there be one SqueezeNetwork account per user profile?

erland
2008-09-29, 21:52
Things like rating will have to be per-user per-track, and if they're in the central database with the user profiles, then the full key will need to be something like userid, libraryid, trackpath (tracks might have numeric ids internally assigned as well, but they can't be presumed consistent between rescans).

Please just remember that when implementing smart playlists we need to be able to combine the rating and statistic information in the central database with the category information in the library databases with decent performance.

For example, queries like:
"All tracks in Pop genre with a rating>3"
"100 most played tracks in Rock genre"
"100 top rated tracks by artist yyy which belongs to user/library xxx"
"All unrated tracks within the Jazz genre"

Will the library service API cover both the library databases and the central database, or will it just cover the library databases ?

Will it be possible to share information between different user profiles ? For example having shared playlists and shared libraries ?

hickinbottoms
2008-09-30, 00:03
"What about non-tag-sourced metadata" - this is a really tough one, suggestions welcome. The current approach of this new design is that the library databases are essentially throwaway library metadata indexing systems, with no original content of their own. This has a lot of benefits, including that all code other than the scanner doesn't need to do any writes, that denormalization for performance doesn't really have a downside, and that users never "lose" anything by trashing the database - there should never be a reason to back it up.


Thinking about my LazySearch plugin, I'm keen that plugins can have additional tags on tracks even if they are not persistent over rescans as this would fulfil my plugin's requirements (although clearly not for some other 3rd party plugins).

From reading the wiki page I think that the plan is to support this using some kind of schema definition file to be stored with the library that will contain this plan. However I think an important requirement is how that plan can be extended by plugins. At the moment the wiki page says that any modification to the plan requires a full clear-and-rescan and suggests that for new 'tags' to be supported this central file must be modified. That would make installing plugins such as mine much more difficult and, presumably, wouldn't help the transition towards a built-in plugin manager any easier either since it would involve more than just controlling the files in the plugin folder.

May I suggest that there is some way for plugins to overlay some kind of 'custom' plan onto the library plan to allow this to be handled automatically? Something similar to how they can have their own custom filetype or format maps at the moment would do. There would probably have to be restrictions to that, such as only adding to the existing plan and the additional tags must not already be in the plan.

I also second Erland's request for the scanner hooks - I need those to add the custom data to the database during scanning without complex and unreliable hacks to try and detect when the database has changed.

Interesting, though - looking forward to seeing some of these ideas in practice. I'm not particularly looking forward to updating my plugin again (I still get the shivers when I think back to the transition to DBIx), but that's just being selfish!

Thanks for keeping us updated.

Stuart

cdoherty
2008-09-30, 11:41
On Mon, Sep 29, 2008 at 12:02:46PM -0700, CatBus said:
> I honestly don't care one way or the other on this issue, but if the
> reason for SQLite is for better performance on low-end NAS devices, and
> NAS builds are typically custom builds, then wouldn't it make sense to
> have the default for non-NAS builds be MySQL and the default for NAS
> builds be SQLite?

I'd imagine that one of Logitech's new Slim product ideas is something
like a one-piece Squeezebox+NAS device, so people don't have to have a PC
running at all; and that that's what's driving the SQLite thing. Which
means, among other things, that the NAS-like deployment won't be a custom
build any more. =)

(It's a fine product idea; whether SQLite is a needed part of the
solution seems to be up for debate. I'm a little skeptical myself, just
based on Firefox 3's SQLite experience, but I have a server at home and
I'm not really impacted.)

Chris




-------------------------------
Chris Doherty
chris [at] randomcamel.net

"I think," said Christopher Robin, "that we ought to eat
all our provisions now, so we won't have so much to carry."
-- A. A. Milne
-------------------------------

bossanova808
2008-09-30, 23:19
One request - please whatever you do, don't tie anything to location on disk of a track as seems to be implied will be the case - people re-arrange things all the time...surely some sort of MD5 or similar approach is the way to get a persistent reference to tracks between scans? I.e. a repeatable, calculated value that links tracks to data such as ratings etc, that is in NO way tied to the physical nature of the tracks on disk.

I'm probably misinterpreting but think it is worth stressing...for example, it would be nice if somehow I could swap in a flac version of an album for an MP3 one, ans SC were clever enough to go - hey the track length, and metadata match, that must be the same as this song I have previously stored info about. But even more basic stuff like moving stuff from one folder to another when I've accidentally dropped something in the wrong spot of my 400Gig music library.

Multi libraries/users is a good idea, but all being built off one larger library must be supported - i.e. I can define a 'sub library' for my wife and my daughter, from the main music library (mine). It would certainly not be great to have to have multiple *physical* libraries to achieve this...

Listener
2008-10-02, 13:05
dean and brandon,

I'm one of those niche users whose been waiting for some progress for 2 1/2 years. Thanks for a start.

I hope you will focus on getting good functional specs and performance targets. Many of the posts in thread seem to skip directly to arguing for or against a particular implementation approach.

Bill

vrobin
2008-10-04, 07:12
I'm watching this topic from the distance, and I'm just dropping this note to insist on the need (IMHO) of some sort of data persistence for stats, rating, scanning optimization, etc.

If you don't want it in the core API, don't forget to design the new DB architecture with this in mind and initiate an official module for this persistence.

My 2 cents as they say!
PS: all in all, this announcement is a really good news, like the "fast forward" rewriting announce

MrSinatra
2008-10-10, 01:47
i'd like to add my comments to this thread, although some of my ideas may be broader than just the new DB schema design. i did read the wiki article and i agree its all very exciting.

as someone who came to slim stuff after becoming pretty good at winamp and similar apps, i have to say there was a steep learning curve, as i just did not find the experience intuitive.

the first thing i would suggest is that slim needs to make the entire scanner component modular, and allow it to basically be like a plugin, so different users can use different plugins to do their scans. these plugins could have their own set of options that relate just to scanning. obviously slim would provide the default. but imagine picking a scanning plugin that only looks for SOME things, it could be a lot quicker for infrant users, or one that just does artwork, etc...

secondly, i think slim really needs to concentrate on delineating between scanner options, (pre scanning) and library options (post scanning).

obviously the best solution is to design the schema such that it isn't necessary to delineate such a difference and trigger automatic rescans, but if thats not possible, it should be obvious which options are which.

thirdly i hope that the schema will be designed so that slim "logics" such as auto-triggering rescans, Various Artists logic, Greatest Hits logic, and whatever else are NOT necessary. no other program has these "features" which i consider drawbacks and they aren't intuitive. slim should work based off of tags, filenames, and locations by default, and any interpretation of them via such logic should only be a secondary opt in feature, altho i do think you should keep it for those who already use it.

(also, folder location could be used to inform on file DB characteristics)

4th, i don't know if user prefs and config is gonna be in the DB or not, but i also think its important for slim to be able to be able to set different defaults for different types of users for new features. so if u add a new feature, u'd have one default for existing users, and another default for a new user.

sebp
2008-10-13, 15:32
One request - please whatever you do, don't tie anything to location on disk of a track as seems to be implied will be the case - people re-arrange things all the time...surely some sort of MD5 or similar approach is the way to get a persistent reference to tracks between scans?
Don't use MD5 sums.
Please don't!

Why?
Just look at this :

Raoul's a Mac Mini with Core2Duo 2GHz CPU :


raoul:~ seb$ time flac -dc "/c/media/Music/flac/Asian Dub Foundation/Enemy of The Enemy/01. Fortress Europe.flac" | md5sum

flac 1.1.4, Copyright (C) 2000,2001,2002,2003,2004,2005,2006,2007 Josh Coalson
flac comes with ABSOLUTELY NO WARRANTY. This is free software, and you are
welcome to redistribute it under certain conditions. Type `flac' for details.

01. Fortress Europe.flac: done
632a57170d57f9c2177b846ff90f2647 -

real 0m1.331s
user 0m1.585s
sys 0m0.101s

Computing MD5 sum for this 3m53s song took 1.33 seconds.
I have 10500+ tracks in my collection, so average scanning time overhead would be 3h52m!

Now, same test on Barney, that's a ReadyNAS NV+ with crappy CPU :

barney:~# time flac -dc "/c/media/Music/flac/Asian Dub Foundation/Enemy of The Enemy/01. Fortress Europe.flac" | md5sum

flac 1.1.1, Copyright (C) 2000,2001,2002,2003,2004 Josh Coalson
flac comes with ABSOLUTELY NO WARRANTY. This is free software, and you are
welcome to redistribute it under certain conditions. Type `flac' for details.

01. Fortress Europe.flac: done
632a57170d57f9c2177b846ff90f2647 -

real 0m42.359s
user 0m40.120s
sys 0m1.960s

It took 42.35s to compute one song's MD5 sum!!!
Overhead would be 120+ HOURS ! 5 DAYS !

Sure you could use the FLAC file's embedded MD5SUM.
But how would you treat other files?

Just forget about this, please ...

sebp
2008-10-13, 16:10
That being said, this new db design sounds really exciting, and I'm both thankful for SD staff to keep in mind that lots of people are actually running SC on low powered devices, and for Erland and Stuart for the work they've accomplished unleashing SC with awesome features.

Back to the custom tables :
I'm not sure how big they could grow up to, and am not very familiar with SQLite, so forgive my ignorance if this is stupid : why not use this DBMS to store them in the plugin's directory rather than somewhere in the main db schema?
This way they could at least be immune to full main db wipe outs.

Philip Meyer
2008-10-14, 00:58
I assume this new schema development will be done on a separate branch, not directly in 7.3/trunk?

mherger
2008-10-14, 01:02
> I assume this new schema development will be done on a separate branch, not directly in 7.3/trunk?

Correct.

--

Michael

MrSinatra
2008-10-14, 02:28
so if i DL 7.3 betas its not in there? how/where do i get this other branch?

and/or when will it be in the "normal" 7.3 betas?

mherger
2008-10-14, 02:59
> so if i DL 7.3 betas its not in there?

No.

> how/where do i get this other branch?

Don't know yet. It will be announced.

> and/or when will it be in the "normal" 7.3 betas?

When there's code to test.

--

Michael

funkstar
2008-10-20, 01:39
so if i DL 7.3 betas its not in there? how/where do i get this other branch?

and/or when will it be in the "normal" 7.3 betas?
Have a look at the Roadmap (http://wiki.slimdevices.com/index.php/SoftwareRoadmap). Indicates that the new schema could very well slip to 8.0.

I'm not too surprised about that. If there is no code in there at the moment, there is no way it's going to be integrated and fully tested before 7.3 is out for the end of November (Andy suggested this as a ship date for 7.3 as opposed to Nov 1st as listed on the Wiki).

Anyway, a fundamental change is the way the backend database and scanner operates sounds like a full version upgrade to me. SqueezePlay and NewStreaming are more than enough to justify a .1 version update. :)

pounce
2008-11-07, 17:43
One of the goals of this New Schema effort is to reduce the system
requirements for SqueezeCenter. Even on desktop computers, SC's
memory and CPU footprint is too big.

Can we put real numbers on this so ther emight be target goals on memory and CPU use? What's "too big"?


Knowing that the *only* real driver for this is to support weak devices and systems it makes me wonder if there might be room for 2 SC products. A NAS build and a desktop/server build. Sure, splitting the code base for a product is normally a bad idea, but I could see some interesting things coming from taking this approach. If you create a product specifically for going ultralight then you are free to cut features (like supporting plugins) to ensure performance and to simplify development. With your heavyweight product you can go crazy and support all customers that desire more features in exchange for hardware requirements. I would imagine that performance tweaks and caching approaches that would be required in the lightweight product would make it into the heavyweight product thus improving its performance over time. I could also see the two products working together. If a customer had some sort of Slim/NAS device, but also had the heavyweight client maybe the heavy client could do a lot of heavy lifting for the NAS and populate it's database or cache for enhanced performance. I'm just thinking out loud here.

I personally would like to see MySQL for the power and concurrency. SQLite seems like it kills extensibility. I'm skeptical that the read/write locking of SQLite would be a non issue. You have to go to multiple db's to avoid the issue. My gut is that the db choice is going to add much more complexity when it comes crunching the data. There is a max of 30 attached db's with SQlite on a 32bit system. The default is 10. This is going to put a hard limit on using db per library (if you want to access it using the same connection).

egd
2008-11-09, 04:30
Things like rating will have to be per-user per-track, and if they're in the central database with the user profiles, then the full key will need to be something like userid, libraryid, trackpath (tracks might have numeric ids internally assigned as well, but they can't be presumed consistent between rescans).isn't this a good time to think about using Musicbrainz' PUID or something similar to uniquely and persistently identify a track?

egd
2008-11-09, 05:31
http://www.mysql.com/products/embedded/ Any reason not to go with mysql embedded rather than rever to SQLite? I've not heard of a single music application that hasn't experienced performance issues using large databases under SQLite.

pounce
2008-11-09, 10:33
isn't this a good time to think about using Musicbrainz' PUID or something similar to uniquely and persistently identify a track?

This has my vote. I think using the Musicbrainz/MusicIP's PUID/GUID's is a good idea. Chomping through a collection to get these would take some time though and might be a killer for a NAS install...but I think there is great value.

Philip Meyer
2008-11-09, 11:19
>This has my vote. I think using the MusicIP's PUID/GUID's is a good
>idea. Chomping through a collection to get these would take some time
>though and might be a killer for a NAS install...but I think there is
>great value.

I vote AGAINST it.

Reasons are:
1. I won't archive analysis from MusicIP to my files - MusicIP has some issues that trashes tag content, may even affect the music content.
2. MusicIP doesn't support as many formats as SC, so it won't uniquely identify every song.
3. Extra processing time

A solution has to reliably work for all file formats, and be easy to understand for beginners/new customers.

Take the hit - never change source file name/location.
It may be possible to have some logic for detecting when source location has changed.

pounce
2008-11-09, 11:55
I don't know if I would discount using this so quickly. The files don't *have* to be touched. You could just use the info in the db. Since we are just talking in general and really discussing how you might actually technically use the info I think we can keep an open mind.

I don't have any issues with using as many tracking ID's as possible. They can't be the main index for the tracks anyway since the db is not complete.

http://musicbrainz.org/stats.html

You will also need to be able to track multiple versions of the the same track in different formats...

Having the PUID's could only add value and speed up things over time. Each person may also be helping to perfect the Musicbrainz/MusicIP db's.

What's the cost of an extra column in a few tables? ;)

Maybe a separate thread discussing the topic would be wise if there is enough interest.

Philip Meyer
2008-11-09, 12:35
>Maybe a separate thread discussing the topic would be wise if there is
>enough interest.
It's been discussed before.

There are many different apps that add unique id's to tags, but they are all different. What would SC do if it found several different tags, each uniquely identifying the song? Which would take precidence?

At best, if SC scanner supported the tags, it would be an additional thing, because many users would not have the tags and would not want to add them.

SC *could* write its own unique id tag to each file, but that's not going to happen either (many people don't like SC touching their files, and the files may be read-only).

Phil

pounce
2008-11-09, 12:52
Is there a main thread somewhere to pick up this discussion?

It's hard for me to take the position of non action because there are challenges. From my point of view there is only a downside to *not* using GUID's for tracking. Using the same GUID's across many thousands of users could only add value over time.

I'm not sure why any of this has to change the files. A person could tags files, but the tracking *can* be external to the files.

There is a need for referential integrity of the data. You can do it locally, but I think it's a losing battle to argue against the value of referential integrity that spans multiple users and potentially other external services.

erland
2008-11-09, 14:58
There is a need for referential integrity of the data. You can do it locally, but I think it's a losing battle to argue against the value of referential integrity that spans multiple users and potentially other external services.

SqueezeCenter already supports Musicbrainz tags, isn't this enough ?

People are complaining on the scan time already today, so using something that requires heavy calculation or retrieval of information from external sources would probably result in unacceptable scanning times. It might be acceptable if you calculate/retrieve it only once, but that requires SqueezeCenter to write to the music files which won't be accepted by many users.

pounce
2008-11-09, 15:16
...but that requires SqueezeCenter to write to the music files which won't be accepted by many users.

Why does it have to be required?

Couldn't there be a multi-pass scanning approach? First populate with SC ID's/indexes then over time populate PUID's? The later could be transparent to users and happen over days or even weeks if so desired.

erland
2008-11-09, 15:27
Why does it have to be required?

It has to store the information somewhere where it will automatically follow the track if you move it or rename it. I can't see where you would locally store it besides inside the music file to accomplish this. It could of course be optional to write it to the music files, so for users that don't want to write to the music files it would be re-calculated/retrieved during every scan. It could also happen in the background as you suggest, this is what MusicIP analysis does. The problem is just that you can't completely use the system before the identifier has been re-calculated/retrieved since it is requried to re-connect tracks to the old persistent data (ratings, play counts, ...)

erland
2008-11-09, 15:36
Is there a main thread somewhere to pick up this discussion?

It was discussed earlier in this thread:
http://forums.slimdevices.com/showthread.php?t=48575

So if you want to read what has been said, read that thread.
If you feel we should discuss it more, I would suggest either continuing here or starting a completely new thread.

egd
2008-11-09, 15:44
>This has my vote. I think using the MusicIP's PUID/GUID's is a good
>idea. Chomping through a collection to get these would take some time
>though and might be a killer for a NAS install...but I think there is
>great value.

I vote AGAINST it.

Reasons are:
1. I won't archive analysis from MusicIP to my files - MusicIP has some issues that trashes tag content, may even affect the music content.
2. MusicIP doesn't support as many formats as SC, so it won't uniquely identify every song.
3. Extra processing time

A solution has to reliably work for all file formats, and be easy to understand for beginners/new customers.

Take the hit - never change source file name/location.
It may be possible to have some logic for detecting when source location has changed.

The PUID belongs to Musicbrainz and can be written with Picard. MiP doesn't even have to feature in the equation, in fact it doesn't currently write PUIDs at all.

egd
2008-11-09, 15:48
It might be acceptable if you calculate/retrieve it only once, but that requires SqueezeCenter to write to the music files which won't be accepted by many users.It could be added independently of SC using Picard or some such tool. SC could encourage its use by enabling persistent data IF the user has PUIDs. That way it's opt in and users can do it in their own time.

erland
2008-11-09, 16:11
It could be added independently of SC using Picard or some such tool. SC could encourage its use by enabling persistent data IF the user has PUIDs. That way it's opt in and users can do it in their own time.

What benefit would we get by using PUID instead of the existing Musicbrainz Id tags that already is supported ?

Philip Meyer
2008-11-09, 16:13
>The PUID belongs to Musicbrainz and can be written with Picard. MiP
>doesn't even have to feature in the equation, in fact it doesn't
>currently write PUIDs at all.
I meant that there are several apps that write unique strings to tags to identify songs. MiP can store fingerprint analysis, which could also be used to identify songs.

CD rippers can write ISRC values, which also could uniquely identify songs (not sure if they are guaranteed to be unique though).

egd
2008-11-09, 22:48
What benefit would we get by using PUID instead of the existing Musicbrainz Id tags that already is supported ?

If it uniquely identifies a track then there's no need for the PUID.

egd
2008-11-09, 23:21
I've been looking at this thread and the database schema discussion thread and it strikes me that we don't have requirements consensus. Until that happy medium is reached there's little benefit to be had in designing a revised schema, or am I missing something?

Can we somehow capture requirements through a gatekeeper such that all requirements/ wishes are listed and we are clearly able to distinguish between what is accepted and what is rejected, thereby being able to design the schema with a comprehensive understanding of the targeted functional requirements and associated business rules.

Optimising the schema is likely one of the few really tangible opportunities available to maximise performance, so I'm hoping the schema isn't changed only to have a lot of workarounds implemented later.

Just my .02

erland
2008-11-09, 23:39
I've been looking at this thread and the database schema discussion thread and it strikes me that we don't have requirements consensus. Until that happy medium is reached there's little benefit to be had in designing a revised schema, or am I missing something?

No, you haven't missed anything, you are correct in that there is no written requirements public available besides those mentioned in the wiki page in the initial post in this thread.

The problem is that the people that make themselves heard here are often developers or geeks while this also has to work for normal users. I think this might be one reason why Logitech has decided to gather the requirements themselves and not make their detailed ideas public until they have written some code that works good enough to demonstrate the new schema. Sometimes it's actually faster to write some code to demonstrate it than trying to describe everything in detail.

So unfortunately, at the moment it seems to me that the best thing is just to wait and see what Logitech come up with. As soon as they feel ready to show us something we can comment it to suggest appropriate additions.

The only written requirements I know of is those available on this wiki page:
http://wiki.slimdevices.com/index.php/NewSchema

By the way, I completely agree with you that it would have been better if there were some more detailed requirements available. The wiki page shows some things but it also leaves big holes open which sometimes makes me a bit worried.

pounce
2008-11-10, 10:46
I'm willing to put some time into collaborative schema design. I think there are some good minds here. If there was an opportunity to really work on the future schema I'd wager the community here could come up with some creative and efficient models.

Is being part of the schema design even an option for the community or do we only get to provide feedback? I'm not sure how the product teams works.

I know I'm still against using SQLite, but am more than willing to work on the schema(s) regardless of db type.

If there was any sort of diagram for the new schema maybe the community could at least provide feedback on what there is so far?

erland
2008-11-10, 12:10
I'm willing to put some time into collaborative schema design. I think there are some good minds here. If there was an opportunity to really work on the future schema I'd wager the community here could come up with some creative and efficient models.

Is being part of the schema design even an option for the community or do we only get to provide feedback? I'm not sure how the product teams works.

I know I'm still against using SQLite, but am more than willing to work on the schema(s) regardless of db type.

If there was any sort of diagram for the new schema maybe the community could at least provide feedback on what there is so far?
Brandon who I believe is working on it answered some questions back in this post:
http://forums.slimdevices.com/showthread.php?p=345244#post345244

This is all we have at for the moment and as I understood it he wanted to get some code up and running to make it easier to show the concept.

pounce
2008-11-10, 12:15
Can't wait to see some purdy diagrams ;)

So, what do people think about stored procedures and triggers? Do they have a place in the design or do people think they are evil?

egd
2008-11-10, 13:31
Can't wait to see some purdy diagrams ;)

So, what do people think about stored procedures and triggers? Do they have a place in the design or do people think they are evil?

difficult to comment without context of the problems being addressed, but in principle I'm all for them.

erland
2008-11-10, 16:51
So, what do people think about stored procedures and triggers? Do they have a place in the design or do people think they are evil?

It depends, they can definitely be useful but you also have to be careful so you don't create a dependency towards a specific database product.
We want this solution to work with MySQL even though the default installation probably will come with SQLite as database.

pounce
2008-11-10, 17:51
It depends, they can definitely be useful but you also have to be careful so you don't create a dependency towards a specific database product.
We want this solution to work with MySQL even though the default installation probably will come with SQLite as database.

I'm not giving up hope just yet ;)

Being that this product has a relatively small number of tables and the complexity is generally low I'd imagine that supporting two versions of SP's/Triggers/Functions to support SQlite and MySql may not be that much of an issue. I think some sp's could really simplify some code and help with better db support.

What do we think about serializing album art/thumbnails into the db? How about preferences etc?

erland
2008-11-10, 18:00
What do we think about serializing album art/thumbnails into the db? How about preferences etc?

Why ? What would the advantage be ?

Any changes made should be made for a reason, they should either solve a problem we have today or enhance the system with new features.

The database is currently completely cleaned at some situations, so my feeling is that storing preferences in it might not be a good idea.

pounce
2008-11-10, 18:11
Just thinking outside of the box a little. If major changes are being made I'm thinking of opportunities. Persisting as much as possible into the db allows for easy backups and portability. Serializing artwork into the db *could* provide some performance gains in certain situations, but also ensures that all tag information is in one place.

mherger
2008-11-11, 01:33
> Just thinking outside of the box a little. If major changes are being
> made I'm thinking of opportunities. Persisting as much as possible into
> the db allows for easy backups and portability.

How is backing up a DB simpler than backing up a text file?

--

Michael

pounce
2008-11-11, 10:39
It's one file vs. many. Some file system types you you have versioning on files. (VSS, Time Machine, ZFS snapshots etc). It's a little simpler if your settings and data are captured as one. It prevents being out of sync and the need to "team" your files for snapshots.

mherger
2008-11-11, 15:39
> It's one file vs. many. Some file system types you you have versioning
> on files. (VSS, Time Machine, ZFS snapshots etc).

I only know a bit about VSS. But it requires the DB to support snapshots. You can't just copy away a DB, it would be corrupted. I guess the same applies at least to Time Machine. There's really not much simpler than backing up a text file.

--

Michael

pounce
2008-11-11, 16:36
The purpose if "persisting" settings to the db is to simplify and aid in portability of the data/settings. Storing everything possible that is unique about an install (other than the actual assets (audio, video or whatever) in one file/db makes things easier and can support new features better than using a combination of db and text files.

Since we aren't talking a specific implementation or technical details it's premature to discount the idea based on your assumption that an implementation would not have the means to quiesce the db before performing any backups. Heck, lets not forget you can shut it down for backups for a simple solution. To support easier backups is one of the points.

To challenge you what is the downside of persisting settings to a db? Couldn't there be simple solutions to work around any concerns? I work with several apps that persist settings to the db. It works really well. Some extract the settings on startup and create local read only files to use. When changes are made to the settings in-flight the read only files are updated to reflect what's in the db. If local files are needed for technical reasons it's pretty easy to accomplish and still persist things to the db to ensure data integrity and ease of backups.

kdf
2008-11-11, 17:12
Interesting. I personally can't think of anything more portable and
accessible than plain text, and it doesn't require you to stop the
server to make a backup.

The focus really needs to be on fixing issues with metadata Prefs
handling isn't broken, thus it will only take away from needed
resources for little gain.

-kdf

Philip Meyer
2008-11-11, 17:15
I don't see any point in storing settings in the DB.

Also, the local settings file contains information on what database to connect to. If the settings are in the DB, some additional information would still need to be stored to handle startup before connecting to a DB.

Having settings stored in a settings folder means:

1. average users can see the settings - easier to change them too.
2. If the settings get corrupted, a user could restore the app settings, without needing to restore the database.
3. Different settings can quickly be swapped in, eg. when running a production server and various test environments.
4. Reading settings from files is probably quicker than retrieving from a DB.

Some settings *could* be stored in the DB, but I don't really see any benefit for this type of app.

Large-scale apps sometimes persist settings in the DB. eg. load balanced application servers, so that each load balanced app server shares the same settings.

Phil

kdf
2008-11-11, 17:26
On 11-Nov-08, at 4:15 PM, Phil Meyer wrote:
>
> Large-scale apps sometimes persist settings in the DB. eg. load
> balanced application servers, so that each load balanced app server
> shares the same settings.

Indeed. Replication in SN-land makes good sense. SC, it's just far
simpler with text.
-kdf

pounce
2008-11-11, 20:50
Ok, so how does SC support backups today? What is the best practice for backups and what changes could be made to make backups a feature of the product?

I wont argue the ease of backing up files vs a db, BUT if the system needs a backup how do people do it today? You need to shut down the application and grab files from multiple folders to ensure you have everything that represents the state of the application.

Putting settings in the db would make things simpler for a backup feature. Keeping settings in a db can allow for multiple versions of settings very easily.

I am happy to have argued the point regardless ;)

mherger
2008-11-11, 23:55
> Ok, so how does SC support backups today?

Backup the user profiles (Windows), or /etc/ on Linux/Unix, or /Users/you/ on osx. Pretty much standard paths which should be included by the most basic backup application.


> What is the best practice for
> backups and what changes could be made to make backups a feature of the
> product?

http://wiki.slimdevices.com/index.php/Backup

> I wont argue the ease of backing up files vs a db, BUT if the system
> needs a backup how do people do it today?

I fear most users don't do backups... just my experience with friends and family.

> You need to shut down the
> application and grab files from multiple folders to ensure you have
> everything that represents the state of the application.

Only needed if you care about not doing a scan after restore. For most of us backing up a few text files will do, as rescanning is easier than shutting down SC every time you want to get a backup of the DB.

> Putting settings in the db would make things simpler for a backup
> feature.

Not for me.

> Keeping settings in a db can allow for multiple versions of
> settings very easily.

That's what I expect my backup tool to do.

> I am happy to have argued the point regardless

can't disagree here :-)

--

Michael

pounce
2008-11-12, 01:14
> Putting settings in the db would make things simpler for a backup
> feature.

Not for me.


Why not for you? I don't mean this in a flip way at all. I'm just curious. To be fair I do rate putting settings in the db low in priority, but as long as people are going to be dramatically changing the schema, the API to access the data, the database type and the approach to catering to low power devices like NAS shouldn't just about every major feature be questioned and tossed about to see if it may require a schema change to enable some of the normal key concerns of performance, scaleability, data integrity etc etc.

So, I really don't want to beat this dead horse further, but maybe another kick. What if settings were stored as JSON strings in a table? You could save multiple instances and tag the current version. I just see reducing client side files as much as possible as a good thing. It could help protect the users from themselves and reduce any potential permissions related issues. The app has to use the db so this is one thing that will always be there.

Anyway, thanks for entertaining the idea :)

mherger
2008-11-12, 01:33
>> > Putting settings in the db would make things simpler for a backup
>> > feature.
>> Not for me.
> Why not for you?

I think for the exact reasons I gave in my two previous postings.

--

Michael

Philip Meyer
2008-11-12, 02:54
There is no need for me to back up the database. I see it as merely a cache of metadata that is held in the source music files.

The only persistent data in the database that I need to backup are my song statistics (TrackStat ratings, last played, etc), and that is done by extracting the information out as xml and backing up the xml file.

I would not consider importing all music content into the database, just so I would have ease of backup of my actual music content.

Settings files are nice because there are separate files for each plugin, etc. You could determine changes over time, store incremental backups, restore a single setting file, etc. A single database backup is not as useful.

Eg. if I install the wrong version of a plugin and the settings get messed up, I can simply restore the previous version of the setting file, or even delete the setting file and it would recreate with defaults. If the settings were held in the database, that's a lot harder for such maintenance.

Phil

fuzzyT
2008-11-12, 10:45
The requirements list for a good persistent track id appears to be something like this:

uniquely identifies the track
uniqueness spans the context of a single SC's track DB
uniqueness could be more global, but doesn't need to be
ID to track relation persists through wipe and rebuild SC DB operations
ID to track relation persists (or is recoverable after) file move operations
ID generation should be efficient and fast
ID generation would yield same ID for a track that had tagging changes

Except for "persists file move operations (PFMO)", full path/file name seems to be an excellent candidate ID. Solving the PFMO problem may be easier than tackling the problems inherent in some of the other candidate IDs.

Possible solution to PFMO: convert IDs on FMO. The system would need to recognize or be told when an FMO had occurred. Once it did it would update path reference to the new location, both in the track data and the referring data (eg user_track_stat).

What makes this reasonable: FMOs are probably a relatively rare occurrence. And a user would certainly know when they were happening. Some UI could be devised to allow a user to tell SC when a move had occurred. For instance, the SC UI allows creation and edit on library path. An input action or prompt-on-edit could be added to support moves. If support is added for multiple file locations, then the new (post-FMO) location could be added to SC's location list followed by a scan of just the new location, and the question/offer of "is this a file move, do you want to transfer track stats?" or some such. The system could also attempt to recognize when a move had occurred.

fuzzyT
2008-11-12, 10:56
It appears that we may be modeling a number of new concepts. Just to spark discussion, it may help to lay some of these out for inspection.

Users - Would need ids, descriptive attributes, perhaps credentials. Would map to libraries. May map to user classes.

User Classes - For purposes of permissions. Controlling access to admin settings, user prefs, perhaps other functions. Perhaps an anon user for party mode support {admin, user, anon}.

File locations - filesystem locations for music files, support for multiple locations, support for online/offline locations, support for ad-hoc addition of locations (USB plug-ins, etc), independent scanning of locations.

Libraries - subsets of all scanned files, definable by various means: genre, file locations, perhaps others.

User/Library Associations - subset of libraries that users are able to access, are currently accessing, support for defaults.

pounce
2008-11-12, 12:40
Ok, I'm dropping the settings/db topic.


There is no need for me to back up the database. I see it as merely a cache of metadata that is held in the source music files...

Ah hah!. This was helpful. Since the people participating in the discussion here have a lot more time with this app than me I probably/likely have a different perspective on things. I'm seeing the db as the future source of record for metadata.


I personally would like to treat the DB as a source of record for metadata for the assets that are being cataloged. I would prefer to store and update tag information in the db than in the actual audio files etc. I'd like to see plugins that update tags from external sources put their data in the database rather than touching the actual files. Because of this I think of the data in the db as a little more valuable and not easily/quickly recreated.

I have some ideas about new features and behaviors. It's hard for me to get these down in text. It's always much easier to talk about them, but let me give it a shot.

SC db as source of record:

For discussion lets assume that we are only talking about using music files and that all of the files are read only due to media type or permissions. Lets also assume that there is only enough information in the tags to generally ID the file. Maybe we just have Album and Track and some files have no tags and numbers for file names and they are all in one big folder. So, we scan and gather all we can into the db. We know where all the files are and we can at least show the user as much as we have, but that's not much and probably not a great user experience. So, because we want the users to have the best experience possible we want to fill in as much data that is missing as possible and identify the tracks with missing data. We turn to plugins that use external sources for data like Musicbrainz, MusicIP, Amazon or whatever. Since our files are being treated as read only we put the tag info in the db. Because we aren't touching the files we have unlimited ability to collect as much data as we want. We can even collect data for things we don't have. Lets say we want a feature that shows all album information for any album that the user might have in their collection. Maybe they just have one track from an album, but we want to be able to show what other tracks there were on the album and information about those tracks. We can go to external services like Musicbrainz and get the track information for the release and populate that in the the db. Now we have information in the db for things that aren't currently in the collection, but it adds value for the user. Maybe logitech partners with companies that sell music and they want to offer a feature that allows you to buy tracks to fill in missing music in your collection? Since you are storing all of the info regarding the albums you have you can easily show what's missing in context of what's playing or what's being searched. Maybe the user really likes the track they are listening to because of the sax player. They want to listen to more of this sax player from their collection. Since we have collected huge amounts of info about the users tracks from third parties we know which tracks from other albums that this sax player was a contributor on. We can now show the tracks that the user has, but we can also show the tracks that that are missing based on album ownership and give that user some options around going external to listen to the track and maybe buy that track.

That's a bit long winded, but I think there is a lot of value in treating the db as a source of record for meta data. It's required if the tracks are read only (and you want to fill in info), but it may also be wise not to touch any files when collecting data from external sources. If we have the data we can also offer to sync data back to the tracks if that's what the user wants. SC could allow for easier editing of tags and validating them pre-flight. Any time we go to external sources for data or need to use something like MusicDNA or Picard there is a time/performance hit. If we need to use these kinds of things rescans would not be very efficient.

Obviously, what I am talking about isn't in the product today so this might be mildly off-topic if we are talking strict schema changes to support todays functionality. However, in thinking about features like this it points to how the existing schema could change to support todays features and enable new features in the future without needing additional schema changes.

I'd be happy to discuss this in a different topic if desired. I could also sketch out some schema designs that I am thinking of that could support scaling out metadata as well as permissions etc. I'm hesitant to spend the time without seeing and understanding where the new schema is in the design, but happy to bounce the ideas around.

egd
2008-11-12, 13:03
I think http://www.qsonix.com/Public/Product_Que_Overview.aspx provides some insights to the kind of things the previous things pounce is describing. It's defintely not the SC of today.

Mark Miksis
2008-11-19, 17:27
Do we know yet what the minimum required SQLite version will be for 8.0? What version of SQLite will be shipped with 8.0? I'm just starting to think about whether the Linux packages should include it or require it as a dependency...

andyg
2008-11-19, 18:04
On Nov 19, 2008, at 7:27 PM, Fletch wrote:

>
> Do we know yet what the minimum required SQLite version will be for
> 8.0?
> What version of SQLite will be shipped with 8.0? I'm just starting to
> think about whether the Linux packages should include it or require it
> as a dependency...

Too early. And the answer will be none, DBD::SQLite includes the
entire database engine.

Mark Miksis
2008-11-19, 18:07
Too early. And the answer will be none, DBD::SQLite includes the entire database engine.

Ah, well that should make it a non-issue anyway. Thanks.

blblack
2008-11-20, 08:52
Do we know yet what the minimum required SQLite version will be for 8.0? What version of SQLite will be shipped with 8.0? I'm just starting to think about whether the Linux packages should include it or require it as a dependency...

We'll actually be bundling a custom build of SQLite rather than using vendor packages I believe.

blblack
2008-11-20, 09:21
Brandon who I believe is working on it answered some questions back in this post:
http://forums.slimdevices.com/showthread.php?p=345244#post345244

This is all we have at for the moment and as I understood it he wanted to get some code up and running to make it easier to show the concept.

Yes, and this has been slow going.

To recap and answer some of the questions that have come up recently:

The overall intent here is not to come up with something that's even more complex than what we've got that solves all possible problems in one amazing fixed schema design.

The core of this effort is to come up with some fresh code for dealing with physical media libraries that's well-encapsulated, separate from the rest of SC, simple in design, but flexible enough to support current and future needs (mostly via configuration (which can contain code), but also by new code as well obviously).

The end result will be something that can be used from SC (or any other similar app) as so:



my $lib = Slim::MediaLibrary->init( path => "/place/where/mp3s/are",
db_path => "/I/want/metadata/elsewhere", engine => "mysql",
spec => "iTunes.spec" );

(most of the above could be left out, resulting in the SC default
spec, SQLite, and metadata stored in-place with the library)

my ($scanner_pid, $progress_fh) = $lib->start_async_scan();

my $results = $lib->ResultSet('Track')->search(.....);


Except that the columns as well as how they are mapped from tags is customizable via the spec (and then we tack on the other features like saving search criterion as encoded strings, which allows the concept of a virtual library).

As for unique track ID's, while I'm all for referential integrity and globally persistent identifiers, there's no way we're going to get such things out of everyone's library in a consistent and simple way. Some libraries might even just be Artist/Album/Track.mp3 hierarchies with no tag data.

Currently new_schema is using the relative path (within the library) combined with start and end times/offsets (for multi-track files split via cue-sheet) as the unique key for tracks. I've considered hashing the key as well just to make the user's (user as in SC coders) life easier. So for instance a track's natural key (which has a unique constraint) might be ("dir/dir/file.mp3", 4473, 9999), and then we'll also manually hash that via md5 or something similar (and perhaps shorter and cheaper) for the actual primary key, and use that in references back and forth between the library code and SC. It still wouldn't be global to all users, and it still won't survive intra-library file moves, but at least it would survive wipe+rescan. Of course if we go shorter and simpler than md5, we might have to deal with collisions. Another option is to simply concatenate the string form of the key using null bytes (since a null byte doesn't start a valid UTF-8 sequence)

pounce
2008-11-20, 10:49
Might it be possible to take a smaller sample of the file for the purpose of a fingerprint? I have to wonder if you take x bytes from the start of the data for a given forum that you might have unique enough information to ID the file if it moves. If you could you wouldn't have to md5 the entire file. You would just take a small sample and md5 or sha1 hash it.

vrobin
2008-11-20, 15:40
As for unique track ID's, while I'm all for referential integrity and globally persistent identifiers, there's no way we're going to get such things out of everyone's library in a consistent and simple way. Some libraries might even just be Artist/Album/Track.mp3 hierarchies with no tag data.


Would it be difficult to let the user choose its own set of unique id?
By default, the key would be the composition of path+offset, but the average user could specify another key like:

mp3 > comment tag
flac > my-id-vorbistag
* > default

It would be easy to ship a batch with SC that would generate vorbis tags for flac files, letting the user deciding whether he prefer persistence of data or zero file modification.

(did you think of using guid/uuid ? )
http://search.cpan.org/~rjbs/Data-UUID-1.149/UUID.pm

egd
2008-11-20, 22:18
I'm going to go out on a limb here and suggest that if persistence is important, we're going to have to accept that the most efficient way of achieving this is through a unique identifier such as that suggested by vrobin as opposed to alternative approaches such as tying files to specific paths.

If we can accept the above, then the next logical thing to accept is that it will be necessary to generate and write this unique identifier as a tag in the track's metadata. I accept and support the general notion that SC does not/ should never write to a user's audio files. However, that is not to say that we cannot develop a distinct standalone tool to generate and write the metadata. It's a one-off operation, backups can be made, and files can be tested afterward. Hell, just about everyone edits their tags using a range of tools and nobody's hung up about it...why should this be any different?

erland
2008-11-20, 22:37
So for instance a track's natural key (which has a unique constraint) might be ("dir/dir/file.mp3", 4473, 9999), and then we'll also manually hash that via md5 or something similar (and perhaps shorter and cheaper) for the actual primary key, and use that in references back and forth between the library code and SC. It still wouldn't be global to all users, and it still won't survive intra-library file moves, but at least it would survive wipe+rescan.


People buy new larger hard drives which results in that they need to be able to move their music to a new drive. In this situation it's also common that music has previously been stored on two different drives and you like to move the contents on the two previous drives into the new single larger drive.

This is a situation which we minimum need to handle regarding persistence. As long as the paths are relative it sounds like we might to be able to handle at least some of these cases, so even though your suggestion aren't perfect they might be acceptable.

IMHO, it's critical that the statistic at least can survive rescans and can survive SqueezeCenter upgrades.

For other kind of situations it just needs to be possible to re-connect the statistics manually, for example by creating a SQL script in a text editor. This is something that probably can be provided as a third party utility not bundled with standard SqueezeCenter, since it's only needed in special situations.

It's a pity that the MusicBrains identifier tags which is unique and already is supported by SqueezeCenter can't be used in this new schema. People that has bothered to tag their music with MusicBrains identifiers to get persistent statistics today, should be able to use that also in the future.

mherger
2008-11-20, 23:47
> However, that is not to say that we cannot develop a distinct
> standalone tool to generate and write the metadata.

Ahmm... isn't this what MusicIP, musicbrainz and others do?

Michael

erland
2008-11-21, 00:03
> However, that is not to say that we cannot develop a distinct
> standalone tool to generate and write the metadata.

Ahmm... isn't this what MusicIP, musicbrainz and others do?

Yes, so we need the possibility to optionally use the identifiers generated by those tools.

vrobin
2008-11-21, 01:44
> However, that is not to say that we cannot develop a distinct
> standalone tool to generate and write the metadata.

Ahmm... isn't this what MusicIP, musicbrainz and others do?

Michael
Yes, so we need the possibility to optionally use the identifiers generated by those tools.

Yes it is (what music ip does), and I won't refuse the use of musicIP as the "de facto id generator", but as writing in a media file is some sort of ethical/philosophical point and as SC is quite veratile, I think that let the user choose its persistance mode looks realistic and wouldn't be the hardest part of the work.

Personnaly, I would use Flac native "md5 audio content hash" and a uuid stored in some unused ID3 tag for mp3 files, but if the point is "no persistance, or musicIP signature for persistance", I would say nothing against it, just "so be it, I'm glad persistance exists, no matter how!" :).

And as everybody there can have its own idea/desire on the subject... ;)

Philip Meyer
2008-11-21, 02:43
All this stuff about composite keys consisting of file path and tag values, etc, sounds frightening, and solves nothing, as far as I can see.

The primary key for each table should be a simple int identifier, to be easy and quick to use to uniquely identify a record once it is in the database - there's never a need for it to change. Other unique identifiers can be stored, but are normal properties of the table, used just by the scanner, but are NULLable as they may not always contain data. It's the job of the scanner to get info from the source files into the database, and upon rescan to match the metadata from a source file to a record already in the database.

The filepath can usually uniquely identify - if that matches, there's no need to look for anything else.
If there isn't a match, then use some other identifer - musicbrainz id, flac fingerprint, whatever possibilities there are.
If it's still not found, you can try to find partial filepath matches (like I recently did in a patch to Erlands' TrackStat plugin), which will handle cases where files have been moved to a different disk or base path.
Alternative matching mechanisms are possible; eg. read track no, song title, album and artist from the file and see if there is already a matching song in the database.

But the result is a simple integer primary key. Adding a filepath into the key causes problems when the filepath does change - even if the scanner re-marries the file to an existing database record, the primary key would need to change, and thus anything that references it.

Efficiency of scanning is less important than efficiency of using SC once metadata is scanned.
Efficiency of scanning should still be quick in the usual case where filepaths haven't changed, but if they are, songs can still be reconnected to the DB content.

Remember, there is persistent data at the moment only for songs (such as song ratings). However, there's no reason why there shouldn't be persistent album data (eg. an album rating). - iTunes has both song and album rating properties).

Phil

mflint
2008-11-21, 06:28
I'm in total agreement with Phil Meyer. My personal golden rule for primary keys is "primary key should never contain any data which is meaningful to the user - because the user will want to change it".

MrSinatra
2008-11-21, 07:13
i'm somewhat out of my depth, and i prob don't understand all the ramifications and so on, but based on what i think i know i'd say this:

"persistence" is probably something most average users won't want, need, or care about, and certainly wouldn't expect. (i however, am interested)

so, having said that as a given, i think the way i would handle the issue, would be for slim to make a standalone utility, whose only function is:

to add a tag most tagging apps don't see, but that some could be configured to see. this tag should be something that appears as nonsense, but is really a hex value or md5 value or something like that.

so in a mp3, it would be something like:

TXXX SCTAG = XD3JA5LDM8FH3KD5F

the point of which would be to uniquely ID the file regardless of where it went.

in this way, all files could be uniquely tagged, and only users who cared about persistence would run the standalone utility, and SC still would not write tags of any kind itself, which is i think a good idea.

the one issue i see is when an album is "upgraded." lets say you have a "doors" CD from 1988 and u get a new one thats remastered. when you rip it in, if you want it to replace (and delete) the 1988 one, thats easy enough to do, but all the "persistent" stats would not follow the new files, UNLESS you had some way of applying the old SCTAGs to the new files.

that too could be problematic if we are "fingerprinting" the files.

so, all in all, this is something to think about. maybe we need TWO tags. one is a song fingerprint, one is a xferrable SCTAG unique ID, (and it would have to be verbotten to allow two files to share it).

vrobin
2008-11-21, 07:25
the one issue i see is when an album is "upgraded." lets say you have a "doors" CD from 1988 and u get a new one thats remastered. when you rip it in, if you want it to replace (and delete) the 1988 one, thats easy enough to do, but all the "persistent" stats would not follow the new files, UNLESS you had some way of applying the old SCTAGs to the new files.

that too could be problematic if we are "fingerprinting" the files.

so, all in all, this is something to think about. maybe we need TWO tags. one is a song fingerprint, one is a xferrable SCTAG unique ID, (and it would have to be verbotten to allow two files to share it).

I think the basic ideas you're writing are very close to mine, and as such, I think that would be a very good trade-off ;)
(but I don't agree that people doesn't care about persistance, anybody like statistics... you want a proof? It's in iTunes features ;) ).

The "album replacement" case you're discribing seems rare enough not to take care of it in other ways than by writing a manual "howto" for power users.

But the point of "fingerprint versus uuid" is more important. I don't really have an opinion, and I think both have advantages and drawback. This depend on how you want the program to behave:
- Like MusicIP that care about "the song" so you prefer to count a song as played "several times" even if it comes from several albums.
- Like a real ID where a file or file part (with cue) is the main thing to track

As SC is a music oriented software, not a file oriented software, I think fingerprint would be more suited, but this can lead to weird problems... I think we have still another good flame wars ongoing ;).

Philip Meyer
2008-11-21, 13:05
>"persistence" is probably something most average users won't want,
>need, or care about, and certainly wouldn't expect. (i however, am
>interested)
>
Average users don't need to know anything about how persistence will work, but they will care about it if it doesn't work!

Without persistence, each time a scan is performed, they could lose all play statistics, ratings, etc.

>so, having said that as a given, i think the way i would handle the
>issue, would be for slim to make a standalone utility, whose only
>function is:
>
Doesn't work. Not all music sources will have tags. import from iTunes, MusicIP, cue files. Some users still use WAV files, which don't have tags, and thus the metadata comes from guessing values from the filepath. As I said before, you can never guarantee having a unique identifier in all files.

Phil

MrSinatra
2008-11-21, 13:46
>"persistence" is probably something most average users won't want,
>need, or care about, and certainly wouldn't expect. (i however, am
>interested)
>
Average users don't need to know anything about how persistence will work, but they will care about it if it doesn't work!

Without persistence, each time a scan is performed, they could lose all play statistics, ratings, etc.

it would be like winamp... the DB works until you clear it, or move the file. i call this "not robustly" persistent. thats just how winamp does it tho.

my point however was that a lot of users wouldn't care about it, working or not, or to what degree its robustly persistent.



>so, having said that as a given, i think the way i would handle the
>issue, would be for slim to make a standalone utility, whose only
>function is:
>
Doesn't work. Not all music sources will have tags. import from iTunes, MusicIP, cue files. Some users still use WAV files, which don't have tags, and thus the metadata comes from guessing values from the filepath. As I said before, you can never guarantee having a unique identifier in all files.

Phil

does work.

its totally ok to limit robust persistence to the majority of users. if users have formats that don't support tags, then if they want robust persistence, they need to convert to formats that do support tags. otherwise, go without. we are talking about a small minority here.

its like asking SC to support replaygain for files without tags, its nonsense.

egd
2008-11-21, 15:56
> However, that is not to say that we cannot develop a distinct
> standalone tool to generate and write the metadata.

Ahmm... isn't this what MusicIP, musicbrainz and others do?

Michael

Yes it is, albeit their analysis takes a lot of time as their ID is universal to that version of a song. As most users would likely balk at the idea of having to wait days etc. before being able to generate a library, why not go with a random string of sorts that's totally independent of the song's acoustic properties and other metadata. Generating and writing these would be quick and have a user up and running in no time with persistence in place. What it won't do though is universally identify a file eg like Musicbrainz' PUID - all that means is that there's no tie in to Musicbrainz or MiP or for that matter anyone else's library.

There's nothing stopping the user using Musicbrainz' tools at an earlier or later point to do their tagging or MiP to do their bidding for playback.

egd
2008-11-21, 16:04
Efficiency of scanning is less important than efficiency of using SC once metadata is scanned. Efficiency of scanning should still be quick in the usual case where filepaths haven't changed, but if they are, songs can still be reconnected to the DB content. A unique id gets around all the limitations and complexities of having to code and maintain a series of matching routines that follow a series of scenarios trying to match data, so it'd be highly efficient and IMHO very importantly also ensures SC does not develop external dependencies to function.


Remember, there is persistent data at the moment only for songs (such as song ratings). However, there's no reason why there shouldn't be persistent album data (eg. an album rating). - iTunes has both song and album rating properties).Very good point - I'd love to be able to add persistent album and artist metadata in the future and expose that data via the UI or other means that interacts with SC.

egd
2008-11-21, 16:15
the one issue i see is when an album is "upgraded." lets say you have a "doors" CD from 1988 and u get a new one thats remastered. when you rip it in, if you want it to replace (and delete) the 1988 one, thats easy enough to do, but all the "persistent" stats would not follow the new files, UNLESS you had some way of applying the old SCTAGs to the new files.

that too could be problematic if we are "fingerprinting" the files.

so, all in all, this is something to think about. maybe we need TWO tags. one is a song fingerprint, one is a xferrable SCTAG unique ID, (and it would have to be verbotten to allow two files to share it).

Our thinking is on the same page re the unique identifier, however in regards to replacements/ upgrades etc. I'd make it simple - if you've ripped tracks and intend using them to replace existing tracks and want the metadata in relation to the older tracks to persist in the replacements, use a tag editor and copy and paste the tags from the old to the new. IMHO this isn't something SC should concern itself with.

egd
2008-11-21, 16:27
its totally ok to limit robust persistence to the majority of users. if users have formats that don't support tags, then if they want robust persistence, they need to convert to formats that do support tags. otherwise, go without. we are talking about a small minority here.

its like asking SC to support replaygain for files without tags, its nonsense.Absolutely 100% agree with this. Trying to cater to every possible user implementation is impossibly complex and inefficient. Much rather there's a set of underlying functionality that comes with a set of "system requirements" where it's a simple binary choice -- if I want to make use of X I must do/ have Y. In addition, using file tags alongside the iTunes, MusicIP etc. is not necessarily mutually exclusive.

MrSinatra
2008-11-21, 22:05
Our thinking is on the same page re the unique identifier, however in regards to replacements/ upgrades etc. I'd make it simple - if you've ripped tracks and intend using them to replace existing tracks and want the metadata in relation to the older tracks to persist in the replacements, use a tag editor and copy and paste the tags from the old to the new. IMHO this isn't something SC should concern itself with.

right, i agree, i was mainly saying that a single fingerprint tag wouldn't suffice as it wouldn't cover the scenario, i think two tags are necessary.

also, i'd be more open to phils concern about non-tag formats IF it wasn't such a negligible minority. i think SC should in general follow a "majority rules" philosophy.

Philip Meyer
2008-11-22, 02:51
>also, i'd be more open to phils concern about non-tag formats IF it
>wasn't such a negligible minority. i think SC should in general follow
>a "majority rules" philosophy.
>
I don't think it's a particularly small minority - quite a few people retrieve their information from iTunes/MIP sources, leaving the SC Library path blank. This is how support currently tell users to configure SC to avoid issues with duplicate tracks. All you get through those interfaces is what iTunes/MIP API's provide.

Also, I don't think many users would be too pleased if they scanned their content in, started to collect stats information on their library, rescanned and lost all stats. Then they discovered that they should have run some other tool on their files to add some new tag just for SC's purpose.

The discussion is irrelevant, because I really can't see SC creating a stand-alone cross-platform tool that will modify peoples music files, supporting many different file types, tag formats, etc.

SC should work in its smallest capacity like every other app does - the filepath identifies the track. The cleverness should be in the scanner when it detects new files to see if it is actually an old file that has changed file location. There could be several different ways to reattach a moved file to an existing primary key in the track table.

This could involve identification through tag content (look for a known set of unique identifier tags, or match on artist, album, track no, song, etc).
Perhaps scanner hooks could also be put in, so that third-party plugins could write their own reattachment checks.

MrSinatra
2008-11-24, 05:36
>also, i'd be more open to phils concern about non-tag formats IF it
>wasn't such a negligible minority. i think SC should in general follow
>a "majority rules" philosophy.
>
I don't think it's a particularly small minority - quite a few people retrieve their information from iTunes/MIP sources, leaving the SC Library path blank. This is how support currently tell users to configure SC to avoid issues with duplicate tracks. All you get through those interfaces is what iTunes/MIP API's provide.

i admit, i know very very little about this. yet another reason to hate itunes.

but what are you saying? are you saying that itunes does, or does not, report its own dbase persistent stats into SC? or are you saying it could, but SC doesn't do anything with them yet?

it sounds to me like SC just doesn't use its own DB, and instead uses itunes DB, is this right? its own DB is totally unpopulated?

moreover, if SC "cheats" and simply doesn't populate its own dbase with the itunes info, then why should a user have an expectation that SC would keep stats independently of that?

SC should either use itunes to populate the SC dbase, or it shouldn't. if it doesn't a SC user should simply be told they won't get SC stats if they are using itunes integration.

all the above is contingent that i actually understand the way it actually works, and i don't know that i do.


Also, I don't think many users would be too pleased if they scanned their content in, started to collect stats information on their library, rescanned and lost all stats. Then they discovered that they should have run some other tool on their files to add some new tag just for SC's purpose.

the SC interface could make this plain, moreover, wthout the tags, SC would not keep stats, so there'd be a message saying "do this if you want stats."

the thing is, without the tags, this is how winamp works. it has two modes kind of like SC now, "clear everything" and "rescan." the difference is winamps scanner is exponentionally faster and more reliable.

so on rescan, it keeps the stats, and yet is very robust catching all the changes and edits one might make, (that only rarely aren't reflected in realtime without a rescan i might add).

however, if you clear and rescan, you start from scratch, nothing is kept, even if you didn't change any tags or move any files.

winamp can do ratings, played last, play count, etc... and even if u have a wav file, or whatever tagless format, as long as you don't move it or clear the library, you can even assign replaygain values or keep stats, whatever, for that file. the DB will have info, not the file or tags.

but i don't expect it to persist, once moved or cleared.

i frankly found it a bit confusing, b/c winamp didn't make it clear at first what info would persist, (via tags), and what wouldn't.


The discussion is irrelevant, because I really can't see SC creating a stand-alone cross-platform tool that will modify peoples music files, supporting many different file types, tag formats, etc.

why? i mean, ok, if slim doesn't want to, why not support another app that does it? is there anything out there that does fingerprint tags and unique ID tags?


SC should work in its smallest capacity like every other app does - the filepath identifies the track.

i disagree. i move my files from time to time. lots of people do. what if just the drive letter changed? say i move from a USB ext to a rackmount raid soltuion?


The cleverness should be in the scanner when it detects new files to see if it is actually an old file that has changed file location. There could be several different ways to reattach a moved file to an existing primary key in the track table.

SC already tries to be too clever by half. logics logics logics, none of them very logical.

and then you want to create some kind of voodoo where SC figures out somehow what moved where? how would it do so, and KNOW a track was the same exact one, without some kind of unique ID tag?

why try to clever? brute force is s much better. a unique tag takes out any and all guesswork, and works for most people, even with the itunes issue still outstanding. (either SC shold use itunes info for stats, or they just shouldn't support stats for itunes integrated users... if te SC dbase is otherwise empty, seems reasonable to me).


This could involve identification through tag content (look for a known set of unique identifier tags, or match on artist, album, track no, song, etc).
Perhaps scanner hooks could also be put in, so that third-party plugins could write their own reattachment checks.

you lost me on the scanner bit, but sounds interesting. i have been pusing for the scanner function to be completely modular and work as a plugin, as something to be considered in tandem with the new schema. they seem to want the db to be customizable, so so should the scanner.

as to examining all these different tags in a song to ID it, what if you change them?

how about this:

why not support the brute force method, and then develop alternative methods that others could optionally turn off? that would be the best of both worlds.

Philip Meyer
2008-11-30, 14:27
>SC already tries to be too clever by half. logics logics logics, none
>of them very logical.
>
SC is not clever - it follows simple logical rules, that work for most people. There's no AI, or guesswork.

>and then you want to create some kind of voodoo where SC figures out
>somehow what moved where?
>
No, I'm saying it should be kept simple, but could support many different possible ways of matching up files with the SC DB content when files have moved.

Unique ID's are a possibility, but there's no need to make that part of the key, effectively meaning that every source would require a unique ID of the same type. That's nonsense - there's many different ways of coming up with unique ID's - different tags, checksums on the music content, checksums on the file content, etc. It shouldn't be in the SC DB keys that identify records in the database.

The scanner could use many different types of unique ID tags, and store them in the DB, and check for them if the url of the file being scanned doesn't exist in the DB.

>how would it do so, and KNOW a track was the same exact one, without some kind of unique ID tag?
>
The scanner could also re-attach files to their original SC DB record using other techniques, including partially matching urls. eg. if a drive letter changes but the rest of the path is the same. It could also match on file size, creation date, artist, album, song, track number tags. These are all things that could work without a user having to tag unique IDs, or without having to work out checksums, which would be a lot slower.

dean
2008-11-30, 15:33
On Nov 30, 2008, at 1:27 PM, Phil Meyer wrote:
>> and then you want to create some kind of voodoo where SC figures out
>> somehow what moved where?
>>
> No, I'm saying it should be kept simple, but could support many
> different possible ways of matching up files with the SC DB content
> when files have moved.
I think this makes sense. If there are known unique ID keys
(MusicBrainz, Gracenote/CDDB, whatever) those could be used. If none
exists, then it seems reasonable to degrade to a file path as an
identifier. That file path could be relative (to the music folder
root, say).

gharris999
2008-11-30, 20:48
Would it be difficult to let the user choose its own set of unique id?
By default, the key would be the composition of path+offset, but the average user could specify another key like:

mp3 > comment tag
flac > my-id-vorbistag
* > default

It would be easy to ship a batch with SC that would generate vorbis tags for flac files, letting the user deciding whether he prefer persistence of data or zero file modification.

(did you think of using guid/uuid ? )
http://search.cpan.org/~rjbs/Data-UUID-1.149/UUID.pm

Flac already calculates and embeds a MD5 signature in each flac file. Wouldn't this be good enough? Is MD5 particularly vulnerable to collision?

MrSinatra
2008-12-03, 11:14
>SC already tries to be too clever by half. logics logics logics, none
>of them very logical.
>
SC is not clever - it follows simple logical rules, that work for most people. There's no AI, or guesswork.

untrue.

the "logic" of VA assumes that ONE TPE1 mismatch for a mp3 means something is a comp. thats guesswork, thats bogus.



>and then you want to create some kind of voodoo where SC figures out
>somehow what moved where?
>
No, I'm saying it should be kept simple, but could support many different possible ways of matching up files with the SC DB content when files have moved.

Unique ID's are a possibility, but there's no need to make that part of the key, effectively meaning that every source would require a unique ID of the same type. That's nonsense - there's many different ways of coming up with unique ID's - different tags, checksums on the music content, checksums on the file content, etc. It shouldn't be in the SC DB keys that identify records in the database.

The scanner could use many different types of unique ID tags, and store them in the DB, and check for them if the url of the file being scanned doesn't exist in the DB.

what if the file moved, AND changed? my beef with your system is it isn't fullproof.

a unique ID tag would be fullproof. and keep in mind i am saying this tag should be separate from a "fingreprint" MD5 type tag.

not to mention, this scheme sounds like it would increase scan times horsepower needed. and i can only guess how difficult troubleshooting it would be since the whole process and expectations would probably be as clear as mud.

i'm not sure whats nonsense about what i'm proposing? whats the beef with a tag that contains a unique identifier for all my files? i don't see any issues with it.

again, i'm not against your idea, but i just think if they do it it should be optional, just as my idea would be optional. if you don't like my idea, don't use unique tags in your files. if i don't want your way, allow me to be able to opt out of it in options.

(btw, a standalone utility that would assign unique tags could also first scan your files to be sure it never uses the same one twice, if thats your beef)



>how would it do so, and KNOW a track was the same exact one, without some kind of unique ID tag?
>
The scanner could also re-attach files to their original SC DB record using other techniques, including partially matching urls. eg. if a drive letter changes but the rest of the path is the same. It could also match on file size, creation date, artist, album, song, track number tags. These are all things that could work without a user having to tag unique IDs, or without having to work out checksums, which would be a lot slower.

the last thing i want is SC having to "think" anymore than it already does, or scans having to take longer.

i prefer a simple, foolproof method. if others want the voodoo and the lack of clarity, not to mention extra horsepower needed, then fine by me, but PLEASE allow me to opt out of that if i choose.

Philip Meyer
2008-12-03, 13:59
>the "logic" of VA assumes that ONE TPE1 mismatch for a mp3 means
>something is a comp. thats guesswork, thats bogus.
>
What is wrong with that logic? If there are different performing artists on songs on an album, the album is regarded as a compilation. The majority of the time, it is the correct decision. If it did nothing, it would be wrong more often.

What else would you expect it to do (if there wasn't an album artist)? Assume the first artist? Assume the second artist? Assume that each different artist was in fact a song on a different album?

Some logic is needed, and the logic that is right the majority of the time is the correct logic to use.

>what if the file moved, AND changed? my beef with your system is it
>isn't fullproof.
>
That would be an unusual change, outside the normal case.

If a file was moved, scanned, changed, scanned, (or changed, scanned, moved, scanned) that would work.

If not, it could work if there were unique id's. If that's not present either, any persistent data for those files that have changed and moved would not be re-attached, but could remain in the database (so if files were moved back, eg. a network drive letter changed, they would be re-attached again).

>a unique ID tag would be fullproof. and keep in mind i am saying this
>tag should be separate from a "fingreprint" MD5 type tag.
>
Only, if songs were tagged with a unique tag, which means the user must do it before moving or changing files, or there is no comeback.
Tagging all music could be a costly activity that users may not want to do. SC should not depend on it, but could optionally use the tags if they are present.

>not to mention, this scheme sounds like it would increase scan times
>horsepower needed.
It would be just as fast if files haven't moved/changed.

If SC had to read ID tags (could need to support several different types), and compare with the content previously scanned, that could be longer too.

>and i can only guess how difficult troubleshooting
>it would be since the whole process and expectations would probably be
>as clear as mud.
>
I don't see why that would be the case.

>i'm not sure whats nonsense about what i'm proposing? whats the beef
>with a tag that contains a unique identifier for all my files? i don't
>see any issues with it.
>
When you get an idea, it always has to be done exactly as you think, and all other ideas are irrelevant and you don't read posts closely and turn everything into a battle.

As I said, I'm not against using unique ID tags. Far from it. It is an option for identifying music and re-attaching changed source files back to previously scanned metadata.

However, it is one of many ways, and should not be the only way, because there's a reliance on everyone having unique id tags, and that simply will not be the case with everyone (WAV files, iTunes, MusicIP for starters), or will require users to tag before the first scan.

Support would have to tell users how to re-tag to insert unique ID's. Then you have to ensure that ID's really are unique. What if files are copied and not re-tagged? It's not guaranteed foolproof.

Filepaths are unique - you can't have two filepaths the same having different content. All local music has a filepath, without the user needing to do anything.

I am against making unique ID's have meaning to primary keys in the DB schema. I would also be against making the song url (filepath) the unique song id too.

Unique id tags can be scanned and stored as fields in the tables, and used by the scanner for finding files when rescanned, but the DB song id doesn't need to depend on a unique ID from the source.

MrSinatra
2008-12-04, 21:15
>the "logic" of VA assumes that ONE TPE1 mismatch for a mp3 means
>something is a comp. thats guesswork, thats bogus.
>
What is wrong with that logic? If there are different performing artists on songs on an album, the album is regarded as a compilation.

exactly... thats an erroneous assumption to make. you said there "was no guesswork" and thats false. the "logic" of the assumption can't be described as anything other than guesswork.

if i have a sinatra CD, duets, and it has differing TPE1 info, does that mean its a comp? no it doesn't. its not a comp, no matter how you slice it, it isn't a comp.

if ray charles is a guest with billy joel on one track of a 4 cd box set that is otherwise billy joel, does that make it a comp? no.

sure, it is right SOMETIMES, but as i pointed out before, it got at least half of my albums wrong, (ie. of those that it called comps).


The majority of the time, it is the correct decision. If it did nothing, it would be wrong more often.

not in my case, it was about 50/50. but my whole point has always been that it should be optional. the VA logic does not work for a lot of people, its documented all over the forums and bugzilla, but you just think everyone should live in a SC only world, and use its rules exclusively.

you say it would be wrong more often... i say it would simply act in an expected manner. i don't know of any other app that uses a logic to ID comps, altho maybe itunes does, or uses its own apple DB to know somehow... i admit i don't know how itunes does it if it does, but i am fairly sure its the only mainstream app that attempts it.


What else would you expect it to do (if there wasn't an album artist)? Assume the first artist? Assume the second artist? Assume that each different artist was in fact a song on a different album?

Some logic is needed, and the logic that is right the majority of the time is the correct logic to use.

like treating TPE2 as album artist by default? ;)

seriously, lets say i have one cd that is a comp (soundtrack to a film say of many artists) and one that isn't (bj boxset) and both have TPE1 mismatches and i used the proposed option to turn VA logic off.

in the artist list, the billy joel/ray charles track should show up as a separate artist, and the soundtrack should show up as separate artists for all the tracks on it as well.

if browsing albums by album name, all the artists should appear under the album name they are from in track order. if browsing albums by artist name, then the album should show up multiple times.

would that be so horrible?

i could then SEE and understand how SC was acting, and decide if i want to set comp tags, or album artist tags, to pro-actively modify how SC deals with each differing album. that to me is better than as a new user trying to figure out what the hell is going on, and then how to undo all the "logical" stuff SC is doing i don't want it to do!

remember, if i was ONLY using SC, it would be fine to judiciously leave TPE2 blank on actual comps and let it find such and such as a comp. but that is NOT feasible to play nicely with most other mainstream apps.

having said all that... i expect most users will either:

have album artist set (via TPE2) for both comps and non-comps, since thats what gracenote does. (so users wouldn't have anything SC thought was a comp and the logic is useless under such conditions)

or

THINK they have it set, since some apps, like winamp, can just "assume" that TPE1 should fill the "album artist" field in its DBs if TPE2 is blank. (thus users will be surprised when some things are called comps, and some other things aren't, and they don't know why)

i'm just pointing all these things out, i've always agreed with you philosophically that the big apps should never have mis-used TPE2, but there's nothing anyone can do about it now, and i want my files to play nice wherever they may find themselves.



>what if the file moved, AND changed? my beef with your system is it
>isn't fullproof.
>
That would be an unusual change, outside the normal case.

If a file was moved, scanned, changed, scanned, (or changed, scanned, moved, scanned) that would work.

If not, it could work if there were unique id's. If that's not present either, any persistent data for those files that have changed and moved would not be re-attached, but could remain in the database (so if files were moved back, eg. a network drive letter changed, they would be re-attached again).

i don't think its that unusual. if you move something, you might want to change it at the same time... for instance, if i move eva cassidy albums from blues to easy listening, i might also change their genre.

also, my new rips tend to stay in one folder, and go thru several changes or moves to their tags. i try to fix all the tags before i move them, but this isn't always the case. meanwhile it could be days and i've done several SC scans. who knows at what point SC will catch them?

since everyone archives differently, you could see where this could get unwieldy.

again, i want to point out i'm not AGAINST your way... i simply want it to be OPTIONAL. meaning, if i don't want SC to use the voodoo, logic, or whatever heuristic is created then i should have the right to opt out at the very least.



>a unique ID tag would be fullproof. and keep in mind i am saying this
>tag should be separate from a "fingreprint" MD5 type tag.
>
Only, if songs were tagged with a unique tag, which means the user must do it before moving or changing files, or there is no comeback.
Tagging all music could be a costly activity that users may not want to do. SC should not depend on it, but could optionally use the tags if they are present.

agreed!

so, if users want my way, they merely have to add the tags (which is how they would opt in) and SC would always use them, and if they want your way, they need to enable the heuristic logic u describe. (or if not opt in, at least be able to opt out of your proposed logic, but that wouldn't kill SC using the unique ID tags if present)

i'm not sure whats "costly" about adding unique ID tags in an automated way however?



>not to mention, this scheme sounds like it would increase scan times
>horsepower needed.
It would be just as fast if files haven't moved/changed.

for a full clear and rescan? or just a normal scan for changes?

and what if a lot of files moved? how would an infrant handle that?


If SC had to read ID tags (could need to support several different types), and compare with the content previously scanned, that could be longer too.

well, there is no way of adding this feature that won't to some degree add time, esp if the data is meant to persist even thru full clears and rescans.

but it seems to me that a match/no match of a single tag would be quicker than matching/no matching a bunch of possible variables, no? i mean, which way would take longer if the same dataset and conditions were applied to it?

if i misunderstand you, my apologies, but i'm just trying to envision this.



>and i can only guess how difficult troubleshooting
>it would be since the whole process and expectations would probably be
>as clear as mud.
>
I don't see why that would be the case.

well, slim isn't great at documenting logics, and / or the expected behaviors given certain data. we've seen that time and time again with comps, comp tags, VA logic, greatest hits, multidiscs, etc...

so if someone wanted to know why x y or z for data persistence wasn't persisting, it could be problematic to troubleshoot.



>i'm not sure whats nonsense about what i'm proposing? whats the beef
>with a tag that contains a unique identifier for all my files? i don't
>see any issues with it.
>
When you get an idea, it always has to be done exactly as you think, and all other ideas are irrelevant and you don't read posts closely and turn everything into a battle.

phil, with all due respect, i was responding to what you said:

phil said:
"Unique ID's are a possibility, but there's no need to make that part of the key, effectively meaning that every source would require a unique ID of the same type. That's nonsense"

and my question still stands. you said its nonsense, and i say why? why can't SC use a unique ID tag?

if the "key" is the field in the SC dbase that would uniquely ID the file, why can't an unique ID tag simply populate it? why couldn't your way populate a second "key" field? and SC could then use [either/or] or both. (if both were present, it could populate a third key field with the two values to be the master key)

and btw, i'm not the one saying anything personal, i am merely doing what you do, and what everyone has a right to do, vigorously defending my position, in ordder to achieve the best outcome. i have no problem being wrong, but i just want to be SHOWN i am wrong, if in fact i am.

MrSinatra
2008-12-04, 21:15
As I said, I'm not against using unique ID tags. Far from it. It is an option for identifying music and re-attaching changed source files back to previously scanned metadata.

However, it is one of many ways, and should not be the only way, because there's a reliance on everyone having unique id tags, and that simply will not be the case with everyone (WAV files, iTunes, MusicIP for starters), or will require users to tag before the first scan.

ok, i just want to be able to opt out of your system. and i don't think slim should have to support persistent stats for wav, itunes, or MiP.


Support would have to tell users how to re-tag to insert unique ID's. Then you have to ensure that ID's really are unique. What if files are copied and not re-tagged? It's not guaranteed foolproof.

if the file is copied, the unique tag in it is copied with it.

the utility should be able to determine the ids will be unique.

i don't think support would have a problem with this. the app i propose would be very optionless.


Filepaths are unique - you can't have two filepaths the same having different content. All local music has a filepath, without the user needing to do anything.

I am against making unique ID's have meaning to primary keys in the DB schema. I would also be against making the song url (filepath) the unique song id too.

Unique id tags can be scanned and stored as fields in the tables, and used by the scanner for finding files when rescanned, but the DB song id doesn't need to depend on a unique ID from the source.

i just don't see why? the tag would mean SC doesn't need to take time generating keys from some methodology.

but anyway, i think both ideas can easily coexist. i think a user should be able to employ one of them, none of them, or both of them. the decision for what to use, and how, should be left to the user.

erland
2008-12-04, 21:43
Is it just me that feel like this thread has gone completely off topic ?

Wouldn't it be preferable to have:
- One thread discussing the VA/Compilation logic (or preferably use one of the existing ones in the General section of the forum)
- One thread discussing the identification issue (if it needs to be discussed more)
- Let this thread focus on the new schema where Brandon can announce when he have something that he feel is worth commenting.

I'm pretty sure both the identification and the VA/Compilation issues could be discussed forever, the only way to get an end to them is if someone of the lead developers of SqueezeCenter just make a decision. I'm pretty sure they've already heared all the arguments in the various directions, so discussing it back and forth in the Developers section of the forum isn't really IMHO going to get us anywhere closer to a solution.

MrSinatra
2008-12-04, 22:00
erland, you're right. the VA/comp stuff was really just a manifestation of a larger philosophical argument about SC and logics. i was just trying to say that SC should allow users to opt out of all logics, since they aren't always desirable... which leads to this proposed idea of yet another logic to develop for dbase persistent stats.

so i do think that argument, over how to do persistent stats, is well placed in this thread if the new schema is to support it. all i ask is that a logic developed for it be optional, and in addition to a bruteforce simple reading of a unique tag ID, to act as the SC DB key.

as i mentioned, SC could also use both, but should only do so if a user wants SC to.

Philip Meyer
2008-12-06, 10:09
>exactly... thats an erroneous assumption to make. you said there "was
>no guesswork" and thats false. the "logic" of the assumption can't be
>described as anything other than guesswork.
>
There is no artificial intelligence in SC. No neural networks - no guesswork.

There are logical decisions being made.

>if i have a sinatra CD, duets, and it has differing TPE1 info, does
>that mean its a comp? no it doesn't. its not a comp, no matter how
>you slice it, it isn't a comp.
>
It is, if you haven't set an album artist. If you have not given enough information in tags, then you may have undesirable results. I know you have album artist tags, so in your case this album doesn't appear as a compilation.

>sure, it is right SOMETIMES, but as i pointed out before, it got at
>least half of my albums wrong, (ie. of those that it called comps).
>
But as you keep telling me, you have album artist tags religiously set on all of your music collection, so you should never have any compilations. If you do, there's something else wrong.

Anyway, this is off-topic, and discussed to death elsewhere!

We should end this part of the discussion here.

Phil

Philip Meyer
2008-12-06, 11:27
>again, i want to point out i'm not AGAINST your way... i simply want
>it to be OPTIONAL. meaning, if i don't want SC to use the voodoo,
>logic, or whatever heuristic is created then i should have the right to
>opt out at the very least.
>
I don't think you are against my way, you just haven't understood properly. I believe you are not actually a developer, and perhaps misunderstand some terminology.

We were discussing the PRIMARY KEY for storing data in the database. There is no need for a primary key to be displayed to the user at all. The simpler and smaller the better. It is used to ensure that each row stored in a table is unique, and can be found quickly through an index. It doesn't (and shouldn't) have to contain any physical information about the content of the row. If it DID contain information that could change, then the primary key would change, and that means anything refering to the key (foreign keys) would also need to change, and that is not a good thing to do.

There can be other information stored in the row. For songs, I'd expect the properties of the song to be stored (track number, title, etc), and the filename, and any additional metadata, such as identifiers read from tags. There could be more than one - ISRC, MusicBrainz, MusicIP Fingerprint values, etc. None of these things should be the PRIMARY KEY, as none can be guaranteed to be unique, and are not as easy to pass around in the software as a simple unique number (usually primary keys are sequential autonumbers, starting at 1 for example).

The first song that is read would eg. be stored with a primary key number of 1, the second 2, etc...

When rescanning, an existing record will be found by matching various properties from the music file (tags, filename, calculated checksum, etc) with values previously stored in the rows of the table. A quick mechanism is required for finding a single matching row; additional indexes can be added to columns in the table that will help find match rows quickly.

If a row is found, the primary key can then be used to subsequently identify the row that needs to be updated.

So for example, if the filename changed, and the scanner manages to find a match between the content of the file and the existing row in the table, the information in that row can be changed without changing the primary key.


As I have said before, "unique id" tags can be read and stored in the track table, and used to find tracks when the scanner is run. I am in no way against this - never have been. As I said last time, I am against picking any piece of physical data, including url and unique tags for use as the PRIMARY KEY. More so "unique tags" because they are not guaranteed to exist in all sources and are not guaranteed to be unique.

The scanner could look for these unique id tags, and then search the DB for a matching record. Filename (url) is no different in this respect, as this must already be unique. In most cases, most of the time the filenames will match, because people don't move files often. The scanner already has a filename, as it must have that in order to read tags from it, so the search can be done before even having to open the file to read tags, which could help with performance, potentially. If a file is found that doesn't exist in the DB table, then it could also check if some other tag was present and see if that exists in the table and then change that row to contain the new filename and properties and tags from the file.

Note that tags containing apparently "unique id" may not actually be unique. Files/tags could accidentally be copied in the filesystem. Also the same song may exist on several compilation albums, and thus the music content may result in the same id, but be located in different places legitimately. So if these are stored with unique key indexes, they could cause failures when trying to write new rows to the table. As such, non-unique id tags would need additional handling to prevent errors.

Phil

MrSinatra
2008-12-08, 01:41
>exactly... thats an erroneous assumption to make. you said there "was
>no guesswork" and thats false. the "logic" of the assumption can't be
>described as anything other than guesswork.
>
There is no artificial intelligence in SC. No neural networks - no guesswork.

strawman.

i agree this issue doesn't belong here, but i hope the developers will read my response to this post here:

http://forums.slimdevices.com/showthread.php?t=56078

i think the value is in examining the flawed paradigm of the VA logic. esp with the new schema being discussed, it seems a good time to do it.

MrSinatra
2008-12-08, 02:13
>again, i want to point out i'm not AGAINST your way... i simply want
>it to be OPTIONAL. meaning, if i don't want SC to use the voodoo,
>logic, or whatever heuristic is created then i should have the right to
>opt out at the very least.
>
I don't think you are against my way, you just haven't understood properly. I believe you are not actually a developer, and perhaps misunderstand some terminology.

i've made clear i'm not a developer. but i think i understand the terms, while i appreciate the breakdown below, none of it was other than what i thought. that isn't to say i'm not out of my depth, but if i understand what i think i do, then i have a problem with it.


We were discussing the PRIMARY KEY for storing data in the database. There is no need for a primary key to be displayed to the user at all. The simpler and smaller the better. It is used to ensure that each row stored in a table is unique, and can be found quickly through an index. It doesn't (and shouldn't) have to contain any physical information about the content of the row. If it DID contain information that could change, then the primary key would change, and that means anything refering to the key (foreign keys) would also need to change, and that is not a good thing to do.

There can be other information stored in the row. For songs, I'd expect the properties of the song to be stored (track number, title, etc), and the filename, and any additional metadata, such as identifiers read from tags. There could be more than one - ISRC, MusicBrainz, MusicIP Fingerprint values, etc. None of these things should be the PRIMARY KEY, as none can be guaranteed to be unique, and are not as easy to pass around in the software as a simple unique number (usually primary keys are sequential autonumbers, starting at 1 for example).

The first song that is read would eg. be stored with a primary key number of 1, the second 2, etc...

i'm with you so far... but i simply don't like the idea, b/c i don't trust it. what if a file is deleted, never meant to return? does its primary key (and row) live on forever? b/c thats what it sounds like.

and i assume some kind of heuristic is made so that all these data properties make a value that then relates to the primary key, yes?

what value[s] are in the row with the primary key that don't change? b/c a URL can change, as can tag data, and both can change at the same time between scans.

that seems like a computational load. i guess i'm wondering how much change a row can have before a file is either not properly re-identified, or worse, mis-identified. this is where i am skeptical, and prefer bruteforce unique tag ID alone. even if the method is totally solid, i also question the resources SC will need to do it.


When rescanning, an existing record will be found by matching various properties from the music file (tags, filename, calculated checksum, etc) with values previously stored in the rows of the table. A quick mechanism is required for finding a single matching row; additional indexes can be added to columns in the table that will help find match rows quickly.

If a row is found, the primary key can then be used to subsequently identify the row that needs to be updated.

So for example, if the filename changed, and the scanner manages to find a match between the content of the file and the existing row in the table, the information in that row can be changed without changing the primary key.

see, i guess i'm just skeptical of how much tolerance the process would have. too little and it doesn't recognize the file. too much, and it misidentifies it. but i could be being paranoid. yet i am also loathe to put more stress on SC, as big as it is.

i like a system whose clarity is its strength, if also its weakness. personally, i'd prefer SC relating stats to a unique ID in a given file. that put all the power in handling that feature in my hands.

i'd be cool with the tag being the primary key for a table of stats to persist, BUT i could live with it just being the only value in the row the primary key matches to.


As I have said before, "unique id" tags can be read and stored in the track table, and used to find tracks when the scanner is run. I am in no way against this - never have been. As I said last time, I am against picking any piece of physical data, including url and unique tags for use as the PRIMARY KEY. More so "unique tags" because they are not guaranteed to exist in all sources and are not guaranteed to be unique.

right, but thats why i suggested SC develop a standalone app that could basically guarantee it.


The scanner could look for these unique id tags, and then search the DB for a matching record. Filename (url) is no different in this respect, as this must already be unique. In most cases, most of the time the filenames will match, because people don't move files often. The scanner already has a filename, as it must have that in order to read tags from it, so the search can be done before even having to open the file to read tags, which could help with performance, potentially. If a file is found that doesn't exist in the DB table, then it could also check if some other tag was present and see if that exists in the table and then change that row to contain the new filename and properties and tags from the file.

Note that tags containing apparently "unique id" may not actually be unique. Files/tags could accidentally be copied in the filesystem. Also the same song may exist on several compilation albums, and thus the music content may result in the same id, but be located in different places legitimately. So if these are stored with unique key indexes, they could cause failures when trying to write new rows to the table. As such, non-unique id tags would need additional handling to prevent errors.

Phil

the unique tag ID i proposed had nothing to do with the song or song fingerprint ID.

in any case, i understand what you are saying, but i simply have my doubts. it may be a wonderful paradise, but you think the VA logic is perfect too, so again, i have my doubts. i'm not saying you're wrong about this, i'm just saying that i, as an end user, want total control over how SC handles this issue, and if SC used a tag to tie stats to, i could. anything else that isn't at least optional, robs me of that control.

Philip Meyer
2008-12-09, 15:32
>what if a file is deleted, never meant to return? does its
>primary key (and row) live on forever? b/c thats what it sounds like.
>
It's a function of the scanner to detect music that doesn't exist, and remove it from the DB.

The primary key is irrelevant. Any music that was in the DB before that doesn't exist as a source now, is potentially something that should be purged from the DB.

>and i assume some kind of heuristic is made so that all these data
>properties make a value that then relates to the primary key, yes?
>
The primary key should not be made from any physical properties of the data it represents.

>what value[s] are in the row with the primary key that don't change?
Any properties of the row could change, except for the primary unique key. If the primary key was made up from properties for the row, then effectively the primary key would need to change too, and anything that references that key would need to change too.

>that seems like a computational load. i guess i'm wondering how much
>change a row can have before a file is either not properly
>re-identified, or worse, mis-identified. this is where i am skeptical,
>and prefer bruteforce unique tag ID alone. even if the method is
>totally solid, i also question the resources SC will need to do it.
>
I'm afraid you've lost me. I don't think you understand at all.

If a file is identified in the relevant table, then any new data associated with the file can be amended in the matching table row. Files that cannot be associated to existing rows in the table are added as new rows. Rows that are not matched to any file are redundant rows that can be purged.

It is possible that a row exists for a song, but the song is moved, and tags are changed, and if there are also no existing unique id tags in the file, the old row will be deleted, and a new row created. Any additional related persistent data (stats), would be lost when the old row is deleted.

As long as the file is associated to an existing row, the primary key is used to update data if anything has changed, and no persistent data that refers to the song using its primary key will be affected.

All of this is simple, normal database processing.

>right, but thats why i suggested SC develop a standalone app that could
>basically guarantee it.
>
No, it can't guarantee it. Unless it was run on all source data, could write to read-only sources, could modify files via API calls to MusicIP and iTunes, as the first step before doing any scan. Even then, it would need some hashing mechanism that would guarantee unique id's created for songs. A song that is absolutely identical in content (eg. ripped twice to different folders), would need to return different unique id's. If a file content is subsequently modified, the tool would create a new unique id, and thus it wouldn't match.

Any idea how much extra processing time that would take?

The whole concept is horrendous; just forget it!

FLAC files automatically calculate a fingerprint (sort of unique id/checksum) on the music content, such that if only tags change, the fingerprint on the music content remains the same. This could be used as an identifier, alleviating the need to manage a custom app to store a custom tag. It's possible that that could be used to identify songs in the DB following a rescan if files have moved.

A single tag applied to all tag formats and subsequently used as the primary key is madness - it ain't gonna happen gov! Honest.

>as an end user, want total control over how SC handles this issue,
>and if SC used a tag to tie stats to, i could.
>anything else that isn't at least optional, robs me of that control.
>
End users should not need to know anything about unique identifiers in music files.
Everything should be as transparent as possible.
It should be possible to use unique identifiers that are in files and use them if they are available, but not mandatory. Persistent data should be stored in the database and survive rescans as much as possible, without needing unique identifiers, because most average users will not have any.

Moonbase
2008-12-10, 21:13
End users should not need to know anything about unique identifiers in music files.
Everything should be as transparent as possible.
It should be possible to use unique identifiers that are in files and use them if they are available, but not mandatory. Persistent data should be stored in the database and survive rescans as much as possible, without needing unique identifiers, because most average users will not have any.

I’m with you. Being able to use UFID or some MusicBrainz unique IDs is great if they’re there, but we shouldn’t (and can’t) depend on them without making SC a »techie-only« and »non user-friendly« thing. And of course you’re right re primary keys. No dependence on some funny tag values. For non-database people, I’d suggest: Just forget about »primary keys«. In modern database designs, these are just some funny internal numbers and nobody is going to see those anyway. *g


Here’s kinda »real world example« for non-techies:

You might think your »primary key« is your surname. Not so, anymore. What would happen if Mr. Moonbase decided to marry a beautiful girl and change his name to hers? Right: Each and every occurence of »Moonbase, Alfred E., Mr.« would have to be changed to »Beautiful, Alfred E., Mr.«. Which turns out to be not so primary.

Say we stored some data about Mr. Moonbase:

NAMES FILE
Key: »Moonbase, Alfred E., Mr.«
Name: »Moonbase, Alfred E., Mr.«

ADDRESS FILE
Key: »Moonbase, Alfred E., Mr.«
Address: »3, Some Street, Sometown, ZIPCode, Germany«

EMPLOYEE FILE
Key: »Moonbase, Alfred E., Mr.«
Position: »CEO«
Salary: »€ 1 million«

See what would have to be changed? About a zillion places …

(This would be like maybe using some unique MusicBrainz identifier for a track and then suddenly switching to the all-new, even greater »SuperHirn« service. They’d of course use another unique ID for each track.)


So instead we use an »internal funny number«, say »911«, that has nothing to do with Mr. Moonbase except that it is his »internal number« (much like ID numbers on passports). Now we can go and store the same data as above, using our »funny number«:

NAMES FILE
Key: 911
Name: »Moonbase, Alfred E., Mr.«

ADDRESS FILE
Key: 911
Address: »3, Some Street, Sometown, ZIPCode, Germany«

EMPLOYEE FILE
Key: 911
Position: »CEO«
Salary: »€ 1 million«

We can keep all things together by the »funny number« 911—which has no relevance to any of Mr. Moonbase’s data!

Now what needs to be changed when Mr. Moonbase marries and becomes Mr. Beautiful? You guess it: JUST the Name in the names file—all other data still being kept together by the internal »funny number« (also called the »primary key« by us techies—it just sounds better, ya know?):

NAMES FILE
Key: 911
Name: »Beautiful, Alfred E., Mr.«

(Hope that was not too off-topic. And explains the concept. A little. And yes, I enjoyed reading M.A.D. in olden days … ;-)

Dogberry2
2008-12-30, 14:48
Things might be getting a little side-tracked by deep-level detailed discussions of database design. It's been my experience that requirements/discovery meetings go smoother when discussion focuses on user requirements, without trying to get into the design or the "how-to" specifics. In all the requirements meetings I've been in with users and designers, users pretty much always want to talk about the "how to do it" of the thing rather than the "what results it should give" so it's a common enough reaction, but when they're steered back to describing what results they want and how they want those results presented, leaving the details of how to get there up to the designers, the project moves along much more quickly and smoothly.

Once a long list of desired results is in hand, the designers can sift through it and decide what makes sense, what is possible, and what is reasonable in terms of the project timeline, and go back to the users to refine the requirements. Things can get thrown off the list because (for example) they don't fit within the project scope, or they will cost too much (in terms of man-hours and effort) for the benefit they'll return, or whatever, but once the list is pared down and refined, then the design team can go off and do the design. And the users should not be involved in that. Their interest lies in whether the finished system will deliver the results agreed upon; how those results are provided inside the system is not within their purview. Whenever users start trying to tell me how my database should be put together, or how my software should work inside, I politely (but firmly) tell them not to worry about what goes on inside the black box; as long as specific inputs produce the correct outputs, how the black box does it is not their concern.

I'm not trying to insult anybody here; I know there are a lot of professional software engineers and developers hereabouts, including myself. But for the purpose of this project, we are the users. We can request that the DB-API have certain features to allow plug-ins of a certain nature to be built (e.g. "The API should have a callable routine to allow us to do this-and-such. . ."), and I'm sure the designers at Slim will sift through such requests and weigh them against criteria such as I mentioned above (does it make sense? is it possible? is it cost-effective in terms of manpower and time? will the resultant capability be meaningful to a sufficient number of customers to make it worth what it takes to include it? etc.), and then explain their reasoning in giving it a thumbs-up or -down. And after a couple of iterations through such discussions, they'll have a final list of requirements for what the DB-API should do, and they'll go off and design it to do those things. This is pretty much standard procedure for a project life cycle. But getting into nitty-gritty haggling over how the DB architecture should look and work, at this point, is probably not going to be terribly productive. The DB-API is, to us outside of Slim, a wall; inside it is a black box, and we don't really have any say in what goes on in there. Nor should we: the API is our interface. We can lobby for certain functions to be included in the API, and maybe they will be, and maybe they won't, but inside the API, it's Slim's world, not ours. The most we should be doing is saying "I'd like to see a function available that, when I pass it X and Y, it does Z. That would let me do such-and-such."

We can assume (or hope, if you prefer) that the design engineers and developers at Slim will follow well-established, fundamentally sound principles of database and software design in putting together their black box. Things like not proliferating many-to-many relationships all over the place, not having "smart" primary keys (i.e. dependent on specific data fields), setting up sensible indexing schemes, etc. But as outsiders -- as users -- our best approach will be to simply list the functions/features we would like to see available in the API, at a basic level, then explain what use we would make of it and why we think that would be a good thing, and then let them sift through the requests and justifications and make their decisions about what they can/should include in their design. Trying to tell them how to design it, down to the bare metal, isn't likely to help them; they already have their hands full. And design by a committee of users is never a good idea.

cmcneil
2009-04-13, 19:42
He's put some notes up at http://wiki.slimdevices.com/index.php/NewSchema

-dean

From a quick gander at the notes, and without reading this very long thread, this sounds suspiciously like the dreaded Entity Attribute Value model - a well known db design anti-pattern. I sure hope that's not where you're headed with this ;)

gharris999
2009-04-14, 15:43
For those of us who aren't such db wonks: http://en.wikipedia.org/wiki/Entity-attribute-value_model

In particular, quoting from the "downsides" section:

Inefficient queries. Where you would execute a simple query returning 20 columns from a single table, you end up with 20 self-joins, one for each column. It makes for illegible code and dreadful performance as volumes grow (scalability is very bad). This downside can be mitigated by use of any PIVOT extensions in a database's query language or through the use of complex expressions—one per "column"—that allow the table to be joined to only once by ignoring the values seen for columns the expression is not targeted for.

I certainly don't have the db coding chops necessary to adequately asses the validity of those criticisms. But can we agree that the minimum test db for Brenden ought to be >50k tracks? Otherwise, I'm afraid that those of us with large-ish libraries will end up with a boat anchor.

cmcneil
2009-04-14, 17:48
For those of us who aren't such db wonks: http://en.wikipedia.org/wiki/Entity-attribute-value_model

In particular, quoting from the "downsides" section:

Inefficient queries. Where you would execute a simple query returning 20 columns from a single table, you end up with 20 self-joins, one for each column. It makes for illegible code and dreadful performance as volumes grow (scalability is very bad). This downside can be mitigated by use of any PIVOT extensions in a database's query language or through the use of complex expressions—one per "column"—that allow the table to be joined to only once by ignoring the values seen for columns the expression is not targeted for.

I certainly don't have the db coding chops necessary to adequately asses the validity of those criticisms. But can we agree that the minimum test db for Brenden ought to be >50k tracks? Otherwise, I'm afraid that those of us with large-ish libraries will end up with a boat anchor.

I do ;)

Hey I could be WAY off base... maybe they aren't going that way at all. As I said I didn't read the whole thread, it seemed to be a discussion of the benefit of surrogate keys for the non-technical... that's a religious question anyway :)

This is just my inference from the notes posted, so take it with a large grain of salt, more a heads up than a criticism... from a big BIG Squeezebox fan.

I read the wikipedia pointer you posted and I agree with most of the criticism of the EAV model - which every smart db architect "invents" at some point in their career by the way - but the real problem that I could see looming down that road is scalability, especially if there are any future plans for running squeezecenter on an embedded appliance type device.

The challenges of actually implementing the design... on sqlite no less... well I'll leave that to the guys at Squeeze who I'm sure will test their solution rigorously. The part that drives me nuts is that people are always looking for a solution to relational databases. Well guess what - RDBs ARE the solution to the problem of managing large volumes of data, not a problem in search of a solution.

Well I'm going to ring off of this now and leave it to the folks who have brought us all a truly great music streaming system... keep up the great work Squeeze team.

But do look into the literature on the EAV db model (if you haven't already) before you commit to it if that is indeed where you're headed.

erland
2009-04-14, 19:44
From a quick gander at the notes, and without reading this very long thread, this sounds suspiciously like the dreaded Entity Attribute Value model - a well known db design anti-pattern. I sure hope that's not where you're headed with this ;)

I don't think you have to worry, if I have understood it correctly the plan is to create a description file with the entities/attributes you like to handle and then generate a specific database structure based on that description. This should be completely different compared to the EAV model which is designed to make it possible to put anything in a single generic database structure.

Of course I might be completely wrong since we haven't seen any details yet, but based on the answers we got earlier in the thread I'm pretty sure they aren't targeting a EAV design.

Jeff Flowerday
2009-04-15, 22:16
For those of us who aren't such db wonks: http://en.wikipedia.org/wiki/Entity-attribute-value_model

In particular, quoting from the "downsides" section:

Inefficient queries. Where you would execute a simple query returning 20 columns from a single table, you end up with 20 self-joins, one for each column. It makes for illegible code and dreadful performance as volumes grow (scalability is very bad). This downside can be mitigated by use of any PIVOT extensions in a database's query language or through the use of complex expressions—one per "column"—that allow the table to be joined to only once by ignoring the values seen for columns the expression is not targeted for.

I certainly don't have the db coding chops necessary to adequately asses the validity of those criticisms. But can we agree that the minimum test db for Brenden ought to be >50k tracks? Otherwise, I'm afraid that those of us with large-ish libraries will end up with a boat anchor.

I say > 100,000, I'm already approaching 70,000 and will be approaching 100,000 before the end of the year.

egd
2009-05-19, 23:23
I say > 100,000, I'm already approaching 70,000 and will be approaching 100,000 before the end of the year.

Agreed, I'm already over the 100k mark and with Erland's plugins the tables are massive.

egd
2009-05-26, 20:55
I'm in the process of adding a few tables to the SC db which I would like to make persistent. In order to achieve this and integrate with SC I need to find a common link. The obvious choice (unfortunately) is the actual track name and path string, but before I go down this path, has/ is the new schema likely to settle on a unique identifier (with persistence) of some sort? Are there any details of the revised schema available?

erland
2009-05-26, 22:35
I'm in the process of adding a few tables to the SC db which I would like to make persistent. In order to achieve this and integrate with SC I need to find a common link. The obvious choice (unfortunately) is the actual track name and path string, but before I go down this path, has/ is the new schema likely to settle on a unique identifier (with persistence) of some sort? Are there any details of the revised schema available?

If you need anything soon, I would built upon what's available. In TrackStat I use tracks.url and optionally tracks.musicbrainz_id if available. SqueezeCenter also uses this internally for its tracks_persistent table. Musicbrainz ID's are good since they make it possible to handle renaming and restructuring of the music files. You can't rely on only Musicbrains ID's since most users don't use them. If you use Musicbrainz ID you probably also want to perform a synchronization operation after a SqueezeCenter rescan to refresh the track urls in your tables.

The last information I've seen indicates that the next major SqueezeCenter version will be released early summer, there is no way the new schema will be introduced into this version since it hasn't even been released for alpha testing yet.

I would personally be surprised if we will see the new schema in an official release this year, my guess would be sometime next year. However, it has been strangely quiet in this area, so I'm personally not sure if the plans for a new schema is still alive. It would be nice to get some kind of indication from Logitech what's going on regarding the new schema and when they plan to release some more details regarding it.