Home of the Squeezebox™ & Transporter® network music players.
Page 3 of 14 FirstFirst 1234513 ... LastLast
Results 21 to 30 of 134
  1. #21
    Senior Member JJZolx's Avatar
    Join Date
    Apr 2005
    Location
    Colorado
    Posts
    11,536
    Quote Originally Posted by Triode View Post
    2) The current code support incremental scanning - this is definately a user
    requirement, but it leads to complexity to get a consitent database and has
    been the cause of a number of bugs - we ended up resolving an inconsitent
    contributor_album tables by recreating it after each scan for instance.
    I seem to recall, way back when, suggesting the contributor_album table. In hindsight it wasn't a very good idea, as the relationships can (or should be) derivable from contributor_track and the tracks to albums relationships. Trying to keep data consistent when there are such redundancies in the database can be a handful. The Years table is another example.

  2. #22
    Senior Member JJZolx's Avatar
    Join Date
    Apr 2005
    Location
    Colorado
    Posts
    11,536
    Quote Originally Posted by Dan Sully View Post
    * erland shaped the electrons to say...

    >The basic principles for the browsing schema are:
    >- We get rid of the albums, contributors and genres tables and replace
    >these with a more generic "categories" table that handles all the
    >categories a track can be categories into. This will make it possible
    >to do very flexible browsing mechanisms to browse the library by any
    >category with the same SQL code.


    Welcome to normalization hell.
    Hell is right. I've worked with databases normalized to the n'th degree and they're nightmares. Yeah, they're flexible alright. Trying to troubleshoot apps by looking at data in the tables becomes virtually impossible.

  3. #23
    Senior Member erland's Avatar
    Join Date
    Dec 2005
    Location
    Sweden
    Posts
    11,048
    Quote Originally Posted by Fred View Post
    Right. Noble goal but I would not overdo it or the code will be (as is much of today's "generic" code) replete with special handling for major types.

    I mean album has distinct properties than year, above and beyond being a way to select a list of tracks. For example, coming from album you want tracknum sort, whereas it makes little sense by year. This ends up more or less elegantly managed in some form of generic code with tidbits of specifics inserted here and there. This is pretty hard to manage, IMHO, because yes you do the change once but is it really applicable to all of the minor subcases you do not have in mind when changing the generic code ?

    If we have 2 phases, then creating a specific browse-by-year table should not be a major issue. I mean, take most of the processing hit creating tables, not making generic acrobatics while browsing.
    You might be correct, but what I'm struggling with is how you would handle flexible browsing and browsing based on custom tags if you have specific tables for all "objects". When we talk about custom tags this would basically mean that you need to dynamically create tables based on the custom tag names.

    The current solution with separate tables for albums, contributors, genres is fine as long as we can pre-define in which order they should be browsed. However, it starts to get complex when a users sometimes likes to start with a Composer, then select Genres and finally and Album and the same user in another situation like to start with genre and after genre select Album artists and then Album.

    I'm not saying that it won't be possible but the perl code will get fairly complex since it needs to generate the DBIx queries dynamically based on what the user selects. It doesn't get easier when we also start to talk about custom tags where only the user actually knows what the custom tag represents.

    Of course, we can choose to go with an easier more hardcoded solution similar to what we have today, but then we probably need to limit the flexibility and not fully support bug #2700 and #2701.

    In the patch I posted yesterday in this thread I had a variant of the two solutions. The albums, contributors, genres tables of today remained as they are but I added a separate tags which also contains albums, contributors and genres and can be used if you like to browse in a flexible way.

    Do you (or someone else) have a suggestion of how to handle custom tags if we keep the structure where each object type has its own table ?
    Erland Isaksson (My homepage)
    Developer of many plugins/applets

  4. #24
    Senior Member JJZolx's Avatar
    Join Date
    Apr 2005
    Location
    Colorado
    Posts
    11,536
    Quote Originally Posted by erland View Post
    You might be correct, but what I'm struggling with is how you would handle flexible browsing and browsing based on custom tags
    Is it planned to offer such flexibility? Is it necessary?

    The current database handles most genres of music fairly well. There are, of course, a lot of calls for being able to handle classical music. That's pretty complex, but also fairly well-known what's needed. Added support for contributor roles like composers, orchestras, conductors, soloists/performers, added organization by 'work' instead of 'album'.

    Then add individual user profiles, and perhaps multiple libraries and you're already going to have a system that's much more flexible than just about anything available.

  5. #25
    Senior Member vrobin's Avatar
    Join Date
    May 2007
    Posts
    460
    Limiting the number of tables sounds like a good idea, being able to use "generic code" too, but we must do something to prevent data duplication. When the string "Carlos Santana" is in the base... make it be in the base only once, and everything that wants "carlos santana" should use the corresponding 'id'.

    Can we agree that the base object ('base' in the sense of 'the most elementary') of the DB is the song, or must we say that the base object of the DB is the file? (you know... some silly people like to use embedded cuesheet in a single album file -i'm only kiding ).

    If the base object is the song, we can achieve some sort of generic property object attached to the base object. To improve performance, let's say we'll have 'string property', 'integer property' and let's say 'blob property'.

    Thus, a base object doesn't have an artist, or an album artist, and a date, and a position, and a subtrack index. A base object only have:
    an integer property called 'year',
    a string property called 'album name'
    an integer property called 'album id' (because album name isn't just enough),
    a blob property called 'album cover'
    an integer called 'track index'
    and so on.

    The db would look like:

    base_object_table:
    - base_object_db_uid (database standard uid)
    - base_object_id (some unike id like md5 audio or musicip fingerprint)

    base_object_TO_integer_property_table:
    - base_object_db_uid
    - integer_property_db_uid

    base_object_TO_string_property_table:
    - base_object_db_uid
    - string_property_db_uid

    base_object_TO_blob_property_table:
    - base_object_db_uid
    - blob_property_db_uid

    blob_table:
    - blob_property_db_uid
    - blob_binary_data

    string_table:
    - string_property_db_uid
    - stringvalue
    - order_stringvalue (a dedicated ordered_string table)

    integer_table:
    - integer_property_db_uid
    - integervalue

    This schema could be generic enough, rather optimized and yet not too difficult to genericly crawl...
    The properties should be well defined, and maybe we can find two or more different string properties (basic string property like 'genre', double inderected string property like 'credits' where strings are associated by pairs role and name).

    I hope these thoughts are useful...

    Robin
    Last edited by vrobin; 2008-06-09 at 14:52.

  6. #26
    Senior Member Philip Meyer's Avatar
    Join Date
    Apr 2005
    Location
    UK
    Posts
    5,596

    Thoughts regarding new database schema ?

    If performance is the driving force, I'm not sure it's a great idea to hold everything generically. Several tables with less rows may be better for performance to avoid seeks/scans.

  7. #27
    Senior Member Philip Meyer's Avatar
    Join Date
    Apr 2005
    Location
    UK
    Posts
    5,596

    Thoughts regarding new database schema ?

    >If it has to be massaged to be browsed, it goes in the first DB.
    >

    I am worried about performance of things like MusicIP. Whenever the current MusicIP plugin determines that new content is available in the MusicIP service, it scans the whole of the MusicIP source data into the library. If a single new song goes into the first DB and the whole of the second DB is thrown away and reproduced from the first, that will affect current playback, and performance will be bad.

    I think of the first DB as a cache for reading the local file tags.

    Is it worth considering one "scan" DB for each source? A list of source music folders could be configured, creating one source DB for each configured source path. Then if a source folder is changed or deleted, the other sources are unaffected.

    >> I think id's in the first DB have to be URL-based (URL or make a unique
    >> id from the URL).

    >
    >The question is what do you want the ID to survive. If you ID on full
    >paths, then changing the name of the disk on Mac OS X would change the
    >path, therefore the ID. MusicBrains try to create an ID surviving a
    >remix...
    >I'd ID on partial URL (from the library root - so moving it keeps it
    >intact from SC POV).
    >

    URLs from library root may help, but not be full-proof. Files have a habit of moving location or being renamed due to tagging organisers.

    A "scan" DB could be URL based, but the second DB could have different ID's, determined by URL or other unique tag values that may be found in the data from the "scan" DB. Eg. if a MusicBrains ID is present, use that as the ID instead of URL.

  8. #28
    Senior Member erland's Avatar
    Join Date
    Dec 2005
    Location
    Sweden
    Posts
    11,048
    Quote Originally Posted by JJZolx View Post
    Is it planned to offer such flexibility? Is it necessary?

    The current database handles most genres of music fairly well. There are, of course, a lot of calls for being able to handle classical music. That's pretty complex, but also fairly well-known what's needed. Added support for contributor roles like composers, orchestras, conductors, soloists/performers, added organization by 'work' instead of 'album'.

    Then add individual user profiles, and perhaps multiple libraries and you're already going to have a system that's much more flexible than just about anything available.
    Good points.

    I have a feeling we need to decide what to target:
    1. We can choose to define custom tag names for all relevant things and include them into the current database schema or an optimized version of the current schema.

    2. We can choose a more generic approach where we let the user configure which custom tags he like SqueezeCenter to scan and use.


    The advantage with 1 will be easier configuration and the increased possibility to optimize the database. The disadvantage with 1 would be that it will probably be quite hard to implement flexible browsing as suggested in: http://bugs.slimdevices.com/show_bug.cgi?id=2701

    The advantage with 2 will be that it will be pretty easy to handle flexible browsing as described in bug #2701, but the disadvantage is a more complex configuration and probably decreased performance in some situations.
    Erland Isaksson (My homepage)
    Developer of many plugins/applets

  9. #29
    Senior Member erland's Avatar
    Join Date
    Dec 2005
    Location
    Sweden
    Posts
    11,048
    Quote Originally Posted by vrobin View Post
    Can we agree that the base object ('base' in the sense of 'the most elementary') of the DB is the song, or must we say that the base object of the DB is the file? (you know... some silly people like to use embedded cuesheet in a single album file -i'm only kiding ).
    I think it needs to represent a song if we like to continue handling FLAC's with cuesheets.

    Quote Originally Posted by vrobin View Post
    If the base object is the song, we can achieve some sort of generic property object attached to the base object. To improve performance, let's say we'll have 'string property', 'integer property' and let's say 'blob property'.

    Thus, a base object doesn't have an artist, or an album artist, and a date, and a position, and a subtrack index. A base object only have:
    an integer property called 'year',
    a string property called 'album name'
    an integer property called 'album id' (because album name isn't just enough),
    a blob property called 'album cover'
    an integer called 'track index'
    and so on.

    ...

    This schema could be generic enough, rather optimized and yet not too difficult to genericly crawl...
    The properties should be well defined, and maybe we can find two or more different string properties (basic string property like 'genre', double inderected string property like 'credits' where strings are associated by pairs role and name).
    I'm not sure we need to have separate the tables per datatype.
    As I see it there are basically a number of situations where a generic approach would be prefered and in all situations besides sorting an integer can probably be handled in the same way as a text field:
    - When browsing
    - When listing object details
    - When searching
    - When sorting

    We can always generate special sort values when transfer the scanning database schema to the browsing database schema, this will take care of integers so they can be handled the same way as string when sorting by generating "0001" instead of "1".

    Blobs is a special case, but as I've seen so far blobs are only used for track title (not sure why) and album arts. Albumart is a special case anyway where we need to know that the blob contains an image since it should render differently then a text string, so in this case I'm not sure a generic approach is needed.

    I think we will get performance problems when we have EVERYTHING in generic tables, so I think some kind of combination where the things that needs to be flexible is in generic tables and the rest is stored in hard coded columns would be good enough.

    I feel that a central question here is what controls the behaviour:

    Browsing:
    - Is it the perl browse code or the database contents that controls which objects you should be able to browse ?

    Search:
    - Is it the perl search code or the database contents that controls which objects you should be able to search ?

    Sorting:
    - Is it the perl sort code or the database contents that controls which objects you should be able to sort by ?

    Details:
    - Is it the perl song/album info code or the database contents that controls which objects you should see when showing the details of an object ?

    If we let the database contents control these things, we will get a very flexible solution where new search/sort/browse options could be easily added to the system.

    If we always let the perl code control these things WE need to decide in which way the user should be able to browse/search/sort his library, pretty much the same as we do today. Although, based on the current knowledge regarding music in this community I think we can make a pretty good decision that will satisfy most users (but not all). So in this situation it's basically just a question of which objects that needs to be added to the database to better support classical music.
    Erland Isaksson (My homepage)
    Developer of many plugins/applets

  10. #30
    Senior Member erland's Avatar
    Join Date
    Dec 2005
    Location
    Sweden
    Posts
    11,048
    Quote Originally Posted by Philip Meyer View Post
    I am worried about performance of things like MusicIP. Whenever the current MusicIP plugin determines that new content is available in the MusicIP service, it scans the whole of the MusicIP source data into the library. If a single new song goes into the first DB and the whole of the second DB is thrown away and reproduced from the first, that will affect current playback, and performance will be bad.
    If parts (or all) of the transformation between the first and second DB is done with SQL scripts, it would hopefully be pretty fast. But it will probably still interrupt the music, because the currently playing song might get a new id in the second database.

    Does MusicIP integration really need to perform a rescan while music is playing ?
    What if it waits until the music stops or until the user manually selects to rescan ?

    Quote Originally Posted by Philip Meyer View Post
    Is it worth considering one "scan" DB for each source? A list of source music folders could be configured, creating one source DB for each configured source path. Then if a source folder is changed or deleted, the other sources are unaffected.
    I don't think it needs to be a separate DB, but we should probably include a "source" column in the first DB to indicate which scanning module that detected the song. This way it will also be possible to make a rescan based on a single scanning module.

    Quote Originally Posted by Philip Meyer View Post
    URLs from library root may help, but not be full-proof. Files have a habit of moving location or being renamed due to tagging organisers.

    A "scan" DB could be URL based, but the second DB could have different ID's, determined by URL or other unique tag values that may be found in the data from the "scan" DB. Eg. if a MusicBrains ID is present, use that as the ID instead of URL.
    The problem with Musicbrainz ID's is that it requires the user to tag their music files in a special way, my guess is that the number of SqueezeCenter users that have Musicbrainz tags are probably less then 50%.

    For FLAC files there is already a checksum of the audio portion of the file stored inside the FLAC file, so this should be very fast to retrieve and use.
    Does anyone know if the FLAC checksum will be unique within a library ?

    For other file types we need some kind of identification to uniquely identify the music file so statistics data such as ratings, play counts and last played time can survive a rescan. It would probably be enough to calculate the checksum of based on portions of the audio file, so we probably doesn't have to calculate a checksum of the whole file. Alternatives would be something like the PUID's used by MusicDNS, but I suspect these are going to cause an even larger performance problem.

    I believe that one of the reasons why ratings haven't been included in SqueezeCenter so far is this problem. A user is going to get VERY upset if all his ratings suddenly are lost just because he has restructured his library a bit or moved it to a new disk or a new main directory.
    Erland Isaksson (My homepage)
    Developer of many plugins/applets

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •