PDA

View Full Version : Re: Primary key for Song: Strawman SQL databaseintegration thoughts



John A. Tamplin
2004-08-13, 08:08
On Fri, 13 Aug 2004, Peter Speck wrote:

> > Yes, I would suggest a maximum length of any varchar as 254 (some DBs
> > limit it to 255 including the length byte and use the length value 255
> > to
> > denote null [as opposed to 0 being a string containing no characters]).
>
> Sounds like the "640K ought to be enough for everyone" syndrome.

If you read the context of my statement, it was because some databases
have limits on what you can put in a varchar column. Longer than that,
and you have to use a clob which behaves rather differently and is
frequently awkward to use in the perl DBI (due to issues detecting the
type during binding) -- for example, in Informix you can not directly
insert a clob/blob value, you have to insert with a null and then do an
update.

My suggestion is also for the default, so a user could easily change it if
they needed to and their database supported it. In fact, I am not sure
SQLite even enforces length restrictions, so it wouldn't affect users with
the out-of-the-box setup.

In my database implementation, I store only the relative path to the
files, so I can move things around without having to munge the paths in
the DB (for those that want files stored in multiple areas it would be
easy to have a path_prefix_id etc that selects the appropriate one).

> Vidur chose SQLite. The documentation says: "SQLite is typeless. The
> datatype for the column does not restrict what data may be put in that
> column. All information is stored as null-terminated strings."
> http://www.sqlite.org/datatype3.html
> http://www.sqlite.org/lang.html#createtable

Yes, but just because SQLite was chosen for the default does not mean
consideration for other databases should be ignored. IMO, the only change
required for a different database should be the DDL to create the schema.
For the most part, that means changing the type and lengths of fields are
ok, but you specifically should not assume you can put arbtirary strings
in a number field just because SQLite lets you.

--
John A. Tamplin jat (AT) jaet (DOT) org
770/436-5387 HOME 4116 Manson Ave
Smyrna, GA 30082-3723

Peter Speck
2004-08-13, 10:54
On 13/8-2004, at 17:08, John A. Tamplin wrote:

> On Fri, 13 Aug 2004, Peter Speck wrote:
>
>>> Yes, I would suggest a maximum length of any varchar as 254 (some DBs
>>> limit it to 255 including the length byte and use the length value
>>> 255
>>> to
>>> denote null [as opposed to 0 being a string containing no
>>> characters]).
>>
>> Sounds like the "640K ought to be enough for everyone" syndrome.
>
> If you read the context of my statement, it was because some databases
> have limits on what you can put in a varchar column. Longer than that,
> and you have to use a clob [...]

I would argue that those databases are broken w.r.t. SlimServers needs.
A *lot* of people have relative paths longer than 255 chars.
Especially among those with classical music.

So I see it as a fundamental requirement that a database supports
varchar(n), n >= 1000, to be useable for SlimServer.

> My suggestion is also for the default, so a user could easily change
> it if
> they needed to and their database supported it.

I would argue that SQLite is *the* database, as SlimServer will ship
with it, and +90% of all users won't even know what an SQL database is,
nor which one SlimServer uses, nor its limitations. Most users can't
set the time on their VCR. The geeks are not the market, they are part
of it (an important part), but SlimDevices cannot survive if they can
only sell to geeks (geek == those who are able to edit a configuration
file).

It has to work out of the box with superlong paths: SlimDevices is an
electronics company, just like Sony (albeit a bit more geeky). You
never expect having to modify internals when buying Sony hifi
equipment.

If somebody wants to use Oracle, DB2, Informix or other "weird"
databases, I would put the burden on those users. I would definitely
keep all kinds of hacks for databases which only support varchar(255)
out of the mainstream SlimServer (it is already complex enough as it
is).

> In fact, I am not sure
> SQLite even enforces length restrictions, so it wouldn't affect users
> with
> the out-of-the-box setup.

Yes, but I definitely despise 'hacks', where things work because the
database doesn't implement/enforce the SQL standard fully.

> In my database implementation, I store only the relative path to the
> files, so I can move things around without having to munge the paths in
> the DB (for those that want files stored in multiple areas it would be
> easy to have a path_prefix_id etc that selects the appropriate one).

SlimServer already has multiple input dirs: one specified by the user
and one virtual with files from iTunes xml import.

BTW: for iTunes support, it would be nice to have a extra column in the
track table:
itunes_trackid int
It is for iTunes' internal track id's (named "Track ID" in the xml
file), and this will make iTunes -> SlimServer sync much easier (and
faster and less memory hungry), as it then won't have to guess which
track is which based on paths.

BTW: do you support network streams?
E.g. http://64.236.34.67:80/stream/1024
or http://www.digitallyimported.com/mp3/eurodance.pls

I would suggest using URLs instead of paths. A song can have a full URL
(network streams), or partial which is relative to the URL specified by
path_prefix_id. SlimServer currently uses URLs internally (iTunes
too). My example path starts with
file:///Volumes/frede/Music/Woldemar%20Nelson... note the 3 slashes at
the beginning.

As iTunes can have mp3 files outside iTunes main Music folder, having
both absolute and relative URLs is needed. iTunes uses its Music
folder as a kind of "default download directory", but users can place
their files wherever they want.

Calculation of relative URLs -> absolute is already fleshed out in
SlimServer, so it doesn't add new complexity.

>> Vidur chose SQLite. The documentation says: "SQLite is typeless. The
>> datatype for the column does not restrict what data may be put in that
>> column. All information is stored as null-terminated strings."
>> http://www.sqlite.org/datatype3.html
>> http://www.sqlite.org/lang.html#createtable
>
> Yes, but just because SQLite was chosen for the default does not mean
> consideration for other databases should be ignored.

I think we agree on this, except that I take starting point in SQLite
and makes it possible to use e.g. PostgreSQL, Oracle (and other
databases with huge varchar support), instead of Informix and hope it
works due to lax implementation of SQL in SQLite.

> IMO, the only change
> required for a different database should be the DDL to create the
> schema.
> For the most part, that means changing the type and lengths of fields
> are
> ok, but you specifically should not assume you can put arbitrary
> strings
> in a number field just because SQLite lets you.

We agree on this. I think that is the most dirty aspect of SQLite (I
was very much surprised by it), and would definitely prefer it used
strict type-checking.

(don't take this as bashing, I'm just very frank sometimes)

----
- Peter Speck

vidurapparao
2004-08-13, 11:45
It seems like we may be putting the cart before the horse by focusing
too much on a specific schema - that's the reason the branch code has a
transitional schema. It seems to me that the entity relationship model
and the API should come before the schema details. Pat Farrell's
original proposal was trying to kick off a discussion of the entity
relationship model, but it seems like some of the focus has moved to the
SQL details. I'd like to move the discussion back to the former.

Having said that, I want to support a couple of things that Peter said
below. The choice of SQLite was made with some (maybe not a lot, but at
least some) thought. The goal of this project is not specifically to
support SQL or even RDBMSs, but to have a reasonably performant - both
in terms of speed and memory usage - data storage solution. The Perl
hash table solution that we're currently using was failing on both
counts with large libraries.

Berkeley DB (as Frédéric Miserey had previously suggested) was a
potential option, but it seemed like a RDBMS made more sense given the
inherently relational nature of the data (my experience was that BDB
became ungainly when trying to represent the many-to-many relationships
we have here). SQLite seemed the right compromise between the simplicity
and light overhead of BDB and the robustness and scalability of
standalone RDBMSs. The specific features of SQLite that I liked were:

* It's an in-process database solution. For the amount of data we're
talking about, I believe that this is better than the inherent
overhead introduced by dealing with the IPC necessary for
standalone databases.
* It's easy to install. In fact, the DBD::SQLite driver package
actually contains SQLite! This alleviates the packaging issues
that would be necessary for other databases.
* It performs well and with low overhead for the scale of data that
we're talking about. Specifically, it's a good option for tables
with up to10s or 100s of thousands of rows. We probably won't need
to deal with the next order of magnitude higher.
* It has a great license. Specifically, in the DBD::SQLite
packaging, it's available with the Perl license.

I don't think we should preclude people plugging in other databases, but
we're not going to bend over backwards to support it. As John suggests,
it's possible that only the DDL need be different for different
databases. The current code in the branch allows for this. It's possible
that we may have to make decisions in the future that favor the
strengths of SQLite or work around the weaknesses, while putting other
databases at a disadvantage. I'm comfortable making those choices.

Please don't read this as a flame of any sort...it's meant to be a
gentle nudging of the direction of the discussion. :-)

--Vidur

Peter Speck wrote:

> On 13/8-2004, at 17:08, John A. Tamplin wrote:
>
>> On Fri, 13 Aug 2004, Peter Speck wrote:
>>
>>>> Yes, I would suggest a maximum length of any varchar as 254 (some DBs
>>>> limit it to 255 including the length byte and use the length value 255
>>>> to
>>>> denote null [as opposed to 0 being a string containing no
>>>> characters]).
>>>
>>>
>>> Sounds like the "640K ought to be enough for everyone" syndrome.
>>
>>
>> If you read the context of my statement, it was because some databases
>> have limits on what you can put in a varchar column. Longer than that,
>> and you have to use a clob [...]
>
>
> I would argue that those databases are broken w.r.t. SlimServers
> needs. A *lot* of people have relative paths longer than 255 chars.
> Especially among those with classical music.
>
> So I see it as a fundamental requirement that a database supports
> varchar(n), n >= 1000, to be useable for SlimServer.
>
>> My suggestion is also for the default, so a user could easily change
>> it if
>> they needed to and their database supported it.
>
>
> I would argue that SQLite is *the* database, as SlimServer will ship
> with it, and +90% of all users won't even know what an SQL database
> is, nor which one SlimServer uses, nor its limitations. Most users
> can't set the time on their VCR. The geeks are not the market, they
> are part of it (an important part), but SlimDevices cannot survive if
> they can only sell to geeks (geek == those who are able to edit a
> configuration file).
>
> It has to work out of the box with superlong paths: SlimDevices is an
> electronics company, just like Sony (albeit a bit more geeky). You
> never expect having to modify internals when buying Sony hifi equipment.
>
> If somebody wants to use Oracle, DB2, Informix or other "weird"
> databases, I would put the burden on those users. I would definitely
> keep all kinds of hacks for databases which only support varchar(255)
> out of the mainstream SlimServer (it is already complex enough as it is).
>
>> In fact, I am not sure
>> SQLite even enforces length restrictions, so it wouldn't affect users
>> with
>> the out-of-the-box setup.
>
>
> Yes, but I definitely despise 'hacks', where things work because the
> database doesn't implement/enforce the SQL standard fully.
>
>> In my database implementation, I store only the relative path to the
>> files, so I can move things around without having to munge the paths in
>> the DB (for those that want files stored in multiple areas it would be
>> easy to have a path_prefix_id etc that selects the appropriate one).
>
>
> SlimServer already has multiple input dirs: one specified by the user
> and one virtual with files from iTunes xml import.
>
> BTW: for iTunes support, it would be nice to have a extra column in
> the track table:
> itunes_trackid int
> It is for iTunes' internal track id's (named "Track ID" in the xml
> file), and this will make iTunes -> SlimServer sync much easier (and
> faster and less memory hungry), as it then won't have to guess which
> track is which based on paths.
>
> BTW: do you support network streams?
> E.g. http://64.236.34.67:80/stream/1024
> or http://www.digitallyimported.com/mp3/eurodance.pls
>
> I would suggest using URLs instead of paths. A song can have a full
> URL (network streams), or partial which is relative to the URL
> specified by path_prefix_id. SlimServer currently uses URLs
> internally (iTunes too). My example path starts with
> file:///Volumes/frede/Music/Woldemar%20Nelson... note the 3 slashes
> at the beginning.
>
> As iTunes can have mp3 files outside iTunes main Music folder, having
> both absolute and relative URLs is needed. iTunes uses its Music
> folder as a kind of "default download directory", but users can place
> their files wherever they want.
>
> Calculation of relative URLs -> absolute is already fleshed out in
> SlimServer, so it doesn't add new complexity.
>
>>> Vidur chose SQLite. The documentation says: "SQLite is typeless. The
>>> datatype for the column does not restrict what data may be put in that
>>> column. All information is stored as null-terminated strings."
>>> http://www.sqlite.org/datatype3.html
>>> http://www.sqlite.org/lang.html#createtable
>>
>>
>> Yes, but just because SQLite was chosen for the default does not mean
>> consideration for other databases should be ignored.
>
>
> I think we agree on this, except that I take starting point in SQLite
> and makes it possible to use e.g. PostgreSQL, Oracle (and other
> databases with huge varchar support), instead of Informix and hope it
> works due to lax implementation of SQL in SQLite.
>
>> IMO, the only change
>> required for a different database should be the DDL to create the
>> schema.
>> For the most part, that means changing the type and lengths of fields
>> are
>> ok, but you specifically should not assume you can put arbitrary strings
>> in a number field just because SQLite lets you.
>
>
> We agree on this. I think that is the most dirty aspect of SQLite (I
> was very much surprised by it), and would definitely prefer it used
> strict type-checking.
>
> (don't take this as bashing, I'm just very frank sometimes)
>
> ----
> - Peter Speck
>
>

Will McDonald
2004-08-13, 13:19
On Fri, Aug 13, 2004 at 11:45:44AM -0700, Vidur Apparao wrote:
> I don't think we should preclude people plugging in other databases, but
> we're not going to bend over backwards to support it. As John suggests,

Thanks, Vidur, for the summary (and the work!). While I agree that it doesn't
make sense for SD to bend over backwards to support other databases, I believe
that many people on this list, and an admittedly small number of "power users"
will find it a great benefit that it *shouldn't* require a lot of work to
support other DBs. I think most of us currently hold the belief that a
moderate amount of thought will allow us to design a multi-backend-capable
solution.

That being said, I'd like to add an additional request: when it comes time to
design the schema and schema-api interfaces, let's take another slow,
thoughtful approach so we can create an easily-extensible schema and api. This
will please the 5% of power users who may want to add an additional column to
the database and modify a web skin to take advantage of that (like me). Much
more importantly to SD, though, is that as we geeks test out and discuss new
solutions to scratch our individual itches (like hasErrors for me), we may
discover great new features that can be incorporated into the default release.
This has been the case for slimserver in general, and I expect the effect to
be even greater with relational databases.

We can save the details of this discussion until later, but that will probably
mean a place for users to define new schemas, etc, that won't get overwritten;
cleanly defined and abstracted APIs, and the ability to hook into database
code from many places in the server.

I now return you to your regularly scheduled discussion. :-)

-w

--
---------Will McDonald-----------------will (AT) upl (DOT) cs.wisc.edu----------
GPG encrypted mail preferred. Join the web-o-trust! Key ID: F4332B28

Peter Speck
2004-08-13, 14:18
On 13/8-2004, at 22:19, Will McDonald wrote:

> That being said, I'd like to add an additional request: when it comes
> time to
> design the schema and schema-api interfaces, let's take another slow,
> thoughtful approach so we can create an easily-extensible schema and
> api.

That time is now, it's quickly passing by.

I've not spend much time browsing the new code, so this list of wishes
might be very off the mark:

1) Supporting other SQL databases will share 99% of code for SQLite, so
I think Slim::Music::DBI should be split into a generic
Slim::Music::SQL::SQLBase and a very small specific subclass
Slim::Music::SQL::SQLite (connecting to db).

2) We should have a set of official base classes
Slim::Music::Info::Track, Album, Genre, ... Currently some are
embedded+implemented in Slim::Music::DBI and the rest of the server
(all the old code) uses hash tables and a lot of accessors in Info.pm

3) Kill the accessors in Info.pm: composer, band, conductor, ....

4) Change track identifier from url to a database specific id (black
box for the rest of the server, but something memory-light like an
integer). At least not require it to be identical to the file location,
but some internal url, e.g. x-slim://1234567

5) Reconsider caching of information from the database. My guess is
that it would not be needed, as long as the rest of the server uses a
Slim::Music::Info::Track object and not refetches it all the time. That
is, instead of passing a track_id (currently an url) as paramater to
all functions, pass this Track.pm object instead.

6) Kill the rest of Info.pm: make a subdir with *.pm used for scanning
music dirs and iTunes support. So the Music pkg only contains the
Track, Album, Genre, ... and DataSource.pm

(I know this means major breakage all over the map)

> This
> will please the 5% of power users who may want to add an additional
> column to
> the database and modify a web skin to take advantage of that (like
> me). Much
> more importantly to SD, though, is that as we geeks test out and
> discuss new
> solutions to scratch our individual itches (like hasErrors for me), we
> may
> discover great new features that can be incorporated into the default
> release.

This brings up a point about how schemas are handled. From experience,
I will assume SlimServer 5.4 will use one schema, 5.4.1 another one,
etc... The number of columns in the track table will grow.
Possible, some information is user-specified: hasErrors, or some
scoring value for shuffle logic (like iTunes' 1-5 stars). The user
would be annoyed if all his information is deleted whenever he upgrades
SlimServer to a new version.

I can see 2 possible solutions (for a SQL based database):
1) keep the information in 2 tables: one for information that comes out
of the id3 tags, and one with the users information. Whenever
SlimServer fetches info from the database, these two tables have to be
joined. (This solution isn't perfect: what to do when the users table
is extended?)

2) keeping the definition of the tables in a format that perl
understands. From this it can create the "create table ..." sql
statement than is currently contained in server/SQL/SQLite/dbcreate.sql
file. Furthermore, it can compare the perl-specification to the layout
of the current database, make a diff and on-the-fly add/drop/modify
columns in the database. It would probably be needed for some changes
to be specified in perl, e.g. column renames. This could be made as
"to go from version 5 to 6, rename column foo to bar" or possible even
a list of SQL statements to execute.
We could go so far, as having a convention that says that columns
starting with some prefix, e.g. "local_", is for the users own
experiments, and is ignored by the perl-diff-patcher.

----
- Peter Speck

nothing lasts as long as the temporary
- Napoleon