PDA

View Full Version : Re: Slimserver footprint, time and space, andanarchitectural suggestion



John A. Tamplin
2005-06-06, 10:15
On Mon, 6 Jun 2005, Jeff Coffler wrote:

> > [...] Unless it is multithreaded (and most databases I know of, including
> > commercial ones, have limited multithreading support -- usually you need a
> > separate process for each connection for best results), you are now in a
> > worse situation in that any long-running query will block the streaming.
>
> This is total hogwash. I work with both free and commercial databases all
> day long in my job, developing a carrier-grade product that has hundreds (if
> not thousands) of concurrent threads.
>
> We work with both Berkeley DB (free) from Sleepycat Corporation and Oracle
> (commercial). Both have excellent multi-threaded support. We generally
> have *ONE* process, and we have no problems having many many threads
> accessing the database concurrently. I'll grant you that things weren't so
> rosy with Oracle in this regard, but that was - what - 10 years ago? There
> have been many many releases where, quite simply, Oracle has no problems
> with multithreaded access. And Berkeley DB has never had problems with
> multithreaded access.

Berkeley DB is an indexed file storage mechanism, not a database. By
database in the context of Slimserver, I am referring to databases as
accessed by the Perl DBI interface, which specifically refers to SQL
databases.

>From perldoc DBI, version 1.40:
---
Threads and Thread Safety

Perl 5.7 and later support a new threading model called iThreads.
(The old "5.005 style" threads are not supported by the DBI.)

In the iThreads model each thread has it’s own copy of the perl
interpreter. When a new thread is created the original perl interpreter
is 'cloned' to create a new copy for the new thread.

If the DBI and drivers are loaded and handles created before the thread
is created then it will get a cloned copy of the DBI, the drivers and
the handles.

However, the internal pointer data within the handles will refer to
the DBI and drivers in the original interpreter. Using those handles in
the new interpreter thread is not safe, so the DBI detects this and croaks
on any method call using handles that don't belong to the current thread
(except for DESTROY).

Because of this (possibly temporary) restriction, newly created
threads must make their own connctions to the database. Handles can't
be shared across threads.

But BEWARE, some underlying database APIs (the code the DBD driver uses to
talk to the database, often supplied by the database vendor) are not thread
safe. If it's not thread safe, then allowing more than one thread to enter
the code at the same time may cause subtle/serious problems. In some cases
allowing more than one thread to enter the code, even if not at the same
time, can cause problems. You have been warned.

Using DBI with perl threads is not yet recommended for production
environments.
---

Note that given the above, if it does work you can only make it work by
having a separate connection from each thread. If you have thousands of
threads, that is a scalability problem on the database server. Some C
APIs support multithreaded access using a single connection, and with
others you can build your own synchronization around connections.

I haven't thried multithreaded access via DBD::Oracle in a few years (it
was against 8i), so it is possible that has been fixed but it certainly
wasn't 10 years ago that it didn't work.

Until the most recent version of Informix, ESQL/C connection state was
kept in global variables thus limiting you to one connection per process.
In the current perl multhreading model, that means only one thread can have
DB access.

> Obviously, the database backend chosen by Slim Devices isn't as well behaved
> in this context, but then they didn't need it to be given their short-term
> needs. But there's no question to me that backend databases exist that
> behave quite well with multi-threaded access.

Have you successfully used DBI in multithreaded perl applications, or are you
referring to other APIs?

> Not sure I totally agree here. Obviously, you can't tolerate database
> blocking here. But then an argument can be made that if there are two
> processes (one being a streamer and one being a UI interface), the UI
> interface should give all data it needs to the streaming process (via IPC or
> some other mechanism). After all, if you're after performance, having the
> UI interface look up a bunch of stuff from the DB, only to have the
> streaming process do the same - that just seems like substandard performance
> (compared to the "cost" for an IPC call).

If you take that approach, then it sounds like you have added another
thread/process (I keep saying process because I am not sure threaded perl
is sufficiently portable to all the platforms where it is needed,
specifically Windows) -- a streaming process, a database process, a UI process,
and perhaps a scanning process if that is separated from the others.

The databases I have worked with wind up keeping much of the working set
of rows cached in RAM, so sending a query for something recently fetched
has little cost (especially if it is a singleton select, which would normally
be the case for looking up song info). So, the extra overhead is having to
process the query, which if the code prepares it ahead of time is very small.

The advantage gained is that you don't have to implement your own complicated
protocol for figuring out what to cache, what to send to another process
(probably including some kind of notification interface so a different
process can request what it wants to be notified for). I think just having
multiple connections to the database and letting it do its job is a better
plan. Obviously, that doesn't work so well for databases that are embedded
into each process, such as DBD::SQLite, DBD::XBase, or others, since they
will have to hit the disk each time.

--
John A. Tamplin jat (AT) jaet (DOT) org
770/436-5387 HOME 4116 Manson Ave
Smyrna, GA 30082-3723

Jeff Coffler
2005-06-06, 11:17
On Mon, 6 Jun 2005, John Tamplin wrote:

>> > [...] Unless it is multithreaded (and most databases I know of,
>> > including
>> > commercial ones, have limited multithreading support -- usually you
>> > need a
>> > separate process for each connection for best results), you are now in
>> > a
>> > worse situation in that any long-running query will block the
>> > streaming.

Okay, you're making more sense here. Phew! What you're discussing (in your
latest message) is a PERL limitation, not a database limitation.

Berkeley DB (which is a database, by the way - just a BTREE database - and
it does have a PERL interface to access it) and Oracle both have no problems
with many many threads in one process. These do not require a connection
per thread model.

The PERL limitations are *PERL* issues, not database issues.

I generally access databases from C++ code. No problems with hundreds or
thousands of threads there. Generally, with Oracle, we have a relatively
small number of connections shared between many different threads (it's
pretty trivial to do this). With Berkeley DB (or any "local" database),
there's no issue with "connections" to the database (certainly not with
Berkeley DB, probably not with others - but this is database specific, of
course).

>> Obviously, the database backend chosen by Slim Devices isn't as well
>> behaved
>> in this context, but then they didn't need it to be given their
>> short-term
>> needs. But there's no question to me that backend databases exist that
>> behave quite well with multi-threaded access.
>
> Have you successfully used DBI in multithreaded perl applications, or are
> you
> referring to other APIs?

I'm referring to languages. C++, typically.

PERL brings up a separate set of issues with multithreading, each of which
are PERL limitations. True "threads" have been around for a long time, and
work quite well on nearly all mainstream platforms.

PERL, on the other hand, hasn't been working with threads for long. I'm not
aware of any "serious" multithreaded PERL applications. I'd hate to be the
first to venture into that territory.

I read your comment of "Databases don't work well with multiple threads" as
a "blanket comment", not a PERL specific comment. That was the disconnect.

-- Jeff