PDA

View Full Version : Re: Strawman SQL database integration thoughts



John A. Tamplin
2004-08-12, 11:41
On Thu, 12 Aug 2004, Pat Farrell wrote:

> I disagree. The performance is no slower than reading the file
> and is only done occasionally. Library maintenance is a very low
> frequency occurrence. The hash function is so fast that the
> IO time (especially in Perl or Java) overwhelms it.
>
> Try it, you'll see it isn't a major issue.

Calculating an MD5 on a song is not a cheap operation, and whether it is
I/O bound or not (which I suspect it is instead CPU-bound) that is still
load that isn't necessary. Even if you did use a hash for the song, you
would still want to store it so you didn't have to recompute it every time
you need to figure out the id. In that case, you can just store the hash
(or even better Relatable's TRM ID) as a field in the database which you
can check if you want to find duplicates, but don't have to compute each
time.

> The problem with an auto increment key for the song id [ which I use
> in several other places in the strawman schema], is that you want
> identical songs to have the same songID. We can argue about
> what is "identical" and I threw out three possible definitions.

To me, that is an issue for the import routine (ie, when you add new
songs) -- it can detect a match with another song based on whatever
criteria you have and ask you what to do with it.

> Seems to me that if the Greatest Hits album contains the same (exactly) song
> as the main album, then it is only one song, and one songID.
> If it is another take of the song, another mix, master, etc. then the
> bits will be different and the hash will be different.

The flip side is that perhaps two copies of the same original CDDA bits
are very slightly different (say a later version of the encoder) -- would
you really want them to be two different songs? I think using the hash of
the data is not the appropriate function for an index function. If you
do want that functionality, then store it as a field in the song table and
use serial field for the unique key.

--
John A. Tamplin jat (AT) jaet (DOT) org
770/436-5387 HOME 4116 Manson Ave
Smyrna, GA 30082-3723