PDA

View Full Version : Music Library Archiving Solution



Dave Rodger
2004-05-22, 13:34
Hey--

I'm wondering if anyone has recommendations for what they use to back up
their music collection to archival DVDs? My library's around 35GB or so,
all MP3s at this point; it's on a Linux sever, but the DVD burner's on the
Windows box. I'm looking for something that can do both Full and
Incremental (because I add new music only occasionally, and don't want to
burn the whole collection each time if I don't need to.) What are people
using?

Thanks.

-dave

bob villielm
2004-05-22, 15:34
Dave Rodger wrote:
> I'm wondering if anyone has recommendations for what they use to back up
> their music collection to archival DVDs? My library's around 35GB or so,
> all MP3s at this point; it's on a Linux sever, but the DVD burner's on the
> Windows box. I'm looking for something that can do both Full and
> Incremental (because I add new music only occasionally, and don't want to
> burn the whole collection each time if I don't need to.) What are people
> using?

A 160G hard drive in a usb 2.0 enclosure.
Fast, flexible and total cost a hair over $100.
For software, Vico Biscotti's fsync:

http://www.vicobiscotti.it/eng/fsync.htm

-bob

Harald Walker
2004-05-22, 15:46
Dave Rodger wrote:

>I'm wondering if anyone has recommendations for what they use to back up
>their music collection to archival DVDs? My library's around 35GB or so,
>all MP3s at this point; it's on a Linux sever, but the DVD burner's on the
>Windows box. I'm looking for something that can do both Full and
>Incremental (because I add new music only occasionally, and don't want to
>burn the whole collection each time if I don't need to.) What are people
>using?
>
>
I just collect new files in a seperate folder. It takes a lot of time
anyhow to complete the tags and collect cover artwork. Once there is
enough for a full DVD-R I burn it and move the files to the propper
location in the library. Since the Slimserver doesn't really mind, where
the files are, a quick rescan is enough. In the past I've been burning
to CD-R. That pile is quiet large but good enough as a security back up.
Am no migrating those files also to DVD-R.

Regards,

Harald

Lars Kellogg-Stedman
2004-05-22, 18:49
> I'm wondering if anyone has recommendations for what they use to back up
> their music collection to archival DVDs?

I'm using Amanda (http://www.amanda.org/) to back up to a DLT tape
library. Workin' great :).

> I'm looking for something that can do both Full and
> Incremental (because I add new music only occasionally, and don't want to
> burn the whole collection each time if I don't need to.)

A second hard drive and rsync (http://samba.anu.edu.au/rsync/) is
probably going to be the cheapest solution. Rsync means you're only
copying data that has changed, so after the initial sync backups are
going to be relatively quick.

If you really want to burn to DVD, then take a look at the GNU tar
documentation -- tar supports an 'incremental' mode that could probably
be made to do what you want.

You could also try something as simple as:

find . -newer TIMESTAMPFILE -print

To find all of the songs you've added since you last touched
TIMESTAMPFILE, and burn them to DVD. Then touch TIMESTAMPFILE.

-- Lars

2004-05-22, 20:52
Several people have already suggested a second hard drive.

Before I suggest anything - the first question that needs to
be answered is this: What are you trying to protect yourself from?
Are you only concerned about hard drive failures? Are you only
concerned about accidentally deleting or corrupting your data?
Or both?

If you only care about hard drive crashes - then another hard
drive would be a good idea - even better would be something like
a mirrored drive using some kind of RAID. On Linux setting up
a RAID configuration is pretty simple and only requires the
additional hard drive. I don't know about windows.

Of course this will not protect you from accidental deletion or
corruption.

If you also want to protect yourself from data corruption then
some kind of removeable storage and incremental capability is
a good idea.

As others have suggested - maybe some kind of USB or firewire drive
in an external enclosure. Probably atleast two such external drives
would be good. If you have the time I guess you could also use DVD's.
Though backing up to 7+ DVD's is not my idea of fun.

And for the protection against both kinds of disasters you would need
to do both: RAID plus some kind of removeable storage.

Good luck. Tell us what you do.



Dave Rodger writes:
> Hey--
>
> I'm wondering if anyone has recommendations for what they use to back up
> their music collection to archival DVDs? My library's around 35GB or so,
> all MP3s at this point; it's on a Linux sever, but the DVD burner's on the
> Windows box. I'm looking for something that can do both Full and
> Incremental (because I add new music only occasionally, and don't want to
> burn the whole collection each time if I don't need to.) What are people
> using?
>
> Thanks.
>
> -dave
>
>
>

Jeffrey Gordon
2004-05-23, 05:13
Ok here is what I do, however it is probably not the simplest solution.

I wrote a few perl scripts that handle my music files. The first one
takes my music and puts it in "buckets". Buckets are directories number
sequential that will not exceed a certain filesize limit controlled by
my program. I set the limit to 4.7GB, the size of a DVD. This makes
backing up to DVD simple. Since the music does not change only need to
backup any bucket once, except for the "last" bucket that is not full,
as I add to the collection I have to update that last dvd until it is
full. I use a DVD-RW for the "last" bucket. The first backup is a pain
but beyond that very simple.

Now then you are thinking but I can not handle not having my music in a
nice Genre->Artist->Album structure. Well that is where the next script
comes in. It scans the buckets and creates a directory structure of
your choosing based on the tag data and creates sym links back to the
files in the buckets. This is really nice in that I can change the way
I store my music on the fly, or store it in different structures. I
have a "by Album" and a "by Artist" structure. Course this will not
work on Windows.

This is probably a bit much for most people but I like it and it does
the job well. My music is stored in FLAC format and I keep two copies
of my music on DVDs, a set at home and a set at work for "off-site".

Dave Rodger wrote:

>Hey--
>
>I'm wondering if anyone has recommendations for what they use to back up
>their music collection to archival DVDs? My library's around 35GB or so,
>all MP3s at this point; it's on a Linux sever, but the DVD burner's on the
>Windows box. I'm looking for something that can do both Full and
>Incremental (because I add new music only occasionally, and don't want to
>burn the whole collection each time if I don't need to.) What are people
>using?
>
>Thanks.
>
>-dave
>
>
>

Stuffed Crust
2004-05-23, 05:18
On Sat, May 22, 2004 at 04:34:11PM -0400, Dave Rodger wrote:
> I'm wondering if anyone has recommendations for what they use to back up
> their music collection to archival DVDs? My library's around 35GB or so,
> all MP3s at this point; it's on a Linux sever, but the DVD burner's on the
> Windows box. I'm looking for something that can do both Full and
> Incremental (because I add new music only occasionally, and don't want to
> burn the whole collection each time if I don't need to.) What are people
> using?

I use a set of customized scripts tied into a database that:

1) keeps track of all files and their MD5sums
2) Keep track of when files were added
3) Keep track of what DVD they were backed up on

The checksums help me find duplicates and ensure that when I move files
around, I don't back up the same file twice. The dates let me do things
like "what have I added since X?". Once I have about four and a half
gigs of new stuff, I run another script which generates a ms5sum
checksum file and a file index for the DVD; pass those into mkisofs to
generate the iso file, and then proceed to burn that disk.

The md5sum files also let me easily verify the integrity of the files on
the DVD (it's saved my bacon a couple of times) and I use it to mark
in the database which files were verified correctly.

Periodically I do a full verification and find more errors, and when I
do, I re-backup those files as needed.

- Pizza [105 DVDs and counting]
--
Solomon Peachy ICQ: 1318344
Melbourne, FL JID: pitha (AT) myjabber (DOT) net
Quidquid latine dictum sit, altum viditur

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQFAsJaBPuLgii2759ARAjRLAKCLL/UPm7KsSMtH1XLF6LQSJhn8owCfdEoZ
k+TbDflq5K2RLH7/LLKaO9o=
=t43J
-----END PGP SIGNATURE-----

Stuffed Crust
2004-05-23, 07:56
On Sun, May 23, 2004 at 05:52:04AM -0700, Doug Wise wrote:
> Hard drives are very inexpensive these days. For $150 I have found
> 7200rpm 250gb drives that make great off-sit archives.

And $150 will buy you on the order of ~800 gigs worth of DVD-Rs. :)
It's not like the MP3s will change once encoded/tagged.

Your backup strategy depends on what you're protecting against; In my
case, individual drive failures are handled by a RAID array. The DVDs
are to safeguard against the filesystem getting corrupted, the array
failing altogether, the building burning down, etc etc.

- Pizza
--
Solomon Peachy ICQ: 1318344
Melbourne, FL JID: pitha (AT) myjabber (DOT) net
Quidquid latine dictum sit, altum viditur

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQFAsLuRPuLgii2759ARAqyvAKC/vxKLlLlSkzsZDyKOIHDQ2Uu46ACgracj
BINaSbagYBjhLsv3J6W3Qc8=
=Cmoy
-----END PGP SIGNATURE-----

Steve Baumgarten
2004-05-23, 17:50
> A second hard drive and rsync (http://samba.anu.edu.au/rsync/) is
> probably going to be the cheapest solution. Rsync means you're only
> copying data that has changed, so after the initial sync backups are
> going to be relatively quick.

Windows users can also have a look at robocopy.exe, a part of the Windows
2003 Server Resource Kit (which runs just fine on XP Home):

http://www.microsoft.com/downloads/details.aspx?FamilyID=9D467A69-57FF-4AE7-96EE-B18C4790CFFD&displaylang=en

It has a few options, the simplest of which is the "mirror" option:

robocopy source_dir dest_dir /mir

I use this myself (via a perl script) to mirror "My Documents" on my C:
drive to my F: drive each morning, rotating through directories named for
days of the week. (So basically I have a rolling 7-day backup of the
contents of "My Documents", handy if you find you deleted or changed
something and want to grab a copy from, say, last Tuesday.)

As with rsync, it will only copy new/changed files, and also like rsync
(but unlike normal Windows file copies), it does so "robustly", so you can
trust the results of a file copy. (Microsoft claims this is what the
"robo" part of "robocopy" stands for -- robust -- though personally I'd be
happier if Windows operated robustly by default.)

SBB

Pat Farrell
2004-05-23, 20:53
Hi Stuffed One...

At 08:18 AM 5/23/2004, Stuffed Crust wrote:
>I use a set of customized scripts tied into a database that:
>1) keeps track of all files and their MD5sums
>The checksums help me find duplicates and ensure that when I move files
>around, I don't back up the same file twice.

I've been thinking about the same concept, maybe using SHA1
instead of M5, but that doesn't change much.

Question for you:

Do you just hash the whole file contents? (which is easy and fast) or
do you explicitly skip or explicitly include any ID3 or Ogg tags?

I can see including the tags, and I can see that it is easy to run
taggers that make small changes.
(changing the artist from "The Police" to "Sting and The Police"
is technically a change, but practically is not).

Clearly files that are really different, compressed at different
rates, different formats, etc. are different (and the hash will
show that) but I'm not sure what the "correct" or "best" answer
is on whether the tags are part of the file.

Clearly as we move to external databases, and integrating
external data sources (i.e. Mood Logic or URLs to artwork)
we want to keep the "essential file" the same as much as possible.

Backing up the database is a separate issue.

Thanks
Pat

Steve Baumgarten
2004-05-24, 06:35
> Does robocopy have a way of deleting orphans from the source directory?

Yes, in fact that's what the "/mir" option does. When the command
completes the destination directory will be an exact copy of the source
directory.

Of course you can also tell it to simply copy new/changed files and leave
deleted files in the destination directory -- there are lots of options,
it's very much like rsync, though of course Microsoft had to roll their
own solution, as always. (Though to be fair it also handles
Windows-specific security/permissioning, not an issue with XP Home but
certainly one with W2K and XP Pro.)

SBB

Stuffed Crust
2004-05-24, 12:15
On Sun, May 23, 2004 at 11:53:21PM -0400, Pat Farrell wrote:
> I've been thinking about the same concept, maybe using SHA1
> instead of M5, but that doesn't change much.

For practical purposes, just an extra 32 bits worth of checksum.

> Do you just hash the whole file contents? (which is easy and fast) or
> do you explicitly skip or explicitly include any ID3 or Ogg tags?

It's just simpler to hash the whole file contents; I rarely ever change
files or re-tag anything, and frankly the ID3 tag parsing/stripping
means potentially reading the whole file anyway, so you're really not
saving anything.

> I can see including the tags, and I can see that it is easy to run
> taggers that make small changes.
> (changing the artist from "The Police" to "Sting and The Police"
> is technically a change, but practically is not).

The question here is whether or not you want that to be considered a
"change" for backup purposes; I know I would.

> Clearly as we move to external databases, and integrating
> external data sources (i.e. Mood Logic or URLs to artwork)
> we want to keep the "essential file" the same as much as possible.

My personal feeling is that all "static" metadata should be part of the
file itself -- things like the artist/title/etc. Anything which changes
normally, such as clip rankings/statistics, should be stored externally,
because frankly mp3 files make a lousy database.

That said, I do cache some of the ID3 tag info in my database, but it's
there just to make searching a bit easier.

> Backing up the database is a separate issue.

But pretty damn easy. :)

- Pizza
--
Solomon Peachy ICQ: 1318344
Melbourne, FL JID: pitha (AT) myjabber (DOT) net
Quidquid latine dictum sit, altum viditur

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQFAsknqPuLgii2759ARAsStAJ0Q9SxjXC299dNo+5llW2 V8wjnPjACeIavo
gS9W9elXAN3zo3QZKNoLvMk=
=lXlw
-----END PGP SIGNATURE-----

Pat Farrell
2004-06-02, 11:14
At 07:07 AM 5/24/2004, Dolf Dijkstra wrote:
>>Isn't musicbrainz (http://www.musicbrainz.org/) trying to solve this
issue by creating a >>accoustic fingerprint?

From what I can tell, looking at musicbrainz and at
http://www.relatable.com/ which is the company with the patented
acoustic finger print behind it, maybe. I was not able to find
any technical information on either website, altho there are lots
of words about it being a patented process. The source
code has Copyrights and GLP licensing terms.
It looks like the critical fingerprint generation logic is
distributed as binary only, with a header file for c/c++ programmers
to link to.

They admit to reading the first 30 seconds of the file's audio.
Which raises the obvious question: what if it is different in the 31st second?

There are comments about dithering and normalization of the signal,
altho they say that they don't do it. Not possible to tell without looking
at the code.

It is also not clear that they can actually deliver what they want. Many
simple transformations, such as gain normalization, noise removal
or compression, would change the numerical values all over the place
without being very audible to a listener.

To me, whether the MusicBrainz scheme works or fails is not interesting.
It is not open source, it uses patented technology. It breaks the fundamental
spirit that attracted me to Slim Devices and made me buy a SqueezeBox.

It also is not clear that it is better (or worse) than a MD5 or SHA1 of
the music frames.

Pat

Roy M. Silvernail
2004-06-02, 13:09
Pat Farrell wrote:
> At 07:07 AM 5/24/2004, Dolf Dijkstra wrote:
> >>Isn't musicbrainz (http://www.musicbrainz.org/) trying to solve this
> issue by creating a >>accoustic fingerprint?

> They admit to reading the first 30 seconds of the file's audio.
> Which raises the obvious question: what if it is different in the 31st
> second?

Doesn't matter much. I tried out the Windows tagger whilst bored at work
and wanting to clean up tags on a couple thousand tracks. I didn't keep
detailed records, but MusicBrainz misidentified at least 25% of the
tracks, and most of the misses were pretty wide. Maybe 1 in 5 misses
had *some* version of the track in the possibilities list. The rest
were way wrong. In some cases (only a few), MB id'd the wrong version
of a track when the two possibilities were clearly different
performances (live vs. studio or different live cuts). Worst offenders
were tracks grabbed from net streams by streamripper, but you'd expect
that. Apparently, filename and existing metadata are not used in the
process.

> To me, whether the MusicBrainz scheme works or fails is not interesting.
> It is not open source, it uses patented technology. It breaks the
> fundamental
> spirit that attracted me to Slim Devices and made me buy a SqueezeBox.

Well, it may be closed-source, patented and inaccurate, but at least
it's glacially slow. I was lucky to see bursts of 10-12 id's a minute.
Mostly it was hovering around 5-6 for an average 10 seconds per id.
Every id involves contacting the mothership, too, so you can forget
offline operation.

> It also is not clear that it is better (or worse) than a MD5 or SHA1 of
> the music frames.

Worse, no question. For auditing the integrity of a backup (which is
where this thread started), you don't care what the track is, only that
the copies are bit-identical. Heck, for that purpose a CRC-32 would
probably work, though file verification is I/O bound, so the CPU
overhead of the hash function is most likely down in the noise.
--
Roy M. Silvernail is roy (AT) rant-central (DOT) com, and you're not
Never Forget: It's Only 1's and 0's!
SpamAssassin->procmail->/dev/null->bliss
http://www.rant-central.com

Pat Farrell
2004-06-02, 13:47
At 04:09 PM 6/2/2004, Roy M. Silvernail wrote:
>Pat Farrell wrote:
>> >>Isn't musicbrainz (http://www.musicbrainz.org/) trying to solve this
>> issue by creating a
>>They admit to reading the first 30 seconds of the file's audio.
>>Which raises the obvious question: what if it is different in the 31st
>>second?
>
> In some cases (only a few), MB id'd the wrong version of a track when
> the two possibilities were clearly different performances (live vs.
> studio or different live cuts). Worst offenders were tracks grabbed from
> net streams by streamripper, but you'd expect that. Apparently, filename
> and existing metadata are not used in the process.

It is clear from their website that the engine is seen as a DRM,
and so filename and metadata can't be used (or at least relied on)
because the evil music stealers (tm) will mis-name and mis-tag files.

So it slowly mis-identifies the songs it thinks people are stealing. Great.


>Worse, no question. For auditing the integrity of a backup (which is
>where this thread started), you don't care what the track is, only that
>the copies are bit-identical. Heck, for that purpose a CRC-32 would
>probably work, though file verification is I/O bound, so the CPU overhead
>of the hash function is most likely down in the noise.

I was looking for more than auditing the backup when I spun off "keys to ..."
thread. I want to use some handle as the primary key into the whole database
of all knowledge about the tune, from genre to artist, studio, musicians
on hand, mood, feel, happyness, etc.

On the hash itself, Yes, exactly, the file IO so overwhelms the total time that
you could do nearly anything. Which is why I always say to use SHA1
over simplier, faster, and adequate hashes. It is the way coolest,
government approved, cryptographically strong, buzzword compatible
one, and you won't know that you are using it. Plus, if you want fewer
bits, just wack off the front or back end.

BTW, there is a file copy with SHA1 verification in the source code
to my SqueezeBox utilities.....

Pat