PDA

View Full Version : Backing up tags?



chris.mason
2006-08-18, 06:00
I have all my CDs (about 220) ripped to FLAC. These files are kept on a dedicated hard disk on my (multi purpose) PC. I spent a considerable amount of time ripping them, but it wasn't so much the ripping that took the time, than getting the tagging how I wanted it, and getting my tagging schema right.

I regard my original CDs as backups for the rips, but I'd like to know if anyone knows of a way to back up the tags themselves, and associate them to the original CDs somehow. I'm more concerned about having to retag all my music than I am about having to re-rip the CDs.

Thanks!
Chris.

ceejay
2006-08-18, 06:15
Chris

IMHO the simplest way to do this is to get an external USB drive (250GB should do you, and doesn't cost a lot) and periodically sync your music library to it. That covers you for the rips and the tagging. Leave it disconnected when not backing up for extra security.

Associating a set of tags with a freshly re-ripped CD sounds hard to me...

Ceejay

Mark Lanctot
2006-08-18, 07:34
Associating a set of tags with a freshly re-ripped CD sounds hard to me...

I think EAC does this. When I make changes to a field retrieved from the Internet freedb (mostly capitalization), if I eject and insert the CD, my changes are kept.

I'm not sure how persistent this is though - I've only noticed it in the same EAC session by ejecting and reloading. EAC will let you keep a local CDDB though. I'm not sure what happens when you modify a field - does it get written to the local CDDB? If it did, that would be exactly what Chris is looking for.

chris.mason
2006-08-18, 07:37
I think EAC does this. When I make changes to a field retrieved from the Internet freedb (mostly capitalization), if I eject and insert the CD, my changes are kept.

I'm not sure how persistent this is though - I've only noticed it in the same EAC session by ejecting and reloading. EAC will let you keep a local CDDB though. I'm not sure what happens when you modify a field - does it get written to the local CDDB? If it did, that would be exactly what Chris is looking for.

I think a number of ripping tools locally cache freedb/cddb changes you make, and some give provision for uploading those changes, matching against CD ID. Question is, as you say, how to keep that information?

I mean, if you could keep all this info, matched against some unique CD identifier, then when you re-ripped CDs, it would be easy to properly tag them again.

Mark Lanctot
2006-08-18, 07:44
Question is, as you say, how to keep that information?

*If* the changes were written to the local CDDB, then you'd just have to back up the local CDDB as you would your music files.

The EAC setting indicates a local CDDB is contained in a directory because the default setting is:


Local freedb path: C:\CDDB\

You can download the freedb here: http://www.freedb.org/modules.php?name=Sections&sop=viewarticle&artid=12 although there are some problems doing this in Windows as the page outlines.

funkstar
2006-08-18, 09:35
Tag&Rename will let you export the tags in a variety of ways

chris.mason
2006-08-18, 11:02
Tag&Rename will let you export the tags in a variety of ways

Really? I'd take a look. I'm wondering if musicbrainz might be useful as well...

My problem of course is that I need to back up the tags on FLAC files, which don't have any direct relationship with the CDs they were ripped from. Also, if I did re-rip the CDs, I'd need to find a way of marrying the saved tag data and the CD.

JJZolx
2006-08-18, 15:40
I'm surprised that you'd think the tagging of the files is more time consuming than the ripping.

Using mp3tag and some 'Action Groups' it takes me about 10-20 seconds to fully tag most albums after they've been ripped by EAC. This adds ARTISTSORT, ALBUMSORT, and COMPILATION tags and in some cases renames the file to be more in line with my filenaming convention, which EAC can't always do.

The exception to the above is various artist albums, where it takes a bit more time gettin the ARTISTSORT tags correct. Also, I add DISC and DISCC tags to multi-disc albums, but that only takes a couple of additional seconds of work.

I'd recommend taking the approach of backing up the library as a whole, and not considering the CDs as backups except under extraordinary circumstances. This would take care of backing up both your tag data and the rips and should keep you from having to ever rerip the CD collection.

Robin Bowes
2006-08-18, 17:47
JJZolx wrote:
> I'm surprised that you'd think the tagging of the files is more time
> consuming than the ripping.

Actually writing the tags doesn't take long, but typing them all in
does, especially for classical albums.

>
> Using mp3tag and some 'Action Groups' it takes me about 10-20 seconds
> to fully tag most albums after they've been ripped by EAC. This adds
> ARTISTSORT, ALBUMSORT, and COMPILATION tags and in some cases renames
> the file to be more in line with my filenaming convention, which EAC
> can't always do.

That's only if the CD is in FreeDB and you're not all that fussy about
checking for typos, consistency, etc.

> The exception to the above is various artist albums, where it takes a
> bit more time gettin the ARTISTSORT tags correct. Also, I add DISC and
> DISCC tags to multi-disc albums, but that only takes a couple of
> additional seconds of work.
>
> I'd recommend taking the approach of backing up the library as a whole,
> and not considering the CDs as backups except under extraordinary
> circumstances. This would take care of backing up both your tag data
> and the rips and should keep you from having to ever rerip the CD
> collection.


I would agree with that.

R.

pfarrell
2006-08-18, 18:08
Robin Bowes wrote:
> JJZolx wrote:
>
>>I'm surprised that you'd think the tagging of the files is more time
>>consuming than the ripping.
>
> Actually writing the tags doesn't take long, but typing them all in
> does, especially for classical albums.

I agree. Not only for classical, which I find are not in freeDB at least
70% of the time, but also are flat out wrong about 20% of the time.

>>Using mp3tag and some 'Action Groups' it takes me about 10-20 seconds
>>to fully tag most albums after they've been ripped by EAC. This adds
>>ARTISTSORT, ALBUMSORT, and COMPILATION tags and in some cases renames
>>the file to be more in line with my filenaming convention, which EAC
>>can't always do.
>
>
> That's only if the CD is in FreeDB and you're not all that fussy about
> checking for typos, consistency, etc.

The FreeDB records are terrible for most of the types of music that I
listen to. Jazz, bluegrass, most symphonic and chamber music, etc.

It is OK for classic rock and current pop/alt/whatever.
I rarely care about any of those styles.

The more I care about the quality of tags, the less happy I am
with the data in FreeDB.


--
Pat
http://www.pfarrell.com/music/slimserver/slimsoftware.html

chris.mason
2006-08-20, 04:17
In my opinion, metadata is the most significant aspect of looking after a digital music collection - its all about the data. The higher the quality of your tagging (and quality in this instance is clearly something you define for yourself), the more control you have, and the easier it is to make use of/access your collection.

I have a collection of pop, rock, alternative stuff, metal and classical music. FreeDB and CDDB are just not good enough to provide consistent tagging, imho. They're a good start certainly, but need work, especially when it comes to defining genres, and working on classical music.

When I ripped my classical music I took the decision to rip each piece of music on a CD as seperate albums, so I could find them easily. So, for example, and CD containing Shostakovich's 5th and 10th Symphonies is ripped as two albums. Also, I had to develop my own (very simple) classical music genre tagging schema, making use of the tag seperator you can specify in SlimServer.

So, after all this work, I'd like to not lose it, hence why I say I'm more concerned about backing up my tags than the music itself right now, as the music is already on CD. I'd back everything up if I had the disk space.

tommypeters
2006-08-20, 05:33
SlimServer also creates and uses a "Tag Database" so it doesn't have to read the tags from the individual files until they get changed, but I don't know if that info is enough and I don't know if you can backup and restore it...

chris.mason
2006-08-20, 05:51
SlimServer also creates and uses a "Tag Database" so it doesn't have to read the tags from the individual files until they get changed, but I don't know if that info is enough and I don't know if you can backup and restore it...

That could be just what I need in fact. I just need to keep an export of the tablespace in the database. Then perhaps I could write a script to match up track names or something like that if restoration is necessary.

Mark Lanctot
2006-08-20, 07:30
I'd back everything up if I had the disk space.

You really should look into this. With a USB hard disc enclosure, this only gets tricky if you have a music collection larger than 500 GB, although I think there are 750 GB HDDs now.

When you set up a proper backup program, it only takes a lot of time to first load everything onto the backup. After that, only changes are recorded and it takes minutes.

The disc is separate from your PC, saving it from a power supply failure (which can destroy ALL the hard drives attached to it). You can then relocate the USB disc to another room, another floor or even another building.

Backing up only gets really complicated/expensive when you can't fit everything onto one really large disc. Then you need to look at a PC or NAS with RAID as a backup device.

With a hard drive, the question isn't *if* it will fail, it's *when*. Beyond 3 years, you're on borrowed time.

chris.mason
2006-08-20, 11:44
Yeah, I'm completely with you on the backup approach. I do in fact have all the rest of my data backed up. I have the PC boot every morning, backup everything and do some disk maintenance work. As you say, HDD failure is a matter of if and not when.
It just seems a waste of disk space to me, to back up the rips, when its the tags I want to make up, which take up a miniscule amount of space.

smst
2006-08-21, 01:52
I too back up my metadata -- it definitely took more of my time to type all that text in (and look up the data if the CD liner notes were incomplete), and to locate cover art, than to change disc and hit a button in EAC.

I back up my metadata with a small Python script which walks over my 'Music' folder and creates a mirror structure containing the metadata: every audio file has its metadata extracted with vorbiscomment, metaflac (for surround-sound FLACs), etc and written to a text file in the mirror structure. Every image file (representing cover art) is copied also. All told I end up with a structure which zips up to about 50MB, which I can back up separately.

As part of my ripping process I added a disc ID to each file's tags. I ripped CDs to Ogg Vorbis; in future I might re-rip to FLAC (if I develop the ability to hear the difference!) and at that time will be able to re-use the metadata. (I'll just write another Python script to build a mapping between disc IDs and metadata sets, then choose the correct set from a small list when I rip each disc again.)

chris.mason
2006-08-21, 01:59
I too back up my metadata -- it definitely took more of my time to type all that text in (and look up the data if the CD liner notes were incomplete), and to locate cover art, than to change disc and hit a button in EAC.

I back up my metadata with a small Python script which walks over my 'Music' folder and creates a mirror structure containing the metadata: every audio file has its metadata extracted with vorbiscomment, metaflac (for surround-sound FLACs), etc and written to a text file in the mirror structure. Every image file (representing cover art) is copied also. All told I end up with a structure which zips up to about 50MB, which I can back up separately.

As part of my ripping process I added a disc ID to each file's tags. I ripped CDs to Ogg Vorbis; in future I might re-rip to FLAC (if I develop the ability to hear the difference!) and at that time will be able to re-use the metadata. (I'll just write another Python script to build a mapping between disc IDs and metadata sets, then choose the correct set from a small list when I rip each disc again.)

This sounds like exactly what I need to do: Can you elaborate further on what tools you used to do this?

smst
2006-08-21, 03:25
This sounds like exactly what I need to do: Can you elaborate further on what tools you used to do this?
To dump the metadata I use a short Python script, which I've attached to this post ('dump2.zip'). I've tested it with Python 2.3, and it should work with later versions; I haven't tried earlier releases. If you don't have Python, you can download it here:

http://python.org/download/

Install it, and allow it to associate .PY files with itself so you can run those files easily. Download dump2.zip from this post, and extract the file somewhere.

I'll write these instructions as if you're using Windows; users of other operating systems can hopefully extrapolate from these instructions. Right-click on the extracted file dump2.py in Windows Explorer, and choose "Edit with IDLE", which will open the file in a Python IDE. If you don't have that option, try "Edit" which should open PythonWin. (You could even use Notepad, but be very careful not to change the whitespace at the beginning of each line. PythonWin and IDLE will ensure it doesn't get messed up. Editors like Wordpad, Word, etc are NOT suitable.)

At the top of the file are four variables. You need to edit the bits between the single quotes to tell the utility how your system is set up.

If you have Ogg Vorbis files, you'll need VorbisComment:

http://www.xiph.org/downloads/

Edit VORBIS_COMMENT_EXECUTABLE at the top of the file to point to the EXE (although the default value might be correct). If you don't have that executable, set the value to an empty string (so Ogg Vorbis files will be skipped):
VORBIS_COMMENT_EXECUTABLE = ''...(that's two consecutive single-quote marks).

If you have FLAC files, you'll need MetaFLAC (should be in the main Windows installer):

http://flac.sourceforge.net/download.html

Edit METAFLAC_EXECUTABLE to point to the EXE; set to an empty string if you don't have it.

If you have MP3 files, you'll need Ned Batchelder's id3reader module for Python:

http://www.nedbatchelder.com/code/modules/id3reader.html

Copy that PY file into 'lib/site-packages' inside your Python installation directory (probably 'Program Files/Python24' or similar). There's no configuration to edit for that.

Finally, set ALL_MUSIC_DIR in dump2.py to point at the directory containing all your audio files, and set DUMPED_MD_DIR to point at a directory which should hold the metadata files. Save your modified dump2.py.

Open a Windows Command Prompt (Start -> Run, type 'cmd' and hit ENTER). If dump2.py is not on C: drive, change drives by typing the drive letter and a colon, then ENTER. Use 'cd' to change directories to the place where dump2.py is saved, and then type 'dump2.py' (followed by ENTER) to run the utility.

If it all works, you'll see a ream of text keeping you appraised of what directories are being created and which files are being considered, eg:
C:\>f:

F:\>cd Ripping

F:\Ripping>dump2.py
Making directory: Ripped
Making directory: Ripped\#
Making directory: Ripped\A
Making directory: Ripped\B
Making directory: Ripped\C
...
Dumping MD for directory: Ripped\W\Wolfman - [2004] For Lovers [CD Single]
Track 01 02 jpg
Dumping MD for directory: Ripped\W\World Party - [1997] Beautiful Dream [CD Single] [Disc 1 of 3]
Track 01 02 03 04 jpg
Making directory: Ripped\Z\Zero 7 - [2001] Simple Things
Making directory: Ripped\Z\Zero 7 - [2004] When It Falls
Dumping MD for directory: Ripped\Z\Zero 7 - [2001] Simple Things
Track 01 02 03 04 05 06 07 08 09 10 11 12 jpg
Dumping MD for directory: Ripped\Z\Zero 7 - [2004] When It Falls
Track 01 02 03 04 05 06 07 08 09 10 11 jpg

Then zip up the directory full of data, and back it up somewhere.

smst
2006-08-21, 03:29
This sounds like exactly what I need to do: Can you elaborate further on what tools you used to do this?
Additional: to obtain the Disc ID when I was ripping, I had some Python code which used the 'DiscID' and 'mci' modules to work it out (but they only worked in Python 2.2, so that complicated things a bit). It's part of a larger framework which is too complex to post here right now, but as part of the ripping process I had EAC invoke some code which calculated the Disc ID, as well as the number of tracks and the length of each one. (This last datum allowed my script to alert me if it saw a gap before track one of more than a few seconds, so I knew to look out for a hidden song.)

The Disc ID isn't completely necessary for what I propose to do in future: it's mainly calculated using track lengths, I think, and indeed one could imagine building that inverse ID-to-metadata table I mentioned before using a list of track lengths instead of the ID. That future work has not been started. :-)

chris.mason
2006-08-21, 04:18
Wow, thanks for posting all that - I'll give it a go later when I get home. I'm interested in how you intend to tie tags to CDs. I wonder if using the checksum produced by AccurateRip might be of use? Of course that would mean re-ripping my CDs to get the checksums again..! A hash based on track length might be useful, but would have to be combined with other data to ensure uniqueness surely?

smst
2006-08-21, 04:45
Wow, thanks for posting all that - I'll give it a go later when I get home. I'm interested in how you intend to tie tags to CDs. I wonder if using the checksum produced by AccurateRip might be of use? Of course that would mean re-ripping my CDs to get the checksums again..! A hash based on track length might be useful, but would have to be combined with other data to ensure uniqueness surely?
Uniqueness is overrated. :-) The Disc ID used by CDDB etc isn't unique, but is good enough -- if there's more than one possibility, client software just asks the user which one it should choose.

That's what I'll do on a re-rip (to FLAC or because I've lost the main data for some reason): calculate the Disc ID, and match it up to the Disc IDs I have stored in those text files (actually, I'll create a second folder structure named after the Disc IDs first, to make it easy to look it up). If there's more than one directory which matches, that's fine -- I'll just have to choose the correct one (and I anticipate that I'll be able to make the decision based on album/artist easily). Indeed, even if there's only one choice for a Disc ID, I'll still want to confirm it's correct (that's the kind of anal behaviour that's got me backing up the metadata in the first place).

The AccurateRip checksum: I don't know if that's suitable or not. Is it a checksum of the actual accurately-ripped audio data? If so, matching it on a second rip might not be practical -- it depends on the quality of your hardware.

The track lengths should be fine, really -- the Disc ID is just a distillation of that into a fixed-length value (so 99-track CDs don't lead to a very long ID, say). In fact, with suitable code the track lengths offer an advantage in that you could allow a tolerance of a second or two either side of the correct length. Track lengths can be derived from your ripped data (no need to re-insert the CD!) but there's a disadvantage: if you've split any tracks into separate songs (when there's a hidden track after minutes of silence, I personally split it into a separate numbered file) there'll be the potential for a mismatch. (But there again, the code which looks for matches could just start with track one and keep going until it's happy -- that could involve ignoring the last track if enough prior tracks matched.)

Some ideas to get you going. :-)

verdemar
2006-08-21, 05:34
I have tagged all the music where I've spent much time adding information in tags (custom tags etc) with MusicBrainz, and keep a "backup" of that music in lossy format on my laptop.

If I get trouble with my discs with flac files, I may rerip and retag with Musicbrainz and use the MBIDs to sync my custom tags afterwords. Or at least that was the idea.

egd
2006-08-21, 14:18
Backing up only gets really complicated/expensive when you can't fit everything onto one really large disc. Then you need to look at a PC or NAS with RAID as a backup device.

With a hard drive, the question isn't *if* it will fail, it's *when*. Beyond 3 years, you're on borrowed time.

I'm with you on this - once you have a serious amount of CD's ripped backing up gets expensive and can be complicated.

I'm currently using a NAS with 4x500GB in RAID5 which is around 64% full. The good thing (I guess) is that my entire CD collection is now ripped so there shouldn't be a need to further extend the storage.

Not wanting to rerip and/or tag any CDs again I back up to an LTO2 tape device (1 x full, periodic incrementals). All up, the convenience and pleasure of digital can become quite an expensive pastime. Weighed up against reverting to CD's and losing the ability to mix/match and rediscover artists/albums/tracks through tools like MusicIP I'm comfortable that in my case it is money well spent. Long live the Squeezebox.

egd
2006-08-21, 17:24
Using mp3tag and some 'Action Groups' it takes me about 10-20 seconds to fully tag most albums after they've been ripped by EAC. This adds ARTISTSORT, ALBUMSORT, and COMPILATION tags and in some cases renames the file to be more in line with my filenaming convention, which EAC can't always do.

The exception to the above is various artist albums, where it takes a bit more time gettin the ARTISTSORT tags correct. Also, I add DISC and DISCC tags to multi-disc albums, but that only takes a couple of additional seconds of work.

You're doing more with mp3tag than I thought to do. Would you care to post your mp3tag configuration files?

Pale Blue Ego
2006-08-22, 04:32
Doing ARTIST/ALBUM/TRACK tags is fairly simple. What has cost me gobs of time and effort are the YEAR and GENRE tags. The correct YEAR is sometimes quite difficult to determine, and GENRE tags are very subjective.

I find it's best to just mirror the whole collection on an external drive. Luckily, single-drive capacities have kept ahead of my growing collection. With 750GB and 1TB drives out now or available soon, I think I'll be safe for the foreseeable future.

street_samurai
2006-08-23, 10:33
Uniqueness is overrated. :-)

Agreed. However, I've found that my biggest problem with freedb and most probably with the track length algorithm is that a number of my CDs are mixes or single track CDs. When I go to freedb to look these up I often get 10 or more possible matches. Mostly because these mixes tend to be one track and the exact length of the CD. Not sure how to avoid this issue... just pointing it out. May not be an issue depending on the kind of music you listen to.

ss.

chris.mason
2006-08-24, 01:16
Doing ARTIST/ALBUM/TRACK tags is fairly simple. What has cost me gobs of time and effort are the YEAR and GENRE tags. The correct YEAR is sometimes quite difficult to determine, and GENRE tags are very subjective.

I find it's best to just mirror the whole collection on an external drive. Luckily, single-drive capacities have kept ahead of my growing collection. With 750GB and 1TB drives out now or available soon, I think I'll be safe for the foreseeable future.

Yeah, this has been my issue as well. Sorting Artist/album/track is usually about correcting spelling mistakes. Getting Year and Genre right (as far as I'm concerned) takes time. For my money, if tags aren't self consistent, then they lose their value. Good tagging really brings your collection to life through the ways you can access it.