PDA

View Full Version : Scanning huge disks question



ob_kook
2005-08-21, 23:38
I am a new user of SlimServer and am currently using it in conjunction with WinAmp (although this will soon change as I am getting an SB2 soon).

My music collection is on a storage server that presents a "virtual disk" over my network via iSCSI. My actual collection takes up about 120GB of space with about 22,000 songs, but the virtual disk is read by the application as a 2TB volume. This is normal as I am using a feature on the storage server which allocates physical storage as it is needed - much the same way that virtual memory works.

The SlimServer is running on the storage server as well with the 2TB "MusicServer" volume being served to it as a read only volume.

My question is whether SlimServer will try to scan the entire disk, or only the data? I tried to complete a scan yesterday and it ultimately failed after "scanning" for more than 12 hours.

I will try again this evening, but was curious if anyone could tell me how SlimServer's scanning methodology behaves.

Thanks!

radish
2005-08-22, 06:22
You give it a root path and it does a tree scan from there (not sure if it's depth or breadth first). So if your music directory is set to the root it will scan the whole disc, otherwise only the portion of it you specify.

ob_kook
2005-08-22, 07:13
Thanks. My directory is set to the root since I have many genre folders, and this root is only associated with music. So I guess the other question is how long it would take to scan 2 TB! :)

I guess I can workaround by creating a folder at the root and placing all my genre folders within this one, right?

radish
2005-08-22, 08:53
Well it won't scan 2TB if you don't really have 2TB of files will it? Available capacity is completely irrelevant, it's actual number of files that matters. If you have it scanning through 2TB of actual files then sure it will be slow, but you said you actually only have 120GB, which is not that big. Takes about 10mins to scan somewhat more than that on my system.

Is the root it's scanning from dedicated to music or is it the root of the entire 2TB filesystem (with all your other non-music stuff)?

ob_kook
2005-08-22, 16:06
Well it won't scan 2TB if you don't really have 2TB of files will it?

Takes about 10mins to scan somewhat more than that on my system.

Is the root it's scanning from dedicated to music or is it the root of the entire 2TB filesystem (with all your other non-music stuff)?

The above was my original question. And considering that you can scan more than I have in about 10 minutes, I have to assume that my original scan encountered problems other than trying to scann the entire disk (including the null data) which caused it to hang. I'll try scanning it again and see what happens later tonight.

In answer to your last question, there is no other data on the MusicServer volume. It is recognized by the O/S as a very large disk, has been assigned its own drive letter, but the back end storage application assigns physical chunks from an available pool only as they are required. Hence I have 3 X 2TB logical volumes (movies, music, and pictures respectively), but only 370GB physical.

JJZolx
2005-08-22, 16:55
I am a new user of SlimServer and am currently using it in conjunction with WinAmp (although this will soon change as I am getting an SB2 soon).

My music collection is on a storage server that presents a "virtual disk" over my network via iSCSI. My actual collection takes up about 120GB of space with about 22,000 songs, but the virtual disk is read by the application as a 2TB volume. This is normal as I am using a feature on the storage server which allocates physical storage as it is needed - much the same way that virtual memory works.

The SlimServer is running on the storage server as well with the 2TB "MusicServer" volume being served to it as a read only volume.

My question is whether SlimServer will try to scan the entire disk, or only the data? I tried to complete a scan yesterday and it ultimately failed after "scanning" for more than 12 hours.

I will try again this evening, but was curious if anyone could tell me how SlimServer's scanning methodology behaves.
SlimServer will scan 22,000 tracks. Doesn't matter if those tracks take up 50 megabytes or 50 terrabytes, and it probably wouldn't make much difference in the time it takes to scan since SlimServer is only looking at metadata - tags, file name, file path, etc. It certainly won't be "scanning" the unused space in the disk volume.

BTW, which iSCSI storage server are you using?

MrC
2005-08-22, 17:26
ob_kook, it seems you might have a little misunderstanding about the difference between disks, partitions, and filesystems, and how applications see these things. If I'm being presumptious, my apologies in advance. I've discovered that many people have the same misunderstanding.

A simplistic view is that for each disk, there might be one or more partitions, and for each partition, there is a filesystem. A filesystem is essentially just a database that keeps track of its contents, which are files and more directories. Each directory can have more files and directories and so on. There's more to it than that, but that's the basic idea.

All but some key utilities see only whats inside the filesystem, which are files and directories, as these are returned by the operating system when applications ask.

Slimserver, being just a normal application, asks the operating system for a list of files and directories from the root of the music folder. It doesn't care about partitions, or how many disks there are in your system, or what size each disk is (it could find out, if it cared or needed to) - it just cares about files and directories.

Slimserver looks in the root music folder, gets a list of files, and the list of directories. It examines some key data about each file it cares about (ie. music files), and stores that information in its own database. It also descends into each found directory and repeats the process.

As should be obvious now, it doesn't need to ask how large the disk is, or how large each partition is - it only cares about what most applications would care about, and that is files and directories within the filesystem (aka. volume, C drive, /usr/local, etc.).

Consider that you can create a very small partition on a very large disk, and create a very small filesystem within that small parition. Only the list of files and directories is what's important to most applications.

Hope this helps a bit.

ob_kook
2005-08-22, 20:17
Thanks for the primer MrC! I think I do have a fundamental understanding, but this was helpful. The reason I started heading down that path was from the original fact that SlimServer kept reporting that it was scanning even after 10 hours. I also know that backup software doing a block level backup writes all blocks on a disk regardless if they contain null data or not That got me wondering if SlimServer is possibly doing the same thing. But obviously SlimServer is an app at the filesystem level - silly rat-hole to have gone down, really.

Even though my collection is not that big, I doubt that very many users are presenting 2TB of disk, but then again if the O/S can handle it, there should be no issue with the app...

My assumption now is that this was an anomoly and I will do a re-scan and see what happens.

JJZolx - I am using SANmelody by DataCore

JJZolx
2005-08-22, 20:26
I am using SANmelody by DataCore
So this is a software product that runs under Windows? What kind of hardware do you have it running on?

http://datacore.com

ob_kook
2005-08-22, 20:39
Yes, it is a software program which emulates a LUN and serves the "disk" across a network (iSCSI or Fibre Channel).

I run it on a 1.4GHz AMD / 512MB RAM XP Pro homegrown whitebox with 4 internal SATA HDD. The software uses the system memory as advanced storage cache and therefore is significantly faster than a NAS box. It also works at the block level as opposed to the file level.

I serve my music disk to a main machine where I do all my ripping, encoding, and tag maintenance, in R/W mode, and serve the same disk to any computer or device (including the SlimServer) in read only mode.

Actually, I work for the manufacturer and don't want to get into an advertising thing on the forum, but would be happy to answer any questions you have privately.

seanadams
2005-08-22, 20:46
Actually, I work for the manufacturer and don't want to get into an advertising thing on the forum, but would be happy to answer any questions you have privately.
Go ahead and tell us! I love the idea of virtualized storage. I am surprised that reasonable solutions to this haven't reached the mass market yet. Would love to just hang another cheap-o buffalo NAS onto the network when I need more space (on a single volume). Bonus points if the software layer automatically manages it all in some raid-5-like fashion.

ob_kook
2005-08-22, 21:22
Well if the CEO sez OK...!

A big high-end storage array is just a server anyway when you think about it: CPU's, channels, cache (for performance), and software (or firmware) - oh, and also a bunch of disk drives (made by Seagate, Maxtor, Samsung etc...). DataCore took that idea and decided to leverage commodity kit for the underlying hardware, and focus on the software.

What you get is a portable software storage controller. The O/S of the storage server discovers the disk assets that are physically attached, and SANmelody uses a GUI to create the virtual disks. The same GUI is used to map these LUNs to any computer on your network. These application computers see it simply as a disk that they can mount, initialize, format and start banging away with I/O.

There are also features that are typically associated with very high end storage arrays: snapshot copies, synchronous mirroring for high availability, Asynchronous IP mirroring for disaster recovery, and auto provisioning.

This last one is interesting and was part of my original question. We create a pool of unallocated disk chunks. SANmelody can then create an unlimited number of virtual disks up to 2TB in size. The net result is that I can have, say, 10 2TB volumes allocated to applications (20TB) even though my physical storage is only a few hundred gigs. As the pool is allocated, a warning is issued at a user definable level (default is 80% allocated) and the administrator can add more storage to the pool. The application is unaware of this and so there is no disruption or down time when it comes time to add storage.

To get an idea, you can download a free 30 evaluation copy from the DataCore website. (follow the link and hit "try": http://datacore.com/products/prod_SANmelody.asp)

Of course, this is an SMB product and still may be priced too high for home use, but there is an entry level "Lite" version that allows for up to 4 HDD to be "virtualized" and that product lists at $199, which is doable for most I think.

RAID can be done at the disk level. i.e. if your adapter will do RAID on the disks and create a LUN, SANmelody will then take that LUN and virtualize it.

JJZolx
2005-08-22, 21:24
I run it on a 1.4GHz AMD / 512MB RAM XP Pro homegrown whitebox with 4 internal SATA HDD. The software uses the system memory as advanced storage cache and therefore is significantly faster than a NAS box. It also works at the block level as opposed to the file level.

I serve my music disk to a main machine where I do all my ripping, encoding, and tag maintenance, in R/W mode, and serve the same disk to any computer or device (including the SlimServer) in read only mode.
Since Sean has given it his blessing, I'll ask a couple more questions.

Does the software also offer redundancy via RAID, or does it do JBOD (spanning) or something like RAID0 (striping) in order to combine the disks into single a large volume? What happens in the event of a single disk failure - do you lose only some files, or do you lose the whole volume?

JJZolx
2005-08-22, 21:53
RAID can be done at the disk level. i.e. if your adapter will do RAID on the disks and create a LUN, SANmelody will then take that LUN and virtualize it.
Looks like we posted at the nearly the same time.

It sounds to me that if you can add disks to the storage pool then it would likely be using JBOD, which generally means that the loss of a single drive can take out the whole array.

Would SANmelody not work if you use software RAID at the OS level?

ob_kook
2005-08-22, 21:55
Does the software also offer redundancy via RAID, or does it do JBOD (spanning) or something like RAID0 (striping) in order to combine the disks into single a large volume? What happens in the event of a single disk failure - do you lose only some files, or do you lose the whole volume?

Redundancy can be achieved via several ways.

1. HA mirroring - in this case you have 2 storage servers that mirror each write as it comes down from the app. You can configure a second NIC on the app server with a failover driver and in this event can always access your data in the event of a disk, path, or storage server failure.

2. Host based mirror - create 2 disks and let the app server do the mirroring.

3. Storage RAID - the storage HBA or controller does RAID5, 0, 1 etc.

If you are asking specifically in terms of the auto provisioning feature, there are a couple of things you can do here too.

1. Create 2 seperate pools, and do a copy or snapshot from a disk from pool A to a disk from pool B.

2. Create RAID1+0 before adding them to the pool. This should ensure that if a disk is lost, the whole pool will not be.

Personally, I created 2 seperate "pools" (albeit one of the pools contains a single disk) and do a snapshot from one disk to the other. In this way, I also have the capability to roll back to the snapshot if I want.

Did this answer your question?

ob_kook
2005-08-22, 22:03
Looks like we posted at the nearly the same time.

Would SANmelody not work if you use software RAID at the OS level?

Looks like we did it again! The answer to the question above is that you can use a software RAID with SANmelody, but in that case cannot use the auto provisioning feature as it requires the use of raw chunks.

MrC
2005-08-22, 22:13
Very cool device.

Sorry ob_kook for the first-grade level primer - after reading your subsequent posts, I'm truely embarrased.

You said the device does block-level vs. file level access? I'm curious about this - can you explain more? Somewhere there's a file-to-block translation.

ob_kook
2005-08-22, 22:43
MrC - don't worry about it! I know a bit about storage due to my job, but am in no way an engineer and am always challenged by F/S issues and networking. As I said, your post was helpful.

What I meant by block level is that although SANmelody runs on Windows, it does not put any kind of signature on the disk. It is served as raw blocks. Therefore, it can create and serve disks to any operating system (well, any that has either a Fibre Channel or an iSCSI driver written for it) including Windows, Linux, various flavors of Unix, Novell, Mac, and even VMware. Each of those can use virtual disks as if you actually plugged a physical drive into it.

seanadams
2005-08-22, 22:56
MrC - don't worry about it! I know a bit about storage due to my job, but am in no way an engineer and am always challenged by F/S issues and networking. As I said, your post was helpful.

What I meant by block level is that although SANmelody runs on Windows, it does not put any kind of signature on the disk. It is served as raw blocks. Therefore, it can create and serve disks to any operating system (well, any that has either a Fibre Channel or an iSCSI driver written for it) including Windows, Linux, various flavors of Unix, Novell, Mac, and even VMware. Each of those can use virtual disks as if you actually plugged a physical drive into it.

Would it be fair to describe it as a "lower cost, DIY, potentially higher performance" alternative to a standalone iSCSI system (such as Promise)?

I haven't used iSCSI yet myself but it seems like a sound concept... RAID subsystems work great but are too expensive on a per-GB basis for home servers.

ob_kook
2005-08-22, 23:42
Would it be fair to describe it as a "lower cost, DIY, potentially higher performance" alternative to a standalone iSCSI system (such as Promise)?

I haven't used iSCSI yet myself but it seems like a sound concept... RAID subsystems work great but are too expensive on a per-GB basis for home servers.

At the lowest entry point, it would be fair to say what you mentioned above, with the addition that it is much easier to create and allocate disks on the fly and with no disruption using SANmelody as opposed to a standalone solution.

You can essentially build an iSCSI storage array using a SATA adapter (such as from Adaptec, LSI, Promise etc.) a PC, and a $200 piece of software. If you use RAID at the board level, you can scale yhis up to 2TB with the Lite version (64TB with the full blown version), and allocate storage to any computer on your network.

For SMB's and Enterprise, the fact that they can keep their storage management practice in place while upgrading underlying hardware, and also use different manufacturers storage arrays is a huge benefit. (Traditionally, when you upgrade a storage array you also "throw away" the software functionality which is bound within the hardware as opposed to ugrading the server as you would with most applcations.)

Regarding iSCSI, we have found that the performance approximates that of FC at the 1GB level (without using TOE's).