Announcement

Collapse
No announcement yet.

SimpleLibraryViews plugin update increases library build time by a factor of 10

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • adhawkins
    replied
    Originally posted by mherger View Post
    SQLiteStudio has built-in support for regexp (https://sqlitestudio.pl).
    And my regex almost did work. I'm going to edit it (needs a '$' matcher
    at the end). SQLiteStudio didn't need to escape the slash ("[^\/]"). I'm not sure
    about DBI. Give it a try, and change to just "[^/]" if it doesn't work.
    To be honest, I think I'll take your advice and stick with code I understand. Regular expressions have always been something I've been a bit uncomfortable with, so keeping the code simple is a good idea I think. My new version does seem relatively performant which is good.

    I've also taken your advice and rewritten the plugin so it does a single search of the filesystem per scan. That code is now pushed to GitHub in the 'recurse' branch if you wanted to take a look at what I believe is the final version.

    Just rewriting your 'build' script from several years ago, as in its current state it didn't seem to handle versions with letters in them.

    Once again, thanks so much for all your assistance in getting this far. Perl is not a language I use regularly, so there's been a lot of re-learning required in this work!

    Andy

    Leave a comment:


  • adhawkins
    replied
    Originally posted by mherger View Post

    > my $pathSearch = Slim::Utils::Misc::fileURLFromPath($dir) . File::Spec->catfile('', '') . "%";


    I still believe this could fail on Windows. Would you have any Windows
    testers? catfile() should not be used with a URL. Just use the forward
    slash "/" instead.
    Yes, I've replaced that. I deliberately used catfile because I wanted to make sure I used the correct separator under Windows. You are of course correct though, the URI format that's actually used in the database should only use the '/' as a separator, so I've changed that.

    Cheers

    Andy

    Leave a comment:


  • adhawkins
    replied
    Originally posted by mherger View Post
    Yeah, that's trickier. Hmm... You'd probably have to use a different SQL query using REGEXP instead of LIKE for non-recursive. Then create a regex which would match the path, but no more slash after that.

    Code:
    if ($nonRecursive) {
       $pathSearch = '^\Q' . Slim::Utils::Misc::fileURLFromPath($dir) . '/\E[^\/]*';
    }
    Again: I haven't tested this. I'm not sure it would work. Please give it a try. I'd have to check what regexes SQL supports.
    Looks like the DBD::SQLite module registers a 'regexp' function that just uses Perl regexps. So as long as that is a valid Perl RE it should be Ok.

    Was going to try it in the command line client, but looks like that might not work. A bit of googling suggested there was an extension to sqlite that handled Perl REs, so will have a look at that.

    Thanks as ever.

    Andy

    Leave a comment:


  • mherger
    replied
    SimpleLibraryViews plugin update increases librarybuild time by a factor of 10

    > Perhaps not the most efficient way of doing it, but it basically mirrors
    > what the original code does. It seems to work correctly.


    Whatever works! Sometimes optimizations aren't worth the hassle. If
    you're happy with the result then it's better to keep the code you
    understand and know how to maintain.

    > my $pathSearch = Slim::Utils::Misc::fileURLFromPath($dir) . File::Spec->catfile('', '') . "%";


    I still believe this could fail on Windows. Would you have any Windows
    testers? catfile() should not be used with a URL. Just use the forward
    slash "/" instead.

    Leave a comment:


  • adhawkins
    replied
    Thanks for the info, will go back and take a look at applying it to the code.

    Regarding the 'non-recursive', I've managed to implement this and it seems to be fairly efficient. What I do is:

    Iterate over all the directories I found the file in.
    Run a query on the database to select all the files that match this directory (which will also include the ones below it)
    For each result, get the directory its in.
    If this directory is the same as the directory I searched for, add it in.

    Perhaps not the most efficient way of doing it, but it basically mirrors what the original code does. It seems to work correctly.

    The code is below:

    Code:
    	my $sth_recursive_insert;
    	my $sth_select;
    	my $sth_insert;
    
    	if ($recursive) {
    		$sth_recursive_insert = $dbh->prepare('
    			INSERT OR IGNORE INTO library_track (library, track)
    			SELECT ?, tracks.id
    			FROM tracks
    			WHERE url like ?
    			AND content_type NOT IN ("cpl", "src", "ssp", "dir")
    		');
    	} else {
    		$sth_select = $dbh->prepare('
    			SELECT id, url FROM tracks
    				WHERE url like ? AND content_type NOT IN ("cpl", "src", "ssp", "dir")
    				ORDER BY url
    		');
    
    		$sth_insert = $dbh->prepare('
    			INSERT OR IGNORE INTO library_track (library, track) values (?, ?)
    		');
    	}
    
    	foreach my $dir ( @includeDirs ) {
    		my $pathSearch = Slim::Utils::Misc::fileURLFromPath($dir) . File::Spec->catfile('', '') . "%";
    		main::DEBUGLOG && $log->is_debug && $log->debug("$libName: Including '$dir', pathSearch: '$pathSearch'");
    
    		if ($recursive) {
    			main::DEBUGLOG && $log->is_debug && $log->debug("Doing recursive");
    
    			$sth_recursive_insert->execute($id, $pathSearch);
    		} else {
    			$sth_select->execute($pathSearch);
    
    			while ( my ($trackid, $url) = $sth_select->fetchrow_array ) {
    				my $trackDir = dirname(Slim::Utils::Misc::pathFromFileURL($url));
    
    				main::DEBUGLOG && $log->is_debug && $log->debug("dir for '$url' is '$trackDir'");
    
    				if ( $trackDir eq $dir ) {
    					main::DEBUGLOG && $log->is_debug && $log->debug("Inserting");
    
    					$sth_insert->execute($id, $trackid);
    				}
    			}
    		}
    	}
    How does that look?

    Andy

    Leave a comment:


  • mherger
    replied
    Originally posted by adhawkins View Post
    One question, to go through all the libraries I'm iterating over the result of 'Slim::Utils::Misc::getAudioDirs()'. Is that the right thing to do?
    I'm not sure I understand. Yes, you'll have to can all of getAudioDirs(). But no, you don't have to do this for all libraries. You could do it once, and collect a list of all library name files you find. Keep this around in a (module) global variable:

    Code:
    {
       library1 => [ dir1, dir2, .. ],
       library2 => [ dir2, dirx, .. ],
       ...
    }
    So for the next library you'd only have to evaluate the folders in respective hash entry instead of crawling the filesystem once again.

    Originally posted by adhawkins View Post
    Any suggestions on how to do this filtering? Perl is far from my first
    language!
    Sort the $dirs before processing them. Keep track of the previous value. If your current folder is a sub-folder of the previous, skip its processing

    Code:
    	my $previousDir;
    	foreach my $dir ( sort { $a cmp $b } @includeDirs ) {
    		next if $previousDir && $dir =~ /^\Q$previousDir\E/;
    		$previousDir = $dir;
    
    		my $pathSearch = Slim::Utils::Misc::fileURLFromPath($dir) . '/%';
    		$log->debug("$libName: Including '$dir', pathSearch: '$pathSearch'");
    		$sth_insert->execute($id, $pathSearch);
    	}
    You might notice I've modified the $pathSearch, too: using catfile() for the URL is wrong. That function would concatenate paths and file names following the operating system's conventions (eg. "/" vs. "" on Windows). But the URL used for the $pathSearch would always have a forward slash (to be confirmed - haven't tested this code!).

    Originally posted by adhawkins View Post
    Also, I've had a request to make the recursive behaviour configurable.
    Given the way the scanning code works, is there any way I could achieve
    this? The SQL SELECT statement that finds the tracks would need to only
    accept ones that are in a directory that has a matching file.
    Yeah, that's trickier. Hmm... You'd probably have to use a different SQL query using REGEXP instead of LIKE for non-recursive. Then create a regex which would match the path, but no more slash after that.

    Code:
    if ($nonRecursive) {
       $pathSearch = '^\Q' . Slim::Utils::Misc::fileURLFromPath($dir) . '/\E[^\/]*$';
    }
    Again: I haven't tested this. I'm not sure it would work. Please give it a try. I'd have to check what regexes SQL supports.
    Last edited by mherger; 2022-05-22, 11:13.

    Leave a comment:


  • adhawkins
    replied
    Also, I've had a request to make the recursive behaviour configurable. Given the way the scanning code works, is there any way I could achieve this? The SQL SELECT statement that finds the tracks would need to only accept ones that are in a directory that has a matching file. I guess I could make the select / insert a two stage process, select all the files that match, then go through these and only accept ones that are in the actual directory?

    Andy

    Leave a comment:


  • adhawkins
    replied
    Originally posted by mherger View Post
    As the first one would include the second one anyway you should filter
    out sub-folders of folders already included. This would prevent
    redundant processing. But the result otherwise should be the same, as
    the query does "INSERT OR IGNORE" - ignore a command if the resulting
    record already exists.
    Any suggestions on how to do this filtering? Perl is far from my first language!

    Andy

    Leave a comment:


  • adhawkins
    replied
    Ok, I've made those changes as suggested. Code is a lot simpler, and also much faster. My original 7 second library build time is now about 15 seconds (compared to 90 seconds for the first attempt at the 'recursive' implementation).

    One question, to go through all the libraries I'm iterating over the result of 'Slim::Utils::Misc::getAudioDirs()'. Is that the right thing to do?

    Thanks for the assistance, much appreciated as usual.

    Andy

    Leave a comment:


  • mherger
    replied
    SimpleLibraryViews plugin update increases librarybuild time by a factor of 10

    >> So you're saying I should use File::Next to search for
    >> '.simple-library-views-<libname>' files. Then for each one of these it
    >> finds, call the query you quote passing in
    >> file://dirname(fullpath-to-slv-file)/.

    >
    > Only the path, not including the file name, of course. But basically
    > yes, this.


    That said: it would always be good to understand where you're loosing
    time before re-implementing everything :-). Did you ever try to measure
    time for the various parts in your code and compare with the older version?

    Leave a comment:


  • mherger
    replied
    SimpleLibraryViews plugin update increases librarybuild time by a factor of 10

    > So you're saying I should use File::Next to search for
    > '.simple-library-views-<libname>' files. Then for each one of these it
    > finds, call the query you quote passing in
    > file://dirname(fullpath-to-slv-file)/.


    Only the path, not including the file name, of course. But basically
    yes, this.

    > Will your query handle if (for
    > example) they have library files in both:
    >
    > - /var/spool/music/A/Andy/
    > - /var/spool/music/A/Andy/Album 1


    As the first one would include the second one anyway you should filter
    out sub-folders of folders already included. This would prevent
    redundant processing. But the result otherwise should be the same, as
    the query does "INSERT OR IGNORE" - ignore a command if the resulting
    record already exists.

    Leave a comment:


  • adhawkins
    replied
    Originally posted by mherger View Post
    I'd definitely recommend using File::Next or File::Find to get a list of
    library names with the folders they use. Then query the DB based on the
    folder. As you'd be able to insert based on the url that could be a
    pretty efficient SQL query. Something like

    INSERT OR IGNORE INTO library_track (library, track)
    SELECT ?, tracks.id
    FROM tracks
    WHERE url like ?
    AND content_type NOT IN ("cpl", "src", "ssp", "dir")

    where arguments are the library ID and the "file://path/to/folder%"
    (haven't tested this, but I assume you get the point).
    So you're saying I should use File::Next to search for '.simple-library-views-<libname>' files. Then for each one of these it finds, call the query you quote passing in file://dirname(fullpath-to-slv-file)/. Will your query handle if (for example) they have library files in both:
    • /var/spool/music/A/Andy/
    • /var/spool/music/A/Andy/Album 1


    This would find the same track ID twice, would the query handle that?

    Originally posted by mherger View Post
    Oh, and you got quite a few "libary" instead of "library" in your readme
    :-).
    Whoops! Nobody has spotted that since I first published the plugin something like 6 years ago!

    Andy

    Leave a comment:


  • mherger
    replied
    SimpleLibraryViews plugin update increases librarybuild time by a factor of 10

    > Can you say what change causes the issue? Maybe you can go back in git
    > history to see what commit changed behaviour?


    I believe that even if you don't hit the filesystem, you're going to
    test the same folders over and over again: if none of your files is
    found, you'd walk up the folder hierarchy. And you'd do this over and
    over again. Sure, by checking the cache (which could become pretty big
    in a large collection) you cut short. But still, you do this several times.

    > From a first quick look I was wondering whether it wouldn't be more
    > efficient to search for your your simple-library-views-* files, then map
    > these to the tracks, rather than the opposite?


    I'd definitely recommend using File::Next or File::Find to get a list of
    library names with the folders they use. Then query the DB based on the
    folder. As you'd be able to insert based on the url that could be a
    pretty efficient SQL query. Something like

    INSERT OR IGNORE INTO library_track (library, track)
    SELECT ?, tracks.id
    FROM tracks
    WHERE url like ?
    AND content_type NOT IN ("cpl", "src", "ssp", "dir")

    where arguments are the library ID and the "file://path/to/folder%"
    (haven't tested this, but I assume you get the point).

    Oh, and you got quite a few "libary" instead of "library" in your readme
    :-).

    Leave a comment:


  • mherger
    replied
    SimpleLibraryViews plugin update increases librarybuild time by a factor of 10

    > However, it appears that the build time has increased by a factor of
    > around 10 (on my LMS it's gone from < 10 seconds to about 90 seconds).
    > Another tester has seen a similar increase, from about 30 seconds to
    > over 10 minutes.


    Can you say what change causes the issue? Maybe you can go back in git
    history to see what commit changed behaviour?

    From a first quick look I was wondering whether it wouldn't be more
    efficient to search for your your simple-library-views-* files, then map
    these to the tracks, rather than the opposite?

    Leave a comment:


  • adhawkins
    replied
    A quick look through the code (my Perl is rusty!) suggests that inMediaFolder is calling down into pathFromFileURL, which does involve a trip to the filesystem, but this result should be cached. If my use of 'inMediaFolder' OK as I rise up the directory tree, to make sure I don't leave the defined media folders?

    I'm tempted to take this check out and just allow it to go right up to the root to see if it has any beneficial effect on the build time.

    Andy

    Leave a comment:

Working...
X