Announcement

Collapse
No announcement yet.

SimpleLibraryViews plugin update increases library build time by a factor of 10

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    SimpleLibraryViews plugin update increases library build time by a factor of 10

    > Looks like the DBD::SQLite module registers a 'regexp' function that
    > just uses Perl regexps. So as long as that is a valid Perl RE it should
    > be Ok.
    >
    > Was going to try it in the command line client, but looks like that
    > might not work. A bit of googling suggested there was an extension to
    > sqlite that handled Perl REs, so will have a look at that.


    SQLiteStudio has built-in support for regexp (https://sqlitestudio.pl).
    And my regex almost did work. I'm going to edit it (needs a '$' matcher
    at the end). SQLiteStudio didn't need to escape the slash ("[^\/]"). I'm not sure
    about DBI. Give it a try, and change to just "[^/]" if it doesn't work.
    Last edited by mherger; 2022-05-22, 11:15.

    #2
    SimpleLibraryViews plugin update increases library build time by a factor of 10

    Hi all,

    I've updated my SimpleLibraryViews plugin to allow for the library definition file to work for a directory and all of its subdirectories. I believe I'm caching the files existence such that there should only be one trip to the filesystem for each check in a directory.

    However, it appears that the build time has increased by a factor of around 10 (on my LMS it's gone from < 10 seconds to about 90 seconds). Another tester has seen a similar increase, from about 30 seconds to over 10 minutes.

    Anyone care to take a look at the library code to see if I've done anything stupid? The code in question starts here:



    I'm calling the following LMS functions within the loop. Do any of them involve a trip to the filesystem, which may explain it?
    • Slim::Music::Info::isFileURL
    • Slim::Utils::Misc:athFromFileURL
    • Slim::Utils::Misc::inMediaFolder


    Appreciate any advice you more knowledgeable people might be able to offer!

    Andy

    Comment


      #3
      A quick look through the code (my Perl is rusty!) suggests that inMediaFolder is calling down into pathFromFileURL, which does involve a trip to the filesystem, but this result should be cached. If my use of 'inMediaFolder' OK as I rise up the directory tree, to make sure I don't leave the defined media folders?

      I'm tempted to take this check out and just allow it to go right up to the root to see if it has any beneficial effect on the build time.

      Andy

      Comment


        #4
        SimpleLibraryViews plugin update increases librarybuild time by a factor of 10

        > However, it appears that the build time has increased by a factor of
        > around 10 (on my LMS it's gone from < 10 seconds to about 90 seconds).
        > Another tester has seen a similar increase, from about 30 seconds to
        > over 10 minutes.


        Can you say what change causes the issue? Maybe you can go back in git
        history to see what commit changed behaviour?

        From a first quick look I was wondering whether it wouldn't be more
        efficient to search for your your simple-library-views-* files, then map
        these to the tracks, rather than the opposite?

        Michael

        "It doesn't work - what shall I do?" - "Please check your server.log and/or scanner.log file!"
        (LMS: Settings/Information)

        Comment


          #5
          SimpleLibraryViews plugin update increases librarybuild time by a factor of 10

          > Can you say what change causes the issue? Maybe you can go back in git
          > history to see what commit changed behaviour?


          I believe that even if you don't hit the filesystem, you're going to
          test the same folders over and over again: if none of your files is
          found, you'd walk up the folder hierarchy. And you'd do this over and
          over again. Sure, by checking the cache (which could become pretty big
          in a large collection) you cut short. But still, you do this several times.

          > From a first quick look I was wondering whether it wouldn't be more
          > efficient to search for your your simple-library-views-* files, then map
          > these to the tracks, rather than the opposite?


          I'd definitely recommend using File::Next or File::Find to get a list of
          library names with the folders they use. Then query the DB based on the
          folder. As you'd be able to insert based on the url that could be a
          pretty efficient SQL query. Something like

          INSERT OR IGNORE INTO library_track (library, track)
          SELECT ?, tracks.id
          FROM tracks
          WHERE url like ?
          AND content_type NOT IN ("cpl", "src", "ssp", "dir")

          where arguments are the library ID and the "file://path/to/folder%"
          (haven't tested this, but I assume you get the point).

          Oh, and you got quite a few "libary" instead of "library" in your readme
          :-).
          Michael

          "It doesn't work - what shall I do?" - "Please check your server.log and/or scanner.log file!"
          (LMS: Settings/Information)

          Comment


            #6
            Originally posted by mherger View Post
            I'd definitely recommend using File::Next or File::Find to get a list of
            library names with the folders they use. Then query the DB based on the
            folder. As you'd be able to insert based on the url that could be a
            pretty efficient SQL query. Something like

            INSERT OR IGNORE INTO library_track (library, track)
            SELECT ?, tracks.id
            FROM tracks
            WHERE url like ?
            AND content_type NOT IN ("cpl", "src", "ssp", "dir")

            where arguments are the library ID and the "file://path/to/folder%"
            (haven't tested this, but I assume you get the point).
            So you're saying I should use File::Next to search for '.simple-library-views-<libname>' files. Then for each one of these it finds, call the query you quote passing in file://dirname(fullpath-to-slv-file)/. Will your query handle if (for example) they have library files in both:
            • /var/spool/music/A/Andy/
            • /var/spool/music/A/Andy/Album 1


            This would find the same track ID twice, would the query handle that?

            Originally posted by mherger View Post
            Oh, and you got quite a few "libary" instead of "library" in your readme
            :-).
            Whoops! Nobody has spotted that since I first published the plugin something like 6 years ago!

            Andy

            Comment


              #7
              SimpleLibraryViews plugin update increases librarybuild time by a factor of 10

              > So you're saying I should use File::Next to search for
              > '.simple-library-views-<libname>' files. Then for each one of these it
              > finds, call the query you quote passing in
              > file://dirname(fullpath-to-slv-file)/.


              Only the path, not including the file name, of course. But basically
              yes, this.

              > Will your query handle if (for
              > example) they have library files in both:
              >
              > - /var/spool/music/A/Andy/
              > - /var/spool/music/A/Andy/Album 1


              As the first one would include the second one anyway you should filter
              out sub-folders of folders already included. This would prevent
              redundant processing. But the result otherwise should be the same, as
              the query does "INSERT OR IGNORE" - ignore a command if the resulting
              record already exists.
              Michael

              "It doesn't work - what shall I do?" - "Please check your server.log and/or scanner.log file!"
              (LMS: Settings/Information)

              Comment


                #8
                SimpleLibraryViews plugin update increases librarybuild time by a factor of 10

                >> So you're saying I should use File::Next to search for
                >> '.simple-library-views-<libname>' files. Then for each one of these it
                >> finds, call the query you quote passing in
                >> file://dirname(fullpath-to-slv-file)/.

                >
                > Only the path, not including the file name, of course. But basically
                > yes, this.


                That said: it would always be good to understand where you're loosing
                time before re-implementing everything :-). Did you ever try to measure
                time for the various parts in your code and compare with the older version?
                Michael

                "It doesn't work - what shall I do?" - "Please check your server.log and/or scanner.log file!"
                (LMS: Settings/Information)

                Comment


                  #9
                  Ok, I've made those changes as suggested. Code is a lot simpler, and also much faster. My original 7 second library build time is now about 15 seconds (compared to 90 seconds for the first attempt at the 'recursive' implementation).

                  One question, to go through all the libraries I'm iterating over the result of 'Slim::Utils::Misc::getAudioDirs()'. Is that the right thing to do?

                  Thanks for the assistance, much appreciated as usual.

                  Andy

                  Comment


                    #10
                    Originally posted by mherger View Post
                    As the first one would include the second one anyway you should filter
                    out sub-folders of folders already included. This would prevent
                    redundant processing. But the result otherwise should be the same, as
                    the query does "INSERT OR IGNORE" - ignore a command if the resulting
                    record already exists.
                    Any suggestions on how to do this filtering? Perl is far from my first language!

                    Andy

                    Comment


                      #11
                      Also, I've had a request to make the recursive behaviour configurable. Given the way the scanning code works, is there any way I could achieve this? The SQL SELECT statement that finds the tracks would need to only accept ones that are in a directory that has a matching file. I guess I could make the select / insert a two stage process, select all the files that match, then go through these and only accept ones that are in the actual directory?

                      Andy

                      Comment


                        #12
                        Originally posted by adhawkins View Post
                        One question, to go through all the libraries I'm iterating over the result of 'Slim::Utils::Misc::getAudioDirs()'. Is that the right thing to do?
                        I'm not sure I understand. Yes, you'll have to can all of getAudioDirs(). But no, you don't have to do this for all libraries. You could do it once, and collect a list of all library name files you find. Keep this around in a (module) global variable:

                        Code:
                        {
                           library1 => [ dir1, dir2, .. ],
                           library2 => [ dir2, dirx, .. ],
                           ...
                        }
                        So for the next library you'd only have to evaluate the folders in respective hash entry instead of crawling the filesystem once again.

                        Originally posted by adhawkins View Post
                        Any suggestions on how to do this filtering? Perl is far from my first
                        language!
                        Sort the $dirs before processing them. Keep track of the previous value. If your current folder is a sub-folder of the previous, skip its processing

                        Code:
                        	my $previousDir;
                        	foreach my $dir ( sort { $a cmp $b } @includeDirs ) {
                        		next if $previousDir && $dir =~ /^\Q$previousDir\E/;
                        		$previousDir = $dir;
                        
                        		my $pathSearch = Slim::Utils::Misc::fileURLFromPath($dir) . '/%';
                        		$log->debug("$libName: Including '$dir', pathSearch: '$pathSearch'");
                        		$sth_insert->execute($id, $pathSearch);
                        	}
                        You might notice I've modified the $pathSearch, too: using catfile() for the URL is wrong. That function would concatenate paths and file names following the operating system's conventions (eg. "/" vs. "" on Windows). But the URL used for the $pathSearch would always have a forward slash (to be confirmed - haven't tested this code!).

                        Originally posted by adhawkins View Post
                        Also, I've had a request to make the recursive behaviour configurable.
                        Given the way the scanning code works, is there any way I could achieve
                        this? The SQL SELECT statement that finds the tracks would need to only
                        accept ones that are in a directory that has a matching file.
                        Yeah, that's trickier. Hmm... You'd probably have to use a different SQL query using REGEXP instead of LIKE for non-recursive. Then create a regex which would match the path, but no more slash after that.

                        Code:
                        if ($nonRecursive) {
                           $pathSearch = '^\Q' . Slim::Utils::Misc::fileURLFromPath($dir) . '/\E[^\/]*$';
                        }
                        Again: I haven't tested this. I'm not sure it would work. Please give it a try. I'd have to check what regexes SQL supports.
                        Last edited by mherger; 2022-05-22, 11:13.
                        Michael

                        "It doesn't work - what shall I do?" - "Please check your server.log and/or scanner.log file!"
                        (LMS: Settings/Information)

                        Comment


                          #13
                          Thanks for the info, will go back and take a look at applying it to the code.

                          Regarding the 'non-recursive', I've managed to implement this and it seems to be fairly efficient. What I do is:

                          Iterate over all the directories I found the file in.
                          Run a query on the database to select all the files that match this directory (which will also include the ones below it)
                          For each result, get the directory its in.
                          If this directory is the same as the directory I searched for, add it in.

                          Perhaps not the most efficient way of doing it, but it basically mirrors what the original code does. It seems to work correctly.

                          The code is below:

                          Code:
                          	my $sth_recursive_insert;
                          	my $sth_select;
                          	my $sth_insert;
                          
                          	if ($recursive) {
                          		$sth_recursive_insert = $dbh->prepare('
                          			INSERT OR IGNORE INTO library_track (library, track)
                          			SELECT ?, tracks.id
                          			FROM tracks
                          			WHERE url like ?
                          			AND content_type NOT IN ("cpl", "src", "ssp", "dir")
                          		');
                          	} else {
                          		$sth_select = $dbh->prepare('
                          			SELECT id, url FROM tracks
                          				WHERE url like ? AND content_type NOT IN ("cpl", "src", "ssp", "dir")
                          				ORDER BY url
                          		');
                          
                          		$sth_insert = $dbh->prepare('
                          			INSERT OR IGNORE INTO library_track (library, track) values (?, ?)
                          		');
                          	}
                          
                          	foreach my $dir ( @includeDirs ) {
                          		my $pathSearch = Slim::Utils::Misc::fileURLFromPath($dir) . File::Spec->catfile('', '') . "%";
                          		main::DEBUGLOG && $log->is_debug && $log->debug("$libName: Including '$dir', pathSearch: '$pathSearch'");
                          
                          		if ($recursive) {
                          			main::DEBUGLOG && $log->is_debug && $log->debug("Doing recursive");
                          
                          			$sth_recursive_insert->execute($id, $pathSearch);
                          		} else {
                          			$sth_select->execute($pathSearch);
                          
                          			while ( my ($trackid, $url) = $sth_select->fetchrow_array ) {
                          				my $trackDir = dirname(Slim::Utils::Misc::pathFromFileURL($url));
                          
                          				main::DEBUGLOG && $log->is_debug && $log->debug("dir for '$url' is '$trackDir'");
                          
                          				if ( $trackDir eq $dir ) {
                          					main::DEBUGLOG && $log->is_debug && $log->debug("Inserting");
                          
                          					$sth_insert->execute($id, $trackid);
                          				}
                          			}
                          		}
                          	}
                          How does that look?

                          Andy

                          Comment


                            #14
                            SimpleLibraryViews plugin update increases librarybuild time by a factor of 10

                            > Perhaps not the most efficient way of doing it, but it basically mirrors
                            > what the original code does. It seems to work correctly.


                            Whatever works! Sometimes optimizations aren't worth the hassle. If
                            you're happy with the result then it's better to keep the code you
                            understand and know how to maintain.

                            > my $pathSearch = Slim::Utils::Misc::fileURLFromPath($dir) . File::Spec->catfile('', '') . "%";


                            I still believe this could fail on Windows. Would you have any Windows
                            testers? catfile() should not be used with a URL. Just use the forward
                            slash "/" instead.
                            Michael

                            "It doesn't work - what shall I do?" - "Please check your server.log and/or scanner.log file!"
                            (LMS: Settings/Information)

                            Comment


                              #15
                              Originally posted by mherger View Post
                              Yeah, that's trickier. Hmm... You'd probably have to use a different SQL query using REGEXP instead of LIKE for non-recursive. Then create a regex which would match the path, but no more slash after that.

                              Code:
                              if ($nonRecursive) {
                                 $pathSearch = '^\Q' . Slim::Utils::Misc::fileURLFromPath($dir) . '/\E[^\/]*';
                              }
                              Again: I haven't tested this. I'm not sure it would work. Please give it a try. I'd have to check what regexes SQL supports.
                              Looks like the DBD::SQLite module registers a 'regexp' function that just uses Perl regexps. So as long as that is a valid Perl RE it should be Ok.

                              Was going to try it in the command line client, but looks like that might not work. A bit of googling suggested there was an extension to sqlite that handled Perl REs, so will have a look at that.

                              Thanks as ever.

                              Andy

                              Comment

                              Working...
                              X