Announcement

Collapse
No announcement yet.

Working on a plugin - Scanning question

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Working on a plugin - Scanning question

    > this index exists already

    I don't see where it's created for the temporary table.

    #2
    Working on a plugin - Scanning question

    Hi,

    i'm working on a project where the library needs to be scanned. There are two scanning modes:
    1.) within main process of LMS
    2.) within the scanning module (import module)

    how do i trigger the 1. and the second programatically?

    Thanks

    mamema

    Comment


      #3
      Hi mamema

      I moved your thread to the dev forum, as the DIY is more for the hardware tinkerers.

      > i'm working on a project where the library needs to be scanned. There
      > are two scanning modes:
      > 1.) within main process of LMS

      The scanner will always run in a separate process if a plugin is involved. You can ignore this use case. It's been this way for years (7.7?) now. I'd concentrate on LMS8 going forward.

      > 2.) within the scanning module (import module)

      You need to define an "importmodule" item in the install.xml file (see eg. https://github.com/michaelherger/Spo...nstall.xml#L10). This tells LMS which module to use in the importer.

      In your importer module, you'd define an initPlugin() function which would register the importer:

      Code:
      	
      	Slim::Music::Import->addImporter($class, {
      		'type'         => 'post',
      		'weight'       => 85,
      		'use'          => 1,
      	});
      The type defines what kind of importer this is. There basically are three: file (import some files into the library), post (post processing after the file scan has been done), artwork. They're executed in this order. Most likely you'd be using the post type, as you're not dealing with importing the files themselves, nor artwork, but want to manipulate the data that was read during the file scan.

      Weight helps the scanner to define an order in which the scanners are executed.

      The startScan() would be what is being called by the scanner when it's time to execute your code. Do whatever you need to do in there.

      Once you got this basically running, we can start to look into nice to haves, like reporting progress etc.
      Michael

      "It doesn't work - what shall I do?" - "Please check your server.log and/or scanner.log file!"
      (LMS: Settings/Information)

      Comment


        #4
        Originally posted by mherger View Post

        The type defines what kind of importer this is. There basically are three: file (import some files into the library), post (post processing after the file scan has been done), artwork. They're executed in this order. Most likely you'd be using the post type, as you're not dealing with importing the files themselves, nor artwork, but want to manipulate the data that was read during the file scan.

        Weight helps the scanner to define an order in which the scanners are executed.

        The startScan() would be what is being called by the scanner when it's time to execute your code. Do whatever you need to do in there.
        Sorry for interrupting the discussion with a related question.

        Michael, would it be appropriate to implement a potentially long running operation that needs to be executed at LMS startup as an importer ?
        If so, how would the importer be triggered to run from initPlugin ? Should one just initiate a new/changed files rescan in initPlugin or is it a better way to do it ?

        Asking because I know some of my plugins does things at LMS startup that potentially can take a bit of time. The plugin doesn’t work properly before the operation is finished but it has always felt bad to risk hanging whole LMS. I use main::idleStreams() where it’s possible and I’ve also experimented a bit with Slim::Utils::Scheduler::add_task but for these to work good the plugin needs to be able to divide the work into smaller task which isn’t always easy.
        Last edited by erland; 2021-02-25, 20:32.
        Erland Lindmark (My homepage)
        Developer of many plugins/applets
        Starting with LMS 8.0 I no longer support my plugins/applets (see here for more information )

        Comment


          #5
          Working on a plugin - Scanning question

          > Michael, would it be appropriate to implement a potentially long running
          > operation that needs to be executed at LMS startup as an importer ?


          Yes, definitely.

          But why would you need to run this on every restart?

          > If so, how would the importer be triggered to run from pluginInit ?
          > Should one just initiate a new/changed files rescan in initPlugin or is
          > it a better way to do it ?


          Be nice and give LMS a few seconds to start up. Then run a rescan:

          Slim::Control::Request::executeRequest(undef, ['rescan']);
          Michael

          "It doesn't work - what shall I do?" - "Please check your server.log and/or scanner.log file!"
          (LMS: Settings/Information)

          Comment


            #6
            Originally posted by mherger View Post
            > Michael, would it be appropriate to implement a potentially long running
            > operation that needs to be executed at LMS startup as an importer ?


            Yes, definitely.

            But why would you need to run this on every restart?

            > If so, how would the importer be triggered to run from pluginInit ?
            > Should one just initiate a new/changed files rescan in initPlugin or is
            > it a better way to do it ?


            Be nice and give LMS a few seconds to start up. Then run a rescan:

            Slim::Control::Request::executeRequest(undef, ['rescan']);
            It's not only startup related, it is more the point that some of Erlands plugins are scanning stuff which brings LMS web server to an halt. Doesn't matter on startup or during runtime. Especially for people like me, with large libraries, this could take hours, so the idea is, to use a sort of scanning, which doesn't interfere with the LMS web server at all, or as Erland suggests, divide this into chunks.....

            What is your best design propose to get a scanning done, with 80.000 files and still be able to use LMS?

            For example:
            [21-02-25 01:45:44.0012] Plugins::TrackStat::Storage::refreshTracks (1371) Finished updating urls in statistic data based on musicbrainz ids, updated 108090 items : It took 12190.793819 seconds

            3 hours, 38 min :-)
            Last edited by mamema; 2021-02-26, 10:20.

            Comment


              #7
              Working on a plugin - Scanning question

              > What is your best design propose to get a scanning done, with 80.000
              > files and still be able to use LMS?


              I'd go the "run external scanner" route. It's much simpler than trying
              to do split up a large chunk of work in to smaller pieces, and get them
              done without blocking the single threaded server. And most systems
              nowadays have multiple cores. Let's take advantage of them!

              But first of all I'd try to understand what is taking so long, and why.
              Some understanding of the underlying SQL can help a lot. Are you
              repeatedly running the same queries? Could some results be cached? Is
              the performance CPU bound? Or disk IO?
              Michael

              "It doesn't work - what shall I do?" - "Please check your server.log and/or scanner.log file!"
              (LMS: Settings/Information)

              Comment


                #8
                Originally posted by mherger View Post
                > What is your best design propose to get a scanning done, with 80.000
                > files and still be able to use LMS?


                But first of all I'd try to understand what is taking so long, and why.
                Some understanding of the underlying SQL can help a lot. Are you
                repeatedly running the same queries? Could some results be cached? Is
                the performance CPU bound? Or disk IO?
                the running perl script uses 100 % of one cpu core. The one LMS in running on, because of that, LMS halts.

                I'm not sure yet, if the disk i/o is a problem. I think shouldn't, as it is a 8 disk NAS, which hasn't running any other heavy disk i/o. But need to proof again

                the involved sql queries are:

                UPDATE tracks,track_statistics SET track_statistics.url=tracks.url,track_statistics.u rlmd5=tracks.urlmd5 where tracks.musicbrainz_id is not null and tracks.musicbrainz_id=track_statistics.musicbrainz _id and track_statistics.url!=tracks.url and length(tracks.url)<".($useLongUrls?512:256);

                CREATE temp table temp_track_statistics as select tracks.url,tracks.urlmd5,tracks.musicbrainz_id from tracks join track_statistics on tracks.musicbrainz_id=track_statistics.musicbrainz _id where track_statistics.musicbrainz_id is not null and track_statistics.urlmd5!=tracks.urlmd5";

                UPDATE track_statistics SET url=(select url from temp_track_statistics where musicbrainz_id=track_statistics.musicbrainz_id),ur lmd5=(select urlmd5 from temp_track_statistics where musicbrainz_id=track_statistics.musicbrainz_id) where exists (select url from temp_track_statistics where musicbrainz_id=track_statistics.musicbrainz_id)";

                try to test now, if a INNER JOIN instead of WHERE Clause will speed things up
                Last edited by mamema; 2021-02-26, 11:43.

                Comment


                  #9
                  Originally posted by mherger View Post
                  But first of all I'd try to understand what is taking so long, and why.
                  Some understanding of the underlying SQL can help a lot.
                  I'll try to describe the thoughts behind the refresh operation in TrackStat below (as far as I remember) and how I think it can be optimised. It consist of a number of different parts and currently the whole operation always runs in main LMS process, at LMS startup, after rescan done event, when initiated from TrackStat settings page.
                  Part 1 is the one that takes 3.5 hours for mamema, don't now how long the other parts take.

                  The parts are:

                  Part 1: Recover statistic data based on musicbrainz tags

                  - The issue is that when a track is moved or renamed standard LMS scanner will delete the old track and add a new track and the rating, play counts and last played time is lost. TrackStat tries to recover the old data by using musicbrainz tag which will be the same in the renamed file.
                  - I think this would be possible to handle in an importer instead that's triggered when a new track is added to the database and then checks if there are previous data for the same musicbrainz tag. Running it per track instead of creating a temporary table and running the update statement mamema mention would help a lot. I think this part is also only necessary during scanning, I don't see any reason to run it at LMS startup.

                  Part 2: Update md5 url value

                  - Don't remember the exact purpose of this but I suspect it's related to part 1. Since part 1 is updating the url it's important to also update the md5 representation of the url.
                  - Would likely be possible to handle the same way as part 1.

                  Part 3: Analyze command

                  - Run analyze command to optimise the performance of TrackStat table
                  - Not sure analyse is possible to run on per track basis so this should probably run as a post scan action. Don't think it's needed at LMS startup.

                  Part 4: Updating musicbrainz id's

                  - Copies fresh musicbrainz id's from LMS tracks table to TrackStat track_statistics table. Typical scenario is that a user has added or updated the musicbrainz tags on a music file.
                  - Should be possible to handle as a importer which is called when a track is either updated or added to the database. Running it per track will likely cause less disturbance than running it as a post scan operation of the whole database. Running it per track will also be better in the scenario where a new/changed rescan is performed and only a single track has been changed.

                  Part 5: Analyze command

                  - Run analyze command to optimise the performance of TrackStat table
                  - Same as part 3, not sure why it's executed multiple times, possibly other parts of refresh operation was faster if analyse was executed before them.

                  Part 6: Fill TrackStat table with new tracks

                  - Copies tracks in LMS tracks table into TrackStat table if they don't already exist in TrackStat
                  - This needs to be executed at LMS startup for the scenario when TrackStat is first installed but during scanning it should be possible to handle this in same way as part 1 with an importer that executes when a new track is added to the database.

                  Part 7: Analyse command

                  - Run analyze command to optimise the performance of TrackStat table
                  - Same as part 3, not sure why it's executed multiple times, possibly other parts of refresh operation was faster if analyse was executed before them.

                  Part 8: Update ratings in LMS database

                  - The purpose of this part is to update ratings in LMS tracks_persistent table with the value from TrackStat table.
                  - Not really sure this is required any more, might be there for historical reason when tracks_persistent didn't always survived upgrades upgrades. TrackStats should normally write ratings both to LMS table and TrackStat table, possible it might not happen when a TrackStat backup is restored and that could be the reason for this part.
                  - Not sure it makes sense to run this during scanning, feels more like something that is appropriate to run at LMS startup, if it's even needed anymore.

                  Part 9: Analyse command

                  - Run analyze command to optimise the performance of TrackStat table
                  - Same as part 3, not sure why it's executed multiple times, possibly other parts of refresh operation was faster if analyse was executed before them.

                  Part 10: Update added time in TrackStat table

                  - Purpose is to add the "added" time to TrackStat table in case it didn't exist there before. An important part is that added time is only added once for a track because the meaning of it in TrackStat is when a track was first added to the library. So it is intentionally not updated if you retag an existing track and update its modification time.
                  - Should be possible to run as an importer that executes when a track is added or updated in LMS tables. Don't think this is necessary to run at LMS startup. I think this might already be covered by Part 6 above it it's implemented as I suggest.

                  Part 11: Analyse command

                  - Run analyze command to optimise the performance of TrackStat table
                  - Same as part 3, not sure why it's executed multiple times, possibly other parts of refresh operation was faster if analyse was executed before them.

                  Part 12: Update play counts in TrackStat table

                  - Purpose is to add the play count to TrackStat table in case it didn't exist there before. An important part is that play count is only added once for a track in refresh operation because LMS play count logic is different than TrackStat. TrackStat have logic so the play counts isn't increased if a track is skipped soon after it has started to play.
                  - Should be possible to run as an importer that executes when a track is added or updated in LMS tables. I think this might already be covered by Part 6 above it it's implemented as I suggest.

                  Part 13: Update last played time in TrackStat table

                  - Purpose is to add the last played time to TrackStat table in case it didn't exist there before. An important part is that last played time is only added once for a track in refresh operation because LMS last played time logic is different than TrackStat. TrackStat have logic so the last played time isn't increased if a track is skipped soon after it has started to play.
                  - Should be possible to run as an importer that executes when a track is added or updated in LMS tables. I think this might already be covered by Part 6 above it it's implemented as I suggest.

                  Part 14: Analyze command

                  - Run analyze command to optimise the performance of TrackStat table
                  - Same as part 3, not sure why it's executed multiple times, possibly other parts of refresh operation was faster if analyse was executed before them.

                  Part 15: Update rating in TrackStat table

                  - Purpose is to add the rating to TrackStat table in case it didn't exist there before. An important part is that rating is only added once for a track in refresh operation because TrackStat consider itself to be the master of ratings so they should never be overwritten by anyone except for using the TrackStat CLI commands.
                  - Should be possible to run as an importer that executes when a track is added or updated in LMS tables. I think this might already be covered by Part 6 above it it's implemented as I suggest.

                  Part 16: Analyze command

                  - Run analyze command to optimise the performance of TrackStat table
                  - Same as part 3, not sure why it's executed multiple times, possibly other parts of refresh operation was faster if analyse was executed before them.

                  Part 17: Set rating to null if 0

                  - Updates TrackStat tables so any track with rating 0 is set to null.
                  - TrackStat assumes unrated track has a value of null and not 0 to simplify the logic so it doesn't have to handle both 0 and null in other places.
                  - I think this might already be covered by Part 6 above it it's implemented as I suggest.

                  There are additionally 10 parts related to historical data but I suggest starting with the above if someone wants to look into it.
                  Erland Lindmark (My homepage)
                  Developer of many plugins/applets
                  Starting with LMS 8.0 I no longer support my plugins/applets (see here for more information )

                  Comment


                    #10
                    Following up on previous post, reached the 10 000 character forum message max limit....

                    I think most of the refresh operation in TrackStat would be possible to replace with:
                    - An importer that's triggered when a track is added to database or when a track is updated in database and execute relevant parts of refresh operation for that specific track. If I remember correctly you can implement a trackDeleted and trackChanged function in the importer which will be called by LMS scanner when a track is deleted or when a track is added/updated.
                    - An importer that's triggered at LMS startup when no data exists in TrackStat tables and possibly manually from settings page. The purpose would be to fill TrackStat with information from LMS initially when TrackStat has been installed. Could even be that this doesn't have to be an importer because if it only needs to handle the case when TrackStat tables are empty it should be a fairly simple SQL statement unless I'm missing something.

                    TrackStat is implemented as it is because the concept of importers running after each track has been scanned didn't exist when TrackStat was developed back in 2006.

                    As a side note, there are similar issues in some of my other plugins, some run SQL statements at startup and some does heavy XML parsing to create a configuration cache, so the idea of running an importer at startup can likely be used also in some of my other plugins. Custom Scan implementation should definitely be replaced by importers if someone decides to look at improving that, I started doing it but never reached a non beta version before I run out of time.
                    Erland Lindmark (My homepage)
                    Developer of many plugins/applets
                    Starting with LMS 8.0 I no longer support my plugins/applets (see here for more information )

                    Comment


                      #11
                      Need to go later through your extensive reply first, but one point bugs me:

                      >- The issue is that when a track is moved or renamed standard LMS scanner will delete the old track and add a new track and the rating, play counts and last played time is lost.

                      i don't move stuff around, neither do i rename it.
                      But the extensive 3.5 hour run is always triggered when update scanning takes place.

                      The only thing happening at the moment is heavy musicip fingerprinting, but with preserved file change date.

                      So either i do not understand your mentioned "rename" sentence, or there is something to investigate.....

                      Comment


                        #12
                        Originally posted by mamema View Post
                        the involved sql queries are:

                        UPDATE tracks,track_statistics SET track_statistics.url=tracks.url,track_statistics.u rlmd5=tracks.urlmd5 where tracks.musicbrainz_id is not null and tracks.musicbrainz_id=track_statistics.musicbrainz _id and track_statistics.url!=tracks.url and length(tracks.url)<".($useLongUrls?512:256);
                        This above is only used for MySQL and I don't think LMS supports MySQL these days, so you can probably ignore that.

                        Originally posted by mamema View Post
                        CREATE temp table temp_track_statistics as select tracks.url,tracks.urlmd5,tracks.musicbrainz_id from tracks join track_statistics on tracks.musicbrainz_id=track_statistics.musicbrainz _id where track_statistics.musicbrainz_id is not null and track_statistics.urlmd5!=tracks.urlmd5";

                        UPDATE track_statistics SET url=(select url from temp_track_statistics where musicbrainz_id=track_statistics.musicbrainz_id),ur lmd5=(select urlmd5 from temp_track_statistics where musicbrainz_id=track_statistics.musicbrainz_id) where exists (select url from temp_track_statistics where musicbrainz_id=track_statistics.musicbrainz_id)";

                        try to test now, if a INNER JOIN instead of WHERE Clause will speed things up
                        If you like to go the quick and dirty route and just optimise these SQL statements a bit I wonder if there needs to be some indexes created for the temp table. For the permanent track_statistics table I create a number of indexes here in the code: https://github.com/erland/lms-tracks...torage.pm#L365
                        Could be that these indexes aren't transfered automatically to the temp table and in that case that could maybe be the issue. However, it is a bit strange because looking at the SQL I would expect the temp table to be empty unless you have renamed/moved music files. I wonder if you have multiple tracks with the same musicbrainz id in your database, because that would probably cause the temp table to contain some data. Multiple tracks with same musicbrainz id is known to result in data in track_statistics table to become incorrectly duplicated, so I've suggested to people who have these kind of duplicates in their library to disable musicbrainz support in TrackStat.
                        Erland Lindmark (My homepage)
                        Developer of many plugins/applets
                        Starting with LMS 8.0 I no longer support my plugins/applets (see here for more information )

                        Comment


                          #13
                          Originally posted by mamema View Post
                          Need to go later through your extensive reply first, but one point bugs me:

                          >- The issue is that when a track is moved or renamed standard LMS scanner will delete the old track and add a new track and the rating, play counts and last played time is lost.

                          i don't move stuff around, neither do i rename it.
                          But the extensive 3.5 hour run is always triggered when update scanning takes place.

                          The only thing happening at the moment is heavy musicip fingerprinting, but with preserved file change date.

                          So either i do not understand your mentioned "rename" sentence, or there is something to investigate.....
                          The SQL statement in the refresh operation still executes but isn't supposed to do anything.However, as mentioned in my post a few seconds ago, I would expect the temp table to be empty in your case. However, if you have multiple tracks with same musicbrainz id that could explain it, because then the temp table would not be empty and it would likely cause a lot of duplicates in the track_statistics table.

                          You could run the select that creates the temp table and see if it returns something towards your database. If you aren't able to connect directly towards the database you can use Database Query plugin and create a free form query to run the select statement.
                          Erland Lindmark (My homepage)
                          Developer of many plugins/applets
                          Starting with LMS 8.0 I no longer support my plugins/applets (see here for more information )

                          Comment


                            #14
                            Originally posted by erland View Post
                            You could run the select that creates the temp table and see if it returns something towards your database. If you aren't able to connect directly towards the database you can use Database Query plugin and create a free form query to run the select statement.
                            i'm just running now my 3 hour scanning run and modified the plugin to not drop the temp table, if it is created, so i can look into it.

                            If you're right, and i'm sure you are, then the import scan module way would be the best then.
                            Last edited by mamema; 2021-02-26, 14:36.

                            Comment


                              #15
                              Working on a plugin - Scanning question

                              > the involved sql queries are:

                              Time each of these queries to figure out which one burns the most CPU cycle.

                              my $t = time();
                              runSQL1(); # whatever
                              warn "SQL1: " . (time() - $t);

                              Or similar. Poor man's instrumentation.

                              If you want to go fancy, you can look into installing NYTProf and
                              profile the runs (https://metacpan.org/pod/Devel::NYTProf). This will
                              further slow down the process and take additional time to generate
                              useful reports. You might want to try with a smaller collection first.
                              Michael

                              "It doesn't work - what shall I do?" - "Please check your server.log and/or scanner.log file!"
                              (LMS: Settings/Information)

                              Comment

                              Working...
                              X