Announcement

Collapse
No announcement yet.

[Announce] Music Similarity DSTM mixer

Collapse
X
 
  • Time
  • Show
Clear All
new posts

  • [Announce] Music Similarity DSTM mixer

    This is an alpha release of a DSTM mixer that uses Musly to obtain 'similar' tracks, and then (optionally) uses Essentia to filter these 'similar' tracks by BPM, Key (using Camelot Wheel), loudness, and (if tracks analysed on Linux) other attributes (such as danceability, aggressiveness, etc.).

    There are two parts to this plugin:
    1. The DSTM plugin, which is a standard LMS Plugin - https://github.com/CDrummond/lms-musicsimilarity
    2. A python script (music-similarity) used to analyse music tracks, create a 'similarity' database, and provide access to query track similarity via a simple HTTP API - https://github.com/CDrummond/music-similarity


    The 'music-similarity' script is used to analyse your tracks, and saves its results to an SQLite database and a Musly 'jukebox' file. (This 'jukebox' is what Musly uses to compute similarities, and contains binary data taken from the SQLite database. This jukebox can be created 'on the fly' but this can take sometime, hence a cached version is written to, and read from, disk). Musly itself is very fast to analyse tracks (by default it (my modified version) analyses the middle 2 minutes (starting no later than 3 1/2 minutes into track) of each track). Essentia, however, is much slower. As a rough guide, using both Musly and Essentia, my 7 year old i7, 8-core, SSD, laptop analyses around 1140 tracks/hour.

    Essentia is used to extract attributes of each track - its BPM, key, and loudness. If Essentia has been compiled with Gaia/SVM (or Tensorflow) it can also extract other attributes - such as danceability, aggressiveness, etc. Supporting these extra attributes on non-Linux systems will require rebuilding of Essentia. These attributes are stored in the SQLite database, therefore the Essentia binary is only required when analysing tracks and is not required when providing the similarity service.

    The script is run in anlysis mode via:

    Code:
    ./music-similarity/music-similarity.py -a m -l DEBUG
    Once your tracks have been analysed the script needs to be run in 'server' mode to allow the LMS plugin to query for similar tracks.

    Code:
    ./music-similarity/music-similarity.py -l DEBUG
    Further details may be found on the Music Similarity github page.

    [Edit] I have removed links to the 'music-similarity' ZIPs, as I will no longer be updating them. When MusicSimilarity is released it will be as one ZIP file without the Essentia binaries, Essentia models, and ffmpeg executables. Links to these will be give in the 'INSTALL.md' files for each relevant OS.

    On my personal setup I analyse tracks on my Linux laptop (e.g. using 2) and then copy the SQLite database and jukebox files to my Raspberry Pi4 - where LMS (using 5) and the similarity API server (using 3) run.

    MusicSimilarity is still under development so I fully expect there to be issues - especially on the Windows side, as I'm a 100% Linux user and have not fully tested this setup. However, I'm interested to get feedback on how well this works for others, perhaps even what are the best default settings to use for the plugin side. The LMS plugin adds "Create Similarity Mix", etc, items to LMS's context menus. These were added at the request of "afriend", but are not something I actively use and I'm more interested in feedback on the DSTM side (which is my main use case).

    If you are going to test, I suggest you create a small subset (e.g. 500 tracks) of your Library - so that you are sure the config works, etc. before spending hours analysing 1000s of tracks.

    [Edit] As stated later i this thread, I have found that 'bliss-rs' can be used as a replacement for Musly, under Linux at least, and this (with my library) seems to provide better mixes.
    Last edited by cpd73; 2022-02-06, 09:43.
    Material debug: 1. Launch via http: //SERVER:9000/material/?debug=json (Use http: //SERVER:9000/material/?debug=json,cometd to also see update messages, e.g. play queue) 2. Open browser's developer tools 3. Open console tab in developer tools 4. REQ/RESP messages sent to/from LMS will be logged here.

  • #2
    Hi Craig

    Have just noticed this, and I'm going to give it a go on my Windows setup... I'll report back when I can - but I only have a few minutes this evening, then I'm away for a couple of days - so it may be a slow process.

    First thing I've hit which looks odd, having installed the various requires Python components is this error when I try an analyse a sample library file set:

    2021-12-28 20:33:57 E Failed to open Musly shared library (G:\Installation Files\Squeezebox\music-similarity\windows\mingw32\mingw64\libmusly.dll)!

    That path looks odd, as it has both mingw32 and minw64 in it - where it should likely be one or the other.

    I can probably work around it by manipulating the folders... that's my current thinking at least! :-)

    (FYI - I first tried copying the 64 folder into the 32 one, but it didn't like that - so copying the files from the 32 folder into the 64 folder did the trick.)

    Matt.
    Last edited by mruddo; 2021-12-28, 20:50.

    Comment


    • #3
      OK, so that got the analysis running, but I've not had much luck with it working...

      I edited config.json to include paths path for the db - but I'm not sure I've understood local correctly - below it's set to where I installed the files,

      So I've changed:

      "db":"%USERPROFILE%\\MusicSimilarity",
      "local":"%USERPROFILE%\\Music"


      to:

      "db":"M:\\MusicSimilarity",
      "local":"G:\\Installation Files\\Squeezebox\\music-similarity"


      When I run the analysis I get this:

      G:\Installation Files\Squeezebox\music-similarity>music-similarity.py --analyse "M:\TestLibrary"
      2021-12-28 20:51:32 I Have 681 files to analyze
      2021-12-28 20:51:32 I Extraction length: 120s extraction start: -210s
      2021-12-28 20:51:32 I Analyzing with Musly and Essentia (high level)


      ...so far so good, but then there's just a lot of this:

      musly_track_analyze_audiofile failed for M:\TestLibrary\Chvrches\Every Open Eye\01-Never Ending Circles.flac
      musly_track_analyze_audiofile failed for M:\TestLibrary\Chvrches\Every Open Eye\03-Keep You On My Side.flac
      musly_track_analyze_audiofile failed for M:\TestLibrary\Chvrches\Every Open Eye\04-Make Them Gold.flac
      musly_track_analyze_audiofile failed for M:\TestLibrary\Chvrches\Every Open Eye\05-Clearest Blue.flac


      ...for every flac file it finds.

      And occasionally this sort of error:

      musly_track_analyze_audiofile failed for M:\TestLibrary\Chvrches\Love Is Dead\11-Really Gone.flac
      Process Process-106:
      Traceback (most recent call last):
      File "C:\Users\Matt\AppData\Local\Programs\Python\Pytho n36-32\lib\multiprocessing\process.py", line 258, in _bootstrap
      self.run()
      File "C:\Users\Matt\AppData\Local\Programs\Python\Pytho n36-32\lib\multiprocessing\process.py", line 93, in run
      self._target(*self._args, **self._kwargs)
      File "G:\Installation Files\Squeezebox\music-similarity\lib\analysis.py", line 30, in analyze_audiofile
      mres = mus.analyze_file(db_path, abs_path, extract_len, extract_start)
      File "G:\Installation Files\Squeezebox\music-similarity\lib\musly.py", line 200, in analyze_file
      if self.mus.musly_track_analyze_audiofile(self.mj, abs_path.encode(), extract_len, extract_start, mtrack) == -1:
      OSError: exception: access violation reading 0x00000024


      That's about as far as I've got in the brief time I have this evening... I'll give it another go in due course.

      Thanks,

      Matt

      Comment


      • #4
        Alpha2

        I've updated the initial post with links to alpha2 ZIPs. I've tested these under a Windows10 VM, and they appear to work. I tested with 6 files - 2 ogg, 2 mp3, and 2 flac files. I did notice one oddity in that on an initial run only the output of the 2 ogg files was saved to the SQLite database. But, on a subsequent run the mp3 and flac results were added. Odd. I'll investigate further later. I was caching the essentia output to JSON files, so perhaps there is an issue (under Windows) there. But at least this shows some progress...
        Material debug: 1. Launch via http: //SERVER:9000/material/?debug=json (Use http: //SERVER:9000/material/?debug=json,cometd to also see update messages, e.g. play queue) 2. Open browser's developer tools 3. Open console tab in developer tools 4. REQ/RESP messages sent to/from LMS will be logged here.

        Comment


        • #5
          This looks really cool. Will test for sure.

          Thanks for sharing mate ;-)

          Comment


          • #6
            I'm back, and giving it another go... This time, with Alpha 2, I seem to be getting further - albeit the main issue with me was I believe an outdated Python install, so I've upgraded to the latest 3.10.1 64bit release.

            Analysis (of those 681 FLAC files) is under way.

            It's definitely processing, as my CPUs are maxed out at 100% with multiple "streaming_extractor_music.exe" instances running.

            As you've mentioned the database is cached, I'm presuming that's why I'm not seeing it grow at the moment - but I'll keep an eye on it. Since you mentioned a possible issue writing to the DB on the first pass, are you able to advise a tool I could use to simply query the db file for content to get an idea what's in there?* I can then check that for you and report back.

            One thing I have noticed is that there seems to be a problem analysing files with any accented characters. e.g.

            musly_track_analyze_audiofile failed for M:\TestLibrary\Gorillaz\Demon Days\07-El Mañana.flac
            musly_track_analyze_audiofile failed for M:\TestLibrary\Gorillaz\Song Machine, Season One- Strange Timez (Deluxe)\10-Désolé (feat. Fatoumata Diawara) (Extended Version).flac


            Other than that - it's busy doing something, and I'll let you know how it goes when complete.

            *I've done my own research on that one and it seems SQLite should do the trick. I can see there's a tracks and tracks_tmp table in there, but nothing in there as I write. I'll check again when the analysis completes. I'm starting to think a 681 flac file sample library may have been a little ambitious as a first pass! :-)
            Last edited by mruddo; 2021-12-30, 20:02. Reason: Should now be able to query the DB myself.

            Comment


            • #7
              Originally posted by mruddo
              As you've mentioned the database is cached, I'm presuming that's why I'm not seeing it grow at the moment - but I'll keep an eye on it. Since you mentioned a possible issue writing to the DB on the first pass, are you able to advise a tool I could use to simply query the db file for content to get an idea what's in there?* I can then check that for you and report back.
              DB is saved every 500 tracks. For me this is roughly every 1/2 hour.

              Originally posted by mruddo
              One thing I have noticed is that there seems to be a problem analysing files with any accented characters.
              Yeah, I can see how that'd be an issue. For windows builds the Musly library invokes the ffprobe and ffmpeg executables, passing the track's filepath on the commandline. I'm guessing this is where it breaks with non-ASCII characters. I'll see if I can re-write the code to use a better method.
              Material debug: 1. Launch via http: //SERVER:9000/material/?debug=json (Use http: //SERVER:9000/material/?debug=json,cometd to also see update messages, e.g. play queue) 2. Open browser's developer tools 3. Open console tab in developer tools 4. REQ/RESP messages sent to/from LMS will be logged here.

              Comment


              • #8
                Well, I can't be too sure, but it took about 3 hours to scan those 681 files, and I ended up with 661 in the database. Which confirms it was just those with accented characters (and variations on the apostrophe) that were excluded. So as far as I can tell, there were no issues with the writing of results to the db.

                I was also using EAC to extract some CD audio at the time, so the PC was busy with that too - but all in all, my crude maths suggest that might take almost 6 days to scan my entire FLAC library, but I think that's still sounds feasible. (Back in the MusicIP days this seemed like a never ending task, and my library was much smaller then.)

                I then launched the script, picked a track and got what sounded like a pretty good mix. I've not had chance to explore in detail, but so far I've not been using the restrict to genre option. I'm not sure what balance of key/bpm/energy you're using for the mixes, but when I played a slower track, the following mix seemed similarly paced, and likewise with a heavier track.

                Genres are a tricky one, as I suspect many of mine are just plain wrong - so an an option to select genres would likely be useful, as a generic selection of "Pop" and "Rock" and "Indie" would likely mix from a broad selection, and similarly, it might just be better to have an exclusion list - e.g. ignore "Audio Book", "Language Course" etc.

                So so far, so good though! This looks like an interesting prospect.

                Thank you!
                Last edited by mruddo; 2021-12-30, 22:54.

                Comment


                • #9
                  Originally posted by mruddo
                  One thing I have noticed is that there seems to be a problem analysing files with any accented characters. e.g.
                  I've updated the Windows ZIP to alpha3. This should resolve the non-ASCII character issue.

                  I'm still seeing issues where on the 1st run some files fail the analysis, but if I re-run then they are OK. Could be the VM I am running in is just running out of memory.

                  One minor note: You can configure music-similairy to 'cache' the Essentia output to JSON files, so that if you ever need to re-run then it can use the cached results. These cache filenames are based upon the track filenames. For Alpha3, under Windows, I'm ignoring non-ascii characters, so they are simply removed. In the latest code, however, they get replaced with an underscore. The implication being that it you analyse with alpha3 and save the cached copies, newer release might not use the cached version. I only mention so that you are aware, the default config does not create these cache files - so a non issue for now.
                  Material debug: 1. Launch via http: //SERVER:9000/material/?debug=json (Use http: //SERVER:9000/material/?debug=json,cometd to also see update messages, e.g. play queue) 2. Open browser's developer tools 3. Open console tab in developer tools 4. REQ/RESP messages sent to/from LMS will be logged here.

                  Comment


                  • #10
                    "Struggling" to get this working ...
                    Steps taken:
                    1. Git cloned the repository into the home dir on my Ubuntu 20.04 server
                    2. installed requirements using pip
                    3. adapted config.json in music-similarity root directory
                      Code:
                      {
                       "paths":{
                        "db":"/home/bart/music-similarity/",
                        "local":"/mnt/Music/",
                        "lms":"/mnt/Music/",
                        "cache":"/home/bart/music-similarity/cache/"
                       },
                       "lmsdb":"/var/lib/squeezeboxserver/cache/library.db"
                      }

                    When launching the analysis this tells me the shared library can't be found, it is however in the place indicated.
                    Code:
                    bart@p700:~/music-similarity$ ./music-similarity.py --analyse /mnt/Music
                    2021-12-31 12:44:20 I Found: /home/bart/music-similarity/linux/x86-64/essentia_streaming_extractor_music
                    2021-12-31 12:44:20 E Failed to open Musly shared library (/home/bart/music-similarity/linux/x86-64/libmusly.so)!
                    Click image for larger version

Name:	Screenshot_28.png
Views:	1
Size:	15.5 KB
ID:	1574004
                    Main System: Marantz SR-5015 + Adam Audio T8V + Teufel Ultima 20 Mk 3 + BK Monolith+ FF + Lenovo T560 + Kodi + LG OLED65B26LA + UP-Board running Daphile
                    Kitchen: Touch + Ikea ENEBY 30
                    Home-Office: SqueezeLite-X + Topping DX3 Pro + NAD 312 + TMA Premium 905

                    Comment


                    • #11
                      Originally posted by bakker_be
                      When launching the analysis this tells me the shared library can't be found, it is however in the place indicated.
                      I'm guessing this is due to the musly library having been built on Fedora. I'll install Ubuntu in a VM and rebuild the library on that.
                      Material debug: 1. Launch via http: //SERVER:9000/material/?debug=json (Use http: //SERVER:9000/material/?debug=json,cometd to also see update messages, e.g. play queue) 2. Open browser's developer tools 3. Open console tab in developer tools 4. REQ/RESP messages sent to/from LMS will be logged here.

                      Comment


                      • #12
                        Originally posted by cpd73
                        I'm guessing this is due to the musly library having been built on Fedora. I'll install Ubuntu in a VM and rebuild the library on that.
                        When I started supporting squeezelite, I've found the plethora of different libc versions to be a constant issue. I eventually discovered that building on an older OS virtually eliminated the problem.

                        The intel 32 and 64bit binaries are still built on CentOS 6 VMs.
                        Ralphy

                        1-Touch, 5-Classics, 3-Booms, 2-UE Radio
                        Squeezebox client builds donations always appreciated.

                        Comment


                        • #13
                          I'm currently mid-way through a second pass analysis of my 681 files, and so far there's only been the one failure - so it's looking like the change to allow accented characters has worked successfully. (Incidentally, I believe the track that failed this time was OK in the last run.)

                          I'll update later with info on how long it took this time when my PC wasn't quite so busy.

                          I may re-run later to see if it plugs any gaps from the first pass. In your README it says:

                          If re-run new tracks will be added, and old (non-existent) will be removed. Pass `--keep-old` to keep these old tracks.

                          Does this mean it will actually skip the tracks already there too as well, or will it re-analyse them? I guess I'll find out soon enough if I give it a go.

                          Comment


                          • #14
                            Originally posted by mruddo
                            If re-run new tracks will be added, and old (non-existent) will be removed. Pass `--keep-old` to keep these old tracks.

                            Does this mean it will actually skip the tracks already there too as well, or will it re-analyse them? I guess I'll find out soon enough if I give it a go.
                            If you re-run then newly, or un-analysed, tracks will be analysed and added. Any tracks that are in the DB but not on disk will be removed from the DB. If you want to keep these non-existant tracks in the DB the pass "--keep-old"
                            Material debug: 1. Launch via http: //SERVER:9000/material/?debug=json (Use http: //SERVER:9000/material/?debug=json,cometd to also see update messages, e.g. play queue) 2. Open browser's developer tools 3. Open console tab in developer tools 4. REQ/RESP messages sent to/from LMS will be logged here.

                            Comment


                            • #15
                              Originally posted by bakker_be
                              When launching the analysis this tells me the shared library can't be found, it is however in the place indicated.
                              It's not that the library cannot be found, its that it cannot be opened. The reason for this is that you need to install the ffmpeg libraries. You can do this by just installing ffmpeg iteself (sudo apt install ffmpeg) or install the libavformat, libavcodec, and libavutil libraries.

                              I've installed 20.04 in a VM, installed ffmpeg, and analysis works.
                              Material debug: 1. Launch via http: //SERVER:9000/material/?debug=json (Use http: //SERVER:9000/material/?debug=json,cometd to also see update messages, e.g. play queue) 2. Open browser's developer tools 3. Open console tab in developer tools 4. REQ/RESP messages sent to/from LMS will be logged here.

                              Comment

                              Working...
                              X
                              😀
                              🥰
                              🤢
                              😎
                              😡
                              👍
                              👎