Home of the Squeezebox™ & Transporter® network music players.
Page 4 of 4 FirstFirst ... 234
Results 31 to 36 of 36
  1. #31
    Senior Member
    Join Date
    Mar 2017
    Posts
    2,731
    Quote Originally Posted by Roland0 View Post
    It's a manual configuration (basically a 2-dim array, where x (list of genres) is "same" genres, and y (scale:0-1) is distance (=genre similarity).
    Any chance you could share the code for this, as I still don't understand how it could work. e.g. If I have 4 tracks:

    1. Genre = Metal
    2. Genre = Pop
    3. Genre = Rock
    4. Genre = Classical

    What value would be the 'genre similarity' for each of these? As the value stored in the tree is *before* the seed track is known. Or, do you update the tree after the seed track is known? i.e. set 'genre sim' each time a seed is used? I can see that working, but wonder how slow it would be? I'm guessing you update the tree for each new seed (or change in seed's genre)

    Quote Originally Posted by Roland0 View Post
    This results in a local optimum, since the result of the nn search will include tracks which wouldn't have been included if genres would have been a criteria (and thus exclude tracks which should have been included).
    Comparing nn search without genres to nn search with genres
    To be honest, with my data-set it did not make much difference. And I use random tracks for the similarity set anyway, so its a non-issue for me.

    Quote Originally Posted by Roland0 View Post
    You may want to switch to pykdtree ( no dependencies, faster)
    Thanks. I'll have a look. Speed really isn't an issue, but if its API similar, then why not.

    Quote Originally Posted by Roland0 View Post
    If skip_rows is used, won't your app run out of tracks after 1000 (haven't looked at the LMS plugin and how it handles used tracks, though - maybe it resets them) ?
    My LMS plugin sends 5 seed tracks to be used, and up to the last 100 tracks in the queue (so track is not repeated). skip_rows is set to these and (not currently, as code is missing this) any tracks accepted from a seed. So, there will be enough tracks left from the 1000 to handle this. 1000 is probably way too high, but its quick enough that it makes no difference. I only use skip_rows as its an integer and will be (very slightly) quicker to filter on.
    Material debug: 1. Launch via http: //SERVER:9000/material/?debug=json (Use http: //SERVER:9000/material/?debug=json,cometd to also see update messages, e.g. play queue) 2. Open browser's developer tools 3. Open console tab in developer tools 4. REQ/RESP messages sent to/from LMS will be logged here.

  2. #32
    Senior Member
    Join Date
    Mar 2017
    Posts
    2,731
    Quote Originally Posted by Roland0 View Post
    You may want to switch to pykdtree ( no dependencies, faster)
    From a very quick, and unscientific test, this appears to be slower...
    Material debug: 1. Launch via http: //SERVER:9000/material/?debug=json (Use http: //SERVER:9000/material/?debug=json,cometd to also see update messages, e.g. play queue) 2. Open browser's developer tools 3. Open console tab in developer tools 4. REQ/RESP messages sent to/from LMS will be logged here.

  3. #33
    Senior Member
    Join Date
    Aug 2012
    Location
    Austria
    Posts
    1,190
    Quote Originally Posted by cpd73 View Post
    Any chance you could share the code for this,
    I've copied the relevant code to this file.

    1. Genre = Metal
    2. Genre = Pop
    3. Genre = Rock
    4. Genre = Classical
    What value would be the 'genre similarity' for each of these?
    As mentioned, there are two different methods (get_remapped_genres_static and get_remapped_genres_delta)
    Assuming a genre_map
    [[Metal, Heavy Metal], [Rock], [Pop], [Folk], [Classical]]

    static:
    1. Genre = Metal 0
    2. Genre = Pop 0.4
    3. Genre = Rock 0.2
    4. Genre = Classical 0.8

    dynamic, seed=Heavy Metal
    1. Genre = Metal 0.5
    2. Genre = Pop 1
    3. Genre = Rock 1
    4. Genre = Classical 1

    As the value stored in the tree is *before* the seed track is known. Or, do you update the tree after the seed track is known? i.e. set 'genre sim' each time a seed is used? I can see that working, but wonder how slow it would be? I'm guessing you update the tree for each new seed
    For the dynamic method, yes, this means rebuilding the tree, but this only takes ~8ms for 20K tracks (n.b.: all query times I've posted before include a rebuild, see below for the reason). Looping through 20k tracks to set the genre took a long time (~400ms) with the initial naive approach, but with a lookup, it's very fast (and has to be done only once per seed):
    20k tracks / 8929 genres:
    DEBUG:lmsessim.lib.tracks_source:get_remapped_genr es_delta time : 40ms
    total query time is fine even with this included:
    DEBUG:lmsessim.lib.tracks_source:get_similars time: 77ms

    To be honest, with my data-set it did not make much difference.
    Now I'm a bit confused - have you already implemented and tested genres-as-values with your data set?
    Anyway, my point just was that your method will not give the most similar n tracks. If this is relevant or not depends of course on this actually being a requirement.

    My LMS plugin sends 5 seed tracks to be used, and up to the last 100 tracks in the queue (so track is not repeated). skip_rows is set to these and (not currently, as code is missing this) any tracks accepted from a seed. So, there will be enough tracks left from the 1000 to handle this. 1000 is probably way too high, but its quick enough that it makes no difference. I only use skip_rows as its an integer and will be (very slightly) quicker to filter on.
    I'm simply deleting the used tracks from the in-memory track data and rebuild the tree. Seemed simpler, and ensures the query returns the most similar tracks.

    From a very quick, and unscientific test, this appears to be slower…
    Strange, it's about twice as fast here:
    building / querying kd-tree 20 times: 84ms vs. 143ms
    Various SW: Web Interface | TUI | Playlist Editor / Generator | Music Classification | Similar Music | Announce | EventTrigger | DB Optimizer | Chiptunes | LMSlib2go | ...
    Various HowTos: build a self-contained LMS | Bluetooth/ALSA | Control LMS with any device | ...

  4. #34
    Senior Member
    Join Date
    Mar 2017
    Posts
    2,731
    Quote Originally Posted by Roland0 View Post
    I've copied the relevant code to this file.
    Thanks, not being a python coder, and never having used numpy before, must admit I don't fully understand the code - but I'll get there...

    Quote Originally Posted by Roland0 View Post
    Now I'm a bit confused - have you already implemented and tested genres-as-values with your data set?
    My previous, non-tree code, got the euclidean distance from SQL, took this apart, added genre similary, recalculated euclidean, and re-sorted. So, for that linear search genre similarity was added as value. Based upon that data and the data now, there is not a massive difference.

    Quote Originally Posted by Roland0 View Post
    Anyway, my point just was that your method will not give the most similar n tracks. If this is relevant or not depends of course on this actually being a requirement.
    Well, yes my code does find the most similar tracks - based upon their audio attributes. It then adds the genre to adjust things slightly. But I want the similarity based upon the audio, not the metadata. At least for now. To me the genre stuff is just to help move a slightly less similar audio-wise track higher up the list. For this use-case, I think I prefer my code as is.

    [Edit] And - as I recalculate the euclidean distance with genre sim and then resort - it does find the most similar. Its just the last few % of the 1000 item that I ask for will be slightly incorrect. But, I don't really use all of these 1000 anyway - so for the top few that I do use, then they are the most similar (inc genre). Therefore, for my use case, my method is perfectly fine. The fact that the last (e.g.) 100 of my 1000 might be incorrect is irrelevant, as I (probably) won't be looking past the first 500 anyway. I only ask for so may to cater for filtering.

    But, as stated many times, I'm no expert so just playing around. However, I am actively using mixes created by this, and (so far) I am happy with them.
    Last edited by cpd73; 2021-02-06 at 15:28.
    Material debug: 1. Launch via http: //SERVER:9000/material/?debug=json (Use http: //SERVER:9000/material/?debug=json,cometd to also see update messages, e.g. play queue) 2. Open browser's developer tools 3. Open console tab in developer tools 4. REQ/RESP messages sent to/from LMS will be logged here.

  5. #35
    Senior Member
    Join Date
    Aug 2012
    Location
    Austria
    Posts
    1,190
    Quote Originally Posted by cpd73 View Post
    My previous, non-tree code, got the euclidean distance from SQL, took this apart, added genre similary, recalculated euclidean, and re-sorted. So, for that linear search genre similarity was added as value. Based upon that data and the data now, there is not a massive difference.
    Since your previous code used the same approach (first search without genre, then add genre to result, second search only in results), I'd expect it to exhibit the same characteristics.

    [Edit] And - as I recalculate the euclidean distance with genre sim and then resort - it does find the most similar. Its just the last few % of the 1000 item that I ask for will be slightly incorrect.
    But, I don't really use all of these 1000 anyway - so for the top few that I do use, then they are the most similar (inc genre). Therefore, for my use case, my method is perfectly fine. The fact that the last (e.g.) 100 of my 1000 might be incorrect is irrelevant, as I (probably) won't be looking past the first 500 anyway. I only ask for so may to cater for filtering.
    To estimate the number of different / missing tracks, one would need to know how large the (pair-wise) deltas of the distances typically are, and how much of an adjustment of these the genre values would be.
    So, taking my track data, with a typical search, we have a max distance of 1.760 and thus with an genre adjustment of 0.1-0.7 (your values) a mean distance adjustment of 0.214
    histogram of distances for one search:
    Name:  ess-tracks_hist1k_3k.png
Views: 47
Size:  7.2 KB


    more interesting, histogram of pairwise delta distance:
    Name:  ess-tracks_hist-distances_3k.png
Views: 47
Size:  2.3 KB

    the high values for low deltas indicate a high sensitivity to small changes of the distance.

    testing (pool: 20k tracks / method: find top 50 using both methods (k=1000, identical genre values for both methods), compare top 50 results for missing tracks) seems to confirm this:
    number of test runs: 100
    missing tracks avg: 23.72 (47.44%)
    which means only ~50% of the top 50 were included in the result from the first (k=1000) search

    (disclaimer: for simplification, I've made a number of assumptions about the data which are debatable (concerning distribution and randomness). This does, however, affect only the diagrams, and not the argument itself)
    Last edited by Roland0; 2021-02-11 at 05:32.
    Various SW: Web Interface | TUI | Playlist Editor / Generator | Music Classification | Similar Music | Announce | EventTrigger | DB Optimizer | Chiptunes | LMSlib2go | ...
    Various HowTos: build a self-contained LMS | Bluetooth/ALSA | Control LMS with any device | ...

  6. #36
    Senior Member
    Join Date
    Mar 2017
    Posts
    2,731
    My code finds the 1000 most accoustically similar (using the essentia attributes) tracks. Of these it then re-sorts taking genre into account - with the idea to adjust the positions slightly. I'm of the opinion that the accoustic properties should take precedence over the meta-data.

    I've just tried modifying my code to take the first 30000 tracks (which will be all, as I have only ~24k) add genre and re-sort. This should be the same as adding the genre and rebuilding the tree. I then compared this to my existing code of taking 1000 and adding genre, etc. The first ~540 tracks are the same. Changing my code to get the first 2000 results in no differences (i.e. the same order as for all tracks). As always, this will depend upon the data-set.

    In either case, 1000 or 2000, the returned results from the /api/similar call are the same. So, for my use-case (and with my data set), the results are correct.

    My reluctance to calculate genre diff on all tracks, and re-populate the tree, is based upon the fact that for me this is slower. I do not disagree that you approach is theoretically more correct, just that in practice (and for what it is used for) there is no difference.

    [Edit] I have updated my code to add genre to the tree. This is slower, but not massively so. Results are the same (for my use case)...
    Last edited by cpd73; 2021-02-11 at 10:52.
    Material debug: 1. Launch via http: //SERVER:9000/material/?debug=json (Use http: //SERVER:9000/material/?debug=json,cometd to also see update messages, e.g. play queue) 2. Open browser's developer tools 3. Open console tab in developer tools 4. REQ/RESP messages sent to/from LMS will be logged here.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •