Any chance you could share the code for this, as I still don't understand how it could work. e.g. If I have 4 tracks:
1. Genre = Metal
2. Genre = Pop
3. Genre = Rock
4. Genre = Classical
What value would be the 'genre similarity' for each of these? As the value stored in the tree is *before* the seed track is known. Or, do you update the tree after the seed track is known? i.e. set 'genre sim' each time a seed is used? I can see that working, but wonder how slow it would be? I'm guessing you update the tree for each new seed (or change in seed's genre)
To be honest, with my data-set it did not make much difference. And I use random tracks for the similarity set anyway, so its a non-issue for me.
Thanks. I'll have a look. Speed really isn't an issue, but if its API similar, then why not.
My LMS plugin sends 5 seed tracks to be used, and up to the last 100 tracks in the queue (so track is not repeated). skip_rows is set to these and (not currently, as code is missing this) any tracks accepted from a seed. So, there will be enough tracks left from the 1000 to handle this. 1000 is probably way too high, but its quick enough that it makes no difference. I only use skip_rows as its an integer and will be (very slightly) quicker to filter on.
Results 31 to 36 of 36
-
2021-02-05, 02:32 #31
- Join Date
- Mar 2017
- Posts
- 2,731
Material debug: 1. Launch via http: //SERVER:9000/material/?debug=json (Use http: //SERVER:9000/material/?debug=json,cometd to also see update messages, e.g. play queue) 2. Open browser's developer tools 3. Open console tab in developer tools 4. REQ/RESP messages sent to/from LMS will be logged here.
-
2021-02-05, 02:38 #32
- Join Date
- Mar 2017
- Posts
- 2,731
Material debug: 1. Launch via http: //SERVER:9000/material/?debug=json (Use http: //SERVER:9000/material/?debug=json,cometd to also see update messages, e.g. play queue) 2. Open browser's developer tools 3. Open console tab in developer tools 4. REQ/RESP messages sent to/from LMS will be logged here.
-
2021-02-06, 11:13 #33
- Join Date
- Aug 2012
- Location
- Austria
- Posts
- 1,190
I've copied the relevant code to this file.
1. Genre = Metal
2. Genre = Pop
3. Genre = Rock
4. Genre = Classical
What value would be the 'genre similarity' for each of these?
Assuming a genre_map
[[Metal, Heavy Metal], [Rock], [Pop], [Folk], [Classical]]
static:
1. Genre = Metal 0
2. Genre = Pop 0.4
3. Genre = Rock 0.2
4. Genre = Classical 0.8
dynamic, seed=Heavy Metal
1. Genre = Metal 0.5
2. Genre = Pop 1
3. Genre = Rock 1
4. Genre = Classical 1
As the value stored in the tree is *before* the seed track is known. Or, do you update the tree after the seed track is known? i.e. set 'genre sim' each time a seed is used? I can see that working, but wonder how slow it would be? I'm guessing you update the tree for each new seed
20k tracks / 8929 genres:
DEBUG:lmsessim.lib.tracks_source:get_remapped_genr es_delta time : 40ms
total query time is fine even with this included:
DEBUG:lmsessim.lib.tracks_source:get_similars time: 77ms
To be honest, with my data-set it did not make much difference.
Anyway, my point just was that your method will not give the most similar n tracks. If this is relevant or not depends of course on this actually being a requirement.
My LMS plugin sends 5 seed tracks to be used, and up to the last 100 tracks in the queue (so track is not repeated). skip_rows is set to these and (not currently, as code is missing this) any tracks accepted from a seed. So, there will be enough tracks left from the 1000 to handle this. 1000 is probably way too high, but its quick enough that it makes no difference. I only use skip_rows as its an integer and will be (very slightly) quicker to filter on.
From a very quick, and unscientific test, this appears to be slower…
building / querying kd-tree 20 times: 84ms vs. 143msVarious SW: Web Interface | TUI | Playlist Editor / Generator | Music Classification | Similar Music | Announce | EventTrigger | DB Optimizer | Chiptunes | LMSlib2go | ...
Various HowTos: build a self-contained LMS | Bluetooth/ALSA | Control LMS with any device | ...
-
2021-02-06, 14:46 #34
- Join Date
- Mar 2017
- Posts
- 2,731
Thanks, not being a python coder, and never having used numpy before, must admit I don't fully understand the code - but I'll get there...
My previous, non-tree code, got the euclidean distance from SQL, took this apart, added genre similary, recalculated euclidean, and re-sorted. So, for that linear search genre similarity was added as value. Based upon that data and the data now, there is not a massive difference.
Well, yes my code does find the most similar tracks - based upon their audio attributes. It then adds the genre to adjust things slightly. But I want the similarity based upon the audio, not the metadata. At least for now. To me the genre stuff is just to help move a slightly less similar audio-wise track higher up the list. For this use-case, I think I prefer my code as is.
[Edit] And - as I recalculate the euclidean distance with genre sim and then resort - it does find the most similar. Its just the last few % of the 1000 item that I ask for will be slightly incorrect. But, I don't really use all of these 1000 anyway - so for the top few that I do use, then they are the most similar (inc genre). Therefore, for my use case, my method is perfectly fine. The fact that the last (e.g.) 100 of my 1000 might be incorrect is irrelevant, as I (probably) won't be looking past the first 500 anyway. I only ask for so may to cater for filtering.
But, as stated many times, I'm no expert so just playing around. However, I am actively using mixes created by this, and (so far) I am happy with them.Last edited by cpd73; 2021-02-06 at 15:28.
Material debug: 1. Launch via http: //SERVER:9000/material/?debug=json (Use http: //SERVER:9000/material/?debug=json,cometd to also see update messages, e.g. play queue) 2. Open browser's developer tools 3. Open console tab in developer tools 4. REQ/RESP messages sent to/from LMS will be logged here.
-
2021-02-11, 05:30 #35
- Join Date
- Aug 2012
- Location
- Austria
- Posts
- 1,190
Since your previous code used the same approach (first search without genre, then add genre to result, second search only in results), I'd expect it to exhibit the same characteristics.
[Edit] And - as I recalculate the euclidean distance with genre sim and then resort - it does find the most similar. Its just the last few % of the 1000 item that I ask for will be slightly incorrect.
But, I don't really use all of these 1000 anyway - so for the top few that I do use, then they are the most similar (inc genre). Therefore, for my use case, my method is perfectly fine. The fact that the last (e.g.) 100 of my 1000 might be incorrect is irrelevant, as I (probably) won't be looking past the first 500 anyway. I only ask for so may to cater for filtering.
So, taking my track data, with a typical search, we have a max distance of 1.760 and thus with an genre adjustment of 0.1-0.7 (your values) a mean distance adjustment of 0.214
histogram of distances for one search:
more interesting, histogram of pairwise delta distance:
the high values for low deltas indicate a high sensitivity to small changes of the distance.
testing (pool: 20k tracks / method: find top 50 using both methods (k=1000, identical genre values for both methods), compare top 50 results for missing tracks) seems to confirm this:
number of test runs: 100
missing tracks avg: 23.72 (47.44%)
which means only ~50% of the top 50 were included in the result from the first (k=1000) search
(disclaimer: for simplification, I've made a number of assumptions about the data which are debatable (concerning distribution and randomness). This does, however, affect only the diagrams, and not the argument itself)Last edited by Roland0; 2021-02-11 at 05:32.
Various SW: Web Interface | TUI | Playlist Editor / Generator | Music Classification | Similar Music | Announce | EventTrigger | DB Optimizer | Chiptunes | LMSlib2go | ...
Various HowTos: build a self-contained LMS | Bluetooth/ALSA | Control LMS with any device | ...
-
2021-02-11, 06:13 #36
- Join Date
- Mar 2017
- Posts
- 2,731
My code finds the 1000 most accoustically similar (using the essentia attributes) tracks. Of these it then re-sorts taking genre into account - with the idea to adjust the positions slightly. I'm of the opinion that the accoustic properties should take precedence over the meta-data.
I've just tried modifying my code to take the first 30000 tracks (which will be all, as I have only ~24k) add genre and re-sort. This should be the same as adding the genre and rebuilding the tree. I then compared this to my existing code of taking 1000 and adding genre, etc. The first ~540 tracks are the same. Changing my code to get the first 2000 results in no differences (i.e. the same order as for all tracks). As always, this will depend upon the data-set.
In either case, 1000 or 2000, the returned results from the /api/similar call are the same. So, for my use-case (and with my data set), the results are correct.
My reluctance to calculate genre diff on all tracks, and re-populate the tree, is based upon the fact that for me this is slower. I do not disagree that you approach is theoretically more correct, just that in practice (and for what it is used for) there is no difference.
[Edit] I have updated my code to add genre to the tree. This is slower, but not massively so. Results are the same (for my use case)...Last edited by cpd73; 2021-02-11 at 10:52.
Material debug: 1. Launch via http: //SERVER:9000/material/?debug=json (Use http: //SERVER:9000/material/?debug=json,cometd to also see update messages, e.g. play queue) 2. Open browser's developer tools 3. Open console tab in developer tools 4. REQ/RESP messages sent to/from LMS will be logged here.