Home of the Squeezebox™ & Transporter® network music players.
Page 3 of 4 FirstFirst 1234 LastLast
Results 21 to 30 of 36
  1. #21
    Senior Member
    Join Date
    Mar 2017
    Posts
    2,731
    Quote Originally Posted by Roland0 View Post
    2021-01-30 21:52:00 DEBUG Query time:73
    2021-01-30 21:52:00 DEBUG Total time:233
    Not that this really matters, but... I assume these are from my code? (debug looks like mine). Odd that the python side processing took so much longer than the SQL query (73ms SQL, 160ms python). On my Pi4 the SQL takes 292ms, python 76ms - but I have more tracks, so the SQL will be longer, but its limited to 2500 tracks so should be the same number python side. Guess the Pi4 is faster.
    Last edited by cpd73; 2021-01-31 at 07:06.
    Material debug: 1. Launch via http: //SERVER:9000/material/?debug=json (Use http: //SERVER:9000/material/?debug=json,cometd to also see update messages, e.g. play queue) 2. Open browser's developer tools 3. Open console tab in developer tools 4. REQ/RESP messages sent to/from LMS will be logged here.

  2. #22
    Senior Member
    Join Date
    Aug 2012
    Location
    Austria
    Posts
    1,190
    Quote Originally Posted by cpd73 View Post
    Not that this really matters, but... I assume these are from my code? (debug looks like mine).
    yes

    Odd that the python side processing took so much longer than the SQL query (73ms SQL, 160ms python). On my Pi4 the SQL takes 292ms, python 76ms - but I have more tracks, so the SQL will be longer, but its limited to 2500 tracks so should be the same number python side.
    with 20000 (synthetic) tracks in DB:

    Code:
    ** Querying for 50 tracks:
    2021-02-01 03:26:56 DEBUG    Query time:297
    2021-02-01 03:26:56 DEBUG    Total time:472
    
    DEBUG:lmsessim.lib.tracks_source:total time:27
    seems consistent, k-d tree still ~10 times faster, SQL time is the same as on the Pi4

    Guess the Pi4 is faster.
    Seems unlikely to account for this difference, even if that's the case (HC2's SoC is 2GHz ARM core (although older design than Pi 4's SoC))
    Maybe runtime environment (python version (3.7), ...)
    Various SW: Web Interface | TUI | Playlist Editor / Generator | Music Classification | Similar Music | Announce | EventTrigger | DB Optimizer | Chiptunes | LMSlib2go | ...
    Various HowTos: build a self-contained LMS | Bluetooth/ALSA | Control LMS with any device | ...

  3. #23
    Senior Member
    Join Date
    Mar 2017
    Posts
    2,731
    Quote Originally Posted by Roland0 View Post
    with 20000 (synthetic) tracks in DB:

    Code:
    ** Querying for 50 tracks:
    2021-02-01 03:26:56 DEBUG    Query time:297
    2021-02-01 03:26:56 DEBUG    Total time:472
    
    DEBUG:lmsessim.lib.tracks_source:total time:27
    seems consistent, k-d tree still ~10 times faster
    Yeah, I'm not surprised that k-d tree is faster. (I'd expect the algo itself to be faster, but you are also only fetching 50 tracks, whereas my SQL fetches 1500 (was 2500), reducing that to 50 makes a difference (but still nowhere near k-d).) What I am surprised about is the slowness of the processing after the SQL part. You're seeing ~175ms to iterate the 1500 (or 2500) returned rows, whereas I only see ~75ms. Anyhow, doesn't really matter, I was just curious.

    I thought about adding, or at least looking into, k-d tree for my code, but I add filtering based upon title, artist, album, albumartist, and genre - and I'd either need to hold all of this in memory (which is probably doable), or query the DB for this info on-demand. Not sure the savings are worth the effort... If you asked you're k-d tree implemtation for 1500 tracks what would its time be then?

    p.s. Thanks for your help with this, it has improved my mixer, and nice to have the feedback from someone who knows what they're talking about. (Plus helps me learn things I've never even thought about before...)
    Material debug: 1. Launch via http: //SERVER:9000/material/?debug=json (Use http: //SERVER:9000/material/?debug=json,cometd to also see update messages, e.g. play queue) 2. Open browser's developer tools 3. Open console tab in developer tools 4. REQ/RESP messages sent to/from LMS will be logged here.

  4. #24
    Senior Member
    Join Date
    Aug 2012
    Location
    Austria
    Posts
    1,190
    Quote Originally Posted by cpd73 View Post
    What I am surprised about is the slowness of the processing after the SQL part. You're seeing ~175ms to iterate the 1500 (or 2500) returned rows, whereas I only see ~75ms.
    Which python version are you using? If it's > 3.7, that may be the explanation.
    I recompiled python 3.7 with more optimisations (lto, pgo), which actually made quite a difference:
    Code:
    2021-02-01 16:43:42 DEBUG    Query time:265
    2021-02-01 16:43:42 DEBUG    Total time:392
    I thought about adding, or at least looking into, k-d tree for my code, but I add filtering based upon title, artist, album, albumartist, and genre - and I'd either need to hold all of this in memory (which is probably doable), or query the DB for this info on-demand.
    While I agree that any performance gains won't matter for this kind of application, I find the in-memory approach more elegant. My code reads the DB and does all data processing (normalization / attribute weights, mapping, track filtering) exactly once (at startup). The k-d tree is (re-)built from this cached data and used by all queries (of course, this would work with a linear search as well).

    If you asked you're k-d tree implemtation for 1500 tracks what would its time be then?
    still very fast (and that's worst case, since all track data is randomly generated):
    Code:
    ** Querying for 50 tracks (kd-tree query: 1500):
    DEBUG:lmsessim.lib.tracks_source:kd tree #results: 1501
    DEBUG:lmsessim.lib.tracks_source:total time:34
    nice to have the feedback
    Well, without this discussion, I wouldn't have written a new application in the first place. However, I'd be interested to get feedback about the analysis / classification itself (and find better methods), but I guess this forum isn't really the place for that. It's a shame that MusicIP didn't open source their sw after pulling it from the market.
    At least, Essentia is making progress with the Tensorflow-based analysis (Gaia/SVM seems dead, though). I've just discovered that they now include TempoCNN for global BPM, which I'm rather keen to try since I've always found the current results rather inconsistent (e.g. classical music (w/o percussion) with BPM >150)).
    Various SW: Web Interface | TUI | Playlist Editor / Generator | Music Classification | Similar Music | Announce | EventTrigger | DB Optimizer | Chiptunes | LMSlib2go | ...
    Various HowTos: build a self-contained LMS | Bluetooth/ALSA | Control LMS with any device | ...

  5. #25
    Senior Member
    Join Date
    Mar 2017
    Posts
    2,731
    Quote Originally Posted by Roland0 View Post
    Which python version are you using?
    3.7.3

    Quote Originally Posted by Roland0 View Post
    While I agree that any performance gains won't matter for this kind of application, I find the in-memory approach more elegant.
    I ported my code to cKDTree - which is quite a bit faster than my crude SQL
    Material debug: 1. Launch via http: //SERVER:9000/material/?debug=json (Use http: //SERVER:9000/material/?debug=json,cometd to also see update messages, e.g. play queue) 2. Open browser's developer tools 3. Open console tab in developer tools 4. REQ/RESP messages sent to/from LMS will be logged here.

  6. #26
    Senior Member
    Join Date
    Aug 2012
    Location
    Austria
    Posts
    1,190
    Quote Originally Posted by cpd73 View Post
    I ported my code to cKDTree
    If you use KDTree from sklearn.neighbors instead, you can easily test different distance metrics ( [ 'euclidean', 'cityblock', 'chebyshev'] will give (slightly) different results)

    Some observations (based on a _cursory_ reading of your code):
    - the way you use genres is fundamentally different from what I do: I model the genre (mapped to a float) as a track attribute to be used in the distance calculation, you add it to the distance. Considering (if euclidean) 0 ≤ dist ≤ 3.7 (theoretically, in my data set it's 0.31 ≤ dist ≤ 2.4) and 0.1 ≤ genre ≤ 0.7, this results in a massively disproportional adjustment.
    - not sure I understand the point of any of the square/sqrt/div(max_sim) in lines 197-202
    Various SW: Web Interface | TUI | Playlist Editor / Generator | Music Classification | Similar Music | Announce | EventTrigger | DB Optimizer | Chiptunes | LMSlib2go | ...
    Various HowTos: build a self-contained LMS | Bluetooth/ALSA | Control LMS with any device | ...

  7. #27
    Senior Member
    Join Date
    Mar 2017
    Posts
    2,731
    Quote Originally Posted by Roland0 View Post
    If you use KDTree from sklearn.neighbors instead, you can easily test different distance metrics ( [ 'euclidean', 'cityblock', 'chebyshev'] will give (slightly) different results)
    I did use that one before, but switched. Not really sure there is much difference.

    Quote Originally Posted by Roland0 View Post
    Some observations (based on a _cursory_ reading of your code):
    - the way you use genres is fundamentally different from what I do: I model the genre (mapped to a float) as a track attribute to be used in the distance calculation, you add it to the distance. Considering (if euclidean) 0 ≤ dist ≤ 3.7 (theoretically, in my data set it's 0.31 ≤ dist ≤ 2.4) and 0.1 ≤ genre ≤ 0.7, this results in a massively disproportional adjustment.
    Not sure I follow. How can you map a genre to a float? "Rock=0.1" ? "Pop=0.9" ? My genre "difference" is trivial either its the same genre, in a set with the genre, or different.

    How does it result in a "massively disproportional adjustment" ??? The distance is the 'euclidean' distance, which (AFAIK) is "square-root( square(A1-B1) + .... square(An-Bn) )" The KDTree gives me the distances using all attribs bar genre, but then I want to add in the genre 'difference'. To do this I take the square of the tree distance, add in the genre 'distance', and re-take the square-root. What's wrong with that?

    My thinking:

    1. Result from KDTree = square-root( square(seed['danceable']-track['danceable']) + square(seed['aggressive']-track['aggressive']) + ... )
    2. Taking the square of this = square(seed['danceable']-track['danceable']) + square(seed['aggressive']-track['aggressive']) + ...
    3. Adding on my genre differrence= square(seed['danceable']-track['danceable']) + square(seed['aggressive']-track['aggressive']) + ... + square(genre_diff)
    4. Convert back = square-root( square(seed['danceable']-track['danceable']) + square(seed['aggressive']-track['aggressive']) + ... + square(genre_diff) )


    Quote Originally Posted by Roland0 View Post
    - not sure I understand the point of any of the square/sqrt/div(max_sim) in lines 197-202
    My distances are % of the max. Max would be the square-root of "num attribs" (as attribs are all 0..1, so attrib max=1, max attrib difference = square(1-0) = 1) - so similarity is in range 0..1
    Last edited by cpd73; 2021-02-04 at 03:44.
    Material debug: 1. Launch via http: //SERVER:9000/material/?debug=json (Use http: //SERVER:9000/material/?debug=json,cometd to also see update messages, e.g. play queue) 2. Open browser's developer tools 3. Open console tab in developer tools 4. REQ/RESP messages sent to/from LMS will be logged here.

  8. #28
    Senior Member
    Join Date
    Aug 2012
    Location
    Austria
    Posts
    1,190
    Quote Originally Posted by cpd73 View Post
    How can you map a genre to a float? "Rock=0.1" ? "Pop=0.9" ? My genre "difference" is trivial either its the same genre, in a set with the genre, or different.
    Actually, I'm not sure if a comparison (seed to track) is required. Currently, I simply map all genres to values 0.0-1.0. If genres are considered identical (e.g. Neo Classical = Classical), they get the same value (e.g. 0.2). The more different they are considered, the greater the distance (e.g. Classical=0.2 Heavy Metal=0.9). I'll have to check if calculating this for each track based on comparison (seed to track) makes (much of) a difference.

    How does it result in a "massively disproportional adjustment" ??? The distance is the 'euclidean' distance, which (AFAIK) is "square-root( square(A1-B1) + .... square(An-Bn) )" The KDTree gives me the distances using all attribs bar genre, but then I want to add in the genre 'difference'. To do this I take the square of the tree distance, add in the genre 'distance', and re-take the square-root. What's wrong with that?
    Sorry, that was really badly explained (and based on a slight mis-reading of your code as well). What I meant is
    - I model the genre (mapped to a float) as a track attribute _for all tracks_:
    e.g. track is:
    Code:
     [5.7575762e-01, 3.0000010e-14, 9.9985665e-01, 2.1053423e-01, 1.6725375e-05, 5.9682566e-01, 6.4656746e-01, 4.0207755e-02, 1.2756313e-01, 5.1360805e-02, 7.8181326e-01, 9.9079335e-01]
    track with genre ( if result of genre mapping is 0.5):
    Code:
    [5.7575762e-01, 3.0000010e-14, 9.9985665e-01, 2.1053423e-01, 1.6725375e-05, 5.9682566e-01, 6.4656746e-01, 4.0207755e-02, 1.2756313e-01, 5.1360805e-02, 7.8181326e-01, 9.9079335e-01, 0.5]
    - this is then used to build the k-d tree for the nearest neighbour search
    - your code does the nearest neighbour search for k=1000 _without_ genres, adds genre modifier to similarity, and then sorts by similarity.
    Various SW: Web Interface | TUI | Playlist Editor / Generator | Music Classification | Similar Music | Announce | EventTrigger | DB Optimizer | Chiptunes | LMSlib2go | ...
    Various HowTos: build a self-contained LMS | Bluetooth/ALSA | Control LMS with any device | ...

  9. #29
    Senior Member
    Join Date
    Mar 2017
    Posts
    2,731
    Quote Originally Posted by Roland0 View Post
    Actually, I'm not sure if a comparison (seed to track) is required. Currently, I simply map all genres to values 0.0-1.0. If genres are considered identical (e.g. Neo Classical = Classical), they get the same value (e.g. 0.2). The more different they are considered, the greater the distance (e.g. Classical=0.2 Heavy Metal=0.9). I'll have to check if calculating this for each track based on comparison (seed to track) makes (much of) a difference.
    Identical to what? How can you know Classical and Metal are those values, and that far apart? (I'm probably being dumb here, so forgive me) But I really don't get that. I dont see how mapping a genre to a value allows you to know how close two genres are - unless you have methodically obtained all genres in the DB and by hand set which ones are similar. But even then thats genre A to genre B, but the DB will have lots of genres.

    Quote Originally Posted by Roland0 View Post
    - your code does the nearest neighbour search for k=1000 _without_ genres, adds genre modifier to similarity, and then sorts by similarity.
    Yes, that is exactly what I'm doing. As I want to know if seed track X and candidate track Y are the same, similar, or different genres to each other. I can't see how the genre's can be mapped to a number that can be used to compute distance so that they can be in the KDTree info. So, I have my tree wit hthe measuabel attributes, get 1000 tracks (to cater for filtering, etc), then add the genre similary to this. I know this could skew things abit, but not that much.

    I mainly want the similarity based upon the audio properties (so Essentia attributes), then if 2 tracks had a similar disatance, but one was a closer match on genre it would be used.
    Material debug: 1. Launch via http: //SERVER:9000/material/?debug=json (Use http: //SERVER:9000/material/?debug=json,cometd to also see update messages, e.g. play queue) 2. Open browser's developer tools 3. Open console tab in developer tools 4. REQ/RESP messages sent to/from LMS will be logged here.

  10. #30
    Senior Member
    Join Date
    Aug 2012
    Location
    Austria
    Posts
    1,190
    Quote Originally Posted by cpd73 View Post
    Identical to what? How can you know Classical and Metal are those values, and that far apart?
    It's a manual configuration (basically a 2-dim array, where x (list of genres) is "same" genres, and y (scale:0-1) is distance (=genre similarity).
    Not sure if this is any better/worse than the seed/candidate same/similar/different approach (see below), so I've made this switchable and will do some testing.

    I can't see how the genre's can be mapped to a number that can be used to compute distance so that they can be in the KDTree info.
    seed track X:
    g=0
    candidate track Y:
    same genre: g=0
    similar: g=0.5
    different g=1

    The reason I prefer this approach is that there is only one nn (nearest neighbour) search which includes genres. With your method, there is the nn search without genres, then the sorting with genres (and filtering etc.). This results in a local optimum, since the result of the nn search will include tracks which wouldn't have been included if genres would have been a criteria (and thus exclude tracks which should have been included).
    Comparing nn search without genres to nn search with genres:
    testruns: 20 results/test: 50 avg not_in_both: 10.5
    so ~20% of non-optimal tracks.

    Also:
    - You may want to switch to pykdtree ( no dependencies, faster)
    - If skip_rows is used, won't your app run out of tracks after 1000 (haven't looked at the LMS plugin and how it handles used tracks, though - maybe it resets them) ?
    Various SW: Web Interface | TUI | Playlist Editor / Generator | Music Classification | Similar Music | Announce | EventTrigger | DB Optimizer | Chiptunes | LMSlib2go | ...
    Various HowTos: build a self-contained LMS | Bluetooth/ALSA | Control LMS with any device | ...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •