Home of the Squeezebox™ & Transporter® network music players.
Page 10 of 13 FirstFirst ... 89101112 ... LastLast
Results 91 to 100 of 125
  1. #91
    Senior Member SAL9K's Avatar
    Join Date
    Oct 2020
    Posts
    139
    Well, that was quick, already got a stuck event w/ wlanpoke running.

    Code:
    Kitchen wlanpoke.sh 0.8.4.1 4/6/2021 launched 2022-05-22_21:23:00 1653279780 Options: -x -W slow logging to /var/log/
    Sun May 22 22:50:02 PDT 2022 ( 22:50:02 up 1:27, load average: 1.39, 1.95, 2.28 )
    
    eth1      AR6000 802.11g  ESSID:"xxxxxxxx"  
              Mode:Managed  Frequency:2.432 GHz  Access Point: xx:xx:xx:xx:xx:xx   
              Bit Rate=54 Mb/s   Tx-Power=15 dBm   Sensitivity=0/3  
              Retry:on   
              Encryption key:off
              Link Quality:60/94  Signal level:-35 dBm  Noise level:-96 dBm
              Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
              Tx excessive retries:11  Invalid misc:0   Missed beacon:0
    
    Ping: settings, events, and failed pings [0..n]; Step: full reset results, status; Gaps, Resets: # and recents
    
    Ping 2s-1q6f Events, Fails[0..7] 5216s :    Qr:0 Fr:0   Wr:0 Wc:0  [ 1408 0 0 0 0 0 0 0 ]
    Step 0:0, limit:results: [ 12: 18: 26: 37: 53: ]   Wlan: Rate=54 Quality:59/94 level:-36 retries:11
    Gaps:0 @1653284996 -Gap+OK secs: +5216,
    Resets:0 @1653284996 -Gap+OK secs: +5216,
    Kitchen was master, all other Radio's show similar info, with no gaps or wifi resets, worst RSSI is Garage at -58 dBm. This at least is some indication that there doesn't seem to be significant wifi droppage, and the sync'd random mix is still getting stuck. Ugh.
    Last edited by SAL9K; 2022-05-22 at 22:59.

  2. #92
    Senior Member
    Join Date
    Oct 2005
    Location
    Ireland
    Posts
    21,779
    Quote Originally Posted by SAL9K View Post
    Kitchen was master, all other Radio's show similar info, with no gaps or wifi resets, worst RSSI is Garage at -58 dBm. This at least is some indication that there doesn't seem to be significant wifi droppage, and the sync'd random mix is still getting stuck. Ugh.
    A map of last digits of MAC to device names would help so that older logs can be fully used.

    The problem is initiated by bad wifi. Garage seems to be the problem device in one log.

    The "underrun" and subsequent lack of "bufferReady"event for the problem device (all other Radios make bufferReady after rebuffering) seems to create a state that is not expected by LMS code. I think when the "underrun" device is master - the issues occurs. Possibly a schism where problem device (having bailed) moves to next track with its duration and all other devices play out current track.

    Logging a series of tests with different masters may be helpful to check whether wifi problem with master is a significant factor.

    I've been trying to artificially force an underrun situation but haven't succeeded.

    I have a mesh wifi (WIFI 5) and as my test have gone on, the sync performance has improved with fewer errors - I suspect MIMO is a factor and router has learnt about the devices. My only Radio now reports consistent 100% wifi - to get degraded wifi, I put it in a lidded cooking pot with only a small gap for cable - 85% and still no problems.

    The modern Wifi6/WIFI5 issue causes Radio to drop off totally. This is not the situation here.

    The Wifi 6/Wifi 5 options that should be disabled
    * Roaming
    * RTS/ CTS adaptions to be get perf.
    * Channel Bonding - always 20Mhz do not allow bonding to 40Mhz.

    and as suggested before - fix the router channel at 1,6 or 11.

  3. #93
    Senior Member SAL9K's Avatar
    Join Date
    Oct 2020
    Posts
    139
    Quote Originally Posted by bpa View Post
    A map of last digits of MAC to device names would help so that older logs can be fully used.

    The problem is initiated by bad wifi.

    The modern Wifi6/WIFI5 issue causes Radio to drop off totally. This is not the situation here.

    The Wifi 6/Wifi 5 options that should be disabled
    * Roaming
    * RTS/ CTS adaptions to be get perf.
    * Channel Bonding - always 20Mhz do not allow bonding to 40Mhz.

    and as suggested before - fix the router channel at 1,6 or 11.
    Yes, I fixed channel b/w to 20MHz, disabled roaming, disabled beamforming, and was experimenting with the auto-channel (my router preferred control channels 3 & 5) but have since moved to fixed ch 11 (yes, use only 1,6,11 on 2.4G). I don't see a RTS/CTS setting on the Asus RT-AX86U. I get the same stuck result with default router settings (beamforming, Wifi6, auto channel, etc.) then with the manually selected, supposedly "optimal" 2.4G settings. I've seen no difference in terms of 2.4G performance to Radio's in my environment. So, for my particular situation, it appears that any WiFi issues are due to neighbor AP interference and/or Radio specific inherent Atheros radio h/w or driver issues.

    One interesting data point is that that random mix became stuck at end, with wlanpoke reporting no gaps or wireless resets of any kind. I don't know if this is a pattern, but many stuck events are occurring with Glenn Gould's Goldberg Variations as the next track (flac file integrity show's OK on this album)! It's likely a coincidence (as I have ~100 classical albums), but Bach's Goldberg's are typically very short tracks, on the order of 1 min each. Is there some relationship here, moving from a long track (10-20 min) to a short one? I'm likely reaching here, but if a dropout occur's, and sync is trying to recover, only it's trying to set the recovery time to a time "past" the end of the next track (the next track being a very short Gould variation)?
    Last edited by SAL9K; 2022-05-23 at 10:30.

  4. #94
    Senior Member
    Join Date
    Oct 2005
    Location
    Ireland
    Posts
    21,779
    Quote Originally Posted by SAL9K View Post
    Yes, I fixed channel b/w to 20MHz, disabled roaming, disabled beamforming
    I think you can leave beam forming - I don't think it can cause disconnects and it is possible that it can slightly improve connection.

    , and was experimenting with the auto-channel (my router preferred control channels 3 & 5) but have since moved to fixed ch 11 (yes, use only 1,6,11 on 2.4G). I don't see a RTS/CTS setting on the Asus RT-AX86U.
    Check out Professional Settings / RTS Threshold - and then ignore.

    However Preamble Type might be worth testing (with a lot of caution).

    I get the same stuck result with default router settings (beamforming, Wifi6, auto channel, etc.) then with the manually selected, supposedly "optimal" 2.4G settings. I've seen no difference in terms of 2.4G performance to Radio's in my environment. So, for my particular situation, it appears that any WiFi issues are due to neighbor AP interference and/or Radio specific inherent Atheros radio h/w or driver issues.
    Wifi 6 features will never improve a 802.11g Radio but can break an 802.11g Radio connection.

    One interesting data point is that that random mix became stuck at end, with wlanpoke reporting no gaps or wireless resets of any kind. I don't know if this is a pattern, but many stuck events are occurring with Glenn Gould's Goldberg Variations as the next track (flac file integrity show's OK on this album)! It's likely a coincidence (as I have ~100 classical albums), but Bach's Goldberg's are typically very short tracks, on the order of 1 min each. Is there some relationship here, moving from a long track (10-20 min) to a short one? I'm likely reaching here, but if a dropout occur's, and sync is trying to recover, only it's trying to set the recovery time to a time "past" the end of the next track (the
    next track being a very short Gould variation)?
    I suspect the issue is traffic related so network load and radios TX's colliding and then having to wait to TX. I'm trying to more detailed definition of the issue (IMO) so that somebody else may be able to fix. The problem looks like - buffer being filled with one track and but then told to start playing at end of the next track duration - so player is waiting for a time that may never happen. This instance is triggered by Underrun (i.e. network problems) but may only happen with shortish track and at/near transitions which is why it is hard to reproduce even artificially.
    Last edited by bpa; 2022-05-23 at 11:23.

  5. #95
    Senior Member SAL9K's Avatar
    Join Date
    Oct 2020
    Posts
    139
    Quote Originally Posted by bpa View Post
    Check out Professional Settings / RTS Threshold - and then ignore.

    However Preamble Type might be worth testing (with a lot of caution).
    Ok, in professional settings I see:

    Preamble Type: Long
    AMPDU RTS: Enabled
    RTS Threshold: 2347

    So, Disable the AMPDU RTS?, and tinker with Preample Type [Long, Short]? The RTS Threshold is oddly specific.

    Edit: I'm reading that if RTS Threshold is set to >= 2346, then RTS/CTS is disabled.
    Last edited by SAL9K; 2022-05-23 at 12:39.

  6. #96
    Senior Member
    Join Date
    Oct 2005
    Location
    Ireland
    Posts
    21,779
    Quote Originally Posted by SAL9K View Post
    Ok, in professional settings I see:

    Preamble Type: Long
    AMPDU RTS: Enabled
    RTS Threshold: 2347

    So, Disable the AMPDU RTS?, and tinker with Preample Type [Long, Short]? The RTS Threshold is oddly specific.

    Edit: I'm reading that if RTS Threshold is set to >= 2346, then RTS/CTS is disabled.
    Do nothing with RTS stuff. Leave as is. Ignore them. If it is disabled now - leave it - that AFAIK that is the conservative approach necessary for legacy devices. Its setting is now known.

    The manual had a note about "Preamble" regarding legacy networks (i.e. 802.11g) - that is why I suggest seeing if it has an effect. If "Preamble" setting has an effect I expect it to be very subtle - but this bug is subtle.

  7. #97
    Senior Member
    Join Date
    Oct 2005
    Location
    Ireland
    Posts
    21,779
    Having identified the distractions, I looked at logs again, trying to understand the sequence of events - however since only 2 full logs are posted - this may be drawing a conclusion from limited data.

    * Problems happen after underrun (player buffer is empty) is logged.
    * Underrun was logged by 2 players in one log and 3 players in the other.
    * Just before Underrun - "bailing as playPoint too old" appear - these are only logged when timestamp is > 3 secs so messages with a slightly smaller delay (e.g. 2.9) may have been missed. When bailing happens - LMS gives up trying to keep players in sync but keeps playing.

    This means audio data has not received by at least 2 players for a period of maybe 2-3 secs. Bad for "underrun" to happen to one player but two or three at the same time is very unusual.
    Heartbeat status message from player has not been received by LMS for more than 3 secs.

    I see two possibilities
    * LMS server is good but problem in wifi network - maybe overloading, maybe Radio specific,maybe interference but it persists.
    or
    * LMS server "pauses" for some reason - LMS is single threaded so that an issue in a system call (e.g. NAS is slow to respond to file read) will affect LMS operations.

    While there is clearly an issue with LMS recovery - syncing players assumes a good network and for whatever reason there are gaps in network transmissions. I feel that fixing the LMS recovery problem will not fix your playing sync issues - it may just make the lack of syncing more apparent.

    More log files would help confirm if the sequence is consistent or not.

    To remove LMS server as possible cause
    * if NAS is networked to LMS server - move files onto local connection (e.g. USB)
    * Turn off all other applications on Ubuntu system.
    * Minimise other plugins & streaming services on LMS server.

  8. #98
    Senior Member SAL9K's Avatar
    Join Date
    Oct 2020
    Posts
    139
    Quote Originally Posted by bpa View Post
    Having identified the distractions, I looked at logs again, trying to understand the sequence of events - however since only 2 full logs are posted - this may be drawing a conclusion from limited data.

    * Problems happen after underrun (player buffer is empty) is logged.
    * Underrun was logged by 2 players in one log and 3 players in the other.
    * Just before Underrun - "bailing as playPoint too old" appear - these are only logged when timestamp is > 3 secs so messages with a slightly smaller delay (e.g. 2.9) may have been missed. When bailing happens - LMS gives up trying to keep players in sync but keeps playing.

    This means audio data has not received by at least 2 players for a period of maybe 2-3 secs. Bad for "underrun" to happen to one player but two or three at the same time is very unusual.
    Heartbeat status message from player has not been received by LMS for more than 3 secs.

    I see two possibilities
    * LMS server is good but problem in wifi network - maybe overloading, maybe Radio specific,maybe interference but it persists.
    or
    * LMS server "pauses" for some reason - LMS is single threaded so that an issue in a system call (e.g. NAS is slow to respond to file read) will affect LMS operations.

    While there is clearly an issue with LMS recovery - syncing players assumes a good network and for whatever reason there are gaps in network transmissions. I feel that fixing the LMS recovery problem will not fix your playing sync issues - it may just make the lack of syncing more apparent.

    More log files would help confirm if the sequence is consistent or not.

    To remove LMS server as possible cause
    * if NAS is networked to LMS server - move files onto local connection (e.g. USB)
    * Turn off all other applications on Ubuntu system.
    * Minimise other plugins & streaming services on LMS server.
    LMS Server: Intel NUC7i5 w/ Ubuntu 20.04, 16 GB RAM, 256GB Samsung 960 Evo NVMe SSD, Intel i5-7260U 2.2GHz-3.4GHz dual-core
    File Server: Synology DS-712+ w/ 2 TB WD Red Drives, RAID-1

    The NUC is dedicated to LMS; NUC and NAS are connected over 1 GbE to the same Asus RT-AX86U router. I can easily saturate the 1 GbE to the Synology on both read/write, and the most CPU load I see from LMS is ~3% (on a single core) during a FLAC 24b/96k transcode down to 24b/48k for the Radio.

    You think I may have a CPU or LAN file server bandwidth issue?

  9. #99
    Senior Member SAL9K's Avatar
    Join Date
    Oct 2020
    Posts
    139
    From the wlanpoke testing I've done so far. I got one stuck event, when wlanpoke didn't detect any gaps in WiFi connection, and LMS remained stuck at end of track. And one event over-night last night, where wlanpoke reset the WiFi on master player Kitchen, apparently due to a ~40s WiFi gap, and the random mix also stopped playing at the end of the 31 track playlist (random mix was setup for historic=10, lookahead=20); random mix stopped adding the lookahead tracks to keep the playlist going even though wlanpoke reset Kitchen WiFi and the player remained connected to 2.4G after reset.

    From what I've gathered thus far, I do strongly agree that WiFi is the fundamental problem, 2.4G neighbor friendly fire coupled with inherent flaky 2.4G Radio hardware/driver issues, which together are running amok with the LMS requirement that the sync'd players have very stable/reliable connections. It seems that to get this 4xRadio sync to be stable, the Radio's need better WiFi handlling, and LMS needs to be more tolerant to single player dropouts. Neither of which are likely to happen on this legacy platform. C'est la vie.

    I don't see a path forward here, unfortunately. I'll continue running wlanpoke, as it seems to be keeping the WiFi alive on the Radio's, versus the occasional red WiFi icon of death, requiring a full Radio reboot. Perhaps, one of the external WiFi-to-LAN connections to each Radio could be a workaround I may try in the future.
    Last edited by SAL9K; 2022-05-24 at 11:15.

  10. #100
    Senior Member
    Join Date
    Oct 2005
    Location
    Ireland
    Posts
    21,779
    Quote Originally Posted by SAL9K View Post
    It seems that to get this 4xRadio sync to be stable, the Radio's need better WiFi handlling, and LMS needs to be more tolerant to single player dropouts. Neither of which are likely to happen on this legacy platform. C'est la vie.
    I don't have 4 radio but my one Radio was rock solid even when I put in a cooking pot to get wifi level down. So I think it is a combination issue.

    I don't see a path forward here, unfortunately. I'll continue running wlanpoke, as it seems to be keeping the WiFi alive on the Radio's, versus the occasional red WiFi icon of death, requiring a full Radio reboot. Perhaps, one of the external WiFi-to-LAN connections to each Radio could be a workaround I may try in the future.
    I'm not running wlanpoke and I have no issues with a WIFI5 (Netgear Orbi) router so the problem may be ht or miss depending on environment.

    By changing LMS code , I've introduce "playpoint too old" errors - and everything kept going albeit as expected badly synced.
    I've had radio in a cooking pot to get wifi dignal level down to 60% - no problems.
    I've put Radio on ethernet and then pulled the cable until "underrun" near track transitions and then reconnected - again everything recovered and synced. Sp LMS is very tolerant of dropouts on single players.

    Your wifi issue could be degradation of network not loss so that short packets get through but some long packets don't - so you don't get network disconnect issues just degradation. This could be spotted by looking at Radio /proc/net/netstat top see if there are any TCP errors. Similar on LMS Server - check netstat for retransmissions.

    Since multiple players suffer Underrun at the same time, the other possibility that deserves to be checked is an LMS server issue. Is there a possibility that for some reason when Random mix is doing a DB search from the NAS - that there is a big delay or a pause in network activity - it would be noticeable on WebUI. Could there be something else on server or network causing a hiccough (e.g. bad network cable) ?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •