Announcement

Collapse
No announcement yet.

Community Build Radio Firmware

Collapse
X
 
  • Time
  • Show
Clear All
new posts

  • Hello POMdev, I downloaded the lastest 'Main' version of your script and copied onto my SBR. I copied the rcS.local. Example into etc/init.d and substituted the existing rcS.local file. I've not made any other changes. When I run the script I am getting the following error:
    sh: 169: unknown operand

    Have I missed a step?

    Even though something may not be quite right, the script is helping the situation. When the network connection is dropped/blocked, the connection is being recovered and the streaming resumes after 10 secs or so. Problem is that the connection is dropping every 5 mins or so.

    Thank you.

    Jon
    Last edited by Londonist; 2021-03-21, 13:03.

    Comment


    • Originally posted by Londonist
      Hello POMdev, I downloaded the lastest 'Main' version of your script and copied onto my SBR. I copied the rcS.local. Example into etc/init.d and substituted the existing rcS.local file. I've not made any other changes. When I run the script I am getting the following error:
      sh: 169: unknown operand
      Have I missed a step?

      Even though something may not be quite right, the script is helping the situation. When the network connection is dropped/blocked, the connection is being recovered and the streaming resumes after 10 secs or so. Problem is that the connection is dropping every 5 mins or so.
      1/5 f/m x 60 m/h x 24 h/d = 288 failures / day, that's a lot, but not atypical. Your radio is suffering from your or your neighbor's new wifi protocols. See this user's post and his code on GitHub to make your own graph.

      Regarding the 10 seconds to resume streaming, what is the source? What is it with a local "My Music" library? You may need to increase buffering (somehow) to make up for these breaks. With 4 synchronized (the worst case) radios here, there are occasional <1 (similar to turning a radio on) to 5 second pauses using Pandora or another web station, but not on the local library.

      In the last 2-3 days here, the seven radios here have logged the number of failed pings preceding a successful ping. There were 582395 successful pings that had no prior failures, and 3532 pings with one prior failure. (Ethernet connections have almost no failures.) However, wlanpoke detected and recovered from failures 455 times, with 78 times requiring a full reset. Fortunately for me, the environment here has improved a lot in the past few weeks, the earlier numbers were much higher, with clusters of failures a minute or so apart, followed by relative calm.

      Downloading the main (staging) branch (more recommended are the latest release or the development branch) and installing, the "sh: 169: unknown operand" error was not observed. I don't know what that '169' is: a line number, an operand? There is a variable test against '169' in the script ($IPFIRST == "169") to test for an unusable auto configuration address, but I don't see how '169' could become an operand, and line 169 of the script is a comment... So, to troubleshoot, how and when do you see this error? At launch? Periodically? What is the launch command line in local.rcS? Are you running a netcat (nc or ncat) tcp logging server? Any other info?

      Comment


      • Originally posted by POMdev
        This has produced output condensed into the following ~25 hour report for 7 radios:

        Code:
              number of failed pings: ----------------------- ----- quick, full resets
        dev/loc      params    index:     0   1  2 3 4  5 6 7  8  9 10 
        Ofc  2S Ping 2s3q6f Fails[9]: 33429 225  1 0 0  0 0 0  0  0 0
        bsmt  M Ping 2s3q6f Fails[9]: 33381  58  0 0 1 69 2 0 75  3 0
        LR   1F Ping 2s3q6f Fails[9]: 32106 178  3 0 4 54 2 0 76 16 0
        Kit  1E Ping 2s3q6f Fails[9]: 33261 168  1 1 3 67 1 0 77  5 0
        MBR  2E Ping 2s3q6f Fails[9]: 28918 748 58 3 1 36 2 0 53 11 0    (22 hours)
        Ofc  2N Ping 2s3q6f Fails[9]: 34452 130  1 1 0 21 0 0 32 10 0
        bsmt  W Ping 2s3q6f Fails[9]:  7487  14  0 0 0  0 0 0  0  0 0    (<6 hrs)
                ---- activity -------    ok        q      f    q  f
        A quick reset was performed after 3 failed pings, and a full reset after 6. Array indexes [0 - 7] display the number of failed pings counted before the system had a successful ping. Array indexes [8] and [9] count the number of times the quick and full reset actions were performed. The full reset sets the failed ping counter to 1 (to keep trying), so index [7] is zero. Index [10] is unused.

        By far, the pings succeed with zero intervening failures, so the [0] numbers are large. The radios had some 1, and just a few 2 ping failures, which likely would not be noticed. Three ping failures before recovery were not seen in earlier testing, so the radios performed a quick reset [8] at the 3rd ping failure. These succeeded most of the time, with recoveries shown in indexes [3..6]. These lower outages might not cause interruption. Full recoveries were required in 3/75 to 10/32 of the cases. The reason the quick reset works or doesn't is unknown at this time. The full recovery has worked reliably for the last 6 months.
        Could you please try to explain this a little more clearly? I'm not completely following your explanation of the various numbers. I assume that based on your explanation, "Ping 2s3q6f" indicates that it's pinging every 2 seconds and doing a quick reset after 3 failed pings and a full reset after 6 failed pings. But I get lost on the 10 array digits. What's the difference between array indexes 0-7 if they're all counts of failed pings? When does a failed ping increment index 0 vs 1 vs 2, etc? What event(s) triggers each index to reset (if they do reset)?

        Comment


        • Originally posted by RichardDavies
          Could you please try to explain this a little more clearly? I'm not completely following your explanation of the various numbers. I assume that based on your explanation, "Ping 2s3q6f" indicates that it's pinging every 2 seconds and doing a quick reset after 3 failed pings and a full reset after 6 failed pings. But I get lost on the 10 array digits. What's the difference between array indexes 0-7 if they're all counts of failed pings? When does a failed ping increment index 0 vs 1 vs 2, etc? What event(s) triggers each index to reset (if they do reset)?
          Failed pings do not increment the Ping Fails array, successful pings do. Instead, failed pings increment a counter that counts the number of failed pings so far. When and if the ping succeeds, the array value indexed by that counter is incremented. So, if the index [2] has the value of, say, 12, then the next time the ping succeeds, if the failed ping counter is 2, the index [2] value becomes 13. Otherwise some other index value is incremented. The failed pings counter is reset to 0 when the ping succeeds.

          This means that array indexes [0..7] are all counts of failed pings coming before a successful ping. So, if a ping succeeded, then failed 2 times before a success, that success would increment the [2] index, which counts the number of times the ping test failed 2 times before succeeding. This roughly counts how long connectivity was lost.

          If all the pings worked perfectly, then there would be zero failed pings, so the [0] index would increment on each successful ping test. Index [6] is a count of how many times the ping test failed 6 times before it finally succeeded, and did not trigger the full reset. Since in this case the quick reset was done after 3 failed pings, that means that 3 ping tests after the quick reset, the ping succeeded, just in the nick of time to avoid the full reset. If the ping had not succeeded, the full reset parameter, 6, would trigger a full reset.

          The array is important for tuning the reset algorithm. We don't want to reset too quickly before the radio's 'network stack' (driver, etc.) can do their things. And we don't want to do a full reset too soon after a quick reset because the quick reset evidently takes a few pings to recover. These recoveries are counted in the Failed Pings array. Index [0] is a counter of the times the ping test did not fail. [1..2] count recoveries by the network stack by itself plus the full reset (which wraps the counter back to 1). Indexes [3..6] count recoveries by the quick method.

          It seems that the number of quick recoveries [8] less the sum of [3..6] is equal to the number of full resets [9] because the full reset is triggered when the quick recovery does not recover after the 6th failed ping.

          The indexes are reset whenever wlanpoke is restarted. This might be because of a modification and relaunch (killing the previous instance), or booting the radio, which launches wlanpoke.

          Perhaps the "Log File Analysis" manual.txt section should be revised for improved clarity. There was a revision in version 0.7.3, published 3/21/2021.

          BTW, there is a trade off between script complexity and the desire to troubleshoot or tune the system. We want the script to take minimal resources (cpu, memory, 'disk' space) away from the music player or operating system (especially not the networking!). And it needs to be easy to understand, modify, and maintain. This means that analysis has to be done on the desktop (i.e., the user). This script is more complicated than most. If it were a compiled executable, it could do much more more efficiently, but then users would not be able to vet (important!) or easily modify it, or add features to support their own investigations, or fix bugs.
          Last edited by POMdev; 2021-03-21, 22:32.

          Comment


          • Wireless Connectivity Mitigation Update 0.7.5

            Reply to the post from the 'lost connection' thread.
            Originally posted by mrw
            A successful re-association will trigger the wpa_action script into "kicking" udhcpc into renewing the lease. I think that taking the interface down at this point with ifconfig simply prevents/hinders udhcpc from doing its job. One shouldn't need to be "kicking" udhcpc again. ...
            Ok. The ifconfig down/up was removed (it seems like forever ago) based on your previous post and its slow performance.

            Leaving the re-association alone as in the original 0.7.0 script had caused problems for two users, and I saw failures reported in the UI, and a missing gateway, although the existing connection and in-subnet ping was still working. This suggests that the wpa_action script to renew the lease failed or wasn't being invoked after the re-association. Checking the 7 radios here, wpa_cli was not running on 4 of them! So, for these radios at that time, a simple re-association would not be enough to keep SqueezePlay or jive happy.

            The latest 0.7.5 code on the GitHub development branch includes a function to check if wpa_cli is running, and, if not, to relaunch it. This is called before every ping test. Preliminary testing here shows that the wpa_cli is being relaunched at more or less random times unrelated to wireless connectivity, including on radios that have had no serious ping failures or restarts.

            Either wpa_cli is exiting on its own, or something else (I hope not the script) is killing it. Of course, the hard reset kills and relaunches it, but that has always worked. I doubt it is the script. I will read through the wpa_cli code and see about adding or enabling debug logging to pinpoint this phenomenon. BTW, the recEvent utility also casts suspicion on wpa_supplicant or wpa_client, see the earlier post on this.

            Thank you all very much for your help investigating this issue. Because my environment has become so quiet, others testing the script has become more important to this and to finding and fixing the root cause of the issue.
            Last edited by POMdev; 2021-03-25, 00:24.

            Comment


            • Originally posted by POMdev
              Checking the 7 radios here, wpa_cli was not running on 4 of them! So, for these radios at that time, a simple re-association would not be enough to keep SqueezePlay or jive happy.

              The latest 0.7.5 code on the GitHub development branch includes a function to check if wpa_cli is running, and, if not, to relaunch it. This is called before every ping test. Preliminary testing here shows that the wpa_cli is being relaunched at more or less random times unrelated to wireless connectivity, including on radios that have had no serious ping failures or restarts.

              Either wpa_cli is exiting on its own, or something else (I hope not the script) is killing it. Of course, the hard reset kills and relaunches it, but that has always worked. I doubt it is the script. I will read through the wpa_cli code and see about adding or enabling debug logging to pinpoint this phenomenon. BTW, the recEvent utility also casts suspicion on wpa_supplicant or wpa_client, see the earlier post on this.
              I think it be very helpful to find out what is killing it. If it is not the script, then that may hint at a problem with wpa_supplicant, or with its control socket.

              Some additional notes. There is room for error !

              The reason, I think, that wpa_cli is being run as a daemon is to react to events from wpa_supplicant. The only thing it is actually doing is to "kick" udhcpc into renewing the lease when a connection event is noted. As far I can tell, that is all it is doing.

              SqueezePlay itself does not require wpa_cli be running, other than to deliver that kick, because it communicates with wpa_supplicant directly through its control socket (as does wpa_cli). The socket is /var/run/wpa_supplicant/eth1.

              SqueezePlay does restart wpa_cli when it brings a network up. (Networking.lua function _ifiup calls restartWpaCli).

              SqueezePlay itself does have the ability to respond to events from wpa_supplicant if it chooses. But it does not so choose. See attach / detach following jive.net.Networking:request(). That might be a route towards logging wpa_supplicant events.

              wpa_cli will work from the command line regardless of whether it is running as a daemon. But it is the daemonized instance that would "kick" udhcpc when a reconnection takes place.

              Comment


              • Originally posted by mrw
                I think it be very helpful to find out what is killing it. If it is not the script, then that may hint at a problem with wpa_supplicant, or with its control socket.
                ...
                SqueezePlay does restart wpa_cli when it brings a network up. (Networking.lua function _ifiup calls restartWpaCli).
                ...
                SqueezePlay itself does have the ability to respond to events from wpa_supplicant if it chooses. But it does not so choose. See attach / detach following jive.net.Networking:request(). That might be a route towards logging wpa_supplicant events.
                Using a debug build of both will be a start. Hope to get on that soon, but the script needs to always work again. I hope that's finished with a today's new version.

                As mentioned before, I think there is a problem with one or both of the two wpa_* since the driver messages displayed by recEvent stop before an outage is detected, then resume after wpa_cli and wpa_supplicant is killed (see prior message). RecEvent could do some logging directly, but I'm the events come streaming out at a too-high rate for reasonable logging, and the events just prior to the event stream stopping seem uninformative, at first glance (2 weeks ago).

                If SqueesePlay restarts wpa_cli, I guess it is safe to do so in the script...??? What about restarting wpa_supplicant (thinking about that as a next script step)?

                I think that the wpa_ demons should handle the networking by themselves. SqueezePlay shouldn't have to deal with a faulty network stack. The stack should be fixed. This may be possible.

                Comment


                • Mitigation Script 0.7.6 Addresses Reliability

                  Two radios running 0.7.5 here lost connections last night, so the recent reliability issues can now be addressed, thanks to the logs being preserved.

                  In the meantime, a version 0.7.6 has been uploaded to the GitHub development and main branches. This version is intended for stable operation. It is important that it always work. It achieves this by setting the default quick reset timing to disable the method, now suggested only for troubleshooting.

                  The current quick reset method has two issues: First, every 24 seconds the script tests for the gateway, and if, at that moment, the gateway is invalid because it is still 'recovering' from the quick reset, the script becomes ineffective from then on, the radio eventually loses its connection again, but it is not mitigated. Second, the current quick method's effects don't seem to last as long as the full reset, making this coincidence more likely. Because of these, the new version 0.7.6 sets the quick method's default timing to disable it so that the script will work as reliably as in the past.

                  The basically undeveloped code used to handle network gateway changes will be re-written in 0.8.0. Once that has been tested, an improved quick method can be reinstated by default. But reliability comes first.

                  Thanks to all you contributors out there. BTW, another first: I was able to take a battery powered radio that had lost its connection, connect it to the Ethernet still on, used SqueezePlay to switch to Ethernet, and upload the logs from /var/log. That was the first time doing this has been successful for me!

                  Comment


                  • Originally posted by ralphy
                    There is also a kernel patch to fix the Wireless Event too big messages but I still get the occasional one in /var/log/messages.
                    The only Wireless Event too big message that I can recall seeing is eth1 (WE) : Wireless Event too big (33).

                    '33' being the length of the offending message. Have you ever seen any others with different lengths ?

                    I found a commented patch, otherwise identical to yours, here: https://www.spinics.net/lists/linux-.../msg21543.html.

                    The relevant maximum permitted message length is IW_CUSTOM_MAX, which is defined to be 256. So the "Wireless Event too big" messages that I see are outside the scope of this patch.

                    The kernel message is being sourced from here, in wireless_send_event (I think): https://github.com/Logitech/squeezeo...s/wext.c#L1225

                    We might get a little more information by printing out the command as well, i.e. by replacing:
                    Code:
                    printk(KERN_ERR "%s (WE) : Wireless Event too big (%d)\n", dev->name, wrqu->data.length);
                    with
                    Code:
                    printk(KERN_ERR "%s (WE) : Wireless Event (cmd=0x%04X) too big (%d)\n", dev->name, cmd, wrqu->data.length);
                    The replacement being motivate by this patch taken up within the current kernel code:


                    Perhaps something to slip in on a next test build. It might help track it down, even though it may not actually matter.

                    Comment


                    • Originally posted by POMdev
                      Downloading the main (staging) branch (more recommended are the latest release or the development branch) and installing, the "sh: 169: unknown operand" error was not observed. I don't know what that '169' is: a line number, an operand? There is a variable test against '169' in the script ($IPFIRST == "169") to test for an unusable auto configuration address, but I don't see how '169' could become an operand, and line 169 of the script is a comment... So, to troubleshoot, how and when do you see this error? At launch? Periodically? What is the launch command line in local.rcS? Are you running a netcat (nc or ncat) tcp logging server? Any other info?
                      Hello, it has been a long time since I was tinkering with Linux command line actions, so I only did the minimum as per your manual.txt file to get the script installed. I just SSH'd in and the "sh: 169: unknown operand" message has not appeared. The SB has been rebooted a number of times during the week, as it couldn't find the LMS.

                      LMS is running on PCP, which is also on the wireless network. The PCP is on the 5ghz network.

                      To upgrade the script to 0.7.6 do I need to overwrite the contents of the install directory? Or just selectively replace certain files? If the latter, which ones?
                      /EDIT - I overwrote the files updated in the last two days and managed to get the webserver process running - the output is attached. I thought I had the logging running ok, but it seems to have hung up after adding one line to netcat on my desktop.

                      Thank you for your continued efforts.log 27mar21 1550gmt.txtlog 27mar21 1550gmt.txt
                      Last edited by Londonist; 2021-03-27, 16:53. Reason: update

                      Comment


                      • Originally posted by POMdev
                        If SqueesePlay restarts wpa_cli, I guess it is safe to do so in the script...??? What about restarting wpa_supplicant (thinking about that as a next script step)?
                        The following sequence of events seems to work. One point is that, in wpa_supplicant.conf, the current network is not "disabled", so, on restarting, wpa_supplicant will automatically attempt to reconnect. I think that's how it is, anyway.

                        wpa_cli will not run without wpa_supplicant running, basically (I think) because the control socket (/var/run/wpa_supplicant/eth1) is not present. Latest source code seems to add an -r option to keep it running regardless.

                        SqueezePlay appears to be unfazed, it seems to re-establish its connection to the control socket when wpa_supplicant is restarted. (See /tmp for extant sockets created by wpa_cli and SqueezePlay).

                        But I don't know how robust it all is...

                        Sourcing this script:
                        Code:
                        killall wpa_cli
                        killall wpa_supplicant
                        # Note - udhcpc is left running
                        /usr/sbin/wpa_supplicant -B -Dwext -ieth1 -c/etc/wpa_supplicant.conf
                        /usr/sbin/wpa_cli -B -a/etc/network/wpa_action
                        # I don't seem to need this, because it takes about 10 seconds
                        # for a reconnection to occur. So wpa_cli is already running
                        # by the time a "CONNECT" message is generated, and it will
                        # trigger wpa_action to do this. 
                        #kill -usr1 `cat /var/run/udhcpc.eth1.pid`
                        Gives this log output:
                        Code:
                        Mar 27 13:25:46 kernel: [62748.835724] eth1 (WE) : Wireless Event too big (33)
                        Mar 27 13:25:46 kernel: [62748.848882] AR6000 disconnected from xx:xx:xx:xx:xx:xx 
                        Mar 27 13:25:46 kernel: [62748.977828] eth1 (WE) : Wireless Event too big (33)
                        Mar 27 13:25:46 kernel: [62748.983375] AR6000 disconnected
                        Mar 27 13:25:46 kernel: [62749.544881] eth1 (WE) : Wireless Event too big (33)
                        Mar 27 13:25:46 kernel: [62749.549242] AR6000 disconnected
                        Mar 27 13:25:56 kernel: [62759.564458] AR6000 disconnected
                        Mar 27 13:25:57 kernel: [62759.823446] channel hint set to 2437
                        Mar 27 13:25:57 kernel: [62759.843344] AR6000 disconnected
                        Mar 27 13:25:57 kernel: [62759.903100] AR6000 connected event on freq 2437 with bssid xx:xx:xx:xx:xx:xx  listenInterval=100, beaconInterval = 100, beaconIeLen = 22 assocReqLen=62 assocRespLen =76
                        Mar 27 13:25:57 kernel: [62759.918990] Network: Infrastructure
                        Mar 27 13:25:57 root: wpa_action eth1 DISCONNECTED
                        Mar 27 13:25:57 root: wpa_action eth1 CONNECTED
                        Mar 27 13:25:57 udhcpc[7967]: Performing a DHCP renew
                        Mar 27 13:25:57 udhcpc[7967]: Sending renew...
                        Edit:
                        Here's a second log dump. It seemed to take rather longer to get fully going, I think.
                        Code:
                        Mar 27 13:33:55 kernel: [63238.320007] eth1 (WE) : Wireless Event too big (33)
                        Mar 27 13:33:55 kernel: [63238.335212] AR6000 disconnected from xx:xx:xx:xx:xx:xx 
                        Mar 27 13:33:55 kernel: [63238.451288] AR6000 disconnected
                        Mar 27 13:33:55 kernel: [63238.461172] eth1 (WE) : Wireless Event too big (33)
                        Mar 27 13:33:56 kernel: [63239.041798] eth1 (WE) : Wireless Event too big (33)
                        Mar 27 13:33:56 kernel: [63239.044869] AR6000 disconnected
                        Mar 27 13:34:06 kernel: [63249.061645] AR6000 disconnected
                        Mar 27 13:34:06 kernel: [63249.330126] channel hint set to 2437
                        Mar 27 13:34:06 kernel: [63249.342965] AR6000 disconnected
                        Mar 27 13:34:06 root: wpa_action eth1 DISCONNECTED
                        Mar 27 13:34:07 kernel: [63249.670334] AR6000 connected event on freq 2437 with bssid xx:xx:xx:xx:xx:xx  listenInterval=100, beaconInterval = 100, beaconIeLen = 22 assocReqLen=62 assocRespLen =76
                        Mar 27 13:34:07 kernel: [63249.685526] Network: Infrastructure
                        Mar 27 13:34:07 root: wpa_action eth1 CONNECTED
                        Mar 27 13:34:07 udhcpc[7967]: Performing a DHCP renew
                        Mar 27 13:34:07 root: udhcpc_action eth1 deconfig ip=
                        Mar 27 13:34:07 udhcpc[7967]: Sending discover...
                        Mar 27 13:34:10 udhcpc[7967]: Sending discover...
                        Mar 27 13:34:13 udhcpc[7967]: Sending discover...
                        Mar 27 13:34:15 kernel: [63257.889974] AR6000 connected event on freq 2437 with bssid xx:xx:xx:xx:xx:xx  listenInterval=100, beaconInterval = 100, beaconIeLen = 22 assocReqLen=62 assocRespLen =76
                        Mar 27 13:34:15 kernel: [63257.906164] Network: Infrastructure
                        Mar 27 13:34:17 root: udhcpc_action eth1 leasefail ip=
                        Mar 27 13:34:37 udhcpc[7967]: Sending discover...
                        Mar 27 13:34:37 udhcpc[7967]: Sending select for 192.168.1.64...
                        Mar 27 13:34:40 udhcpc[7967]: Lease of 192.168.1.64 obtained, lease time 86400
                        Mar 27 13:34:40 root: udhcpc_action eth1 bound ip=192.168.1.64
                        Last edited by mrw; 2021-03-27, 14:39. Reason: Additional log

                        Comment


                        • Originally posted by Londonist
                          Hello, it has been a long time since I was tinkering with Linux command line actions, so I only did the minimum as per your manual.txt file to get the script installed. I just SSH'd in and the "sh: 169: unknown operand" message has not appeared. The SB has been rebooted a number of times during the week, as it couldn't find the LMS.
                          ...
                          To upgrade the script to 0.7.6 do I need to overwrite the contents of the install directory? Or just selectively replace certain files? If the latter, which ones? ...
                          The instructions will overwrite the existing contents. "mv -f" will overwrite the existing files. (The "sh: 169: unknown operand" message was a bug fixed in 0.7.4. It showed up only occasionally.)

                          The SB has been rebooted a number of times during the week, as it couldn't find the LMS

                          That may be another matter entirely, but if so, we can try to help you solve it. Did ssh to the radio work? Could the radio ping the router? Could the ssh session ping the LMS machine? Could the LMS machine ping the radio? Could your desktop ping the LMS machine? What does your router show about the LMS machine? It may be that the LMS is also affected. But rebooting the radio solves the problem?

                          Comment


                          • Silly question perhaps but does someone take a look at the upstream fully GPLed driver for the AR600x family 'ath6kl' ?

                            Which exact model is used in the Radio ?

                            Comment


                            • Originally posted by POMdev
                              The instructions will overwrite the existing contents. "mv -f" will overwrite the existing files. (The "sh: 169: unknown operand" message was a bug fixed in 0.7.4. It showed up only occasionally.)

                              The SB has been rebooted a number of times during the week, as it couldn't find the LMS

                              That may be another matter entirely, but if so, we can try to help you solve it. Did ssh to the radio work? Could the radio ping the router? Could the ssh session ping the LMS machine? Could the LMS machine ping the radio? Could your desktop ping the LMS machine? What does your router show about the LMS machine? It may be that the LMS is also affected. But rebooting the radio solves the problem?
                              I think the LMS is fine. I never have a problem finding the PCP main page/LMS via Squeezer on my phone.

                              Logging established on one of the SBR reboots - contents below.

                              1121.log

                              Comment


                              • Originally posted by mrw
                                The following sequence of events seems to work. One point is that, in wpa_supplicant.conf, the current network is not "disabled", so, on restarting, wpa_supplicant will automatically attempt to reconnect. I think that's how it is, anyway.
                                wpa_cli will not run without wpa_supplicant running, basically (I think) because the control socket (/var/run/wpa_supplicant/eth1) is not present. Latest source code seems to add an -r option to keep it running regardless.
                                SqueezePlay appears to be unfazed, it seems to re-establish its connection to the control socket when wpa_supplicant is restarted. (See /tmp for extant sockets created by wpa_cli and SqueezePlay).

                                But I don't know how robust it all is...

                                Sourcing this script: ... Gives this log output: ... Here's a second log dump. It seemed to take rather longer to get fully going, I think.
                                ...
                                What is your radio's signal level? That 45 seconds does seem like a long time. I set my radio on the TTL serial connection to the far-away AP for a Link Quality:32/94 Signal level:-63 dBm, pretty low, although some users have even lower. At boot, this radio connects after 29 seconds.

                                However, during testing of 0.8.0, and an "ifconfig eth1 down", some truly horrifying delays were observed:
                                Code:
                                Mar 26 21:23:53 root: wlan stopped
                                Mar 26 21:23:53 root: wlan: starting
                                					what's this?
                                Mar 26 21:31:06 root: Starting wpa_supplicant
                                Mar 26 21:31:10 root: Started wpa_supplicant
                                Mar 26 21:31:10 root: wlan started
                                I added additional logging to /etc/init.d/wlan wlan_0801.zip, attached, because there were long times associated with killing and/or launching dhcpc.

                                Here, long delays are caused by the radio not completing the dhcp handshakes because of poor reception. Because dhcp can take so long, the 0.8.0 script has been modified to increase hold off delays after a reset before attempting a subsequent full reset. The script will be in the development branch "soon."
                                ...
                                On another note, my environment here has become too quiet. The only real action is from the radio with -63 dbm signal. What I need is one of those wifi-6 or mesh routers that are causing all the trouble. I will buy one, but I need a recommendation. A used one from last year that has not been updated would be perfect, as it might be the worst offender. And do I need a client as well? Please, I need everyone's input on this. Thank you.
                                Last edited by POMdev; 2021-03-27, 20:05.

                                Comment

                                Working...
                                X
                                😀
                                🥰
                                🤢
                                😎
                                😡
                                👍
                                👎