Home of the Squeezebox™ & Transporter® network music players.
Page 3 of 6 FirstFirst 12345 ... LastLast
Results 21 to 30 of 51
  1. #21
    Junior Member
    Join Date
    Dec 2020
    Posts
    15
    Quote Originally Posted by d6jg View Post
    There is definitely something in the one of the latest Windows update that causes some issues with networking where the Red Cross status on drive mapping occurs post boot up but doesn’t restore when you try to access. If you restart it then works. I’ve not got to the bottom of it yet.
    Also I’ve noted that IPv6 is forced on even if not enabled in DHCP and you will find an entry for :: in your DNS settings unless you switch it off.
    There is also an SMB 1.0 auto removal process that has been inserted.
    So when we are talking about Squeezelite failing I think we need to know what specific build of W10 we are talking about.
    Regarding Windoze10:
    * This issue exists on my two windows machines, both running the latest version 20H2, with all patches applied.
    * This issue has existed for 1-2 years, so it also existed on older windoze 10, it's not specific to the latest windows.
    * By chance, I have yesterday received my new Dell XPS-15 9500, so now I have a 3rd windoze computer I can use for testing. I set up the system, installed all patches, and upgraded to version 20H2, like the other machines.
    * I ran the test on the new system, and I confirmed that this issue also exists on this new machine. The issue is identical, and repeatable across all systems: the issue comes up after the first 5 minutes of operation, and the outage lasts for 30 seconds, and keeps coming up every 5, 10, 15 or 20 minutes after that.
    * On the Synology side, this happened when running LMS 7.9.7, and still exists after upgrading to the newer LMS 8.0.1

    Regarding network drive mappings:
    * For years, I have been annoyed by Windoze, every time I log in, my network drives are disconnected, and I see a message saying "some of your network drives could no be connected". I have to click on each drive to reactivate them.
    * With my Synology storage, (running Samba) I *never* have trouble reconnecting. I have NOT noticed this issue you mention: that you get the Red Cross (drive disconnected) upon bootup, but it won't reconnect, even when you select the drive.
    * As a test, I have disconnected all my drive mappings, and rerun the test. Outages happened 4 times over 45 minutes, at 10, 15, 30 and 40-minute marks. No change in overall behavior.

    Regarding IPv6:
    C:\Users\admin>ping nas
    Pinging nas.local [fe80::211:32ff:fe92:9693%4] with 32 bytes of data:
    Reply from fe80::211:32ff:fe92:9693%4: time<1ms
    Reply from fe80::211:32ff:fe92:9693%4: time<1ms
    We can see that DNS resolves its IPv6 address, but I can also ping it with IPv4:
    C:\Users\admin>ping 192.168.0.60
    Pinging 192.168.0.60 with 32 bytes of data:
    Reply from 192.168.0.60: bytes=32 time<1ms TTL=64
    Reply from 192.168.0.60: bytes=32 time<1ms TTL=64
    My Synology is set up with both, and I verified this in the configuration. To test whether IPv6 is causing issues, I have disabled IPv6 on the Synology, and rerun my test.
    Now when I ping it, it resolves as IPv4:
    C:\Users\admin>ping nas
    Pinging nas.local [192.168.0.60] with 32 bytes of data:
    Reply from 192.168.0.60: bytes=32 time<1ms TTL=64
    The outage happened as usual, this time, instead of after 5 minutes, it took 10 minutes before the first outage happened.
    Thus we can say that this issue is not affected by the presence of IPv6.

    Another test: install location.
    * I have installed SqueezeLite into C:\program files (x86)\SqueezeLite
    * This might be considered unusual, because it's just a command-line program with no installer. I use admin permission to copy the files there. This is just be consistent with where windoze puts its installed programs.
    * What if this causes some issue? Maybe it causes confusion for windoze, which needs things written to the registry when items are installed there. Or maybe the fact that the path contains whitespaces causes some issues for SqueezeLite, which also uses linux-type file systems that don't have spaces and are case-sensitive.
    * To put my mind at ease, I reran the test with the software installed in c:\temp\SqueezeLite
    * Results: Outage at 5 minute and 20 minute mark. No difference.

    I'll post separately with my testing with the new "test3" build.
    Last edited by foopydog; 2021-01-07 at 19:24.

  2. #22
    Junior Member
    Join Date
    Dec 2020
    Posts
    15
    Quote Originally Posted by ralphy View Post
    I uploaded 1318-test3 which prints the error code for the slimproto socket calls as well. You'll find a patch file included in that zip with the changes.
    Ok, so I got the test3 build, and ran some more tests.
    Looking at the patch file, I see that the changes. Now slimproto.send_packet() is now showing the last_error instead of strerror(last_error).
    The same change was made to stream.send_header().

    When I ran the test, the socket error came up once by itself, and the same error code is now reported in both functions:

    [15:16:35.171] process_strm:280 strm command t
    [15:16:35.180] sendSTAT:195 STAT: STMt
    [15:16:36.211] sendSTAT:195 STAT: STMt
    [15:16:37.231] sendSTAT:195 STAT: STMt
    [15:16:39.231] sendSTAT:195 STAT: STMt
    [15:18:04.091] stream_thread:404 end of stream
    [15:18:40.430] sendDSCO:214 DSCO: 0
    [15:18:40.430] send_packet:114 failed writing to socket: 2745
    [15:18:40.430] sendSTAT:195 STAT: STMt
    [15:18:40.430] send_packet:114 failed writing to socket: 2745
    [15:18:40.430] slimproto_run:563 error reading from socket: 2745
    [15:18:40.571] slimproto:932 connected
    [15:18:40.571] sendHELO:148 mac: dc:41:a9:a3:eb:a1
    [15:18:40.571] sendHELO:150 cap: CanHTTPS=1,Model=squeezelite,AccuratePlayPoints=1, HasDigitalOut=1,HasPolarityInversion=1,Firmware=v1 .9.8-1318-test3,ModelName=SqueezeLite,MaxSampleRate=384000,d sf,dff,alc,aac,ogg,ops,ogf,flc,aif,pcm,mp3

    The error code 2745 was reported now in both functions, which is no surprise. I tried searching the web for .h files with these error-codes but I couldn't find it.
    I'd be curious to know what this error code means.

    Today, I did a whole lot of testing of various scenarios to try to narrow down the problem a bit further, but so far, the problem remains elusive.

  3. #23
    Senior Member ralphy's Avatar
    Join Date
    Jan 2006
    Location
    Canada
    Posts
    2,841
    The error codes are documented by microsoft here.

    I'm displaying the error code in hex, I should have prefixed it with 0x2745

    But the description is vague.

    WSAECONNABORTED

    10053 (0x2745)

    An established connection was aborted by the software in your host machine.

    This site suggests it indicates that The connection has been closed.

    Thanks for trying the different builds. I need to investigate if anything can be done to gracefully handle the error and why the call to convert the error code to a message returns Unknown error.
    Ralphy

    1-Touch, 5-Classics, 3-Booms, 2-UE Radio
    Squeezebox client builds donations always appreciated.

  4. #24
    Senior Member
    Join Date
    Dec 2020
    Posts
    126
    Yeah, that was one of the two returns I expected to see.

    The `vague description`is somewhat of a trademark for Microsoft - I guess it helps them sell courses in understanding WTF might be going on and in fact most people that do are very proud to be able to call themselves MSCE when they pass. What you need to understand here is that this error is a client-side generated error. Effectively what it means is that something between your application and the server, but not the server itself, actively says 'NO'.

    Yes that is still pretty vague, but if that Pi you own is connected through the same switch to the LMS host that pretty much confines the issue to the Windows machine and you'll most likely be looking at some kind of malware blocker.
    Last edited by gordonb3; 2021-01-08 at 13:12.

  5. #25
    Senior Member
    Join Date
    Oct 2005
    Location
    Ireland
    Posts
    20,346
    Quote Originally Posted by gordonb3 View Post
    you'll most likely be looking at some kind of malware blocker.
    A few years ago - some 3rd party security software used to insert themselves into the Winsock stack to catch malware which used "pipes" - a few caused strange problem with LMS.
    To check if the security software was the problem, it had to be uninstalled, disabling it was not sufficient to stop its bad behaviour.

    Have you got non Microsoft security software installed ?

  6. #26
    Senior Member
    Join Date
    Feb 2011
    Location
    Cheshire, UK
    Posts
    5,728
    Quote Originally Posted by foopydog View Post
    Ok, so I got the test3 build, and ran some more tests.
    Looking at the patch file, I see that the changes. Now slimproto.send_packet() is now showing the last_error instead of strerror(last_error).
    The same change was made to stream.send_header().

    When I ran the test, the socket error came up once by itself, and the same error code is now reported in both functions:

    [15:16:35.171] process_strm:280 strm command t
    [15:16:35.180] sendSTAT:195 STAT: STMt
    [15:16:36.211] sendSTAT:195 STAT: STMt
    [15:16:37.231] sendSTAT:195 STAT: STMt
    [15:16:39.231] sendSTAT:195 STAT: STMt
    [15:18:04.091] stream_thread:404 end of stream
    [15:18:40.430] sendDSCO:214 DSCO: 0
    [15:18:40.430] send_packet:114 failed writing to socket: 2745
    [15:18:40.430] sendSTAT:195 STAT: STMt
    [15:18:40.430] send_packet:114 failed writing to socket: 2745
    [15:18:40.430] slimproto_run:563 error reading from socket: 2745
    [15:18:40.571] slimproto:932 connected
    [15:18:40.571] sendHELO:148 mac: dc:41:a9:a3:eb:a1
    [15:18:40.571] sendHELO:150 cap: CanHTTPS=1,Model=squeezelite,AccuratePlayPoints=1, HasDigitalOut=1,HasPolarityInversion=1,Firmware=v1 .9.8-1318-test3,ModelName=SqueezeLite,MaxSampleRate=384000,d sf,dff,alc,aac,ogg,ops,ogf,flc,aif,pcm,mp3

    The error code 2745 was reported now in both functions, which is no surprise. I tried searching the web for .h files with these error-codes but I couldn't find it.
    I'd be curious to know what this error code means.

    Today, I did a whole lot of testing of various scenarios to try to narrow down the problem a bit further, but so far, the problem remains elusive.
    Youĺve misunderstood what I meant on ipv6.
    IPv6 is not enabled on the network. There is no ipv6 dns server.
    When ipV6 is switched on on the client a rogue ipv6 dns server :: gets inserted in your dns server settings as the default viz

    ::
    8.8.8.8
    8.8.4.4

    nslookup then fails to find anything

    This is new
    VB2.4 storage QNAP TS419p (NFS)
    Living Room Joggler & Pi4/Khadas -> Onkyo TXNR686 -> Celestion F20s
    Office Joggler & Pi3 -> Denon RCD N8 -> Celestion F10s
    Dining Room SB Boom
    Kitchen UE Radio (upgraded to SB Radio)
    Bedroom (Bedside) Pi Zero+DAC ->ToppingTP21 ->AKG Headphones
    Bedroom (TV) & Bathroom SB Touch ->Denon AVR ->Mordaunt Short M10s + Kef ceiling speakers
    Guest Room Joggler > Topping Amp -> Wharfedale Modus Cubes
    Everything controlled by iPeng & Material on iOS

  7. #27
    Junior Member
    Join Date
    Jan 2021
    Posts
    15
    Quote Originally Posted by ralphy View Post
    The error codes are documented by microsoft here.

    I'm displaying the error code in hex, I should have prefixed it with 0x2745

    But the description is vague.

    WSAECONNABORTED

    10053 (0x2745)

    An established connection was aborted by the software in your host machine.

    This site suggests it indicates that The connection has been closed.

    Thanks for trying the different builds. I need to investigate if anything can be done to gracefully handle the error and why the call to convert the error code to a message returns Unknown error.
    I'm also experiencing dropouts. While reproducible, the exact timing of the dropouts depend on circumstances like stream buffer size and mp3 bitrate.
    When loging to a terminal with -d all=sdebug the dropouts do occur. When logging same to a file with -f <logfile> however, the dropouts are gone. When they occur, one thread in squeezelite starts using nearly 100% of one cpu core, above 95% of that in kernel mode. Following the onset of 100% cpu usage the stream buffer runs empty and then the dropout occurs. To see this I had to run squeezelite from tailored program, redirecting the error stream, logging to memory and writing it all to a file after the tests so as not to alter squeezelite's internal timing. I tested with squeezelite v1.9.8-1307 (the test versions like 1318-test3 seem to be gone).

    To me, what I found, points to a possibly spurious race condition, which may be hard to reproduce on a specific machine. E.g. I can avoid triggering the dropouts when using a big stream buffer like -b 16384. It may very well be, that error 0x2745 is a consequence of a race condition as opposed to a cause.
    I could do some more testing if you would like to give me some pointers where to look further.
    Last edited by fpo; 2021-01-11 at 20:41.

  8. #28
    Senior Member ralphy's Avatar
    Join Date
    Jan 2006
    Location
    Canada
    Posts
    2,841
    Quote Originally Posted by fpo View Post
    I'm also experiencing dropouts. While reproducible, the exact timing of the dropouts depend on circumstances like stream buffer size and mp3 bitrate.
    When loging to a terminal with -d all=sdebug the dropouts do occur. When logging same to a file with -f <logfile> however, the dropouts are gone. When they occur, one thread in squeezelite starts using nearly 100% of one cpu core, above 95% of that in kernel mode. Following the onset of 100% cpu usage the stream buffer runs empty and then the dropout occurs. To see this I had to run squeezelite from tailored program, redirecting the error stream, logging to memory and writing it all to a file after the tests so as not to alter squeezelite's internal timing. I tested with squeezelite v1.9.8-1307 (the test versions like 1318-test3 seem to be gone).

    To me, what I found, points to a possibly spurious race condition, which may be hard to reproduce on a specific machine. E.g. I can avoid triggering the dropouts when using a big stream buffer like -b 16384. It may very well be, that error 0x2745 is a consequence of a race condition as opposed to a cause.
    I could do some more testing if you would like to give me some pointers where to look further.
    Thank you for taking the time to look into this.

    Yes, I did remove the test versions and have adding new r1323 32/64 windows binaries.

    I don't use windows for much of anything, so I'm not sure what to suggest at this point.
    Last edited by ralphy; 2021-01-13 at 06:29.
    Ralphy

    1-Touch, 5-Classics, 3-Booms, 2-UE Radio
    Squeezebox client builds donations always appreciated.

  9. #29
    Junior Member
    Join Date
    Jan 2021
    Posts
    15
    Quote Originally Posted by ralphy View Post
    ...
    Yes, I did remove the test versions and have adding new r1323 32/64 windows binaries.
    ...
    I tested with v1.9.8.1323/64. Sometimes a title runs through till the end, but in most cases still dropouts after 3 minutes. Symptom is again 100% cpu time on one core.

    Testing with "-b256 -d all=sdebug" I see lines like

    [16:20:37.944] stream_thread:428 streambuf read 3756 bytes

    before the onset of 100% cpu time but not thereafter. The streambuffer runs empty, then the ouput buffer, then the dropout.

    [16:20:38.044] decode_thread:75 streambuf bytes: 262143 outputbuf space: 241055
    ...
    [16:20:48.468] decode_thread:75 streambuf bytes: 1913 outputbuf space: 180431
    [16:20:48.538] sendSTAT:166 ms_played: 173186 (frames_played: 7637537 device_frames: 0)
    [16:20:48.538] sendSTAT:195 STAT: STMt
    [16:20:48.538] sendSTAT:200 received bytesL: 4469675 streambuf: 1913 outputbuf: 3339376 calc elapsed: 173186 real elapsed: 173203 (diff: -17) device: 0 delay: 0
    ...
    [16:20:58.487] decode_thread:75 streambuf bytes: 1913 outputbuf space: 3527999
    [16:20:58.517] sendSTAT:166 ms_played: 182621 (frames_played: 8055983 device_frames: 3072)
    [16:20:58.517] sendSTAT:195 STAT: STMt
    [16:20:58.517] sendSTAT:200 received bytesL: 4469675 streambuf: 1913 outputbuf: 0 calc elapsed: 182621 real elapsed: 183172 (diff: -551) device: 69 delay: 16
    [16:20:58.517] sendSTAT:166 ms_played: 182621 (frames_played: 8055983 device_frames: 3072)

    The 100% cpu time is from one thread. Using ProcessExplorer this thread's start address is
    squeezelite-x64.exe+0x88790
    This thread usually is way below 1% cpu time.

    Attaching Visual Studio Debugger and halting squeezelite while in 100% cpu mode shows alternate locations
    squeezelite-x64.exe!00007ff7f06d8a31
    mswsock.dll!00007ffa4e96c5fe

    This seems to point to a continued polling originating from a squeezelite thread.
    So much towards the symptoms I see.

    Reasoning over it:
    Whatever the problem is, squeezelite should not be producing 100% cpu time, most of which is in kernel mode, since that in itself could lead to all sorts of strange problems from as of yet unknown race conditions. These race conditions could originate from code one has no control over.

    My strategy would be:
    - find the source/reason fo the continued polling
    - insert a sleep with randomized duration (e.g. 5..50ms)
    This could already solve the problem in case there is a hidden race condition that is only triggered when cpu time is tight.
    In case one cannot recover
    - abort the polling after some time (e.g. 5 seconds)
    - reset the connection
    - start the stream anew and skip to the current point in the streambuffer (that I guess would be a major change)

    In any case thanks for your time doing this for all of us for free.
    It is much appreciated.

  10. #30
    Junior Member
    Join Date
    Dec 2020
    Posts
    15
    Quote Originally Posted by bpa View Post
    A few years ago - some 3rd party security software used to insert themselves into the Winsock stack to catch malware which used "pipes" - a few caused strange problem with LMS.
    To check if the security software was the problem, it had to be uninstalled, disabling it was not sufficient to stop its bad behaviour.

    Have you got non Microsoft security software installed ?
    From the OP: On all my Windoze machines, the Microsoft-provided security software is the only one I have. I don't have any 3rd party anti-virus or anti-malware software installed. As previously noted, I also ran the same test on a brand-new laptop with not much software installed on it. I agree that Windoze OS is implicated, but what's causing it? We've seen the WSAECONNABORTED which only tells us what we already know, which is that the connection was closed, but why was it closed?

    I think that the suggestion above points towards the cause. The Squeezelite client has a socket open, and suddenly, there is no socket. Consider: SqueezeLite on RaspberryPi with TinyLinux has no issues, but only Windows machines have the issue. While the source code for SqueezeLite is (almost) identical between these two OS, they have very different TCP/IP transport implementations. It is therefore plausible that the root cause is in the way SqueezeLite manages its socket connections on Windows -- or the Winsock itself. (utils.c, stream.c, slimproto.c)
    For example:
    Code:
    #if WIN
    	u_long iMode = 1;
    	ioctlsocket(s, FIONBIO, &iMode);
    #else
    	int flags = fcntl(s, F_GETFL,0);
    	fcntl(s, F_SETFL, flags | O_NONBLOCK);
    #endif
    Here we can see the difference between Windows and Linux implementations.

    Since I'm the only crazy person that seems to have this problem, I need to find one build of Windows that doesn't show this problem. That could be done by stripping Windows down, piece by piece, until the problem disappears. Or, I could stand up a Windows VM, either Windows7 or an older Windows10 with no software installed. Right now, I don't have Hyper-V or VirtualBox set up, so that could take some time. I'll try that later tonight. And if anyone else has any other suggestions, i'm all ears.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •