Regarding Windoze10:
* This issue exists on my two windows machines, both running the latest version 20H2, with all patches applied.
* This issue has existed for 1-2 years, so it also existed on older windoze 10, it's not specific to the latest windows.
* By chance, I have yesterday received my new Dell XPS-15 9500, so now I have a 3rd windoze computer I can use for testing. I set up the system, installed all patches, and upgraded to version 20H2, like the other machines.
* I ran the test on the new system, and I confirmed that this issue also exists on this new machine. The issue is identical, and repeatable across all systems: the issue comes up after the first 5 minutes of operation, and the outage lasts for 30 seconds, and keeps coming up every 5, 10, 15 or 20 minutes after that.
* On the Synology side, this happened when running LMS 7.9.7, and still exists after upgrading to the newer LMS 8.0.1
Regarding network drive mappings:
* For years, I have been annoyed by Windoze, every time I log in, my network drives are disconnected, and I see a message saying "some of your network drives could no be connected". I have to click on each drive to reactivate them.
* With my Synology storage, (running Samba) I *never* have trouble reconnecting. I have NOT noticed this issue you mention: that you get the Red Cross (drive disconnected) upon bootup, but it won't reconnect, even when you select the drive.
* As a test, I have disconnected all my drive mappings, and rerun the test. Outages happened 4 times over 45 minutes, at 10, 15, 30 and 40-minute marks. No change in overall behavior.
Regarding IPv6:
C:\Users\admin>ping nasWe can see that DNS resolves its IPv6 address, but I can also ping it with IPv4:
Pinging nas.local [fe80::211:32ff:fe92:9693%4] with 32 bytes of data:
Reply from fe80::211:32ff:fe92:9693%4: time<1ms
Reply from fe80::211:32ff:fe92:9693%4: time<1ms
C:\Users\admin>ping 192.168.0.60My Synology is set up with both, and I verified this in the configuration. To test whether IPv6 is causing issues, I have disabled IPv6 on the Synology, and rerun my test.
Pinging 192.168.0.60 with 32 bytes of data:
Reply from 192.168.0.60: bytes=32 time<1ms TTL=64
Reply from 192.168.0.60: bytes=32 time<1ms TTL=64
Now when I ping it, it resolves as IPv4:
C:\Users\admin>ping nasThe outage happened as usual, this time, instead of after 5 minutes, it took 10 minutes before the first outage happened.
Pinging nas.local [192.168.0.60] with 32 bytes of data:
Reply from 192.168.0.60: bytes=32 time<1ms TTL=64
Thus we can say that this issue is not affected by the presence of IPv6.
Another test: install location.
* I have installed SqueezeLite into C:\program files (x86)\SqueezeLite
* This might be considered unusual, because it's just a command-line program with no installer. I use admin permission to copy the files there. This is just be consistent with where windoze puts its installed programs.
* What if this causes some issue? Maybe it causes confusion for windoze, which needs things written to the registry when items are installed there. Or maybe the fact that the path contains whitespaces causes some issues for SqueezeLite, which also uses linux-type file systems that don't have spaces and are case-sensitive.
* To put my mind at ease, I reran the test with the software installed in c:\temp\SqueezeLite
* Results: Outage at 5 minute and 20 minute mark. No difference.
I'll post separately with my testing with the new "test3" build.
Results 21 to 30 of 51
-
2021-01-07, 16:48 #21
- Join Date
- Dec 2020
- Posts
- 15
Last edited by foopydog; 2021-01-07 at 19:24.
-
2021-01-07, 19:21 #22
- Join Date
- Dec 2020
- Posts
- 15
Ok, so I got the test3 build, and ran some more tests.
Looking at the patch file, I see that the changes. Now slimproto.send_packet() is now showing the last_error instead of strerror(last_error).
The same change was made to stream.send_header().
When I ran the test, the socket error came up once by itself, and the same error code is now reported in both functions:
[15:16:35.171] process_strm:280 strm command t
[15:16:35.180] sendSTAT:195 STAT: STMt
[15:16:36.211] sendSTAT:195 STAT: STMt
[15:16:37.231] sendSTAT:195 STAT: STMt
[15:16:39.231] sendSTAT:195 STAT: STMt
[15:18:04.091] stream_thread:404 end of stream
[15:18:40.430] sendDSCO:214 DSCO: 0
[15:18:40.430] send_packet:114 failed writing to socket: 2745
[15:18:40.430] sendSTAT:195 STAT: STMt
[15:18:40.430] send_packet:114 failed writing to socket: 2745
[15:18:40.430] slimproto_run:563 error reading from socket: 2745
[15:18:40.571] slimproto:932 connected
[15:18:40.571] sendHELO:148 mac: dc:41:a9:a3:eb:a1
[15:18:40.571] sendHELO:150 cap: CanHTTPS=1,Model=squeezelite,AccuratePlayPoints=1, HasDigitalOut=1,HasPolarityInversion=1,Firmware=v1 .9.8-1318-test3,ModelName=SqueezeLite,MaxSampleRate=384000,d sf,dff,alc,aac,ogg,ops,ogf,flc,aif,pcm,mp3
The error code 2745 was reported now in both functions, which is no surprise. I tried searching the web for .h files with these error-codes but I couldn't find it.
I'd be curious to know what this error code means.
Today, I did a whole lot of testing of various scenarios to try to narrow down the problem a bit further, but so far, the problem remains elusive.
-
2021-01-08, 05:18 #23
The error codes are documented by microsoft here.
I'm displaying the error code in hex, I should have prefixed it with 0x2745
But the description is vague.
WSAECONNABORTED
10053 (0x2745)
An established connection was aborted by the software in your host machine.
This site suggests it indicates that The connection has been closed.
Thanks for trying the different builds. I need to investigate if anything can be done to gracefully handle the error and why the call to convert the error code to a message returns Unknown error.Ralphy
1-Touch, 5-Classics, 3-Booms, 2-UE Radio
Squeezebox client builds donations always appreciated.
-
2021-01-08, 13:09 #24
- Join Date
- Dec 2020
- Posts
- 79
Yeah, that was one of the two returns I expected to see.
The `vague description`is somewhat of a trademark for Microsoft - I guess it helps them sell courses in understanding WTF might be going on and in fact most people that do are very proud to be able to call themselves MSCE when they pass. What you need to understand here is that this error is a client-side generated error. Effectively what it means is that something between your application and the server, but not the server itself, actively says 'NO'.
Yes that is still pretty vague, but if that Pi you own is connected through the same switch to the LMS host that pretty much confines the issue to the Windows machine and you'll most likely be looking at some kind of malware blocker.Last edited by gordonb3; 2021-01-08 at 13:12.
-
2021-01-08, 13:38 #25
- Join Date
- Oct 2005
- Location
- Ireland
- Posts
- 20,090
A few years ago - some 3rd party security software used to insert themselves into the Winsock stack to catch malware which used "pipes" - a few caused strange problem with LMS.
To check if the security software was the problem, it had to be uninstalled, disabling it was not sufficient to stop its bad behaviour.
Have you got non Microsoft security software installed ?
-
2021-01-08, 14:26 #26
- Join Date
- Feb 2011
- Location
- Cheshire, UK
- Posts
- 5,437
You’ve misunderstood what I meant on ipv6.
IPv6 is not enabled on the network. There is no ipv6 dns server.
When ipV6 is switched on on the client a rogue ipv6 dns server :: gets inserted in your dns server settings as the default viz
::
8.8.8.8
8.8.4.4
nslookup then fails to find anything
This is newVB2.4 storage QNAP TS419p (NFS)
Living Room Joggler & Pi4/Khadas -> Onkyo TXNR686 -> Celestion F20s
Office Joggler & Pi3 -> Denon RCD N8 -> Celestion F10s
Dining Room SB Boom
Kitchen UE Radio (upgraded to SB Radio)
Bedroom (Bedside) Pi Zero+DAC ->ToppingTP21 ->AKG Headphones
Bedroom (TV) & Bathroom SB Touch ->Denon AVR ->Mordaunt Short M10s + Kef ceiling speakers
Guest Room Joggler > Topping Amp -> Wharfedale Modus Cubes
Everything controlled by iPeng & Material on iOS
-
2021-01-11, 20:04 #27
- Join Date
- Jan 2021
- Posts
- 15
I'm also experiencing dropouts. While reproducible, the exact timing of the dropouts depend on circumstances like stream buffer size and mp3 bitrate.
When loging to a terminal with -d all=sdebug the dropouts do occur. When logging same to a file with -f <logfile> however, the dropouts are gone. When they occur, one thread in squeezelite starts using nearly 100% of one cpu core, above 95% of that in kernel mode. Following the onset of 100% cpu usage the stream buffer runs empty and then the dropout occurs. To see this I had to run squeezelite from tailored program, redirecting the error stream, logging to memory and writing it all to a file after the tests so as not to alter squeezelite's internal timing. I tested with squeezelite v1.9.8-1307 (the test versions like 1318-test3 seem to be gone).
To me, what I found, points to a possibly spurious race condition, which may be hard to reproduce on a specific machine. E.g. I can avoid triggering the dropouts when using a big stream buffer like -b 16384. It may very well be, that error 0x2745 is a consequence of a race condition as opposed to a cause.
I could do some more testing if you would like to give me some pointers where to look further.Last edited by fpo; 2021-01-11 at 20:41.
-
2021-01-13, 05:30 #28
Last edited by ralphy; 2021-01-13 at 06:29.
Ralphy
1-Touch, 5-Classics, 3-Booms, 2-UE Radio
Squeezebox client builds donations always appreciated.
-
2021-01-13, 10:34 #29
- Join Date
- Jan 2021
- Posts
- 15
I tested with v1.9.8.1323/64. Sometimes a title runs through till the end, but in most cases still dropouts after 3 minutes. Symptom is again 100% cpu time on one core.
Testing with "-b256 -d all=sdebug" I see lines like
[16:20:37.944] stream_thread:428 streambuf read 3756 bytes
before the onset of 100% cpu time but not thereafter. The streambuffer runs empty, then the ouput buffer, then the dropout.
[16:20:38.044] decode_thread:75 streambuf bytes: 262143 outputbuf space: 241055
...
[16:20:48.468] decode_thread:75 streambuf bytes: 1913 outputbuf space: 180431
[16:20:48.538] sendSTAT:166 ms_played: 173186 (frames_played: 7637537 device_frames: 0)
[16:20:48.538] sendSTAT:195 STAT: STMt
[16:20:48.538] sendSTAT:200 received bytesL: 4469675 streambuf: 1913 outputbuf: 3339376 calc elapsed: 173186 real elapsed: 173203 (diff: -17) device: 0 delay: 0
...
[16:20:58.487] decode_thread:75 streambuf bytes: 1913 outputbuf space: 3527999
[16:20:58.517] sendSTAT:166 ms_played: 182621 (frames_played: 8055983 device_frames: 3072)
[16:20:58.517] sendSTAT:195 STAT: STMt
[16:20:58.517] sendSTAT:200 received bytesL: 4469675 streambuf: 1913 outputbuf: 0 calc elapsed: 182621 real elapsed: 183172 (diff: -551) device: 69 delay: 16
[16:20:58.517] sendSTAT:166 ms_played: 182621 (frames_played: 8055983 device_frames: 3072)
The 100% cpu time is from one thread. Using ProcessExplorer this thread's start address is
squeezelite-x64.exe+0x88790This thread usually is way below 1% cpu time.
Attaching Visual Studio Debugger and halting squeezelite while in 100% cpu mode shows alternate locations
squeezelite-x64.exe!00007ff7f06d8a31
mswsock.dll!00007ffa4e96c5fe
This seems to point to a continued polling originating from a squeezelite thread.
So much towards the symptoms I see.
Reasoning over it:
Whatever the problem is, squeezelite should not be producing 100% cpu time, most of which is in kernel mode, since that in itself could lead to all sorts of strange problems from as of yet unknown race conditions. These race conditions could originate from code one has no control over.
My strategy would be:
- find the source/reason fo the continued polling
- insert a sleep with randomized duration (e.g. 5..50ms)
This could already solve the problem in case there is a hidden race condition that is only triggered when cpu time is tight.
In case one cannot recover
- abort the polling after some time (e.g. 5 seconds)
- reset the connection
- start the stream anew and skip to the current point in the streambuffer (that I guess would be a major change)
In any case thanks for your time doing this for all of us for free.
It is much appreciated.
-
2021-01-13, 11:16 #30
- Join Date
- Dec 2020
- Posts
- 15
From the OP: On all my Windoze machines, the Microsoft-provided security software is the only one I have. I don't have any 3rd party anti-virus or anti-malware software installed. As previously noted, I also ran the same test on a brand-new laptop with not much software installed on it. I agree that Windoze OS is implicated, but what's causing it? We've seen the WSAECONNABORTED which only tells us what we already know, which is that the connection was closed, but why was it closed?
I think that the suggestion above points towards the cause. The Squeezelite client has a socket open, and suddenly, there is no socket. Consider: SqueezeLite on RaspberryPi with TinyLinux has no issues, but only Windows machines have the issue. While the source code for SqueezeLite is (almost) identical between these two OS, they have very different TCP/IP transport implementations. It is therefore plausible that the root cause is in the way SqueezeLite manages its socket connections on Windows -- or the Winsock itself. (utils.c, stream.c, slimproto.c)
For example:
Code:#if WIN u_long iMode = 1; ioctlsocket(s, FIONBIO, &iMode); #else int flags = fcntl(s, F_GETFL,0); fcntl(s, F_SETFL, flags | O_NONBLOCK); #endif
Since I'm the only crazy person that seems to have this problem, I need to find one build of Windows that doesn't show this problem. That could be done by stripping Windows down, piece by piece, until the problem disappears. Or, I could stand up a Windows VM, either Windows7 or an older Windows10 with no software installed. Right now, I don't have Hyper-V or VirtualBox set up, so that could take some time. I'll try that later tonight. And if anyone else has any other suggestions, i'm all ears.