Thanks for the explaination around the unknown error strings. I'll look into that for the 64bit build. For this session, I'm building 32bit only and squeezelite still can't resolve the error code to a string.
In an effort to determine what the unknown socket errors are, I've uploaded an r1318-test2 build which prints the hex error value instead of trying to convert it to a string.
Would you try this build in debug mode?
Thanks,
Results 11 to 20 of 51
-
2021-01-03, 06:17 #11
Last edited by ralphy; 2021-01-03 at 06:50.
Ralphy
1-Touch, 5-Classics, 3-Booms, 2-UE Radio
Squeezebox client builds donations always appreciated.
-
2021-01-04, 11:06 #12
- Join Date
- Dec 2020
- Posts
- 15
No, this is the 32-bit build 1313 which ralphy put up to help troubleshoot this issue.
>> it possible that your hard drives are set to park when they are not in use?
Yes. here is how they are set up:
Hibernation happens after 20 minutes, and I have also set up logging for this. Looking at the logs, I see the last hibernation activity was 8 months ago. That makes sense, because this page: https://www.synology.com/en-global/k...re_hibernation
explains that certain services, such as the Logitech Media Server need constant access to disk, so if they are running, hibernation usually doesn't happen, and that is the case with me.
>>applications report 8-byte length winsock errors that need to be masked 0x0000FFFF to resolve to a usable 10xxx decimal error code.
In that case, it would be helpful to use a #computer-directive to check for 32/64 bitness, and use the suggested logic to resolve the error-code. I hope ralphy can easily make the change, and I'll rerun my test to see if I can get that error code.
>> if you have installed `green` drives they will actually do that themselves without the OS telling them.
I am aware that these desktop-type drives have this power-saving 'feature' which causes problems for NAS. I have Toshiba NAS drives, and these are not the 'green' type drives.
Looking at the hard drives:
admin@nas:~$ sudo smartctl -a /dev/sda
smartctl 6.5 (build date May 7 2020) [x86_64-linux-4.4.59+] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: TOSHIBA
Product: HDWN160
Revision: FS1M
User Capacity: 6,001,175,126,016 bytes [6.00 TB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is fully provisioned
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x50000398ac581f2a
Serial number: 78NUK021FAXG
Device type: disk
Local Time is: Mon Jan 4 09:47:49 2021 PST
SMART support is: Unavailable - device lacks SMART capability.
=== START OF READ SMART DATA SECTION ===
Current Drive Temperature: 0 C
Drive Trip Temperature: 0 C
Error Counter logging not supported
[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
Device does not support Self Test logging
Here we don't see all the information, but with the Synology Storage Manager I can see SMART and other info about my 4 disks in raid-5.
Also:
admin@nas:~$ sudo btrfs device stats .We can see that the Synology reports good health, and this is good for all drives. With 15,000 hours on the drives, they have had about 2 years of operation. I run a monthly health report on the drives, and they have always reported good health. There is no reason to think that these issues are related to drive health.
[/dev/mapper/cachedev_0].write_io_errs 0
[/dev/mapper/cachedev_0].read_io_errs 0
[/dev/mapper/cachedev_0].flush_io_errs 0
[/dev/mapper/cachedev_0].corruption_errs 0
[/dev/mapper/cachedev_0].generation_errs 0
I'm going to play with Wireshark to see if I can gain some insight into the TCP activity surrounding this outage issue.
Thanks
-
2021-01-04, 12:41 #13
- Join Date
- Dec 2020
- Posts
- 86
Oh... That's pretty interesting. Apparently the Synology shell is something of a virtualization layer on top of the actual OS. Shame that the SSM tool does not display vital drive information like LCC, but with a 20 minute sleep setting I would not expect issues with park/unpark. Should this be the case anyway you should definitely be able to identify it by a ticking sound coming from the unit. Do note that RAID 5 is a three disk configuration, so one of your disks must function as a spare. Does SSM tell you which one is which, i.e. which one is the spare? Two types of logic might apply here, just follow the numerical order or skip one of the centre two disks to reduce heat exchange between the disks, so drawing any conclusions from it will most likely prove wrong, but I would be very suspicious if the spare happened to be sda.
That said, I think we may have gotten off-track here a bit as I was wondering if you could verify this behaviour with the Duet you own and then noticed that you had mentioned to run piCorePlayer as well and not seeing the issue there. That implies that the problem is on the client side, not the server. It may be interesting to verify TCP connectivity on other protocols, in particular SMB. Do you have active drive mappings to the Synology on the Windows client? If not, create one and keep an Explorer window with folder listing open without clicking the mapped drive. If after some time (probably about 5 minutes or so) you see a red X appear over the mapped drive this means that the TCP connection has actively been disconnected, which should not happen.Last edited by gordonb3; 2021-01-04 at 12:44.
-
2021-01-05, 05:24 #14
Have you had a chance to try this?
Ralphy
1-Touch, 5-Classics, 3-Booms, 2-UE Radio
Squeezebox client builds donations always appreciated.
-
2021-01-05, 14:26 #15
- Join Date
- Dec 2020
- Posts
- 15
>> Apparently the Synology shell is something of a virtualization layer on top of the actual OS.
Yes, sort of. The user-interface for Synology DSM runs on linux. It's not a virtualization layer as such, but just a user-interface on top of regular linux. In the case of my Storage Manager, it does provide a bit more insight than smartctl, but that's probably because Synology developed other tools to manage the raid array. They call it "SHR - Synology Hybrid Raid", which is like regular RAID, but allows different types of disk, and volumes can grow/shrink.
>> Do note that RAID 5 is a three disk configuration, so one of your disks must function as a spare. Does SSM tell you which one is which, i.e. which one is the spare?
Actually that's not correct. Raid-5 has one-disk redundancy, but there is no concept of a "spare". Think of it like this: There are 4 disks in the array, all are equal to each other, and when a file is written to the array, the parity is calculated, and stored alongside the file, and it's distributed across disks. Then, if any one of the four disks fails, a new disk is inserted, and all the missing files are reconstructed, one by one, based on the parity.
>>verify this behaviour with the Duet
Now I can't because I no longer have a Duet. I had one for years, but recently sold both on ebay. I have replaced the Duet with my Raspberry Pi.
I can say for sure that this issue does NOT exist with the SqueezeLite running on piCorePlayer on the Raspberry Pi 3B+. I tested it again yesterday, and it ran all day with not a single hiccup.
>>That implies that the problem is on the client side, not the server.
Yes I agree. I also used the Synology LMS for years with the Duet, and also never saw an issue like this. That's why I presumed that this is a windows-specific client-side issue.
I also re-ran tests on BOTH my windows machines separately, and both have the same problem with audio dropouts, and the exact same behavior too.
1. Dell XPS-13, Core i7 6th gen, 16GB memory, windows home.
2. Asrock Mini ITX, Core i7-8700K, 16GB memory, windows pro.
For both machines, the behavior is the same:
* Interruptions happen every 5, 10, 15 or 20 minutes, almost exactly these time-intervals
* The interruption always lasts for 30 seconds, and then resumes.
>>Do you have active drive mappings to the Synology on the Windows client?
Yes, the Synology is running Samba, and both windows systems have drive-mappings to the Synology over Samba. I don't see how this could affect the behavior of the SqueezeLite, because it doesn't talk to the Synology over Samba.
>> If after some time (probably about 5 minutes or so) you see a red X appear over the mapped drive
Good suggestion, but I confirmed that there is no network outage. I know that when you unplug the ethernet cable, or reboot the NAS, I would see those "red X" showing that the drive-mapping had become disconnected, and when the connection is restored, you only need to click on the drive-mapping, to reactivate it. That feature works as expected, but in this case, there is no network outage. That is, when SqueezeLite loses audio, I can immediately navigate to samba shares, and the network connectivity continues normally. I have examined both windows and synology systems during the outage, but everything is normal. Thus we can say that the SqueezeLite client is the only cause of the outage.
But since it's hard to reproduce on another Windows system, we still don't know that for sure. There could be some weird routing issue with my LAN on which these 2 machines sit, but I don't know what that could be. It could be some weird Windows thing but what? The right way to fix this, is to debug it on my machine. I'm not set up with Visual Studio, but I could set it up and do a debug session, if we exhaust the basic options.
Separately, I'll reply to ralphy and test the new build he put up.
John
-
2021-01-05, 15:13 #16
- Join Date
- Dec 2020
- Posts
- 15
Ok so I got the new build, and tested again today. The outages happened, as before. But I did NOT see any socket errors, after several outages over 45 minutes.
Yesterday, I did see the socket errors a couple of times, but as I've pointed out, the socket errors happened AFTER the audio dropout, and seems to be a symptom, but doesn't point towards the cause. But yesterday, I did notice the socket error happen when the console was blocked, so I was able to force the error as follows:
1. In the console, with debug output running, use the mouse to select text in the console. This causes console output to be blocked, until you right-click the console.
2. While the console is blocked, use the squeezebox controller to seek ahead on the track. This forces activity in SqueezeLite, but the application is blocked, due to the console.
3. Right-click the console to unblock it. I see all the buffered console text appear, and now the socket error is there.
[14:57:11.357] stream_thread:404 end of stream
[14:57:11.358] sendDSCO:214 DSCO: 0
[14:57:12.368] sendSTAT:195 STAT: STMt
[14:57:14.380] sendSTAT:195 STAT: STMt
[14:57:14.859] process:527 strm
[14:58:10.318] process_strm:280 strm command t
[14:58:10.328] sendSTAT:195 STAT: STMt
[14:58:10.328] mad_decode:207 end of stream
[14:58:10.338] mad_decode:292 gapless: early end - trimming padding from end
[14:58:10.338] decode_thread:100 decode complete
[14:58:10.338] slimproto_run:715 output underrun
[14:58:10.348] sendSTAT:195 STAT: STMd
[14:58:10.358] send_packet:114 failed writing to socket: 2745
[14:58:10.358] sendSTAT:195 STAT: STMu
[14:58:10.378] send_packet:114 failed writing to socket: 2745
[14:58:10.378] slimproto_run:578 error reading from socket: Unknown error
[14:58:10.518] slimproto:932 connected
[14:58:10.518] sendHELO:148 mac: 70:85:c2:a8:ce:68
[14:58:10.525] sendHELO:150 cap: CanHTTPS=1,Model=squeezelite,AccuratePlayPoints=1, HasDigitalOut=1,HasPolarityInversion=1,Firmware=v1 .9.8-1318-test2,ModelName=SqueezeLite,MaxSampleRate=384000,d sf,dff,alc,aac,ogg,ops,ogf,flc,aif,pcm,mp3
Note that this is NOT the error we're talking about, this is some other issue.
I would not expect the console to be blocked, in normal operation, so this does not indicate any issue by itself, I was just forcing it to misbehave.
Also, looking at the code yesterday, I did notice that the "retry" code exists in 2 places: send_packet() and slimproto_run().
It seems that the change you made was only to send_packet(), and the handle 2745 was found there. The other one still shows "unknown error".
Ralphy can you tell me how to find the details of your most recent commit? Last time you provided a link to it, but I don't see a way to get it through github. I'm logged in to github, but I don't see how to find the commits. Please help me navigate to the commits.
Thanks
-
2021-01-06, 02:10 #17
- Join Date
- Dec 2020
- Posts
- 86
The only four disk RAID configuration I know of is Raid 1+0, aka RAID 10, which combines striping with mirror. RAID 5 distributes disk blocks in a 2/3 ratio across three disks, meaning that (unlike with RAID 1) no disk holds a complete copy of the data. A bonus is that this improves disk read speed because the data must be read from two disks. A fourth disk is always a spare with RAID 5, but I guess that a smart controller might rotate the roles on a regular base to get the best out of disk life time.
PS something that most people tend to overlook is that the disks entered into a RAID configuration are usually the same brand and size and also bought at the same time, thus with a high probability of being from the same manufacturing batch and consequently with a high probability of failing around the same time. RAID is not a substitute for backups and you wouldn't be the first one to lose a substantial amount of data on account of a second disk failing during RAID reconstruction.
>> If after some time (probably about 5 minutes or so) you see a red X appear over the mapped drive
Good suggestion, but I confirmed that there is no network outage. I know that when you unplug the ethernet cable, or reboot the NAS, I would see those "red X" showing that the drive-mapping had become disconnected, and when the connection is restored, you only need to click on the drive-mapping, to reactivate it. That feature works as expected, but in this case, there is no network outage. That is, when SqueezeLite loses audio, I can immediately navigate to samba shares, and the network connectivity continues normally. I have examined both windows and synology systems during the outage, but everything is normal. Thus we can say that the SqueezeLite client is the only cause of the outage.
-
2021-01-06, 06:24 #18
I don't use github for squeezelite development, so the 1318-test2 changes are not there. I've attached a patch file with the test2 code changes. I forgot to include the patch file in the zip.
I uploaded 1318-test3 which prints the error code for the slimproto socket calls as well. You'll find a patch file included in that zip with the changes.Ralphy
1-Touch, 5-Classics, 3-Booms, 2-UE Radio
Squeezebox client builds donations always appreciated.
-
2021-01-06, 14:59 #19
- Join Date
- Feb 2011
- Location
- Cheshire, UK
- Posts
- 5,464
There is definitely something in the one of the latest Windows update that causes some issues with networking where the Red Cross status on drive mapping occurs post boot up but doesn’t restore when you try to access. If you restart it then works. I’ve not got to the bottom of it yet.
Also I’ve noted that IPv6 is forced on even if not enabled in DHCP and you will find an entry for :: in your DNS settings unless you switch it off.
There is also an SMB 1.0 auto removal process that has been inserted.
So when we are talking about Squeezelite failing I think we need to know what specific build of W10 we are talking about.VB2.4 storage QNAP TS419p (NFS)
Living Room Joggler & Pi4/Khadas -> Onkyo TXNR686 -> Celestion F20s
Office Joggler & Pi3 -> Denon RCD N8 -> Celestion F10s
Dining Room SB Boom
Kitchen UE Radio (upgraded to SB Radio)
Bedroom (Bedside) Pi Zero+DAC ->ToppingTP21 ->AKG Headphones
Bedroom (TV) & Bathroom SB Touch ->Denon AVR ->Mordaunt Short M10s + Kef ceiling speakers
Guest Room Joggler > Topping Amp -> Wharfedale Modus Cubes
Everything controlled by iPeng & Material on iOS
-
2021-01-07, 14:37 #20
- Join Date
- Dec 2020
- Posts
- 15
Ok, that's interesting. Not sure if it applies here, if that's only peer-to-peer configuration - as I'm using a Synology DS918+ to run the Logitech Media Server.
What we have 'going on' is a client that uses a socket to connect to a host. Pretty simple, and I would not expect the o/s to unilaterally kill a socket. And if windows did this because it "hasn't seen any activity" then ok, but in this case, you couldn't argue that there was no activity.