Home of the Squeezebox™ & Transporter® network music players.
Page 2 of 6 FirstFirst 1234 ... LastLast
Results 11 to 20 of 51
  1. #11
    Senior Member ralphy's Avatar
    Join Date
    Jan 2006
    Location
    Canada
    Posts
    2,790
    Thanks for the explaination around the unknown error strings. I'll look into that for the 64bit build. For this session, I'm building 32bit only and squeezelite still can't resolve the error code to a string.

    In an effort to determine what the unknown socket errors are, I've uploaded an r1318-test2 build which prints the hex error value instead of trying to convert it to a string.

    Would you try this build in debug mode?

    Thanks,
    Last edited by ralphy; 2021-01-03 at 06:50.
    Ralphy

    1-Touch, 5-Classics, 3-Booms, 2-UE Radio
    Squeezebox client builds donations always appreciated.

  2. #12
    Junior Member
    Join Date
    Dec 2020
    Posts
    15
    Quote Originally Posted by gordonb3 View Post
    Is SqueezeLite a 64-bit app?
    No, this is the 32-bit build 1313 which ralphy put up to help troubleshoot this issue.

    >> it possible that your hard drives are set to park when they are not in use?

    Yes. here is how they are set up:
    Name:  synology-hibernation.JPG
Views: 88
Size:  59.6 KB

    Hibernation happens after 20 minutes, and I have also set up logging for this. Looking at the logs, I see the last hibernation activity was 8 months ago. That makes sense, because this page: https://www.synology.com/en-global/k...re_hibernation
    explains that certain services, such as the Logitech Media Server need constant access to disk, so if they are running, hibernation usually doesn't happen, and that is the case with me.

    >>applications report 8-byte length winsock errors that need to be masked 0x0000FFFF to resolve to a usable 10xxx decimal error code.

    In that case, it would be helpful to use a #computer-directive to check for 32/64 bitness, and use the suggested logic to resolve the error-code. I hope ralphy can easily make the change, and I'll rerun my test to see if I can get that error code.

    >> if you have installed `green` drives they will actually do that themselves without the OS telling them.
    I am aware that these desktop-type drives have this power-saving 'feature' which causes problems for NAS. I have Toshiba NAS drives, and these are not the 'green' type drives.

    Looking at the hard drives:
    admin@nas:~$ sudo smartctl -a /dev/sda
    smartctl 6.5 (build date May 7 2020) [x86_64-linux-4.4.59+] (local build)
    Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

    === START OF INFORMATION SECTION ===
    Vendor: TOSHIBA
    Product: HDWN160
    Revision: FS1M
    User Capacity: 6,001,175,126,016 bytes [6.00 TB]
    Logical block size: 512 bytes
    Physical block size: 4096 bytes
    LU is fully provisioned
    Rotation Rate: 7200 rpm
    Form Factor: 3.5 inches
    Logical Unit id: 0x50000398ac581f2a
    Serial number: 78NUK021FAXG
    Device type: disk
    Local Time is: Mon Jan 4 09:47:49 2021 PST
    SMART support is: Unavailable - device lacks SMART capability.

    === START OF READ SMART DATA SECTION ===
    Current Drive Temperature: 0 C
    Drive Trip Temperature: 0 C

    Error Counter logging not supported

    [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
    Device does not support Self Test logging

    Here we don't see all the information, but with the Synology Storage Manager I can see SMART and other info about my 4 disks in raid-5.
    Name:  Drive-health.JPG
Views: 93
Size:  65.1 KB

    Also:
    admin@nas:~$ sudo btrfs device stats .
    [/dev/mapper/cachedev_0].write_io_errs 0
    [/dev/mapper/cachedev_0].read_io_errs 0
    [/dev/mapper/cachedev_0].flush_io_errs 0
    [/dev/mapper/cachedev_0].corruption_errs 0
    [/dev/mapper/cachedev_0].generation_errs 0
    We can see that the Synology reports good health, and this is good for all drives. With 15,000 hours on the drives, they have had about 2 years of operation. I run a monthly health report on the drives, and they have always reported good health. There is no reason to think that these issues are related to drive health.

    I'm going to play with Wireshark to see if I can gain some insight into the TCP activity surrounding this outage issue.

    Thanks

  3. #13
    Oh... That's pretty interesting. Apparently the Synology shell is something of a virtualization layer on top of the actual OS. Shame that the SSM tool does not display vital drive information like LCC, but with a 20 minute sleep setting I would not expect issues with park/unpark. Should this be the case anyway you should definitely be able to identify it by a ticking sound coming from the unit. Do note that RAID 5 is a three disk configuration, so one of your disks must function as a spare. Does SSM tell you which one is which, i.e. which one is the spare? Two types of logic might apply here, just follow the numerical order or skip one of the centre two disks to reduce heat exchange between the disks, so drawing any conclusions from it will most likely prove wrong, but I would be very suspicious if the spare happened to be sda.

    That said, I think we may have gotten off-track here a bit as I was wondering if you could verify this behaviour with the Duet you own and then noticed that you had mentioned to run piCorePlayer as well and not seeing the issue there. That implies that the problem is on the client side, not the server. It may be interesting to verify TCP connectivity on other protocols, in particular SMB. Do you have active drive mappings to the Synology on the Windows client? If not, create one and keep an Explorer window with folder listing open without clicking the mapped drive. If after some time (probably about 5 minutes or so) you see a red X appear over the mapped drive this means that the TCP connection has actively been disconnected, which should not happen.
    Last edited by gordonb3; 2021-01-04 at 12:44.

  4. #14
    Senior Member ralphy's Avatar
    Join Date
    Jan 2006
    Location
    Canada
    Posts
    2,790
    Quote Originally Posted by foopydog View Post
    No, this is the 32-bit build 1313 which ralphy put up to help troubleshoot this issue.

    >>applications report 8-byte length winsock errors that need to be masked 0x0000FFFF to resolve to a usable 10xxx decimal error code.

    In that case, it would be helpful to use a #computer-directive to check for 32/64 bitness, and use the suggested logic to resolve the error-code. I hope ralphy can easily make the change, and I'll rerun my test to see if I can get that error code.
    Have you had a chance to try this?
    Ralphy

    1-Touch, 5-Classics, 3-Booms, 2-UE Radio
    Squeezebox client builds donations always appreciated.

  5. #15
    Junior Member
    Join Date
    Dec 2020
    Posts
    15
    >> Apparently the Synology shell is something of a virtualization layer on top of the actual OS.

    Yes, sort of. The user-interface for Synology DSM runs on linux. It's not a virtualization layer as such, but just a user-interface on top of regular linux. In the case of my Storage Manager, it does provide a bit more insight than smartctl, but that's probably because Synology developed other tools to manage the raid array. They call it "SHR - Synology Hybrid Raid", which is like regular RAID, but allows different types of disk, and volumes can grow/shrink.

    >> Do note that RAID 5 is a three disk configuration, so one of your disks must function as a spare. Does SSM tell you which one is which, i.e. which one is the spare?

    Actually that's not correct. Raid-5 has one-disk redundancy, but there is no concept of a "spare". Think of it like this: There are 4 disks in the array, all are equal to each other, and when a file is written to the array, the parity is calculated, and stored alongside the file, and it's distributed across disks. Then, if any one of the four disks fails, a new disk is inserted, and all the missing files are reconstructed, one by one, based on the parity.

    >>verify this behaviour with the Duet

    Now I can't because I no longer have a Duet. I had one for years, but recently sold both on ebay. I have replaced the Duet with my Raspberry Pi.
    I can say for sure that this issue does NOT exist with the SqueezeLite running on piCorePlayer on the Raspberry Pi 3B+. I tested it again yesterday, and it ran all day with not a single hiccup.

    >>That implies that the problem is on the client side, not the server.

    Yes I agree. I also used the Synology LMS for years with the Duet, and also never saw an issue like this. That's why I presumed that this is a windows-specific client-side issue.
    I also re-ran tests on BOTH my windows machines separately, and both have the same problem with audio dropouts, and the exact same behavior too.
    1. Dell XPS-13, Core i7 6th gen, 16GB memory, windows home.
    2. Asrock Mini ITX, Core i7-8700K, 16GB memory, windows pro.
    For both machines, the behavior is the same:
    * Interruptions happen every 5, 10, 15 or 20 minutes, almost exactly these time-intervals
    * The interruption always lasts for 30 seconds, and then resumes.

    >>Do you have active drive mappings to the Synology on the Windows client?

    Yes, the Synology is running Samba, and both windows systems have drive-mappings to the Synology over Samba. I don't see how this could affect the behavior of the SqueezeLite, because it doesn't talk to the Synology over Samba.

    >> If after some time (probably about 5 minutes or so) you see a red X appear over the mapped drive

    Good suggestion, but I confirmed that there is no network outage. I know that when you unplug the ethernet cable, or reboot the NAS, I would see those "red X" showing that the drive-mapping had become disconnected, and when the connection is restored, you only need to click on the drive-mapping, to reactivate it. That feature works as expected, but in this case, there is no network outage. That is, when SqueezeLite loses audio, I can immediately navigate to samba shares, and the network connectivity continues normally. I have examined both windows and synology systems during the outage, but everything is normal. Thus we can say that the SqueezeLite client is the only cause of the outage.

    But since it's hard to reproduce on another Windows system, we still don't know that for sure. There could be some weird routing issue with my LAN on which these 2 machines sit, but I don't know what that could be. It could be some weird Windows thing but what? The right way to fix this, is to debug it on my machine. I'm not set up with Visual Studio, but I could set it up and do a debug session, if we exhaust the basic options.

    Separately, I'll reply to ralphy and test the new build he put up.
    John

  6. #16
    Junior Member
    Join Date
    Dec 2020
    Posts
    15
    Quote Originally Posted by ralphy View Post
    Have you had a chance to try this?
    Ok so I got the new build, and tested again today. The outages happened, as before. But I did NOT see any socket errors, after several outages over 45 minutes.

    Yesterday, I did see the socket errors a couple of times, but as I've pointed out, the socket errors happened AFTER the audio dropout, and seems to be a symptom, but doesn't point towards the cause. But yesterday, I did notice the socket error happen when the console was blocked, so I was able to force the error as follows:

    1. In the console, with debug output running, use the mouse to select text in the console. This causes console output to be blocked, until you right-click the console.
    2. While the console is blocked, use the squeezebox controller to seek ahead on the track. This forces activity in SqueezeLite, but the application is blocked, due to the console.
    3. Right-click the console to unblock it. I see all the buffered console text appear, and now the socket error is there.

    [14:57:11.357] stream_thread:404 end of stream
    [14:57:11.358] sendDSCO:214 DSCO: 0
    [14:57:12.368] sendSTAT:195 STAT: STMt
    [14:57:14.380] sendSTAT:195 STAT: STMt
    [14:57:14.859] process:527 strm
    [14:58:10.318] process_strm:280 strm command t
    [14:58:10.328] sendSTAT:195 STAT: STMt
    [14:58:10.328] mad_decode:207 end of stream
    [14:58:10.338] mad_decode:292 gapless: early end - trimming padding from end
    [14:58:10.338] decode_thread:100 decode complete
    [14:58:10.338] slimproto_run:715 output underrun
    [14:58:10.348] sendSTAT:195 STAT: STMd
    [14:58:10.358] send_packet:114 failed writing to socket: 2745
    [14:58:10.358] sendSTAT:195 STAT: STMu
    [14:58:10.378] send_packet:114 failed writing to socket: 2745
    [14:58:10.378] slimproto_run:578 error reading from socket: Unknown error
    [14:58:10.518] slimproto:932 connected
    [14:58:10.518] sendHELO:148 mac: 70:85:c2:a8:ce:68
    [14:58:10.525] sendHELO:150 cap: CanHTTPS=1,Model=squeezelite,AccuratePlayPoints=1, HasDigitalOut=1,HasPolarityInversion=1,Firmware=v1 .9.8-1318-test2,ModelName=SqueezeLite,MaxSampleRate=384000,d sf,dff,alc,aac,ogg,ops,ogf,flc,aif,pcm,mp3

    Note that this is NOT the error we're talking about, this is some other issue.
    I would not expect the console to be blocked, in normal operation, so this does not indicate any issue by itself, I was just forcing it to misbehave.

    Also, looking at the code yesterday, I did notice that the "retry" code exists in 2 places: send_packet() and slimproto_run().
    It seems that the change you made was only to send_packet(), and the handle 2745 was found there. The other one still shows "unknown error".

    Ralphy can you tell me how to find the details of your most recent commit? Last time you provided a link to it, but I don't see a way to get it through github. I'm logged in to github, but I don't see how to find the commits. Please help me navigate to the commits.

    Thanks

  7. #17
    Quote Originally Posted by foopydog View Post
    >> Do note that RAID 5 is a three disk configuration, so one of your disks must function as a spare. Does SSM tell you which one is which, i.e. which one is the spare?

    Actually that's not correct. Raid-5 has one-disk redundancy, but there is no concept of a "spare". Think of it like this: There are 4 disks in the array, all are equal to each other, and when a file is written to the array, the parity is calculated, and stored alongside the file, and it's distributed across disks. Then, if any one of the four disks fails, a new disk is inserted, and all the missing files are reconstructed, one by one, based on the parity.
    The only four disk RAID configuration I know of is Raid 1+0, aka RAID 10, which combines striping with mirror. RAID 5 distributes disk blocks in a 2/3 ratio across three disks, meaning that (unlike with RAID 1) no disk holds a complete copy of the data. A bonus is that this improves disk read speed because the data must be read from two disks. A fourth disk is always a spare with RAID 5, but I guess that a smart controller might rotate the roles on a regular base to get the best out of disk life time.

    PS something that most people tend to overlook is that the disks entered into a RAID configuration are usually the same brand and size and also bought at the same time, thus with a high probability of being from the same manufacturing batch and consequently with a high probability of failing around the same time. RAID is not a substitute for backups and you wouldn't be the first one to lose a substantial amount of data on account of a second disk failing during RAID reconstruction.


    >> If after some time (probably about 5 minutes or so) you see a red X appear over the mapped drive

    Good suggestion, but I confirmed that there is no network outage. I know that when you unplug the ethernet cable, or reboot the NAS, I would see those "red X" showing that the drive-mapping had become disconnected, and when the connection is restored, you only need to click on the drive-mapping, to reactivate it. That feature works as expected, but in this case, there is no network outage. That is, when SqueezeLite loses audio, I can immediately navigate to samba shares, and the network connectivity continues normally. I have examined both windows and synology systems during the outage, but everything is normal. Thus we can say that the SqueezeLite client is the only cause of the outage.
    Actually, you would see this behaviour when you create a drive mapping between two Windows workstations (10/8/7 - non server). It's a sort of OS license-/resource saving mechanism of the workstation that acts as a `server` to forcibly close the connection if it hasn't seen any activity on it for some time. When you click the deactivated mapping it instantly reconnects, but of course if you have something more vital going on like a database connection, that will have been broken and the application will need to be restarted. We see that issue a lot on tiny businesses (<5 computers).

  8. #18
    Senior Member ralphy's Avatar
    Join Date
    Jan 2006
    Location
    Canada
    Posts
    2,790
    Quote Originally Posted by foopydog View Post
    Also, looking at the code yesterday, I did notice that the "retry" code exists in 2 places: send_packet() and slimproto_run().
    It seems that the change you made was only to send_packet(), and the handle 2745 was found there. The other one still shows "unknown error".

    Ralphy can you tell me how to find the details of your most recent commit? Last time you provided a link to it, but I don't see a way to get it through github. I'm logged in to github, but I don't see how to find the commits. Please help me navigate to the commits.

    Thanks
    I don't use github for squeezelite development, so the 1318-test2 changes are not there. I've attached a patch file with the test2 code changes. I forgot to include the patch file in the zip.

    I uploaded 1318-test3 which prints the error code for the slimproto socket calls as well. You'll find a patch file included in that zip with the changes.
    Attached Files Attached Files
    Ralphy

    1-Touch, 5-Classics, 3-Booms, 2-UE Radio
    Squeezebox client builds donations always appreciated.

  9. #19
    Senior Member
    Join Date
    Feb 2011
    Location
    Cheshire, UK
    Posts
    5,464
    There is definitely something in the one of the latest Windows update that causes some issues with networking where the Red Cross status on drive mapping occurs post boot up but doesn’t restore when you try to access. If you restart it then works. I’ve not got to the bottom of it yet.
    Also I’ve noted that IPv6 is forced on even if not enabled in DHCP and you will find an entry for :: in your DNS settings unless you switch it off.
    There is also an SMB 1.0 auto removal process that has been inserted.
    So when we are talking about Squeezelite failing I think we need to know what specific build of W10 we are talking about.
    VB2.4 storage QNAP TS419p (NFS)
    Living Room Joggler & Pi4/Khadas -> Onkyo TXNR686 -> Celestion F20s
    Office Joggler & Pi3 -> Denon RCD N8 -> Celestion F10s
    Dining Room SB Boom
    Kitchen UE Radio (upgraded to SB Radio)
    Bedroom (Bedside) Pi Zero+DAC ->ToppingTP21 ->AKG Headphones
    Bedroom (TV) & Bathroom SB Touch ->Denon AVR ->Mordaunt Short M10s + Kef ceiling speakers
    Guest Room Joggler > Topping Amp -> Wharfedale Modus Cubes
    Everything controlled by iPeng & Material on iOS

  10. #20
    Junior Member
    Join Date
    Dec 2020
    Posts
    15
    Quote Originally Posted by gordonb3 View Post
    It's a sort of OS license-/resource saving mechanism of the workstation that acts as a `server` to forcibly close the connection if it hasn't seen any activity on it for some time. When you click the deactivated mapping it instantly reconnects, but of course if you have something more vital going on like a database connection, that will have been broken and the application will need to be restarted. We see that issue a lot on tiny businesses (<5 computers).
    Ok, that's interesting. Not sure if it applies here, if that's only peer-to-peer configuration - as I'm using a Synology DS918+ to run the Logitech Media Server.
    What we have 'going on' is a client that uses a socket to connect to a host. Pretty simple, and I would not expect the o/s to unilaterally kill a socket. And if windows did this because it "hasn't seen any activity" then ok, but in this case, you couldn't argue that there was no activity.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •