Home of the Squeezebox™ & Transporter® network music players.
Page 2 of 3 FirstFirst 123 LastLast
Results 11 to 20 of 25
  1. #11
    Senior Member
    Join Date
    Oct 2005
    Location
    Ireland
    Posts
    17,921
    Quote Originally Posted by ScottAM View Post
    It looks to me as if the squeezebox is in fact reading the TCP stream more or less steadily (once the buffer is full) - but at a certain point is just stops adding what it reads to the squeezebox's play buffer in any sort of timely fashion, and what it has just gets used up in a more or less linear fashion. Before that point, the server is clearly having no problem keeping the squeezebox fed. Once it starts stuttering, the fullness bounces around between 0.0 and 0.04, if that.
    That sounds like correct behaviour right up to the stuttering so it seems your server ghas waited too long to send data so player just runs out, get topped and then runs out again - all the time the server is too late in providing the data. The problem may be as far back as in the initialisation phase (e.g. threshold).

    You are presenting your analysis form your point of view which may or may nor be wrong. As we don't know your system - if we make suggestion it will be on our assumptions of LMS.
    It would be better to supply a raw pcap file of playing say a short track (filtered to just include server and player IP) and then we can analyse the data without preconceptions.

    Additional info of what time you hear stuttering would be helpful.

    edit:

    What is the audio format of the stream sent to the player ?
    Last edited by bpa; 2019-09-20 at 00:07.

  2. #12
    Senior Member
    Join Date
    Oct 2005
    Location
    Ireland
    Posts
    17,921
    What is the audio format of the stream to help make sense of the periodic buffer emptiness

    It looks like about 6 empties per 20 secs (hard to tell having to measure from a screen and no raw data).

    If the buffer being sent is 16k - this mean about 3 secs per 16k - are you sending 16k of PCM 44.1kHz 16bit/ 2 chan data ?

    Why not compressed ?
    Why not 32k which is what LMS uses ?

  3. #13
    Junior Member
    Join Date
    Sep 2019
    Posts
    18
    I've tried various buffer sizes, from 4k to 128k. Buffer size doesn't change the behaviour, at least in that range.

    The song in question is a .flac file. If you want to look at it it's at http://ikrast.herbison.com:82/Johannes as Benedictus.flac. The other .flac file, Carol of the bells, doesn't exhibit much if any stutter. Please delete any copy you take when done.

    The timing data I collected from the server is in the directory as why.csv. The columns are:

    . time since start of server in ms (I started the song about 83 seconds in)

    . nanoseconds spent reading the next chunk of the file. Negligible in all cases (internal SSD).

    . Number of bytes the read returned from the file. 16K every time.

    . nanoseconds spent in the socket send() call. Initially small, but at millisecond 84393 it goes up sharply and ends up taking a significant fraction of a second each time.

    . The number of bytes send() claims to have transferred, 16k each time

    . the amount queued/unacknowledged in the system TCP buffer, reported by ioctl and ...OUTQ. Doesn't seem remarkable.

    . The squeezebox's buffer fullness. These are requested periodically and the last value fetched is written to each row. This immediately approaches 1 and stays there, but starts to nosedive around ms 187826. Note that the size and rate of the TCP sends() doesn't change significantly; the same general rate of sending that kept the player topped off before suddenly isn't feeding the buffer much of any. Note that Nagle is off, not that it matters much in this case, and turning it back on didn't change anything.

    It's worth noting again that the loop that reads from the disk and writes to the socket, is running with Linux's real time priority of 97, and nothing else in the system runs that high. I don't believe there are pauses being injected by scheduling - the time to send() each buffer dominates the loop and it's presumably that slow because the squeezebox has quenched the stream.

    The code is at s.cpp. The loop at line 19 is there to make sure the while buffer gets sent in case send() writes only part of the buffer, but it's always taken exactly once because res == howMany is always true. The logger (ts) is lightweight and writing to an SSD and isn't injecting timing issues.

    The big mystery here is why are the send()s as slow as they are when the squeezebox is clearly draining the play buffer. When the play buffer is 99.9% full it makes sense that it would pause the TCP stream - and clearly those pauses don't prevent the play buffer from staying 98+% full for most of the play. The question isn't "why isn't the server sending fast enough." I think it's sending whenever it's allowed to.

    I haven't figured out wireshark yet, but when I do I'll try to capture just the music stream. Pointers welcome; it's a complicated unfamiliar interface.

  4. #14
    Senior Member
    Join Date
    Oct 2005
    Location
    Ireland
    Posts
    17,921
    Quote Originally Posted by ScottAM View Post
    I haven't figured out wireshark yet, but when I do I'll try to capture just the music stream. Pointers welcome; it's a complicated unfamiliar interface.
    I'll read the rest of post later but please please just give a pcap dump of playing a track. You can use wireshark (enable logging, and then save ) or tcpdump - no analysis just raw data.

    A few years ago I spent a lot of time looking at stuttering with SB3 when playing live RealAudio streams which had to be transcoded- trying to do clever things with the data never worked. Problem can come about from the chain of processes ultimately leading to the TCP socket.

    Processing power and OS scheduling "realtime" are not issues here - Slimserver was able to run on NSLu2 a 266MHz ARM processor with only 32Mb RAM and v. early Openwrt standard Linux.

    The issues is the flow of data from source to player. How many processes, output socket control, if processes/threads how is data exchanged e.g. pipes, shared memory.
    My gut feeling is that output to TCP socket should either be non-blocking or else in a dedicated thread/process which has lots of data ready to send.

  5. #15
    Senior Member
    Join Date
    Oct 2005
    Location
    Ireland
    Posts
    17,921
    Just looking at code a look at s.cpp - is musicSocket open as Blocking or non-Blocking ?

    If opened as Blocking then possibly the chunkiness of buffers which do not align with how data is consumed by the player may end up with player having too few bytes in buffer.

    The LMS way to handling streaming audio is to use non-Blocking socket and then write as much data as possible until a EWOULDBLOCK is returned (i.e. do not sleep on notional internal buffer boundaries - replenish instead) - sleep when output buffers are all full. The Sleep can then be at least 200ms. 16ms seems to be too small.

    If I can I understand the dynamics of why.csv - I'll see if it sheds light.

  6. #16
    Junior Member
    Join Date
    Sep 2019
    Posts
    18
    Quote Originally Posted by bpa View Post
    Just looking at code a look at s.cpp - is musicSocket open as Blocking or non-Blocking ?

    If opened as Blocking then possibly the chunkiness of buffers which do not align with how data is consumed by the player may end up with player having too few bytes in buffer.

    The LMS way to handling streaming audio is to use non-Blocking socket and then write as much data as possible until a EWOULDBLOCK is returned (i.e. do not sleep on notional internal buffer boundaries - replenish instead) - sleep when output buffers are all full. The Sleep can then be at least 200ms. 16ms seems to be too small.

    If I can I understand the dynamics of why.csv - I'll see if it sheds light.
    Blocking. Let me point out that doing it this way worked for years, when the code ran on Windows and then on a raspberry pi. It appears to be moving it to Linux Mint on a NUC that made it fail (to prove that I need to move it back to a pi, which is some work). Maybe Linux Mint handles sockets differently than Raspian in some deep, strange way, but I'd be shocked.

    I think this is where I am confused. TCP is a stream protocol. The receiver can't (at least, better not) be able to tell whether the sender is using blocking or nonblocking calls. It's all just a stream of bytes to the receiver. Usually you use nonblocking writes when you have one thread driving several sockets and the thread doesn't want to get caught pending on any one of them when others are still writable; so usually you'd use select() to figure out which one is writable, and go write that one, and let select() block you when everything's full. It's an old school approach and maybe LMS does it that way, but I have a thread per client, so I should - in theory - be able to just write(), and if it blocks because the client hasn't caught up yet, that's fine, that's what blocking writes are for. There should be no difference between repeatedly writing to a non-blocking socket until it EWOULDBLOCK (and then, what, backing off and trying to send again later?), or doing a write that does, in fact, block. The receiver should see no difference because it's the same bytestream either way; if anything, blocking writes should deliver the bytes faster because there's no userland code getting involved to move the next piece along.

    But I will go down the path of madness and implement nonblocking writes of small pieces of the stream, and sleep briefly when I get EWOULDBLOCK, because I have nothing to lose. I mean maybe Linux has some terrible bug that causes blocking send() to fail to return promptly when bytes are delivered. That's about unthinkable but it's the only idea I can come up with.

    Of possible interest: I installed LMS and it upgraded the Classic to version 131, as expected. This combination is able to play the song without stuttering (so at least the hardware is ok). But when I made my own server talk to the upgraded Classic, it failed after just a few seconds (it didn't stutter, it just stopped, but the fullness size was 0 when it stopped.)

    So clearly LMS knows something about writing buffers to squeezeboxen that I don't. Unfortunately I'm bad at reading Perl, but I can see it's using nonblocking writes, so...

  7. #17
    Senior Member
    Join Date
    May 2008
    Location
    Canada
    Posts
    5,166
    any chance that the NUC has this jumbo frames option enabled? Have you looked at the options set in your NUC regarding MSS and MTU - any potential issue with pMTUd? Similar questions with TCP window size
    LMS 7.7, 7.8 and 7.9 - 5xRadio, 3xBoom, 4xDuet, 1xTouch, 1 SB2. Sonos PLAY:3, PLAY:5, Marantz NR1603, JBL OnBeat, XBoxOne, XBMC, Foobar2000, ShairPortW, JRiver 21, 2xChromecast Audio, Chromecast v1 and v2, , Pi B3, B2, Pi B+, 2xPi A+, Odroid-C1, Odroid-C2, Cubie2, Yamaha WX-010, AppleTV 4, Airport Express, GGMM E5

  8. #18
    Senior Member
    Join Date
    Oct 2005
    Location
    Ireland
    Posts
    17,921
    Up to the posting of the your code , all we have is your description of the problem and your interpretation of the behaviour. So all we can do is offer places to look where things are different. That said my comment are my interpretation of siutation coming from a LMS POV and so are also biased.

    A pcap dump of the network conversation between your servers and player would enable the exchange to be analysed and compared to LMS to see if there is anything unusual or different (e.g jumbo frames). TCP is a streaming protocol but that the flow of data can be lumpy or smooth depending on data handling

    I am well aware your system has worked for many years on a Pi - so I'm trying to think what is different from a network and computing POV.
    * between Squeezelite (e.g. Linux based) & older IPK players such as Classic.
    * between Pi and NUC
    * between your system and LMS

    Squeezelite vs Classic - main difference is larger buffers. There are types of internal player buffer: received data and decoded data. IIRC With Classic Data is first put into received data buffer but not acknowledged until audio is moved into decoded buffer (i.e. ready to play) so TCP windows edge is moved when audio is played and space is made in decoded buffer. With Linux based players , data is acknowledged as it is received and put unto internal buffer. Linux based players also have unacknowledged data which are kept in Linux network buffers. So sending data after TCP window edge movement is critical to smooth playing on Classic but not so critical on Linux based players all the more so with Classic smaller buffers.

    Pi vs NUC - I'm guessing Pi is a Pi1 or 2. PI1 had only 1 core but Pi2,3&4 and Intel Celeron all have 4 cores. If Pi1 was used then 1 core is significant. NUC is a lot faster and NUC has a more advanced & faster network chip. All this adds up to "timing" probably underlying issue. On Pi your system was probably more "linear" as it was slower, use Blocking and more task by software. NUC will be faster and so behaviour may be more in parallel. There may be have been a latent issue here. On LMS a stuttering issue showed up on Windows system (mainly high end and low end) - turned out to be bug in event handling (e.g write compete, read ready ) which had been in LMS for many years but only showed up with newer faster x86 systems and more than 4 cores. Newer network hardware may have resulted in changed network options.

    Your system & LMS - Not enough info regarding protocol implementation in case slimproto frames are used differently or in different order and how network connection is set up (e.g. TCP options) . Only thing that can be seen is using Blocking vs Non blocking I/O. Blocking is more linear whereas non-blocking is better able to handle tasks in parallel - this is an observation.

    I think to solve this problem, I think a pcap dump is essential if only to eliminate a class of possible problems. It will also show what is happening and what is going wrong and possibly work backward to source of the problem using the why.csv data. Don't put effort into rewriting code - put it into getting the raw pcap captured data to understand the problem.

    At them moment I am comparing why.csv with an LMS equivalent to see if buffer filling/transmission "behaviour" is different/similar.
    Last edited by bpa; 2019-09-21 at 00:14.

  9. #19
    Junior Member
    Join Date
    Sep 2019
    Posts
    18
    I'll be posting a tcpdump real soon now. I checked and the mtu setting is 1500. I tried with pMTUd both on and off, there's no difference. I wouldn't' expect any. There's a single, reasonably modern switch between the server and the squeezeboxes (I've tried a couple, actually) and they wouldn't be messing with packets.

    Implementing nonblocking writes made no difference, but I'd be horrified if it had. As far as I can tell, LMS is single threaded (if so, not a shock - threads are a horror show in perl, just like most everything else in perl). So it was forced to use nonblocking I/O everywhere and use select() to figure out what to do next. It wasn't because it was the better choice or had special characteristics - they just didn't have another option, and the LMS code is the spaghetti it is because they didn't have better choices. They get credit for making it work at all, but the code is littered with bug tracking and you can begin to see why the product died.

    With the Classic at 131 and using my server, many songs hit the dreaded empty play buffer condition almost immediately. In 131, when that happens it just halts playing, so instead of stuttering it stops cold. These are all good things as far as my testing is concerned - the stuttering was hard on my nerves, and so was waiting for over a minute to see if things would fail each time. Plus I was getting very tired of hearing the first 40 seconds of Benedictus over and over.

    The new symptoms are interesting. Since I'm getting play buffer fullness reports every 5 seconds now, I can see that initially the buffer gets some trivial amount of data - 0.01 or so, then it rises a very little but quickly crashes back to 0. How fast it crashes back to 0 depends on the song. For some songs, though, it never hits 0, the buffer fullness rises towards 0.99 and the song plays straight through. When I can see the fullness starting to crash towards 0, if I'm quick enough I can pause the stream, wait a few seconds, and continue, and things are better for awhile.

    There's an interesting aspect to that. When a song is paused, my server keeps sending the song stream at the squeezebox. The local OS TCP buffer is just about always maxed out or nearly, and it's obvious that the squeezebox isn't "sipping" from the stream very quickly, though why I don't know - it's got nothing else to do when the music is paused, and you'd think it would drink the data at network speeds. But it doesn't, and more interestingly... while the song is paused, I'm still queryng the buffer fullness over and over and it does not move. Even though it's taking in data from the TCP stream, it's not going into the play buffer, or if it is it's forgetting to report that. And then I unpause the stream and things get very weird - instead of the buffer fullness jumping to a much higher number (due to all the previous sipping it was doing), it instead rises in steps over a number of seconds. I don't know where it was squirreling away the data I shipped (or why it read it so slowly) but it doesn't seem to end up in the classic's play buffer unless the song is playing!

    When I'm doing now is running strace on LMS - the code is too messy to follow, and all that matters is what system calls it's making anyway. I can see, so far, that it's just sending stream data in a nonblocking fashion in a way that's probably close to identical to what my server is doing. But, while I have to dive deeper into this, I think it's sending different strm commands at the very beginning of the song. It's possible LMS is deliberately or accidentally delaying the start of playing until there's more of the song shipped - in effect, pausing play for long enough to let even the glacially slow classic's reads make a little progress.

    I'll strace my server, and LMS, and take a hard look at any differences.

    I wish I could get my hands on the original coders for LMS - I bet they have stories to tell.


    Funny note, unrelated to the problem - when I installed LMS, I gave it exactly one song in the library. After fussing with getting everything working, I navigated to it with the remote and played it - not hard when there's just one song. It played fine - except the intro was longer! I have no idea how it happened, but LMS had somehow jumped from the one song in my local library to an online version of the same song... I had a few horrible moments of wondering if my .flac file had been corrupt all this time and the old firmware had just skipped part of the song for years - but no, I just needed to restart things and make sure it was really on the local library. I still have no idea how it found a different version of Benedictus...

  10. #20
    Senior Member
    Join Date
    Oct 2005
    Location
    Ireland
    Posts
    17,921
    Quote Originally Posted by ScottAM View Post
    I wish I could get my hands on the original coders for LMS - I bet they have stories to tell.
    Remember Slimserver was originally written for SliMP3 which had a tiny amount of buffer space and had to run on a Linux box with slow processor with little memory. So resorting to tricks and "odd" behaviour is understandable in the context.

    Like you at first I was horrified at LMS but LMS is not bad once you get your head a map of the system and how to separate the "threads" (I used to work on embedded system in network devices using simple non preemptive scheduler loops so it is familiar). What is good about LMS is that works across many platforms and how it is has been extended beyond original intent.

    Enable logging network.protocol.slimproto to see LMS side of the slimproto conversation although for some info DEBUG is necessary but then it becomes too verbose
    Enable network.http to INFO to see how the buffer writes to the player (i.e. attempts 32k then like your system send remainders until all gone)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •