PDA

View Full Version : Playback problem: buffer underrun



John Lilley
2004-08-12, 15:14
Greetings,

I've been having a persistent problem with my network of slimp3 players (not
squeezebox), since updating to the 5.x release.

I'm currently on server version 5.2.1, which I believe is the latest. My
three slimp3 players are firmware version 2.3. The server is running on RH
8.0 Linux distribution with a 3x160 RAID 5 in software for track storage.
It is running on a 1Ghz Athalon with 800MB memory. We have about 225GB and
35000 tracks.

The symptom is: for about every hour or two of playback, the playback will
hang for about 15 seconds before continuing. This occurs with any of the
three slimp3 devices. Another symptom is that, while playback is stalled,
the HTTP interface is also non-responsive, leading me to believe that there
is some critical section, garbage collection, etc that is taking place and
locking the entire server for that time period. I've done tests while this
glitch occurs such as running other programs, checking load, and running
vmstat, but there appears to be nothing unusual happening in a system-wide
sense. I've also tested the players with flood-ping, and successfully
round-tripped millions of echo packets.

I've been in contact with Slimdevices support about this issue, and Kevin
helped me track it down using logging facilities to determine that I have a
buffer underrun problem. After a few email exchanges, I was unable to get
further response from support, so I'm hoping that the discussion group can
help carry this forward. Below are key parts of the conversations we've
had. Any help would be much appreciated, as well as contact from anyone
with similar issues.


Thanks,
John Lilley


ME: ============== 7/7/2004 ================
I'm presently experiencing an occasional "glitch" while playing, it seems
unrelated to the file or playlist. The system will stop playing in the
middle of the song, and after a few seconds either pick up where it left off
or jump to the next song in the playlist.
This seems to happen about once every 20 minutes. I'd like to think its
related to a spike in system activity, but xload isn't displaying anything
unusual.
Is there anything I can do to log what's happening? Perhaps something that
would log the player buffer state or TCP traffic so I can see if something
is happening around breaks in the traffic?

SUPPORT: ============== 7/8/2004 ================
Refreshing the browser shouldn't cause behavior like this. You may want to
turn on -d_stream and see if possibly the players buffer is emptying out or
something. But problems like this are more typically related to a networking
issue... What's your network setup like?

ME: ============== 7/9/2004 ================
I did run a ping -f test from the slimserver machine, and sent a couple
million ping packets to the player with no packet loss. So I don't think
we're dealing with a network problem at the hardware level.
Meanwhile, I notice that the symptoms are really two separate things:
1) More common. If I have a browser window open to the server, it will
sometimes cause the playback to stall when it refreshes the playlist. It
seems to take 10-20 seconds to refresh this window at times. I have the
playlist refresh set to a fairly long period (100 seconds). Is it possible
that it is reading the playlist files again (for MP3 tags) when it
refreshes?
2) Less common. Even with no browser window open, I notice that the player
will occasionally (maybe once every hour or two) cut off a song, pause for a
few seconds, and skip to the next song.

SUPPORT: ============== 7/9/2004 ================
Did -d_stream show anything interesting?

ME: ============== 7/12/2004 ================
Meanwhile, I'm curious if the slimserver doesn't have some periodic event
that would cause it to enter a critical section that could block other
activity? Perhaps there is somewhere in the code where the background
process that's scanning the files gets finished with a complete scan and
wants to sync up the main library?

SUPPORT:
<Don't have copy of message, but basically no, there shouldn't be any such
periodic event that locks things down>

ME: ============== 7/12/2004 ================
Here's what I'm getting in the log during one of these "events"
with --d_stream turned on:
2004-07-12 15:15:21.8596 stream not readable
2004-07-12 15:16:21.7806 ***Stream underrun: 88
2004-07-12 15:16:21.7956 00:04:20:04:0b:ceBuffer drained, pausing playback
2004-07-12 15:16:21.8431 00:04:20:04:0b:ce Buffer full, starting playback
I'm also seeing these messages occasionally, but it doesn't seem to hurt
anything:
2004-07-12 15:12:09.7138 00:04:20:04:0b:ce 1089666729.71369 Timeout on seq:
10139

ME: ============== 7/12/2004 ================
Here's another log snippet. The pause occurred during the 18:16:58 time
period. I don't know if the preceding entries matter at all.
2004-07-12 18:10:21.5495 00:04:20:04:0b:ce 1089677421.54946 Timeout on seq:
60557
2004-07-12 18:14:05.4936 stream not readable
2004-07-12 18:16:58.8964 ***Stream underrun: 64
2004-07-12 18:16:58.8975 00:04:20:04:0b:ceBuffer drained, pausing playback
2004-07-12 18:16:58.9085 00:04:20:04:0b:ce Buffer full, starting playback

ME: ============== 7/13/2004 ================
Here's another occurrence of the "pause" event. This is with the very
latest server, just installed this morning.
2004-07-13 10:14:23.6181 00:04:20:04:0b:ce 1089735263.61802 Timeout on seq:
27897
2004-07-13 10:14:23.6689 00:04:20:04:0b:ce 1089735263.66881 Timeout on seq:
27897
2004-07-13 10:25:54.8151 00:04:20:04:0b:ce 1089735954.81505 Timeout on seq:
49415
2004-07-13 10:30:00.9205 ***Stream underrun: 432
2004-07-13 10:30:00.9214 00:04:20:04:0b:ceBuffer drained, pausing playback
2004-07-13 10:30:00.9512 00:04:20:04:0b:ce Buffer full, starting playback
I don't know that the timeouts are relevant, but included them anyway. Is
there anything else I could be logging to help track this down? I keep
suspecting that there is either some splike in system load that clobbers the
streaming threads, or else some critical section in the slimserver that's
blocking the streaming threads while something is updated, but I'm not
seeing any unusual load events.


ME: ============== 7/13/2004 ================
Did the logs I sent previously tell you anything about what might be going
on? It is definitely a buffer underrun, but the question is why...
Here's the evidence I can see:
1) Logs indicate a buffer underrun, which persists for 5-10 seconds.
2) While the playback is hung waiting for the buffer underrun to correct
itself, the web interface and players are unresponsive. This would seem to
indicate some central block on the server's execution.
3) While the playback is hung, other activity on the Linux machine seems
fine. I can access it over the network, browse the disks, etc. So it
wouldn't seem to be a general system problem.
4) I can flood-ping the player with millions of packets and zero loss.

Jason Holtzapple
2004-08-12, 15:29
John Lilley wrote:
> The symptom is: for about every hour or two of playback, the playback will
> hang for about 15 seconds before continuing. This occurs with any of the
> three slimp3 devices. Another symptom is that, while playback is stalled,
> the HTTP interface is also non-responsive, leading me to believe that there
> is some critical section, garbage collection, etc that is taking place and
> locking the entire server for that time period.

If you have the tag cache enabled, I believe it is saved to disk every hour.
That shouldn't be enough to cause your symptoms unless your system is already
stressed from other tasks. You could use --d_info debugging to see when the
save is happening to see if it is the culprit.

--Jason

Jack Coates
2004-08-12, 21:17
> Greetings,
>
> I've been having a persistent problem with my network of slimp3 players
> (not
> squeezebox), since updating to the 5.x release.
>
> I'm currently on server version 5.2.1, which I believe is the latest. My
> three slimp3 players are firmware version 2.3. The server is running on
> RH
> 8.0 Linux distribution with a 3x160 RAID 5 in software for track storage.
> It is running on a 1Ghz Athalon with 800MB memory. We have about 225GB
> and
> 35000 tracks.
....

I'd suspect that's not enough memory for that many tracks. Got other stuff
running too?

cat /proc/`pidof slimserver`/status | grep VmSize

free -m

You say vmstat looks okay, I assume you're watching for cpu, disk i/o and
swap pages i/o?

--
Jack At Monkeynoodle.Org: It's A Scientific Venture...
"Believe what you're told; there'd be chaos if everyone thought for
themselves." -- Top Dog hotdog stand, Berkeley, CA

Scooter
2006-03-18, 20:25
John,

I have just started to get similar problems with my Slimp3 player (doesn't seem to affect the later players. They have bigger buffers but I haven't done much testing to see if the interface hangs on those from time to time as well). Mine are all connected wirelessly, and so I always look at this first whenever theres a problem. It's run for 3 years without issue but now for some reason, this one player skips every few minutes, and the interface locks up too. If the outage is short, it will recover. If longer, the player chnages state to "Stopped".

I did much the same tests as you have done - pinging the player from the server and observing the lights on the WiFi bridge at the player end tell me that packets are still streaming to the player during these outages. I observed the processor load on the server which was always very light. Just to be sure I reniced all the slimserver user's procs to -10.

I also fired up a laptop as a player with a WiFi card and placed it next to the WiFi bridge for the player - this plays just fine, but I was just playing the "stream.mp3" in a software player so the buffer was most likely huge in comparison.

Polling the status on the WiFi bridge, it consistently gets 80% signal strength. I swapped out the 11b WiFi bridge for a 11g one - no change. Re-flashed the AP to latest firmware. Reflashed the Slimp3 to latest firmware. Also tried ramping up the TX power on the AP - which is reflected in the signal strength at the player end but makes no difference - as throughout all these outages, the network appears to perform faultlessly.

As packets are still coming thick and fast when playback stops (and the counter halts a few seconds earlier on the player), I suspect that the fault lies in the firmware on the player (the server is, after all, still transmitting the stream). This is why I reflashed it, in case the image had become corrupted somehow but it had no effect. It may be that it's the ACKs and other replies from player to server that are failing to be received, and so after a while the server (or some other layer in the network stack) gives up. I played around with the antennae settings on my AP

I'm going to experiment with the player some more tomorrow on a wired connection straight into the server's switch to see if I can get it to do the same (and eliminate the network as the problem).

One other thing I've considered: I notice a lot of ARP traffic on my network, with the server constantly asking for the MAC of the player, and vice versa etc (plus every other device chattering ARP). I might try some static ARP entries for the players and see if that cures it. In the back of my mind though - I keep coming back to the fact that this all worked fine up until a couple of days ago - and as far as I can tell - nothing has changed.

Cheers,
Owen

Scooter
2006-03-19, 16:56
Just noticed this thread is quite old :P

Anyway - connected the Slimp3 up directly to the switch. I'm still seeing lots of this:-

2006-03-19 21:45:15.5702 00:04:20:03:01:a2 1142804715.57026 Timeout on seq: 21877
2006-03-19 21:45:15.6230 00:04:20:03:01:a2 1142804715.62301 Timeout on seq: 21877
2006-03-19 21:45:15.6758 00:04:20:03:01:a2 1142804715.67577 Timeout on seq: 21877
2006-03-19 21:45:15.7284 00:04:20:03:01:a2 1142804715.72845 Timeout on seq: 21877
2006-03-19 21:45:15.7812 00:04:20:03:01:a2 1142804715.78119 Timeout on seq: 21877
2006-03-19 21:45:15.8339 00:04:20:03:01:a2 1142804715.83391 Timeout on seq: 21877
2006-03-19 21:45:15.8866 00:04:20:03:01:a2 1142804715.88666 Timeout on seq: 21877

But crucially - no buffer underuns, so despite the initial evidence, it looks as though this is caused by the wireless connection - which makes me happier, as this is something I can do something about! I've got the signal strength up to 90% now at the SLimp3 site with "quality" at "60". Not sure what this number represents but it's bigger than it was. Doesn't seem to have improved anything though, although I noticed the latency on that leg is quite erratic typically from 4 to 10ms with the occasional spike. Tried limiting the transmit rate at the player end to 1Mb/s (as most of the data flows into the player).

I don't see any of these timeouts on the SB2, and presumably that's because it uses a completely new TCP based protocol designed to work on higher latency and more complex networks. Maybe it's time to retire the Slimp3 to a wired location. Quite frustrating though as it worked in this location for years. I resisted buying more SB2's as they're not the prettiest of boxes, but the new SB3's look nice.

Scooter
2006-03-21, 18:36
The solution:

http://www.econnecthomeplug.co.uk/products.aspx

Works faultlesly so far. Doesn't work when plugged into extension trailers - but less latency than 802.11g. This application doesn't really test the throughput, but tests I read indicate good results.

For 42 each - forget all your WiFi woes! (ok you can forget them with a roll of CAT5 too, but I'm lazy..) :P