PDA

View Full Version : Network & Server Health Plugin



Triode
2005-10-03, 16:53
Tomorrow's 6.2 nightly should include an extra plugin which collects data on network and server health. Please try it and let me know what you think. [Should appear in "settings" section of the home page]

The plugin collects data on key network and server metrics in the form of logging the number of samples that fall within a certain range, together with max, min and average values.

The Summary section attempts to summarise the health of a player by looking at the data collected. This is based on thresholds build into the plugin which decided whether for example the signal strength is good or poor or intermittent. These need tuning - please let me have your feedback on this.

The Player and Server performance sections below this show graphs of all the data recorded.

Information gathered at present:

Per player:
- signal strength
- buffer fill
- congestion on control connection [user interface connection to player]

Server:
- reponse time of server
- accuracy and length of timer tasks *
- length of scheduled tasks *
[* probably only of interest in diagnosing server issues within plugins and to developers etc]

Feedback definately appreciated - especially on the text descriptions and on the thresholds used to trigger the summary status.

Data collection is only enabled if you turn it on via the plugin or start the server with --perfmon. However I don't expect there to be any noticable performance penalty of leaving it enabled.

Thanks to kdf for the help with the web page.

Adrian

dean
2005-10-03, 20:50
While this is very cool, I'm a little nervous about two points:

1. It's pretty late in the 6.2 cycle to add this in the release.
I'd like feedback on the usefulness vs. risk of this from folks.

2. I don't think that a link to this belongs on the main home page.
Can we have it under Server Settings -> Plugins please?

Thanks,

dean

On Oct 3, 2005, at 4:53 PM, Triode wrote:

>
> Tomorrow's 6.2 nightly should include an extra plugin which collects
> data on network and server health. Please try it and let me know what
> you think. [Should appear in "settings" section of the home page]
>
> The plugin collects data on key network and server metrics in the form
> of logging the number of samples that fall within a certain range,
> together with max, min and average values.
>
> The Summary section attempts to summarise the health of a player by
> looking at the data collected. This is based on thresholds build into
> the plugin which decided whether for example the signal strength is
> good or poor or intermittent. These need tuning - please let me have
> your feedback on this.
>
> The Player and Server performance sections below this show graphs of
> all the data recorded.
>
> Information gathered at present:
>
> Per player:
> - signal strength
> - buffer fill
> - congestion on control connection [user interface connection to
> player]
>
> Server:
> - reponse time of server
> - accuracy and length of timer tasks *
> - length of scheduled tasks *
> [* probably only of interest in diagnosing server issues within
> plugins
> and to developers etc]
>
> Feedback definately appreciated - especially on the text descriptions
> and on the thresholds used to trigger the summary status.
>
> Data collection is only enabled if you turn it on via the plugin or
> start the server with --perfmon. However I don't expect there to be
> any noticable performance penalty of leaving it enabled.
>
> Thanks to kdf for the help with the web page.
>
> Adrian
>
>
> --
> Triode
>

dean
2005-10-04, 08:10
One thing I noted was that I consistently have 6% of samples with
buffer fullness of zero while playing a song overnight.

I _think_ this is when we're playing out the end of the song during
playout_play or playout_stop.

I suggest that we not sample when in those playmodes since it's
acceptable to have an arbitrarily low buffer fullness when we're at
the end of a song.

Does this make sense?

-dean

On Oct 3, 2005, at 4:53 PM, Triode wrote:

>
> Tomorrow's 6.2 nightly should include an extra plugin which collects
> data on network and server health. Please try it and let me know what
> you think. [Should appear in "settings" section of the home page]
>
> The plugin collects data on key network and server metrics in the form
> of logging the number of samples that fall within a certain range,
> together with max, min and average values.
>
> The Summary section attempts to summarise the health of a player by
> looking at the data collected. This is based on thresholds build into
> the plugin which decided whether for example the signal strength is
> good or poor or intermittent. These need tuning - please let me have
> your feedback on this.
>
> The Player and Server performance sections below this show graphs of
> all the data recorded.
>
> Information gathered at present:
>
> Per player:
> - signal strength
> - buffer fill
> - congestion on control connection [user interface connection to
> player]
>
> Server:
> - reponse time of server
> - accuracy and length of timer tasks *
> - length of scheduled tasks *
> [* probably only of interest in diagnosing server issues within
> plugins
> and to developers etc]
>
> Feedback definately appreciated - especially on the text descriptions
> and on the thresholds used to trigger the summary status.
>
> Data collection is only enabled if you turn it on via the plugin or
> start the server with --perfmon. However I don't expect there to be
> any noticable performance penalty of leaving it enabled.
>
> Thanks to kdf for the help with the web page.
>
> Adrian
>
>
> --
> Triode
>

MrC
2005-10-04, 09:25
Hello All,

I'd like to say Thank You for the plug-in !

Dean requested feedback regarding the plugin's value. Personally, I would think it invaluable for everyone who helps diagonse and respond to user's problems via technical support or online. So, my vote would be to ensure that it gets into 6.2.

Buffer emptiness is part of normal operation towards the end of a track, and it would be expected. I think putting in special logic to filter out end-of-track statistics is probably not necessary, and skews reality. This is best left to the diagnosticians. Since you can clear the counters, and since the plugin is for diagnosis only, I would suggest leaving it as is (especially if it will impact inclusion into 6.2!). Those that need its use during diag can clear the counters before track playback, and watch updates during playback for anomolies.

If possibly, a future enhancement might be to allow the page to be undocked (or launchable in a new window), as having statistics available while browsing other web would be very useful.

mherger
2005-10-04, 09:54
> If possibly, a future enhancement might be to allow the page to be
> undocked (or launchable in a new window), as having statistics
> available while browsing other web would be very useful.

You can always open pages in a new window (or tab in FF/Opera). Right
click the link to get the option to do so. You could then bookmark that
page.

--

Michael

-----------------------------------------------------------
Help translate SlimServer by using the
StringEditor Plugin (http://www.herger.net/slim/)

kdf
2005-10-04, 09:59
Quoting MrC <MrC.1wdyw1 (AT) no-mx (DOT) forums.slimdevices.com>:

> If possibly, a future enhancement might be to allow the page to be
> undocked (or launchable in a new window), as having statistics
> available while browsing other web would be very useful.

Right click on "Enable Performance Monitoring" and you should get a
context menu
with the option to open in a new window (IE), or right click anywhere in the
frame and look for the option to open frame in new tab/window (Firefox)

-kdf



----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

kdf
2005-10-04, 10:10
I agree this should be a helpful tool for tracking down perfomance
issues. Probably moreso for those who spend their time tweaking for
every extra
percent.

some notes:

Using Default skin, once entering the Network Health page, there is no
easy way
home aside from the back button. Some sort of pwd_list would be more
consistent with the rest of the webUI.

Given that this is a performance-based feature, has there been any profiling
done to:
1) make sure that when disabled, performance is the same as before this was
added (to address some of Dean's concern)
2) track how much (if any) performance drop may be due to performance
monitoring
when enabled.

-kdf


----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

Triode
2005-10-04, 11:29
Given that this is a performance-based feature, has there been any profiling
done to:
1) make sure that when disabled, performance is the same as before this was

When dissabled, the only additional overhead are a few extra checks of the global variable $::perfmon (just like adding a few more debug statements). I can't measure any cpu load of this. There is a little bit more memory as a data structure is set up at startup for each item monitored. But other than this you should not notice it...!



2) track how much (if any) performance drop may be due to performance
monitoring
when enabled.

I have monitored cpu load and time to log and can't notice any significant difference when logging. The critical performance related ones incur less penalty if the log is at the beginning of the list of buckets anyway (as log walks an array of buckets to decide which counter to increment).

So in summary, yes I've profiled and not measured anything significant.

fcm4711
2005-10-04, 12:21
Hi all

Even though my SLIMP3 player is playing music w/o problem I get the following warning which is a bit confusing:

Warnings
Slimserver cannot find a player. If you own a player this could be due to your network blocking connection between the player and server. Please check your network and/or server firewall does not block connection to TCP & UDP port 3483.

Felix

Triode
2005-10-04, 12:35
Felix,

For the moment would:
"Performance measurements are not available for a SLIMP3 player."

Be more appropriate?

(I don't have a slimp3 and know the protocols are different so don't know how to log/interpret them correctly..)

Dean,

Yes logging buffer fill while on playmode 'play' gets rid of most of the 0 buffer fill.

mkosma
2005-10-04, 14:10
Triode: the performance measurement stuff is very cool. I like it a lot.

One question, though. The on-screen display as well as your "buffer fullness" histogram report negative fullness %. What's up with that? Here's an example after playing one song (and experiencing my start-of-track glitch that is the subj of another thread):

< 10 : 62 : 33% ################
< 20 : 2 : 1%
< 30 : 1 : 1%
< 40 : 1 : 1%
< 50 : 2 : 1%
< 60 : 1 : 1%
< 70 : 2 : 1%
< 80 : 28 : 15% #######
< 90 : 42 : 23% ###########
< 100 : 45 : 24% ############
>=100 : 0 : 0%
max : 99.931717
min : -300.000000
avg : 3.117672


Is this normal?

Triode
2005-10-04, 14:11
Thanks for the feedback.

I've updated the plugin based on feedback received thus far:
1) Moved to help section of home page
2) Added home / pwd link at top of page
3) Stopped logging buffer fullness while playing out the end of tracks [though on SB2 it will still record lower fill at the start of tracks due to the size of the buffer]
4) Added explicit message for Slimp3 (rather than can't find a player message)

Adrian

MrC
2005-10-04, 14:12
The on-screen display as well as your "buffer fullness" histogram report negative fullness %. What's up with that?

I reported negative buffer situations to the softsqueeze author, as I saw it on softsqueeze a number of times. There has been no response. Are you seeing this on SB or softsqueeze? I presume softsqueeze.

And since the negative number is appearing now under the Health page, it would seem the problem is not related to softsqueeze, but to the lower level software?

Triode
2005-10-04, 14:20
Re negative buffer fill - I think this is just with softsqueeze - I see it too with softsqueeze but not with real squeezeboxes.

Is there a bug registered against this - if not it would be good to raise one. Though I think the author is busy at present.

MrC
2005-10-04, 14:27
Bug 2251 created:

http://bugs.slimdevices.com/show_bug.cgi?id=2251

cbemoore
2005-10-04, 14:57
I have 2 Squeezeboxen, but the plugin will only give the statistics for the first one (regardless of which player I have selected in the right hand pane). Is there any way to get stats for both my players?

Chris

Triode
2005-10-04, 15:09
It should work for all players. Try going to the home page, selecting the relavent player in the right pannel and then clicking on the plugin. Which skin are you using?

If that does not work, you can force the player in the url to the right one - the page takes ?player=<playerid>

cbemoore
2005-10-04, 15:17
I just tried going back to the home page and doing a complete refresh, and it now seems to work.

I guess there was a cached entry somewhere before....

Thanks! All working perfectly now!!

Triode
2005-10-04, 15:25
Re negative buffer fill with softsqueeze - I think I've found the problem and it is a server bug so fixable without Richards help. Will look to resolve.

MrC
2005-10-04, 15:32
Re negative buffer fill with softsqueeze - I think I've found the problem and it is a server bug so fixable without Richards help. Will look to resolve.
Ah, excellent!

mkosma
2005-10-05, 05:29
I stayed up late to test this fix, but it doesn't look like the windows exe made it into the 10-5 nightly build. Was there a problem? I'll try it out as soon as it shows up.

MrC
2005-11-09, 10:11
A number of users have noticed low buffer rates, but are not able to further diagnose because they don't know exactly why. (see one example at:

http://forums.slimdevices.com/newreply.php?do=newreply&noquote=1&p=64101

I'm wondering, is it possible to modify the Network health plugin to add information that shows the connection or transfer rates? Perhaps buckets like:

< 1mbps : 85 : 80% ###################
< 2mbps : 20 : 10% ##
...
< 54mbps : 0 : 0%
< 108mbps : 0 : 0%

While this would not show what is causing the low transfer rates, it would at least be demonstrative for users to see that indeed their connection and transfer rates are historically very slow.

seanadams
2005-11-09, 10:35
It's not easy to provide useful data here because during normal operation we only measure the transfer speed, not link capacity. That's not too useful because it's just going to tell you the bit-rate of whatever you're playing.

Statistics about packet retransmission rates and such are not available to the server because they happen in the OS kernel. In squeezebox firmware we could collect some data, but only what the receiving end sees, which isn't as helpful.

We'd need to add some other machinery eg a TCP througput test or ping test which the user would initiate.

pfarrell
2005-11-09, 10:41
seanadams said:
> We'd need to add some other machinery eg a TCP througput test or ping
> test which the user would initiate.

It is usually fairly easy (a SMOP) for a client to record the
effective incoming data rate. From that, and a little bit
of knowledge about what the stream is (i.e. FLAC, so expect 70kb/s
or low-res MP3, so expect 10kb/s) you can have the client know
that it isn't being feed fast enough (altho it can also count
dropouts due to buffer empty). So with two separate metrics,
the client could tell a suitable server that this link is
unlikely to work they way people want. Even suggest doing
something to fix it, like transcoding to a lower rate, or
using a wired connection, or turning off the microwave oven.

The good part of this is that a lot of accuracy is not needed.
You just want to know one or so significant digit.

Pat
http://www.pfarrell.com

MrC
2005-11-09, 11:21
It's not easy to provide useful data here because during normal operation we only measure the transfer speed, not link capacity. That's not too useful because it's just going to tell you the bit-rate of whatever you're playing.
I had figured link capacity would be not likely, but am not sure I follow your reasoning about transfer vs. bit rate. Aren't the two separate because of the buffer. In other words, isn't (or shouldn't) 10 seconds of 192kbps music be transferred to SB[23] faster than it is played? If so, then wouldn't a little timer math give you true transfer speed and not playback speed?


Statistics about packet retransmission rates and such are not available to the server because they happen in the OS kernel.
Yes, understood.


We'd need to add some other machinery eg a TCP througput test or ping test which the user would initiate.
This was actually another suggestion I was going to ask about. It might be a very nice diagnostic tool to separate out the server and its operations from just network data xfer to the device. It seems the vast majority of user problems with dropouts are due to network problems.

Triode
2005-11-09, 12:01
Sean,

One thing I wondered about was whether it is possible to get other stats out of the wireless mac and report them back to the server. Signal strength is useful, but it does not seem to capture the case of noise bursts etc which cause loss of throughput. I'm thinking of anything which gives a good summary of instantaneous wireless throughput.

I did think about something to proactively test the link, but didn't find an easy answer. To send pings in a non blocking manner requires a reasonable amount of code (and possibly root access).

In linux, netstat shows the TCP send-Q which would be a useful metric. Unfortunately there doesn't seem to be an api to get at it [unless anyone knows differently?]

Adrian

MrC
2005-11-09, 12:12
I'm not sure you'll be able to setup and sends pings fast enough to get the data you want.

In Linux, you can get your data via /proc. Try:

cat /proc/net/tcp

Triode
2005-11-09, 12:54
MrC,

Yes thanks - this info is in /proc for linux - do you know if there is any way to get similar info out of any other OSs? [If we are going to add it to the plugin I would like to do something which covers more than just linux users as they don't seem to be the people having problems...]

Re pings - I was actually planning to infer from the round trip time whether there was congestion on the link rather than try to saturate it with pings. If there is a way to get to the TCP state then this gets this info anyway.

So does anyone know how to get tcp state data out of Windows and Mac OS?

Adrian

MrC
2005-11-09, 22:20
MrC,

Yes thanks - this info is in /proc for linux - do you know if there is any way to get similar info out of any other OSs? [If we are going to add it to the plugin I would like to do something which covers more than just linux users as they don't seem to be the people having problems...]
...
So does anyone know how to get tcp state data out of Windows and Mac OS?


Are you interested in SNMP?



Re pings - I was actually planning to infer from the round trip time whether there was congestion on the link rather than try to saturate it with pings.

I wonder if you'll hit inbetween windows of high and low xfer rates. A small ping, at even 1500 bytes, is a very small sample relative to a 30 meg FLAC file transfer. Perhaps a larger sample size would be necessary?

radish
2005-11-10, 08:05
Under Windows "netstat -e" will get you basic ethernet stats (counts of bytes, packets, errors, etc). The -s option will also list protocol stats (TCP, ICMP, IP etc). If you want to get at the stats programmatically there's a COM interface but I don't know much about it.

Triode
2005-11-10, 12:03
Under Windows "netstat -e" will get you basic ethernet stats (counts of bytes, packets, errors, etc). The -s option will also list protocol stats (TCP, ICMP, IP etc). If you want to get at the stats programmatically there's a COM interface but I don't know much about it.

I really want the TCP send-Q depth per socket (well actually for the sockets used by slimserver). I can't find any commands line options which do this with window's netstat.

Any pointers to the COM interface appreciated, especially if it provides more internal state info.

MrC
2005-11-10, 15:41
I've been looking through quite a few resources for Windows and have yet to find an analagous Send-Q/Recv-Q statistic. Winsock does not seem to provide such statistics, nor do any Windows network diag utilites that I've been able to find. I can look a little more.

However, the more I think about this, the more I'm convinced the right way to provide this capability is via wireless client (the SB itself). Many network card providers also provide their own client diagnostic tools for the same reason - its the only thing they can control in a mixed environment. It feels like the server should not have to go through the OS to attempt to calculate how often the wireless client is having trouble.

Triode
2005-11-12, 05:07
There doesn't seem to be a simple way of collecting TCP stats from the server OS.

So taking a slightly different track, I was thinking of reusing the display scrolling code as a packet generator and creating a plugin to test link capacity by varying scrolling rates [eventually to become part of the health plugin]. This will only work with graphics players as it relies on the relatively large size of each display frame [1290 bytes in the case of SB2/3] On a wired network I can sustain 4Mbps to an SB2 with no TCP queue build up using this method. Above ~5M there is some TCP congestion, probably due to the small tcp window used for the slimproto session. However the area of interest is below this so this is probably a reasonable approach. It also has side effect that network performance is visible..

Any thoughts?

What link rates should it attempt to test at? I am thinking:
128k, 192k, 320k, 1M, 1.5M, 2M, 4M

MrC
2005-11-12, 10:18
This is an interesting idea, and should provide some good diagnostics.

Any reason not to go down to 64k and below. Since you can Bitrate limit down to 64k, it would be good to catch this case too. I was also thinking that one step lower would close the case if the user couln't sustain 32k.

If there can be a user selectable rate (or better yet, configuration of rates to be used, with yours and the two I mention above if you agree), then that could catch any boundary cases. The increasingly larger gaps might hide some anomolies.

When you're ready, I'll be happy to run some tests if you want.

Triode
2005-11-12, 10:37
Happy to add 32k and 64k.

I'm playing with a prototype at present which is driven from the player user interface. It requires a very small patch to the server to instrument the scrolling code.

Are you on the dev list - if so I'll post the plugin + patch there [assuming you are happy to patch the server], otherwise will post here.

Also need to add some protection as 4M seems to render a SB1 unresponsive to any IR input...

MrC
2005-11-12, 10:59
Ok, cool. Yup, I'm (foolishly) on all lists but the German and Spanish ones. Patches to the dev list are fine. I keep up with the 6.5 trunk and can do 6.x as per your convenience.

Triode
2005-11-18, 16:00
Following this discussion, I've put together a simple plugin which tests network bandwidth at a number of rates.

I would like some feedback from people on this, to see if it really does detect the throughput people see on their wireless networks. Assuming it proves useful I will look to add to it and probably integrate it with the Health plugin.

Please try the attached and feedback your results. It should work for graphics players on 6.2 and later slimservers.
Unzip the file and place it in the Plugins directory of your slimserver install & restart the server.

To use:
1) Select the "Network Test" option from the Plugin menu
2) Press down to select the network bandwith at which you want to test.
3) The screen displays the rate, 1 second %age throughput and average %age for that rate [average calculated over the period you leave it running for]

You should be able to stream successfully at the highest rate at which 100% throughput is recorded.

Note: for SB1 if you drive the link too fast, the remote may become unresponsive - the test can be stoped by powering off the player from the web interface (I can't find a way round this)

Edit - Version 0.2 for server 6.2/6.3. This plugin is included with the server from 6.5.

abdomen
2005-11-22, 07:32
Following this discussion, I've put together a simple plugin which tests network bandwidth at a number of rates.
Since this could be useful to a lot of people, I added it to the wiki on a new "diagnostic" plugins page: http://wiki.slimdevices.com/index.cgi?PluginDiagnostics

I wasn't sure just how to reference the Health plugin, but perhaps it should be added to that page, even if it is (will be? I haven't upgraded to 6.2 yet to see) included with SlimServer.

Triode
2005-11-22, 09:19
I'm aiming to include this in 6.5 as part of the Health plugin - watch this space. [I'm working on a web interface for it now to link it to the health plugin..]

In the mean time for 6.2, yes please use the one attached here.

bglad
2005-11-22, 10:30
Triode, your NetTest plugin looks very useful, works so far and matches the results i'd expect - thanks

mkosma
2005-11-22, 12:45
This is extremely cool. It shows me that my network connection from home to work (the one on which I used to experience stuttering but haven't in a long while) is able to sustain 256kbps without any doubt, and can usually maintain 100%/100% at 320kbps. Given that I was running bitrate limited to 128kbps, throughput was not my issue as several folks originally suspected.

Now I just have to figure out why the player still freezes occasionally....

thanks!

Triode
2005-11-22, 15:15
I've just added a version of the network tester to the health plugin in 6.5. Please test the next nightly to see what you think. [New version is integrated into health web page, but can also be driven from the player menu.]

danco
2005-11-23, 09:52
In the information about the plugin, it might be useful to say what the figures mean. Specifically, what bandwidth would one need for uncompressed streaming (wav or aiff) and for flac. mp3 can go up to 320k, can't it.

Also, what about percentages of less than 100%. How much trouble is one likely to get at 90%? At 80%?

Or aren't these questions to which a definite answer can be given?

Triode
2005-11-23, 12:45
I hope the web page in 6.5 explains how the figures are measured and what they mean. Is this comment against the web page or the stand alone plugin? (very happy to take feedback on the words I've used though!)

danco
2005-11-24, 01:18
Thanks for that. I am using 6.2.1 (the latest official release of SlimServer), so my comments applied to the stand-alone version. If a description is incorporated in 6.5, I think too few of us will be using the stand-alone version to make it worth putting a more detailed decription there.

MrC
2005-11-25, 12:57
This is extremely cool. It shows me that my network connection from home to work (the one on which I used to experience stuttering but haven't in a long while) is able to sustain 256kbps without any doubt, and can usually maintain 100%/100% at 320kbps. Given that I was running bitrate limited to 128kbps, throughput was not my issue as several folks originally suspected.

Now I just have to figure out why the player still freezes occasionally....

thanks!
Can you really conclude this given the data you present? You indicate that playback is now fine AND network performance is currently sufficient. And you indicate that it used to fail previously - but had no network measurements during those periods. One could presume that your network was troublesome then, but no longer so now.

mecouc
2005-12-23, 04:53
I get a message about an invalid zip file wehn I try to download this plugin.

MrC
2005-12-23, 10:17
I get a message about an invalid zip file wehn I try to download this plugin.

Go here and download: http://wiki.slimdevices.com/index.cgi?PluginDiagnostics

The zip file is fine.