PDA

View Full Version : Free work done for SlimD people -- gonna fix bug?



tbessie
2007-01-14, 17:09
Hey Slim Devices folks...

With all this free debugging being done for you by Christopher et. al., are you guys planning on fixing bug #3778?

It looks like the cause of the bug has been found for you... it would be nice if someone would reply in that thread ( http://bugs.slimdevices.com/show_bug.cgi?id=3778 ) and say what you're thinking of doing about it.

I realize that you have many bugs to work on, but this one requires a reboot each time to make the SB reconnect... I'd call that 'severe'.

- Tim

peter
2007-01-15, 02:05
tbessie wrote:
> I realize that you have many bugs to work on, but this one requires a
> reboot each time to make the SB reconnect... I'd call that 'severe'.
>

Sounds very severe, but I have 4 SB3's and I restart my slimserver
fairly regularly and I've *never* seen that problem. So perhaps it only
affects a small number of people, which would make it *less* severe,
wouldn't it...?

Regards,
Peter

tbessie
2007-01-15, 12:26
tbessie wrote:
> I realize that you have many bugs to work on, but this one requires a
> reboot each time to make the SB reconnect... I'd call that 'severe'.
>

Sounds very severe, but I have 4 SB3's and I restart my slimserver
fairly regularly and I've *never* seen that problem. So perhaps it only
affects a small number of people, which would make it *less* severe,
wouldn't it...?

Regards,
Peter

I would say the problem itself is 'severe', but priority is 'low-medium' because of the number of people it effects, as per common bug database schemes.

Since it effects ME, of course I'd say it should be High priority. :-)

- Tim

snarlydwarf
2007-01-15, 13:01
It looks like a curious bug but the real catch is that for some people (like you), it happens a lot, but for others, they don't see it... which makes it hard to diagnose.

Your first post implies it has been diagnosed, but I don't see that in the thread there... did I miss a way to duplicate this? (Computers may -seem- random and chaotic, but they are really, well, programmed and predictable... so things should be reproducable... but finding the exact conditions is the tricky part.)

Is there anything special about your setup that this happens?

Looking at the packet capture, I see this:


12:49:58.508392 00:04:20:06:68:e4 > Broadcast, ethertype ARP (0x0806), length 60: arp who-has 192.168.1.24 (Broadcast) tell 192.168.1.20
12:49:59.427438 00:04:20:06:68:e4 > Broadcast, ethertype ARP (0x0806), length 60: arp who-has 192.168.1.24 (Broadcast) tell 192.168.1.20
12:50:00.349080 00:04:20:06:68:e4 > Broadcast, ethertype ARP (0x0806), length 60: arp who-has 192.168.1.24 (Broadcast) tell 192.168.1.20
12:50:01.577808 00:04:20:06:68:e4 > Broadcast, ethertype ARP (0x0806), length 60: arp who-has 192.168.1.24 (Broadcast) tell 192.168.1.20
12:50:02.499364 00:04:20:06:68:e4 > Broadcast, ethertype ARP (0x0806), length 60: arp who-has 192.168.1.24 (Broadcast) tell 192.168.1.20
12:50:07.414606 00:04:20:06:68:e4 > Broadcast, ethertype ARP (0x0806), length 60: arp who-has 192.168.1.24 (Broadcast) tell 192.168.1.20
12:50:08.336120 00:04:20:06:68:e4 > Broadcast, ethertype ARP (0x0806), length 60: arp who-has 192.168.1.24 (Broadcast) tell 192.168.1.20
12:50:09.565006 00:04:20:06:68:e4 > Broadcast, ethertype ARP (0x0806), length 60: arp who-has 192.168.1.24 (Broadcast) tell 192.168.1.20

That is the Squeezebox at 192.168.1.20 saying, "Hey, my server was at 192.168.1.24, what is the MAC address of that?" And getting no response.

Which would mean the server is no longer at 192.168.1.24, or the server is broken... at the networking level. (Perhaps a firewall? But it is long before any software from SlimDevices.. the TCP/IP stack on 192.168.1.24 should be responding all by itself.)

Intermixed with these, I see this:

12:50:47.658257 00:04:20:06:68:e4 > Broadcast, ethertype IPv4 (0x0800), length 144: IP (tos 0x0, ttl 64, id 33474, offset 0, flags [none], lengt
h: 130) 192.168.1.20.3483 > 255.255.255.255.8900: [udp sum ok] UDP, length: 102


That is the Squeezebox, again. This time it is sending a UDP broadcast out, saying, "Hey, anyone listening on UDP/8900? Hello?"

I have no idea why it is doing that: the usual "slim discovery" port is 3483.

On reboot, the SB in that log then connects immediately to Squeezenetwork, I don't even see it trying to connect to a local server until it looks like it is logged off SN:


12:55:15.559548 00:04:20:06:68:e4 > Broadcast, ethertype IPv4 (0x0800), length 60: IP (tos 0x0, ttl 64, id 58, offset 0, flags [none], length: 4
6) 192.168.1.20.3483 > 255.255.255.255.3483: [udp sum ok] UDP, length: 18
12:55:15.560833 00:0c:76:bd:46:d5 > 00:04:20:06:68:e4, ethertype IPv4 (0x0800), length 60: IP (tos 0x0, ttl 128, id 21302, offset 0, flags [none]
, length: 46) 192.168.1.24.3483 > 192.168.1.20.3483: [udp sum ok] UDP, length: 18


That is the SB again asking "hello? anyone talking to me?" with a UDP broadcast, but this time it is on the right port and gets an answer.

And oddly:


12:55:17.778834 00:04:20:06:68:e4 > Broadcast, ethertype ARP (0x0806), length 42: arp who-has 192.168.1.24 (Broadcast) tell 192.168.1.20
12:55:17.779664 00:0c:76:bd:46:d5 > 00:04:20:06:68:e4, ethertype ARP (0x0806), length 60: arp reply 192.168.1.24 is-at 00:0c:76:bd:46:d5

Now ARP works and the server is responding. It is the exact same thing that is at the start, but this time 192.168.1.24 feels like responding.

If the server responded like that at the start of the log, it should have worked... but again, ARP responses are low-level IP stuff (well at the border between IP and Ether layers...) and an application program can't change the responses... only a firewall or broken IP stack should break them.

Is there a firewall on that machine? What is breaking ARP?

oreillymj
2007-01-15, 15:02
These things are far less common then they used to be, but seems like there's possible a dodgy nic/cable somewhere on the network?

JJZolx
2007-01-15, 15:45
Interesting philosophical question - If a bug is severe, but only a certain percentage of people are experiencing it, then what kind of priority should it be given? If it were a bug that was a show-stopper for those people, then I'm sure it'd be given a much higher priority. But there are workarounds and in the end the bug is mostly just an inconvenience, although it can be _very_ inconvenient if you experience it.

I think one problem with that bug is that a number of people have mistakenly jumped on it and this either frustrates or irritates the developers. This happens with many bugs, but that one is particularly susceptible. Squeezeboxes that can't connect to SlimServer are a very common issue, but for any one of dozens of reasons. Without digging into debugging and packet sniffing and etc. it can be difficult or impossible for the end user to tell the difference.

snarlydwarf
2007-01-15, 16:08
I think one problem with that bug is that a number of people have mistakenly jumped on it and this either frustrates or irritates the developers. This happens with many bugs, but that one is particularly susceptible. Squeezeboxes that can't connect to SlimServer are a very common issue, but for any one of dozens of reasons. Without digging into debugging and packet sniffing and etc. it can be difficult or impossible for the end user to tell the difference.

That is true, and there is even a question on that bug report if there is more than one bug being talked about... without being able to reproduce it, who knows... Hence why I ignore all but the packet dump.

The only broken thing (from the SB anyway) that I see is the discover from port 3483 to port 8900... that is backwards: it should be sending to port 255.255.255.255:3483.

No idea why it would do that, though that still doesnt answer the "why is arp not replying?" .. but since the bug is supposedly related to shutting the server down, perhaps that is related.

What I see in the packet dump is that the SB is trying really hard to connect to the server, getting no response. When rebooted it connects to Squeezenetwork just fine, and when it is logged out of SN, it connects to the local server.

tbessie, do you know how to use ethereal or tcpdump to create a packet capture so we can see what it is doing?

tbessie
2007-01-15, 16:34
That is true, and there is even a question on that bug report if there is more than one bug being talked about... without being able to reproduce it, who knows... Hence why I ignore all but the packet dump.

The only broken thing (from the SB anyway) that I see is the discover from port 3483 to port 8900... that is backwards: it should be sending to port 255.255.255.255:3483.

No idea why it would do that, though that still doesnt answer the "why is arp not replying?" .. but since the bug is supposedly related to shutting the server down, perhaps that is related.

What I see in the packet dump is that the SB is trying really hard to connect to the server, getting no response. When rebooted it connects to Squeezenetwork just fine, and when it is logged out of SN, it connects to the local server.

tbessie, do you know how to use ethereal or tcpdump to create a packet capture so we can see what it is doing?

I haven't used them, but I can surely set them up/use them as needed. I'm a software developer myself, so I understand the intricacies of these sorts of problems, don't get me wrong. I just was under the (mistaken?) impression that the information Christopher had provided would be enough to set the SB developers on the right path.

But I'll reread the bug report and this thread and see if I can produce a nice trace for y'all.

- Tim

tbessie
2007-01-15, 23:21
tbessie, do you know how to use ethereal or tcpdump to create a packet capture so we can see what it is doing?

Well, I ran Ethereal (well, Wireshark) with 3 different scenarios. I also unplugged any other devices I had connected to make sure there was as little traffic as possible. The IPs and MACs are as below. Perhaps this will help diagnose the problem? It LOOKS like the same symptoms Christopher was having (tho' I'm not a network expert):

192.168.1.101 / 00:13:20:09:de:cd (IntelCor_09:de:cd) : The computer running SlimServer
192.168.1.150 / 00:04:20:06:db:3d (SlimDevi_06:db:3d) : Static IP / MAC for my SqueezeBox
192.168.1.1 / 00:18:39:c1:84:b0 (Cisco-Li_c1:84:b0) : My Wireless Router

1) SlimServer is on, my SqueezeBox is plugged in but off, and I press the "On" button on the SqueezeBox (I get the "waking up" and "connecting" messages on the SqueezeBox, but it does not connect, and goes blank. I press "On" again, and the same thing happens.
( http://www.sonic.net/~tbessie/misc/capture-squeezebox-onbutton-2x.html )


2) SlimServer is on, my SqueezeBox is plugged in but off, and I do nothing but capture (no pressing of buttons)
( http://www.sonic.net/~tbessie/misc/capture-squeezebox-nopush.html )

3) SlimServer is on, my SqueezeBox is plugged in but off. I press "On", get the "waking up" message, but press "back", go to "set up networking" and run through the menus, the SqueezeBox restarts, connects to the network again, I press "forward" until I the SqueezeBox AGAIN goes blank. Then press "On", and the SqueezeBox connects to SlimServer.
( http://www.sonic.net/~tbessie/misc/capture-squeezebox-on-reset-on-connect.html )